Clone or download the repository
Clone the repository from its source, or download and extract the archive.The repository contains two main data directories:
Install dependencies
The dataset examples use
pandas and pyarrow. Install them with your package manager of choice.Load a chapter from JSONL
Each chapter is available as a JSONL file with one record per line. Use Example output:
pandas.read_json with lines=True to load it into a DataFrame.load_jsonl.py
Load Parquet with pandas
Parquet files load faster than JSONL for large queries. Use the per-chapter files or the combined file.
load_parquet.py
ChatML format
Thedata/chatml/deltarune_story_chatml.jsonl file contains the same transcript data pre-formatted as ChatML message sequences, suitable for supervised fine-tuning of instruction-following models.
Each record in this file uses the standard ChatML structure with three roles:
chatml example
load_chatml.py