context— Scene identifier, e.g."Scene: Obj Krisroom"speaker— Speaker name or tag (Narrator,Player, or a character name)text— The dialogue or narration text
Install dependencies
Clone the repository
Load data
Load a single chapter (JSONL)
Each chapter has its own JSONL file under Available files:
data/.chap1_dataset.jsonl through chap4_dataset.jsonl.Load a single chapter (Parquet)
Parquet files are stored under
parquet/ and load faster than JSONL for large chapters.Load all chapters combined
Use the combined Parquet file to query across all chapters without manually concatenating files.
Load the ChatML file
The ChatML file is used for fine-tuning and contains multi-turn conversations in the OpenAI messages format.
Load from HuggingFace Hub
If the dataset has been published to the HuggingFace Hub, load it directly with the
datasets library — no local clone required.The exact dataset identifier and available splits depend on how the dataset was published to the Hub. Check the dataset card on HuggingFace for the authoritative identifier and split names.