context, speaker, and text. This schema is consistent across JSONL, Parquet, and ChatML formats.
JSONL / Parquet fields
The scene identifier where this line occurs. Always formatted as
"Scene: <scene_name>". Scene names may be descriptive (e.g., "Scene: Cyber World") or internal object identifiers (e.g., "Scene: Obj Krisroom").The entity delivering this line. One of: a character name (e.g.,
"Susie", "Toriel"), "Narrator" for game narration and descriptions, or "Player" for player choice options.The actual content of the line — dialogue, narration, or a player choice label.
Example records
HuggingFace dataset_info
The schema is declared in the repository’sREADME.md as HuggingFace dataset metadata:
Format comparison
All three available formats represent the same underlying data:| Format | Fields | Notes |
|---|---|---|
| JSONL | context, speaker, text | One JSON object per line |
| Parquet | context, speaker, text | Columnar, same column names |
| ChatML | messages[].role, messages[].content | Reformatted for instruction tuning |
ChatML message structure
The ChatML format wraps the same data in a multi-turn message structure:Array of message objects with
role and content.The ChatML format aggregates all individual JSONL records for a scene into a single assistant response. One ChatML record equals one full scene.