Skip to main content
Every record in the dataset — regardless of format — contains three fields: context, speaker, and text. This schema is consistent across JSONL, Parquet, and ChatML formats.

JSONL / Parquet fields

context
string
required
The scene identifier where this line occurs. Always formatted as "Scene: <scene_name>". Scene names may be descriptive (e.g., "Scene: Cyber World") or internal object identifiers (e.g., "Scene: Obj Krisroom").
speaker
string
required
The entity delivering this line. One of: a character name (e.g., "Susie", "Toriel"), "Narrator" for game narration and descriptions, or "Player" for player choice options.
text
string
required
The actual content of the line — dialogue, narration, or a player choice label.

Example records

{"context": "Scene: Device Contact", "speaker": "Narrator", "text": "ARE YOU THERE?"}
{"context": "Scene: Device Contact", "speaker": "Narrator", "text": "ARE WE CONNECTED?"}
{"context": "Scene: Device Contact", "speaker": "Narrator", "text": "EXCELLENT."}
{"context": "Scene: Obj Krisroom", "speaker": "Toriel", "text": "Kris...!"}
{"context": "Scene: Obj Krisroom", "speaker": "Toriel", "text": "Wake up!"}
{"context": "Scene: Obj Classscene", "speaker": "Alphys", "text": "So, does everyone have a..."}
{"context": "Scene: Obj Classscene", "speaker": "Susie", "text": "... am I late?"}
{"context": "Scene: Device Contact", "speaker": "Player", "text": "YES"}
{"context": "Scene: Device Contact", "speaker": "Player", "text": "SWEETS"}

HuggingFace dataset_info

The schema is declared in the repository’s README.md as HuggingFace dataset metadata:
dataset_info:
  features:
  - name: context
    dtype: string
  - name: speaker
    dtype: string
  - name: text
    dtype: string
task_categories:
- text-generation
language:
- en

Format comparison

All three available formats represent the same underlying data:
FormatFieldsNotes
JSONLcontext, speaker, textOne JSON object per line
Parquetcontext, speaker, textColumnar, same column names
ChatMLmessages[].role, messages[].contentReformatted for instruction tuning

ChatML message structure

The ChatML format wraps the same data in a multi-turn message structure:
messages
array
required
Array of message objects with role and content.
{
  "messages": [
    {
      "role": "system",
      "content": "You are the Deltarune Script Archive. You provide exact, chronological scene transcripts including dialogues, actions, and pauses."
    },
    {
      "role": "user",
      "content": "Provide the transcript for Scene: Obj Krisroom.\nContext: This scene occurs after 'Device Contact' and before 'Obj Carcutscene'."
    },
    {
      "role": "assistant",
      "content": "Toriel: \"Kris...!\"\nToriel: \"Wake up!\"\nToriel: \"KRIS...!\"\nToriel: \"Kris, if you do not wake up, we will be late for school!\"\nToriel: \"I will wait outside for you, alright?\""
    }
  ]
}
The ChatML format aggregates all individual JSONL records for a scene into a single assistant response. One ChatML record equals one full scene.