How to read and work with the Deltarune dataset in JSONL (newline-delimited JSON) format.
JSONL (newline-delimited JSON) stores one JSON object per line. Each line is a self-contained, valid JSON record. This makes it easy to stream large files without loading everything into memory, and to process records incrementally in scripts and pipelines.The JSONL files for this dataset are located at data/chap*_dataset.jsonl, one file per chapter.
Records within each file are in chronological order as they appear in the game. Do not assume ordering is preserved when combining multiple files unless you sort explicitly.
import pandas as pddf = pd.read_json("data/chap1_dataset.jsonl", lines=True)# All lines spoken by Susiesusie_lines = df[df["speaker"] == "Susie"]print(f"Susie has {len(susie_lines)} lines in Chapter 1")print(susie_lines[["context", "text"]].head(10))
import pandas as pddf = pd.read_json("data/chap1_dataset.jsonl", lines=True)# All lines in the classroom sceneclassroom = df[df["context"] == "Scene: Obj Classscene"]for _, row in classroom.iterrows(): print(f"{row['speaker']}: {row['text']}")
import pandas as pdimport globfiles = sorted(glob.glob("data/chap*_dataset.jsonl"))dfs = [pd.read_json(f, lines=True) for f in files]df_all = pd.concat(dfs, ignore_index=True)print(f"Total records across all chapters: {len(df_all)}")print(df_all["speaker"].value_counts().head(10))
If you need cross-chapter analysis regularly, use the parquet/full_chapters_dataset.parquet file instead. It is pre-built from all JSONL files and loads faster for repeated queries.