Skip to main content
As of early 2026, major LLMs — including models with training cutoffs after the July 2025 public release of Deltarune Chapters 3 and 4 — consistently fail to recall basic plot details from those chapters. This dataset exists to fill that gap by providing a structured, machine-readable transcript of all four released chapters.
The dataset was processed from video playthroughs of Deltarune. The workflow combined manual transcription with AI-assisted segmentation using Google Gemini. All transcription, formatting, quality control, and cross-referencing was performed by one person.
No. This dataset was not extracted from Deltarune’s game files. All content was processed from video playthroughs only. This means the data reflects what appears during actual gameplay rather than raw asset dumps.
This is a solo project. One person performed all transcription, formatting, quality control, and cross-referencing. There is no team behind it.
Chapters 1, 2, and 3 are stable. Chapter 4 is currently in Beta status and may have quality issues. There are also known gaps across chapters — see Known gaps for the full list.
No. The Snowgrave/Weird Route is not included for any chapter. Chapter 2 and Chapter 4 cover the Normal Route only. Chapter 3 includes both the Normal Route and the Sword Route, but has no Snowgrave content. See Route coverage for the full breakdown.
Yes. The dataset is released under CC0 1.0 — Public Domain. There are no restrictions on use, including commercial use. No attribution is required. See License for details.
The dataset is available in the following formats:
  • JSONL — Structured JSON Lines format (one record per line), at data/chap1_dataset.jsonl through data/chap4_dataset.jsonl
  • Plain text — Human-readable cleaned text at data/chap1_cleaned.txt through data/chap4_cleaned.txt
  • Parquet — Columnar format for efficient querying at parquet/chap1_dataset.parquet through parquet/chap4_dataset.parquet, plus parquet/full_chapters_dataset.parquet combining all chapters
  • ChatML — Instruction fine-tuning format at data/chatml/deltarune_story_chatml.jsonl
Issues and contributions can be reported at the project’s GitHub repository:https://github.com/ntvm/Deltarune-Complete-Transcript-Cleaned
  • raw/ — Contains pre-processed source files before structured formatting was applied.
  • data/ — Contains the cleaned, structured JSONL files suitable for direct use in training pipelines or retrieval systems.
In most cases, you should use the files in data/ unless you need the unprocessed source material for a specific reason.