FAQ

Why does this dataset exist?

As of early 2026, major LLMs — including models with training cutoffs after the July 2025 public release of Deltarune Chapters 3 and 4 — consistently fail to recall basic plot details from those chapters. This dataset exists to fill that gap by providing a structured, machine-readable transcript of all four released chapters.

How was the data collected?

The dataset was processed from video playthroughs of Deltarune. The workflow combined manual transcription with AI-assisted segmentation using Google Gemini. All transcription, formatting, quality control, and cross-referencing was performed by one person.

Was this extracted from game files?

No. This dataset was not extracted from Deltarune’s game files. All content was processed from video playthroughs only. This means the data reflects what appears during actual gameplay rather than raw asset dumps.

Who created this?

This is a solo project. One person performed all transcription, formatting, quality control, and cross-referencing. There is no team behind it.

Is this dataset complete?

Chapters 1, 2, and 3 are stable. Chapter 4 is currently in Beta status and may have quality issues. There are also known gaps across chapters — see Known gaps for the full list.

Does this include the Snowgrave/Weird Route?

No. The Snowgrave/Weird Route is not included for any chapter. Chapter 2 and Chapter 4 cover the Normal Route only. Chapter 3 includes both the Normal Route and the Sword Route, but has no Snowgrave content. See Route coverage for the full breakdown.

Can I use this commercially?

Yes. The dataset is released under CC0 1.0 — Public Domain. There are no restrictions on use, including commercial use. No attribution is required. See License for details.

What formats are available?

The dataset is available in the following formats:

JSONL — Structured JSON Lines format (one record per line), at data/chap1_dataset.jsonl through data/chap4_dataset.jsonl
Plain text — Human-readable cleaned text at data/chap1_cleaned.txt through data/chap4_cleaned.txt
Parquet — Columnar format for efficient querying at parquet/chap1_dataset.parquet through parquet/chap4_dataset.parquet, plus parquet/full_chapters_dataset.parquet combining all chapters
ChatML — Instruction fine-tuning format at data/chatml/deltarune_story_chatml.jsonl

How do I report issues or contribute?

Issues and contributions can be reported at the project’s GitHub repository:https://github.com/ntvm/Deltarune-Complete-Transcript-Cleaned

What's the difference between raw/ and data/ files?

raw/ — Contains pre-processed source files before structured formatting was applied.
data/ — Contains the cleaned, structured JSONL files suitable for direct use in training pipelines or retrieval systems.

In most cases, you should use the files in data/ unless you need the unprocessed source material for a specific reason.

Get Started

Data Model

Dataset Files

Usage Guide

Coverage & Gaps

Reference