Deltarune Complete Transcript Dataset
A fan-made transcript dataset covering the complete story of Deltarune Chapters 1 through 4. Processed from video playthroughs with AI-assisted segmentation, structured for use in LLM fine-tuning, retrieval-augmented generation, and narrative research.As of early 2026, major LLMs fail to recall basic plot details of Deltarune Chapters 3 and 4 despite their public release. This dataset exists to close that gap.
Quick Start
Load the dataset in Python in under 5 minutes
Data Schema
Understand the context, speaker, and text fields
Dataset Files
JSONL, Parquet, and ChatML formats available
LLM Fine-tuning
Use the ChatML format for instruction tuning
What’s included
Chapter 1 — Full
Complete Chapter 1 transcript. Stable release. Pre-vid2text version.
Chapter 2 — Full (Normal Route)
Complete Chapter 2 Normal Route transcript. Stable release.
Chapter 3 — Full + Sword Route
Complete Chapter 3 transcript including Sword Route content. Stable release.
Chapter 4 — Full (Normal Route)
Complete Chapter 4 Normal Route transcript. Beta release.
Dataset formats
The transcript data is available in three formats depending on your use case:| Format | File | Best for |
|---|---|---|
| JSONL | data/chap*_dataset.jsonl | General use, streaming, pandas |
| Parquet | parquet/chap*_dataset.parquet | Columnar queries, large-scale processing |
| ChatML | data/chatml/deltarune_story_chatml.jsonl | LLM instruction fine-tuning |
Record structure
Every record in the JSONL dataset has three fields:context— The scene or location where the line occursspeaker— Who is speaking (character name,Narrator, orPlayer)text— The actual line of dialogue or narration
License
This dataset is released under CC0 1.0 Universal — public domain. No attribution required. No conditions. Use it however you want.Source material (Deltarune) is © Toby Fox. This dataset covers the transcription and structural processing work only. Standard fan-project precedent applies.