Scene context

The context field in every record identifies which scene the line belongs to. Understanding scene context is essential for filtering, ordering, and using the data effectively.

Format

All context values follow this pattern:

"Scene: <scene_name>"

The scene name portion uses one of two conventions.

Naming conventions

Descriptive scene names

Some scenes use human-readable names that describe the setting or situation:

Context value	What it represents
`"Scene: Cyber World"`	The Dark World cyber-themed area
`"Scene: Card Castle"`	The card-themed castle area
`"Scene: Dark Worlds"`	Generic Dark World scenes

Internal object names

Other scenes use the internal game object names, prefixed with Obj:

Context value	What it represents
`"Scene: Device Contact"`	Opening sequence (the SOUL creation screen)
`"Scene: Obj Krisroom"`	Kris’s bedroom (game start)
`"Scene: Obj Carcutscene"`	Car ride to school
`"Scene: Obj Classscene"`	Classroom scene
`"Scene: Obj Schoollobbycutscene"`	School lobby encounter with Susie
`"Scene: Obj Insideclosetcutscene"`	Inside the closet

The internal object name convention (Obj prefix) comes from the game’s asset/object naming system. These names are used in the data as-is from the transcription process.

Scene ordering

Within each JSONL file, records are stored in chronological order — the order in which they appear in the game’s story progression. The first record in chap1_dataset.jsonl is the first line of Chapter 1; the last record is the final line. All records for a given scene are grouped together consecutively.

Scene context in ChatML

The ChatML format makes scene ordering explicit. Each user prompt includes the predecessor and successor scene:

{
  "role": "user",
  "content": "Provide the transcript for Scene: Obj Krisroom.\nContext: This scene occurs after 'Device Contact' and before 'Obj Carcutscene'."
}

This temporal context is encoded to help language models understand scene sequencing when fine-tuned on this data.

Working with scenes in Python

import pandas as pd

df = pd.read_json('data/chap1_dataset.jsonl', lines=True)

# Get all unique scenes in chapter order
scenes = df['context'].unique()
print(scenes)

# Get all lines for a specific scene
closet_scene = df[df['context'] == 'Scene: Obj Insideclosetcutscene']

# Find all scenes containing a keyword
cyber_scenes = df[df['context'].str.contains('Cyber', case=False)]

# Count lines per scene
scene_counts = df.groupby('context').size().sort_values(ascending=False)
print(scene_counts.head(10))

Cross-chapter scenes

Some location names appear across multiple chapters (e.g., recurring settings). When working with the combined Parquet file, you can use scene names alongside the source file to distinguish chapters:

import pandas as pd

# Load per-chapter data and tag with chapter number
chapters = []
for i in range(1, 5):
    df = pd.read_parquet(f'parquet/chap{i}_dataset.parquet')
    df['chapter'] = i
    chapters.append(df)

full = pd.concat(chapters, ignore_index=True)

# Filter a scene name in a specific chapter
result = full[(full['context'] == 'Scene: Card Castle') & (full['chapter'] == 2)]

The full_chapters_dataset.parquet file does not include a chapter column — it’s a direct concatenation of the four per-chapter Parquet files. Add the chapter label yourself as shown above if you need to distinguish sources.

Get Started

Data Model

Dataset Files

Usage Guide

Coverage & Gaps

Reference

Format

Naming conventions

Descriptive scene names

Internal object names

Scene ordering

Scene context in ChatML

Working with scenes in Python

Cross-chapter scenes

Get Started

Data Model

Dataset Files

Usage Guide

Coverage & Gaps

Reference

​Format

​Naming conventions

​Descriptive scene names

​Internal object names

​Scene ordering

​Scene context in ChatML

​Working with scenes in Python

​Cross-chapter scenes

Format

Naming conventions

Descriptive scene names

Internal object names

Scene ordering

Scene context in ChatML

Working with scenes in Python

Cross-chapter scenes