Corpus statistics

Distributions across the 105-task suite. Media counts and durations from media-stats.json; difficulty and partitions from baseline results (internal).

105
tasks
536
media files
6.9 h
total media
2
median files/task
50
max files/task
1.3 min
median duration/task

Tasks per meta-category

Media Production 40
Enterprise & Compliance 23
Performance & Coaching 23
Personal & Education 13
Operations & Research 6

Tasks per modality (overlapping)

audio 98
video 67
image 17
text 7
document 4

Media files by kind

video 204
audio 198
image 79
other 55

Files per task

1 file 30
2–3 40
4–9 19
10–19 11
20+ 5

Total media duration per task

<30s 10
30–60s 19
1–3 min 37
3–10 min 26
10 min+ 6

Tasks per family

media_production 19
media_qa 18
ui_event_audit 10
spoken_media 9
dataset_annotation 7
spoken_media_perception 6
audio_production 5
compliance 4
document_extraction 4
captioning 4
clip_retrieval 3
personal_workflow 2
meeting_comprehension 2
av_qa 2
lecture_understanding 2
retrieval 2
speaker_attribution 1
gameplay_qa 1
long_form_clip_extract 1
audio_visual_audit 1
clip_mining_retrieval 1
office_workflow 1

Difficulty & baselines internal

Difficulty score = 1 − (mean continuous reward across all 24 baseline cells run for the task). 0 = every baseline solves it, 1 = no baseline earns any reward. The tier is a coarse label from the binary solve rate (unsolved 0 · frontier ≤15% · hard ≤40% · moderate ≤70% · easy >70%).

Tasks per difficulty tier

unsolved 36
frontier 25
hard 22
moderate 17
easy 5

Per-baseline solve rate

Pro·T2 12% (13/105)
Pro·KIRA 10% (11/105)
Pro·MM 37% (39/105)
GPT5.2·Codex 16% (17/105)
Sonnet·CC 16% (17/105)
Flash·MM 23% (24/105)

Pro modality-ladder partition

unresolved-by-Pro 49
multimodal-unlocked 22
audio-unlocked 16
text-tool-solvable 13
image-unlocked 5

Codex × Pro-MM partition

both-fail 60
MM-only (omni-necessary evidence) 28
both-solve 11
Codex-only (CLI-strategy evidence) 6