MMTB Stats — Task Explorer

Corpus statistics

Distributions across the 105-task suite. Media counts and durations from media-stats.json; difficulty and partitions from baseline results (internal).

105

tasks

536

media files

6.9 h

total media

median files/task

max files/task

1.3 min

median duration/task

Tasks per meta-category

Media Production 40

Enterprise & Compliance 23

Performance & Coaching 23

Personal & Education 13

Operations & Research 6

Tasks per modality (overlapping)

audio 98

video 67

image 17

text 7

document 4

Media files by kind

video 204

audio 198

image 79

other 55

Files per task

1 file 30

2–3 40

4–9 19

10–19 11

20+ 5

Total media duration per task

<30s 10

30–60s 19

1–3 min 37

3–10 min 26

10 min+ 6

Tasks per family

media_production 19

media_qa 18

ui_event_audit 10

spoken_media 9

dataset_annotation 7

spoken_media_perception 6

audio_production 5

compliance 4

document_extraction 4

captioning 4

clip_retrieval 3

personal_workflow 2

meeting_comprehension 2

av_qa 2

lecture_understanding 2

retrieval 2

speaker_attribution 1

gameplay_qa 1

long_form_clip_extract 1

audio_visual_audit 1

clip_mining_retrieval 1

office_workflow 1

Difficulty & baselines internal

Difficulty score = 1 − (mean continuous reward across all 24 baseline cells run for the task). 0 = every baseline solves it, 1 = no baseline earns any reward. The tier is a coarse label from the binary solve rate (unsolved 0 · frontier ≤15% · hard ≤40% · moderate ≤70% · easy >70%).

Tasks per difficulty tier

unsolved 36

frontier 25

hard 22

moderate 17

easy 5

Per-baseline solve rate

Pro·T2 12% (13/105)

Pro·KIRA 10% (11/105)

Pro·MM 37% (39/105)

GPT5.2·Codex 16% (17/105)

Sonnet·CC 16% (17/105)

Flash·MM 23% (24/105)

Pro modality-ladder partition

unresolved-by-Pro 49

multimodal-unlocked 22

audio-unlocked 16

text-tool-solvable 13

image-unlocked 5

Codex × Pro-MM partition

both-fail 60

MM-only (omni-necessary evidence) 28

both-solve 11

Codex-only (CLI-strategy evidence) 6