105 / 105 tasks

2-speaker-diarized-transcript-from-podcast-audio

Produce a diarized transcript labeling each utterance with its speaker for a 2-person podcast clip

Media Production frontier 🔊1 · 3.2m
🔊 audio native A
Pro·T2.00Pro·KIRA.00Pro·MM.58GPT5.2·Codex.00audio-unlocked

accessibility-sync-audit

Accessibility tester audits a 47 s benefits-portal screen-reader walkthrough and produces a 6-row desync log of screen-reader vs visible-focus mismatches.

Enterprise & Compliance moderate 🎥1 📝1 · 47s
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex.00audio-unlocked

adr-edit-detection

Detect ADR replacement time intervals in a narration scene via acoustic continuity

Performance & Coaching hard 🔊1 · 60s
🔊 audio native A
Pro·T2.86Pro·KIRA.22Pro·MM.86GPT5.2·Codex.00text-tool-solvable

animation-narration-audit

Designer narrates 8 transient UI animations (300-700ms each) on a Helio Studio prototype. 6 narrations correctly describe the visible animation; 2 misdescribe (wrong color, wrong direction). Agent emits 8-row CSV: (claim_idx, described_animation, observed_animation, match).

Enterprise & Compliance hard 🎥1 📝1 · 53s
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex.00multimodal-unlocked

articulation-deviation-detection

Read a piano score with explicit articulation markings (staccato dots, slurs, accents, tenutos), listen to a recording where some notes are played with the wrong articulation, and produce a feedback.json listing each mismatched note with its expected and played articulation category.

Performance & Coaching unsolved 🔊1 🖼1 · 20s
🖼 image🔊 audio native I+A
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

audience-ringtone-detection

Find the recital recording containing an audience cellphone ringtone and sort recordings

Personal & Education frontier 🔊6 · 6.0m
🔊 audio native A
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00multimodal-unlocked

audio-visual-dub-detection

Find audio-dub slips in a lecture recording where short audio spans have been replaced by audio from elsewhere in the same talk; requires joint audio-visual reasoning to detect rhythm mismatches between lip motion and heard syllables.

Personal & Education hard 🎥1 · 4.4m
🔊 audio🎥 video native I+A+V
Pro·T21.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00text-tool-solvable

av-desync-detection

Detect which video clips have noticeable audio-video desynchronization

Media Production frontier 🎥6 · 6.0m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

av-desync-offset-repair

Repair a desynced clip so audio and video are aligned

Media Production hard 🎥1 · 1.0m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.80unresolved-by-Pro

av-identity-leak-detect

Detect cross-channel identity leaks (badge + spoken name/title) in a pre-release marketing clip

Enterprise & Compliance unsolved 🎥1 · 2.7m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

av-privacy-exposure

Detect cross-modal PII exposures in an Acme CRM screen recording where reveal-on-click toggles transient visibility, and produce both a pii_flags.csv and an edited.mp4 with audio muted + visual mask over the customer-detail panel during exposure intervals.

Enterprise & Compliance hard 🎥1 📝1 · 52s
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.13GPT5.2·Codex.00multimodal-unlocked

b-roll-pool-assignment

Assign each of 10 narration segments to its matching clip from a 30-clip B-roll pool

Media Production unsolved 🎥30 🔊1 · 9.5m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

batch-media-qc-audit

Audit a batch of 10 video delivery bundles against a manifest spec; report which seeded compliance defects each bundle carries.

Media Production frontier 📝1 o1 · 0s
🔊 audio🎥 video🖼 image native I+A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00multimodal-unlocked

birthday-money-shot

Cut the singing and candle-blow segments from a birthday-party video

Personal & Education moderate 🎥1 · 41s
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.77GPT5.2·Codex.00multimodal-unlocked

blind-audition-match

Pick the audition candidate whose line-by-line readings most match the director's script directions

Performance & Coaching hard 🔊30 📝1 · 1.3m
🔊 audio📝 text native A
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex.00audio-unlocked

blood-test-pdfs-to-csv

Flatten five scanned multi-locale pathology PDFs into a normalised analyte CSV (52 rows, mixed SI/conventional units)

Enterprise & Compliance moderate p5 · 0s
📄 document native I
Pro·T21.00Pro·KIRA.00Pro·MM.99GPT5.2·Codex.00text-tool-solvable

boss-cooldown-cheat-audit

Audit boss-fight ability casts against posted cooldown rules. For each cast, determine if it fired before the ability's cooldown bar refilled (illegal) or after (legal). Joint-AV required: each cast plays a distinctive spell SFX, a visible animation, AND triggers a UI cooldown bar drain — agent must hold a unified A+V state across the clip to track per-ability cooldowns and flag premature casts.

Media Production frontier 🎥5 🔊5 · 8.4m
🔊 audio🎥 video native I+A+V
Pro·T21.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.24text-tool-solvable

broadcast-package-edit

Broadcast certification-style package edit — agent assembles bars+tone / black / main+music+mosaic+logo into a 25 s 360x240 MP4 with edit log

Media Production easy 🎥1 🔊1 🖼1 · 1.0m
🎥 video🔊 audio🖼 image native I+A+V
Pro·T21.00Pro·KIRA1.00Pro·MM1.00GPT5.2·Codex1.00text-tool-solvable

bug-repro-claim-audit

Support engineer triages a 60s synthetic Acme HR portal bug-repro screen recording. User narrates 6 claims; some match the visible screen sequence, some don't. Agent emits a 6-row CSV: (claim_idx, claimed_event, actual_event, confirmed).

Enterprise & Compliance moderate 🎥1 📝1 · 50s
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex.00audio-unlocked

call-center-disclosure-audit

Audit a recorded support call for compliance: spoken disclosures + CRM UI actions

Enterprise & Compliance frontier 🎥1 · 1.8m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.17GPT5.2·Codex.39unresolved-by-Pro

caption-nonspeech-enrichment

Enrich a speech-only SRT with cues for the recording's non-speech audio events

Media Production unsolved 🎥1 📝2 · 3.0m
🔊 audio📝 text native A
Pro·T2.22Pro·KIRA.22Pro·MM.22GPT5.2·Codex.22unresolved-by-Pro

caption-speech-mismatch

Find captions in a 4:23 lecture recording that disagree with the spoken audio. Joint-AV required: must compare visual caption text against audio speech to identify semantic mismatches at known intervals.

Media Production unsolved 🎥1 · 4.4m
🔊 audio🎥 video native I+A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

chapter-repair

Refine a 3-entry coarse chapter file for a shell-tools lecture into a 7-9 entry fine chapter file aligned to topic + visual transitions

Personal & Education frontier 🎥1 📝1 · 48.9m
🎥 video🔊 audio native A+V
Pro·T2.40Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

code-review-comment-attribution

Code-review compliance reviewer audits a 73 s Github-PR-style screen-share and produces a 4-row attribution log: intended vs committed line, mismatch flag, sentiment.

Enterprise & Compliance moderate 🎥1 📝1 · 1.2m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex1.00audio-unlocked

comping-chord-substitution

Read a piano lead sheet (bass-clef melody with chord symbols above each bar), listen to a comping recording with four wrong chords substituted into the harmonic accompaniment, and produce a feedback.json listing each wrong-chord bar with its expected and played chord names.

Performance & Coaching unsolved 🔊1 🖼1 · 48s
🖼 image🔊 audio native I+A
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

constant-hum-attenuation

Attenuate 60/120/180 Hz mains-style hum from a voice recording without damaging speech intelligibility

Media Production frontier 🔊1 · 35s
🔊 audio native A
Pro·T2.67Pro·KIRA.51Pro·MM.51GPT5.2·Codex.93unresolved-by-Pro

constant-offset-srt

Correct a constant-offset timing shift on an SRT file by re-anchoring it to the spoken audio

Media Production moderate 🎥1 📝1 · 1.5m
🎥 video🔊 audio
Pro·T2.00Pro·KIRA.00Pro·MM.96GPT5.2·Codex.00audio-unlocked

cooking-instruction-alignment

Label the first frame where each narrated cooking event (grasp, shake-end, release) becomes visually true

Operations & Research frontier 🎥1 · 36s
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00multimodal-unlocked

coop-voice-callout-audit

Audit teammate voice callouts on a coop FPS team-comms channel against the visible game state. For each match, flag callouts that don't match what's visible: false_call, wrong_direction, wrong_state, wrong_attribution. Joint-AV required: each callout names a speaker, an event, a direction; the agent must hold a unified A+V picture across the clip — which teammate is talking (4 distinct voices), the live HUD state (per-teammate HP/ammo/status), the kill-feed history, the minimap layout — to decide whether the call is correct.

Media Production unsolved 🎥5 🔊5 · 7.0m
🔊 audio🎥 video native I+A+V
Pro·T2.00Pro·KIRA.00Pro·MM.29GPT5.2·Codex.00unresolved-by-Pro

creator-voiceover-lipsync-mismatch

Sponsored creator vs voiceover lip-sync mismatch flagging — joint-AV detection of voiceover-during-lip-motion intervals

Media Production unsolved 🎥4 · 2.0m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.83GPT5.2·Codex.00unresolved-by-Pro

crm-compliance-audit

Sales-ops compliance reviewer audits a 60 s screen-share of a sales rep on a discovery call inside a Salesforce-style CRM. Produce a 5-row promised-vs-logged audit CSV.

Enterprise & Compliance hard 🎥1 📝1 · 1.1m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex.00audio-unlocked

cross-channel-privacy-leak

Flag corporate-comms clips where a moving callout on a diagram element co-occurs with the voiceover naming that same element, AND deliver a redacted MP4 for each leaked clip. Joint-AV required: callout-on-element and audio-naming-element each appear separately throughout but only their precise temporal intersection constitutes a leak.

Enterprise & Compliance unsolved 🎥5 · 2.5m
🔊 audio🎥 video native I+A+V
Pro·T2.00Pro·KIRA.00Pro·MM.44GPT5.2·Codex.00unresolved-by-Pro

cursor-deictic-thumbnails

Photo curator reviews 24 thumbnails on a Pixelmine asset library, narrating with deictic-only references ('this one', 'that one', 'the one underneath'). The cursor hovers over the referenced thumbnail at each utterance moment. Agent emits an 8-row CSV: (utterance_idx, thumbnail_id_referenced).

Enterprise & Compliance hard 🎥1 📝1 · 52s
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

dead-air-removal

Identify mid-sentence dead-air regions to cut from a narration recording while preserving sentence-boundary pauses

Media Production hard 🔊1 📝1 · 1.9m
🔊 audio
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex.00multimodal-unlocked

debate-attribution

Attribute each of 8 utterances in a 4-speaker panel-debate video to one of the 4 on-screen positions (A/B/C/D). Voices are paired across positions so voice alone is insufficient; lip-sync on the active tile is required for full disambiguation.

Enterprise & Compliance easy 🎥1 📝1 · 1.3m
🎥 video🔊 audio native A+V
Pro·T21.00Pro·KIRA1.00Pro·MM.50GPT5.2·Codex1.00text-tool-solvable

deictic-ui-reference

Recovering which on-screen UI element a reviewer's deictic remark referred to

Personal & Education unsolved 🎥1 · 1.4m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.33GPT5.2·Codex.00unresolved-by-Pro

delivery-clip-defect-triage

Multi-defect triage on a delivery clip set — classify 8 short clips into a closed defect set spanning audio, visual, and joint-AV failure modes

Media Production unsolved 🎥8 · 1.3m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.63GPT5.2·Codex.00unresolved-by-Pro

design-review-approval-audit

PM audits a 83 s Figma-style design-review screen-share with three voices and produces a 4-row committed-vs-claimed approval audit per frame.

Enterprise & Compliance moderate 🎥1 📝1 · 1.0m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex.00multimodal-unlocked

design-review-version-approval

Identify the trustee-recruitment plan agreement moment in a meeting recording (which plan slide + when + verbatim phrase)

Enterprise & Compliance unsolved 🎥1 📝1 · 12.0m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

dialogue-exchange-match

Pick the 4-turn dialogue take whose per-turn speaker AND emotion arc match the director's brief, with stable emotion within each turn

Performance & Coaching unsolved 🔊10 📝2 · 1.8m
🔊 audio📝 text native A
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

dub-speaker-mismatch

Detect intervals where heard voice does not match on-screen speaker in a dubbed multi-character scene

Media Production unsolved 🎥1 · 4.0m
🔊 audio🎥 video native A+V
Pro·T2.54Pro·KIRA.00Pro·MM.39GPT5.2·Codex.00unresolved-by-Pro

emotional-arc-match

Pick the monologue take whose per-sentence emotional arc matches the director's brief

Performance & Coaching unsolved 🔊15 📝2 · 3.7m
🔊 audio📝 text native A
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

external-mic-sync-repair

Sync an external-mic WAV recording to camera footage with non-trivial drift; produce synced MP4 + drift report

Media Production unsolved 🎥1 🔊1 · 2.5m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.15unresolved-by-Pro

fugal-subject-entry-labeling

Read a four-voice string-quartet fugue score, listen to the recording, and produce a feedback.json listing each bar where the fugue's subject is stated — labeling which voice (violin_1, violin_2, viola, or cello) carries the subject at that entry. Requires recognizing the subject's melodic shape as it migrates through all four voices during the exposition.

Performance & Coaching frontier 🔊1 🖼1 · 49s
🖼 image🔊 audio native I+A
Pro·T2.00Pro·KIRA.00Pro·MM.80GPT5.2·Codex.00multimodal-unlocked

game-alert-mismatch

Find clips where a voiced game alert ('combat engaged', 'fuel low') does not match the visible game event; output a bugs.csv with flagged clip_ids and timestamps

Media Production moderate 🎥8 · 1.3m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex.00multimodal-unlocked

game-outcome-qa

Find the 2 of 6 gameplay outcome clips whose banner outcome does not match the played outcome jingle

Media Production unsolved 🎥8 · 1.7m
🔊 audio🎥 video native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

interview-music-ducking-audit

Family-video music-vs-kid-voice mix audit — flag windows where background music is too hot relative to a child's voice

Personal & Education frontier 🎥3 · 1.5m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.50Pro·MM1.00GPT5.2·Codex.00multimodal-unlocked

interview-srt-refine

Refine an interview's auto-generated SRT to broadcast-grade quality

Media Production easy 🎥1 📝1 · 42s
🎥 video🔊 audio
Pro·T2.00Pro·KIRA.89Pro·MM.96GPT5.2·Codex.95image-unlocked

invoice-estimate-pdfs-to-xlsx

Extract fields from 5 invoice + 5 estimate PDFs into a target Excel template, including PO-based estimate↔invoice association

Enterprise & Compliance moderate p10 o1 · 0s
📄 document native I
Pro·T21.00Pro·KIRA1.00Pro·MM1.00GPT5.2·Codex.00text-tool-solvable

lecture-demo-clip-extract

Locate the on-screen timestamp window for each of 7 labeled slides in a 25-min CC-BY-NC Python conference talk AND quote a verbatim phrase the presenter says while each slide is visible.

Personal & Education unsolved 🎥1 · 25.0m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

lecturer-visual-term-ref

Resolve a lecturer's deictic references to specific terms in on-screen equations

Personal & Education hard 🎥1 · 4.0m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.50Pro·MM.50GPT5.2·Codex.00multimodal-unlocked

lexical-stress-classification

Per-word lexical-stress correctness classification across 3 L2 English read-aloud recordings; suprasegmental rather than segmental

Performance & Coaching frontier 🔊3 · 19s
🔊 audio native A
Pro·T2.69Pro·KIRA.69Pro·MM.50GPT5.2·Codex.00unresolved-by-Pro

line-failure-annotation

Flag which line indices of a single monologue take diverged from the director's per-line emotion brief

Performance & Coaching hard 🔊1 📝2 · 33s
🔊 audio📝 text native A
Pro·T2.00Pro·KIRA.67Pro·MM.75GPT5.2·Codex.00audio-unlocked

lipsync-drift-correction

Lip-sync drift correction on single-talker clip — measure audio→video offset via joint mouth motion + audio onset analysis, then re-mux with the inverse offset

Media Production moderate 🎥1 o1 · 30s
🎥 video🔊 audio native A+V
Pro·T21.00Pro·KIRA1.00Pro·MM1.00GPT5.2·Codex.00text-tool-solvable

long-form-clip-miner

Mine the strongest short-form clip candidates from a ~49-minute developer lecture, requiring hooks that combine the spoken concept with the distinctive on-screen terminal tokens.

Media Production unsolved 🎥1 · 48.9m
🔊 audio🎥 video native I+A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

mock-call-automation

Write a doorbell-cam automation script that detects when audible footsteps coincide with a visible person entering the camera frame within 2 seconds. Tests the agent's ability to plan a joint audio-visual detection pipeline and write a working, generalising script — not solvable by perceiving sample clips and hardcoding answers, because scoring runs on previously-unseen test clips.

Operations & Research hard 🎥13 📝1 · 2.2m
🔊 audio🎥 video native I+A+V
Pro·T2.80Pro·KIRA.80Pro·MM1.00GPT5.2·Codex.50text-tool-solvable

multi-mic-bleed-attribution

Identify cross-mic bleed events on a 4-lavalier panel recording, naming source speaker via diagram

Media Production hard 🔊4 🖼1 · 4.7m
🔊 audio🖼 image native I+A
Pro·T2.20Pro·KIRA.50Pro·MM.92GPT5.2·Codex.00audio-unlocked

multi-utterance-pronunciation-errors

Detect and characterise per-phone pronunciation errors across 3 L2 English read-aloud recordings; ARPABET phone substitutions / deletions

Performance & Coaching unsolved 🔊3 · 19s
🔊 audio native A
Pro·T2.00Pro·KIRA.00Pro·MM.26GPT5.2·Codex.00unresolved-by-Pro

multicam-active-speaker-cut

Multicam active-speaker cut — given 3 ISO angles + a boom mix, identify per second which camera frames the active speaker and emit a cut list + cut video

Media Production unsolved 🎥3 🔊1 · 2.0m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.83GPT5.2·Codex.00unresolved-by-Pro

musical-mood-shot-pick

Pick which of 5 candidate silent video shots best matches a reference music cue's pacing and mood.

Media Production moderate 🎥5 🔊1 · 3.5m
🔊 audio🎥 video native I+A+V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex1.00audio-unlocked

narration-drift-qc

Find the interval where documentary narration stops matching on-screen footage

Media Production moderate 🎥1 · 3.0m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.99GPT5.2·Codex1.00audio-unlocked

narration-mars-rover

Find on-screen captions in a 3:09 NASA Mars-rover panorama narrated video that disagree with the spoken narration. Joint-AV required: must compare visual caption text against audio narration to identify semantic mismatches.

Media Production frontier 🎥1 · 3.2m
🔊 audio🎥 video native I+A+V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex.00multimodal-unlocked

narration-music-ducking

Mix narration over background music with proper ducking (audio production task)

Media Production frontier 🔊2 · 1.0m
🔊 audio native A
Pro·T2.25Pro·KIRA.25Pro·MM.50GPT5.2·Codex.50multimodal-unlocked

narration-visual-align

Find on-screen captions in a 4:14 NASA Juno narrated video that disagree with the spoken narration. Joint-AV required: must compare visual caption text against audio narration to identify semantic mismatches.

Media Production hard 🎥1 · 4.2m
🔊 audio🎥 video native I+A+V
Pro·T2.00Pro·KIRA.00Pro·MM.60GPT5.2·Codex.00audio-unlocked

near-duplicate-frame-dedup

Cluster 20 lecture-slide screenshots into one canonical frame per distinct slide state

Personal & Education easy 🖼11 · 0s
🖼 image native I
Pro·T21.00Pro·KIRA1.00Pro·MM1.00GPT5.2·Codex1.00text-tool-solvable

ornament-classification-detection

Read a baroque-style keyboard score with explicit ornament symbols (trill tr, mordent squiggle, turn ~), listen to a recording, and produce a feedback.json listing each note where the ornament played differs from the ornament notated on the score.

Performance & Coaching unsolved 🔊1 🖼1 · 19s
🖼 image🔊 audio native I+A
Pro·T2.00Pro·KIRA.40Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

page-photo-to-text

Transcribe a marked-up scan of a printed memo, applying the handwritten edits

Enterprise & Compliance moderate p1 · 0s
🖼 image📄 document native I
Pro·T2.23Pro·KIRA.85Pro·MM.00GPT5.2·Codex1.00image-unlocked

partial-srt-resync

Resynchronise a drifted SRT after a mid-video cut; produce a corrected WebVTT

Media Production unsolved 🎥1 📝1 · 5.3m
🔊 audio🎥 video
Pro·T2.00Pro·KIRA.70Pro·MM.52GPT5.2·Codex.52unresolved-by-Pro

phone-level-pronunciation-errors

Detect and characterise per-phone pronunciation errors in an L2 English learner's read-aloud recording (ARPABET phone substitutions / deletions, native-rater gold)

Performance & Coaching unsolved 🔊1 · 8s
🔊 audio native A
Pro·T2.00Pro·KIRA.00Pro·MM.26GPT5.2·Codex.00unresolved-by-Pro

phoneme-confusion-patterns

Identify recurring phoneme-confusion patterns across 4 utterances from one L2 English speaker; aggregate-level pronunciation coaching workflow

Performance & Coaching unsolved 🔊9 · 1.1m
🔊 audio native A
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

piano-practice-feedback

Read a printed piano sheet-music image, listen to a practice recording, and flag wrong-pitch, missed-note, and timing-error mistakes as a structured feedback JSON.

Media Production frontier 🔊1 🖼1 · 20s
🖼 image🔊 audio native I+A
Pro·T2.00Pro·KIRA.00Pro·MM.62GPT5.2·Codex.00unresolved-by-Pro

podcast-episode-assembly

Retrofit a mid-roll sponsor spot at an editorially specified topic break in a published 47-minute podcast episode, apply crossfades, and master to broadcast-spec LUFS and true peak.

Media Production frontier 🔊2 · 48.0m
🔊 audio native A
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00multimodal-unlocked

polyphonic-piano-feedback

Read a polyphonic piano grand-staff sheet music image (treble + bass), listen to a practice recording with seeded mistakes in both hands, and flag per-hand wrong-pitch / missed / timing-error mistakes as structured feedback JSON.

Performance & Coaching unsolved 🔊1 🖼1 · 1.2m
🖼 image🔊 audio native I+A
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

polyrhythm-accuracy-detection

Read a piano score notated with explicit 3:2 polyrhythm (triplet brackets in the right hand over steady quarters in the left), listen to a recording, and produce a feedback.json listing each bar where the right-hand polyrhythm was played sloppily (flattened to a duple rhythm instead of the notated 3-against-2 triplet figure).

Performance & Coaching unsolved 🔊1 🖼1 · 1.2m
🖼 image🔊 audio native I+A
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

pronunciation-error-flagging

Flag nonstandard mispronunciations in a learner's read-aloud recording against a script + closed-set error labels

Performance & Coaching frontier 🔊1 · 19s
🔊 audio native A
Pro·T2.40Pro·KIRA.00Pro·MM.38GPT5.2·Codex.00multimodal-unlocked

proof-step-note

Write a focused study note for step 3 of a 4-step Pythagorean proof. Step→lemma binding is audio-only; lemma slides are visually labelled.

Personal & Education easy 🎥1 · 5.0m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex1.00audio-unlocked

prosody-multi-dim-selection

Pick the voice-over take matching the director's 3-D brief (actor gender × emotion × intensity) across 18 same-text takes

Performance & Coaching frontier 🔊18 📝2 · 40s
🔊 audio📝 text native A
Pro·T21.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00text-tool-solvable

prosody-take-selection

Pick the voice-over take whose prosody matches the director's brief among 6 same-text takes

Performance & Coaching moderate 🔊6 📝2 · 11s
🔊 audio📝 text native A
Pro·T2.00Pro·KIRA1.00Pro·MM1.00GPT5.2·Codex.00image-unlocked

question-statement-intonation

Classify each of 6 short audio clips as a question or statement based on intonation alone (no transcript provided)

Performance & Coaching hard 🔊6 · 23s
🔊 audio native A
Pro·T2.67Pro·KIRA.50Pro·MM.83GPT5.2·Codex.50multimodal-unlocked

quote-clip-retrieval

Find a moment in a 150-second academic lecture excerpt where BOTH a verbal cue is spoken AND a specific visual condition holds, then export that segment as a 3-30s mp4. Joint-AV required — verifier checks both audio (whisper substring) and visual (SSIM vs reference frame) coincidence.

Personal & Education frontier 🎥1 · 2.5m
🔊 audio🎥 video native I+A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

receipt-photo-to-json

Extract vendor, date, total, and currency from a folder of real printed receipts

Enterprise & Compliance moderate 🖼6 · 0s
🖼 image📄 document native I
Pro·T2.00Pro·KIRA1.00Pro·MM1.00GPT5.2·Codex1.00image-unlocked

robotics-demo-command-audit

Audit 12 tabletop robot demos for command-vs-action mismatches across object identity, destination, and ignored corrections

Operations & Research unsolved 🎥16 · 2.7m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.69GPT5.2·Codex.00unresolved-by-Pro

safe-single-cue-keep

Decide which 20-second segments of a 120-second 6-segment Acme CRM tutorial must be removed (cross-modal PII exposure) vs kept (single-cue / mismatch / benign). Same CRM reveal-on-click + 2.2s auto-redact mechanic as T015 but at segment-level EDL granularity.

Enterprise & Compliance frontier 🎥1 📝2 · 1.8m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.50GPT5.2·Codex.83unresolved-by-Pro

screenshare-deictic-grounding

Ground multiple spoken deictic decisions to specific cards on a Kanban screen-share recording

Enterprise & Compliance frontier 🎥1 · 1.6m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.75GPT5.2·Codex.69multimodal-unlocked

semantic-chaptering

Chapter a 23-minute academic lecture into its major sections by identifying the start timestamp of each chapter. Tests precise boundary detection in long-form content using both visual slide cues and verbal section transitions.

Personal & Education unsolved 🎥1 · 23.7m
🔊 audio🎥 video native I+A+V
Pro·T2.13Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

semantic-image-retrieval

Rank the top 3 matches for each natural-language query against a controlled 50-image gallery

Operations & Research hard 🖼50 · 0s
🖼 image
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex1.00multimodal-unlocked

signal-based-qc-report

Signal-based playback-defect QC on a delivered video

Media Production hard 🎥1 · 5.0m
🎥 video🔊 audio
Pro·T21.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex1.00text-tool-solvable

slack-action-extraction

Compliance auditor classifies 6 messages in a Slack channel by both auditor-voice and visible message-state. 6-row CSV output.

Enterprise & Compliance hard 🎥1 📝1 · 1.1m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex.00audio-unlocked

speaker-action-attribution

3-person Zoom call with screen-share. Sarah hosts; Mike, Priya, and Sarah herself issue 6 verbal instructions/observations about a CRM dashboard. Sarah executes screen actions in response; some match the speaker's instruction, some don't. Agent emits 6-row CSV: (action_idx, speaker_name, instructed_action, executed_action, match).

Enterprise & Compliance moderate 🎥1 📝1 · 1.2m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex.00audio-unlocked

speaker-roster-identification

Identify which rostered speakers (from 20 voice exemplars) are present in a mixed audio call

Enterprise & Compliance unsolved 🔊41 · 5.0m
🔊 audio native A
Pro·T2.00Pro·KIRA.75Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

speedrun-input-tamper-detect

Game-engine audio-trigger QA: review 5 paired playtest captures (visual frame log + audio event log on the same engine t=0 clock) of a Godot platformer build, find frames where the in-game SFX (jump / attack / coin pickup / hit) is desynced from the visible player action — either audio_only (orphan SFX, audio system fired without trigger) or visual_only (orphan animation, audio system failed to fire). Joint-AV required at sub-second precision: visual + audio are split into two engine-trace files specifically because the QA workflow exposes the engine's separate audio-event and frame-event logs.

Media Production unsolved 🎥5 🔊5 · 8.4m
🔊 audio🎥 video native I+A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

spoken-decision-cell-ref

Log spoken decisions in a quarterly ops review against the spreadsheet cell each decision refers to

Enterprise & Compliance hard 🎥1 · 1.6m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex.00multimodal-unlocked

spoken-vs-displayed-claim

Find on-screen captions in a 5:00 TEDx lecture that disagree with the spoken audio. Joint-AV required: must compare visual caption text against audio speech to identify semantic mismatches.

Media Production unsolved 🎥1 · 5.0m
🔊 audio🎥 video native I+A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

sports-broadcast-events

Log the official events (fouls, baskets) from short basketball broadcast clips. Joint-AV required: each official event needs concurring audio (whistle / ball-net) AND visual (ref-signal graphic / scoreboard change) cues; single-channel cues alone (crowd whistles, replay-graphic score blips) are decoys.

Media Production frontier 🎥5 · 2.1m
🔊 audio🎥 video native I+A+V
Pro·T2.00Pro·KIRA.00Pro·MM.47GPT5.2·Codex.00unresolved-by-Pro

stereo-channel-flip-repair

Identify which clips in a 4-clip stereo video batch have their L/R audio channels wired backwards relative to visible source motion, repair only those, and write a batch QC report

Media Production hard 🎥4 · 36s
🎥 video🔊 audio native A+V
Pro·T21.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex1.00text-tool-solvable

stream-alert-ack-audit

Twitch-style live-stream session: streamer plays a game with continuous voice commentary while 8 transient alert overlays slide in (follows, donations, subs, raid, gift). Some are verbally acknowledged (named or unnamed); some are silently ignored. Agent emits 8-row CSV: (alert_idx, alert_type, sender_name, acknowledged).

Media Production frontier 🎥1 📝1 · 2.5m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex1.00unresolved-by-Pro

string-quartet-mistake-attribution

Read a four-staff string quartet score, listen to a rehearsal recording with mistakes seeded in different voices, and produce per-part wrong-pitch / missed / timing-error feedback with the mistake attributed to the correct player (violin_1, violin_2, viola, cello).

Performance & Coaching unsolved 🔊1 🖼1 · 49s
🖼 image🔊 audio native I+A
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

take-tone-reaction-pick

Pick the audition take whose voice tone AND facial affect both convey 'angry but restrained' (joint-AV affect congruence)

Performance & Coaching frontier 🎥8 · 48s
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

tempo-drift-detection

Read a piano score with rehearsal letters A-H marking 8 sections, listen to a recording mixed with a steady metronome click track, and produce a feedback.json listing every section where the pianist's tempo drifts away from the click (rushing or dragging by more than ±3 BPM). Ignore intentional tempo changes written in the score.

Performance & Coaching frontier 🔊1 🖼1 · 1.2m
🖼 image🔊 audio native I+A
Pro·T2.00Pro·KIRA.80Pro·MM.00GPT5.2·Codex.00image-unlocked

traffic-cam-incident-audit

Audit dispatch radio calls over traffic-cam captures of an intersection. For each clip, flag dispatches that don't match the visible event (false calls, wrong vehicle attribution, wrong action, or late). Joint-AV required: each dispatch line names a vehicle, action, and direction; the agent must hold a unified A+V picture across the clip — light state, vehicle identities, and the dispatch claim — to decide whether the call is correct.

Operations & Research unsolved 🎥5 🔊5 · 7.4m
🔊 audio🎥 video native I+A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

travel-clip-retrieval

Find and select the specific travel video clip that matches a described scene from a folder of mixed clips

Personal & Education moderate 🎥10 · 4.8m
🎥 video native V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex.00audio-unlocked

tutorial-edit-recreation

Reproduce a Kdenlive screencast tutorial's edits on the same source media — trim, lower-third title, ducking, 9:16 export.

Media Production hard 🎥2 🔊1 📝1 · 3.1m
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex.00multimodal-unlocked

vfr-drift-repair

Measure the progressive audio-video drift in a 60s recording where post-production introduced a smooth A/V offset that worsens over time. Requires joint audio-visual reasoning to compare lip-motion timing against heard syllable timing at multiple points along the timeline.

Media Production unsolved 🎥1 · 1.0m
🔊 audio🎥 video native I+A+V
Pro·T2.00Pro·KIRA.00Pro·MM.00GPT5.2·Codex.00unresolved-by-Pro

violin-intonation-detection

Read a solo violin score, listen to a recording where some notes are played with intonation errors (played sharp or flat by more than ±10 cents), and produce a feedback.json listing each out-of-tune note with a signed cents-error magnitude.

Performance & Coaching unsolved 🔊1 🖼1 · 37s
🖼 image🔊 audio native I+A
Pro·T2.00Pro·KIRA.00Pro·MM.02GPT5.2·Codex.00unresolved-by-Pro

warehouse-sku-pack-audit

Cooking-instructor multi-voice attribution audit — joint-AV: which of two simultaneously-dubbed voices (chef vs director) the visible cook followed (REDESIGNED v2 2026-04-27)

Operations & Research hard 🎥1 · 40s
🎥 video🔊 audio native A+V
Pro·T2.00Pro·KIRA.00Pro·MM1.00GPT5.2·Codex1.00multimodal-unlocked