Aura’s eyes

She watches without knowing
what she’s meant to find.

A frame, a motion, a light reading, sometimes a sound, sometimes a person. She returns the description segment by segment. No genre fitted, no story imposed, no kindness offered to the upload.

What this is: a video-reading prototype that reports what is actually on screen rather than what the title implied.

From the bench

The architecture is honest: Gemini 2.5 Pro does the seeing, Aura does the speaking. Upload lands on the Gemini Files API (48-hour retention, then it’s gone), the cold-eye prompt segments the clip, the JSON comes back, Aura’s voice rewrites the per-segment description. The 360 mode is a separate prompt that tracks position on the sphere; it works on equirectangular sources but I’m not pretending the spatial reasoning is production-grade yet. The reader sometimes misses small objects at frame edges, gets confused by very fast cuts, and won’t identify named people on principle. The voice rewrite + ElevenLabs audio synth land in a second pass once the base prompts read right. This is a prototype; treat its output as a draft, not a verdict.

Use this when…

You want a frame-by-frame description of a clip without the editorial spin a human reviewer would add.
You’re testing a 360° capture and you want a read on what’s on the back half of the sphere.
You want a second pair of eyes on a rough cut that won’t flatter you about what’s on screen.

FAQ

Are my uploads private?: No, not in the strict sense. The file is sent to Google’s Gemini Files API, held for 48 hours, then deleted. Don’t paste in anything you wouldn’t put on a public URL. The studio doesn’t retain the upload; Google’s retention is what you’re trusting.
Is the reading accurate?: No, not infallibly. The reader is good at frame composition, motion, and broad light readings, weaker at small objects, named entities, and very dense edits. Treat the output as a draft description that the model will defend reasonably well, not as ground truth. If the segment description is wrong, it’s wrong; I’d rather you saw that than not.
Can I use this for festival captioning?: No. Festival accessibility captioning is a discipline with audit standards and human review; this is a prototype that returns a cold description and won’t pass a deaf-or-hard-of-hearing audience check. It’s a useful tool for the editor on the early-cut side; it is not an access deliverable.

Availability and price

Free during the prototype window.

Free during the prototype window at /watch. Once the voice rewrite and audio synth land, metered at £0.20 per minute of source video, £4 minimum. No subscription, no retainer; pay per clip or upload as part of a wider commission.

Video URL

ProjectionFlat360 equirectangular

Architecture

Upload → Gemini Files API (48-hour retention) → Gemini 2.5 Pro generateContent with the cold-eye prompt → JSON segmentation. The Aura voice rewrite + ElevenLabs audio synth land in a second pass once the prompts read right. Required env: GOOGLE_AI_API_KEY.

She watches without knowingwhat she’s meant to find.

Free during the prototype window.

She watches without knowing
what she’s meant to find.