MISE: Designing a Voice-First Kitchen Agent

Turning “What can I cook?” into a fluid, multimodal conversation

Project: MISE
Context: Google Gemini Live Agent Challenge
Technology: Gemini Live 2.5 Flash native audio, Google Cloud, Vertex AI, Cloud Run and Firebase

Role

Role

Role: Product concept, Human - AI Experience design and conversational architecture

Challenge

MISE is a voice-first, vision-enabled kitchen companion designed for the moment when someone has ingredients but no clear idea what to cook.

Instead of asking people to search recipes, complete forms or move between apps, MISE lets them show the agent what is available and talk naturally about what matters—such as cost, health or both. The agent identifies ingredients, suggests suitable meals, highlights anything missing and can connect those items to a shopping basket.

The project explored a broader Human-AI Experience question:

How can an AI agent reduce decision fatigue without taking control away from the person using it?

The Problem

Most recipe services begin after someone has decided what they want to make. Retail services begin after they know what they need to buy.

Neither adequately supports the uncertain moment before those decisions: standing in front of the fridge at the end of the day, with limited attention, competing priorities and no plan.

This creates several connected problems:

  • Ingredients already at home can go unused.

  • Budget or health intentions can be displaced by convenience.

  • Search creates more cognitive work when the user is already tired.

  • Existing voice-commerce experiences often execute commands without helping form the underlying intention.

MISE was designed to intervene at that decision point.


My Contribution

The trailer was built through a structured AI production process, using CapCut Video Studio as the core creative workspace.

1. Story Development

The concept began with one line:

“My mother said that side of the family only leaves you two things. Money. Or trouble.”

That line shaped the emotional centre of the trailer: family history, suspicion, legacy, and a woman stepping into a story that started long before her.

From there, I built the trailer around a confrontation between two characters in a surreal desert world. The aim was not to explain the full story, but to make the audience feel there was a larger mystery behind it.

2. Visual Direction

The visual language was designed to feel cinematic, surreal, and slightly dangerous.

The desert created scale and isolation. The pink horses added strangeness and myth. The vintage cars suggested family history, money, arrival, and escape.

Every visual choice had to serve the same emotional idea: inheritance as something beautiful, unresolved, and potentially dangerous.

3. Storyboarding and Scene Generation

Using CapCut Video Studio’s canvas-based workflow, I planned the trailer as a sequence of emotional beats rather than isolated AI shots.

Scenes were generated with Dreamina Seedance 2.0, with prompts focused on camera movement, atmosphere, character presence, and continuity.

The challenge was keeping the world coherent across shots. Each scene needed to feel like it belonged to the same story, not a separate visual experiment.

4. Edit and Trailer Structure

The final edit was shaped around tension, withholding, and momentum.

The dialogue created the emotional thread, while the visuals expanded the world around it. The trailer was designed to reveal just enough to make the audience curious without over-explaining the story.

The edit turned the generated scenes into a cinematic sequence with rhythm, pressure, and narrative shape.


Creative Insights

A strong AI trailer needs a question at its centre. For Inheritance, the question was simple: what did she really inherit?

The tool does not create the story for you. CapCut Video Studio can give you the workspace and Seedance can generate the scenes, but the creative direction has to come from the filmmaker.

Visual consistency starts with story consistency. The desert, horses, cars and characters worked because they all belonged to the same emotional idea: family legacy as something strange, beautiful and dangerous.

Surreal details need emotional logic. Pink horses are strange. Family inheritance is understandable. Put them together and the audience has something to hold onto.

AI video still needs direction. Camera movement, pacing, shot order and silence all matter. The tool can generate motion, but it cannot decide what a scene should mean.

A trailer is not a summary. It is an invitation. The job is to make people want the rest of the story.

Inheritance tested whether a new AI production workspace could support story, mood, pacing and cinematic world-building inside a single creative workflow.

The result is an original AI trailer that uses generative tools not as a shortcut, but as a production system guided by a clear creative point of view.

Get in touch.

hello@dalveenmahal.co.uk

Dalveen Mahal © 2026.

Instagram

Get in touch.

hello@dalveenmahal.co.uk

Dalveen Mahal © 2026.

Instagram