Integrate Real-Time Audio-to-Audio Workflows with the Gemini 3.1 Live API

Google / DeepMind · AI Model Update · 2026-03-26 · notable

Briefing for: Engineering

What happened

Google released gemini-3.1-flash-live-preview, a native audio-to-audio (A2A) model designed specifically for real-time dialogue. This model bypasses traditional multi-step speech-to-text and text-to-speech pipelines, allowing for direct audio processing through the Live API.

Why it matters

This significantly reduces the latency in voice applications, enabling fluid, natural conversations that were previously difficult to achieve with cascaded models. Developers can now build voice-first apps that handle interruptions and emotional nuances better by processing audio end-to-end.

What this enables

If you build voice assistants, you can move away from high-latency STT-LLM-TTS pipelines to a single A2A model.
If you develop real-time communication tools, test the new Live API to implement low-latency conversational features.
If you use Gemini for audio processing, evaluate this model for applications requiring immediate verbal feedback.

Get personalized AI briefings for your role at Changecast →