Built on the Gemini 3.1 Flash model, Gemini Live delivers ultra-low latency bidirectional voice streaming via the Google Gemini API. No waiting — converse with Gemini AI as naturally as a phone call. Multilingual, real-time transcription, pure frontend.
First-byte response latency
Bidirectional WebSocket streaming
Languages supported
Pure frontend integration
Google Gemini's latest real-time multimodal Gemini AI model, purpose-built for low-latency streaming interactions via the Gemini Flash API.
The Gemini Flash Live API uses bidirectional WebSocket streaming to deliver a phone-call-like experience. No need to wait for full recordings — speak and process simultaneously.
The 3.1 Flash model delivers first-byte response under 500ms. Streamed audio transmission and playback make Gemini AI replies feel virtually instantaneous.
Both input and output audio are transcribed in real time through the Gemini API, giving you full visibility of the conversation for easy review and record-keeping.
The Gemini model automatically detects spoken language and responds in kind. Seamlessly switch between 40+ languages including English, Chinese, Japanese, and more.
Choose from prebuilt voices like Kore, Charon, and Puck on the Google Gemini Flash platform to give your AI assistant a unique personality that fits your brand.
Connect directly to the Google API from the browser. No server required. Run the entire Gemini Flash API integration with just three files and zero dependencies.
From establishing a Google Gemini API connection to playing back audio — the entire Gemini Live flow is clean and simple.
Open a WebSocket to the Google Gemini Flash endpoint and send your Gemini model configuration — voice, language, and system instructions.
Use AudioWorklet to capture real-time PCM audio from the mic, resample to 16kHz, and stream it to the Gemini Flash Live API.
The Gemini 3.1 Flash model processes your audio stream in real time — listening, understanding, and generating voice replies on the fly.
The browser decodes and plays audio chunks while displaying real-time input/output transcriptions powered by Gemini AI.
Zero dependencies, zero build step. All you need is a Gemini API key and an HTTP server. Deploy to Vercel, Netlify, or any static host.
Gemini Flash Live natively supports streaming voice, eliminating the multi-step STT → LLM → TTS pipeline entirely.
| Feature | Traditional (STT → LLM → TTS) | Gemini Flash Live |
|---|---|---|
| Latency | 2-5 seconds (3 serial API calls) | < 500ms end-to-end |
| Architecture | 3 separate APIs + backend orchestration | Single WebSocket connection |
| Deployment | Requires backend server | Pure frontend, zero servers |
| Interruption | Difficult to support real-time barge-in | Native support, smooth & natural |
| Context | STT → LLM loses tone & nuance | End-to-end understanding preserves emotion |
| Code Size | Hundreds of lines + multiple dependencies | ~400 lines, zero dependencies |
Wondering how Gemini 3.1 Flash Live stacks up against other Gemini models and competing platforms? Here's a quick overview.
Gemini 3.1 Flash is the latest generation optimized for speed and cost. It succeeds Gemini 2.5 Flash and Gemini 2.0 Flash, offering significantly lower latency. For lighter workloads, Gemini Flash Lite (also known as Gemini Flash 3.1 Lite) provides an even more cost-effective option. For maximum capability, Gemini Pro remains the flagship reasoning model. Additionally, Flash Express offers a high-throughput variant for batch and non-interactive use cases.
The Gemini 2.5 Flash and Gemini 2.5 Flash API are excellent for text and Gemini Flash Image generation tasks, but they don't support real-time voice streaming. Gemini 3 Flash Live (the Gemini 3.1 Flash Live variant) is purpose-built for bidirectional audio — making it the right choice for voice-first applications. If you're using the Google Gemini Flash 2.5 API today, upgrading to Gemini AI 2.5 Flash's live successor gives you real-time voice for free.
Other real-time AI options include MiniMax M2.7 for multilingual voice synthesis and OpenAI's Realtime API. For developers who want to route across multiple providers, LiteLLM offers a unified proxy that supports the Gemini API alongside OpenAI, Anthropic, and others — useful for A/B testing latency and cost. All of these can be deployed to Vercel or similar edge platforms.
Enter your Gemini API key and click "Start Chat" to talk to Gemini AI in real time. Your key is only used client-side and is never sent to any third-party server.
Enter your API Key and click "Start Chat"
Once connected, speak directly or type a message below. Requires HTTPS or localhost access.
Gemini Flash Live is Google Gemini's real-time multimodal AI model purpose-built for low-latency, bidirectional streaming interactions. It receives live audio over WebSocket, understands on the fly, and replies with streamed voice — delivering a natural, phone-call-like conversation experience. The current Gemini model identifier is gemini-3.1-flash-live-preview.
Gemini 3.1 Flash Live pricing follows Google's pay-as-you-go model. The Gemini API free tier provides generous rate limits at no cost — perfect for prototyping and personal projects. For production workloads, Gemini pricing is based on audio duration and token usage. The Gemini Flash pricing tier is significantly cheaper than Gemini Pro, making it ideal for real-time voice applications. Check Google's pricing page for current rates.
Visit Google AI Studio, sign in with your Google account, and create a free Google Gemini API key. The free tier includes a generous allowance for the Gemini Flash API; upgrade to a paid plan for higher quotas.
Yes. This demo is a pure frontend application — your Google API key is sent directly from the browser to Google's servers via WebSocket. It never passes through any third-party backend. We do not store, transmit, or log your key.
Gemini 3.1 Flash is the latest generation, succeeding Gemini 2.5 Flash and Gemini 2.0 Flash. Key improvements include native real-time voice (Gemini Live), lower latency, and better multilingual performance. The Gemini 2.5 Flash API and Google Gemini Flash 2.5 remain excellent for text and Gemini Flash Image generation, but only the Gemini 3.1 family supports live bidirectional audio streaming.
Gemini Flash Lite (also called Gemini Flash 3.1 Lite) is a smaller, cheaper variant of the Gemini Flash model — great for lightweight tasks where cost is a priority. Flash Express is a high-throughput variant optimized for batch processing and non-interactive workloads. For real-time voice, you need the full Gemini 3.1 Flash Live model.
Gemini Flash Live offers a more generous free tier and broader language support than OpenAI's Realtime API, with comparable latency. MiniMax M2.7 is strong in multilingual voice synthesis but lacks Gemini's multimodal reasoning. For developers routing across providers, LiteLLM supports the Gemini API alongside OpenAI and Anthropic via a unified proxy.
All modern browsers: Chrome 66+, Edge 79+, Safari 14.1+, Firefox 76+. Requires AudioWorklet API and WebSocket support. The page must be served over HTTPS or localhost (file:// is not supported).
Absolutely. Use the systemInstruction config to define the Gemini AI's
role, language style, and response rules. You can also choose different prebuilt voices
(Kore, Charon, Puck, etc.) and configure response modalities (audio, text, or both).
Simply copy the three project files and serve them over HTTP — deploy to Vercel, Netlify, or any static host. The core Gemini Flash API integration is under 400 lines with zero external dependencies. For full API reference, see the official Google docs.
Grab a free Gemini API key and build your own Google Gemini-powered real-time voice AI assistant in under 3 minutes.