What is the Gemini Flash Live pricing?

Gemini 3.1 Flash Live pricing follows Google's pay-as-you-go model. The Gemini API free tier provides generous rate limits at no cost. For production workloads, pricing is based on audio duration and token usage. The Gemini Flash pricing tier is significantly cheaper than Gemini Pro.

How do I get a Gemini API Key?

Visit Google AI Studio (aistudio.google.com/apikey), sign in with your Google account, and create a free Gemini API key. The free tier includes a generous allowance for the Gemini Flash API.

What are Gemini Flash Lite and Flash Express?

Gemini Flash Lite (also called Gemini Flash 3.1 Lite) is a smaller, cheaper variant of the Gemini Flash model for lightweight tasks. Flash Express is a high-throughput variant optimized for batch processing. For real-time voice, you need the full Gemini 3.1 Flash Live model.

Gemini Flash Live — Gemini 3.1 Flash Real-Time Voice AI

Core Features

Why Choose the Google Gemini Flash Live API?

Google Gemini's latest real-time multimodal Gemini AI model, purpose-built for low-latency streaming interactions via the Gemini Flash API.

🎙️

Gemini Live Real-Time Voice Chat

The Gemini Flash Live API uses bidirectional WebSocket streaming to deliver a phone-call-like experience. No need to wait for full recordings — speak and process simultaneously.

⚡

Ultra-Low Latency with Gemini 3.1 Flash

The 3.1 Flash model delivers first-byte response under 500ms. Streamed audio transmission and playback make Gemini AI replies feel virtually instantaneous.

📝

Live Transcription Built In

Both input and output audio are transcribed in real time through the Gemini API, giving you full visibility of the conversation for easy review and record-keeping.

🌍

Multilingual Gemini Model

The Gemini model automatically detects spoken language and responds in kind. Seamlessly switch between 40+ languages including English, Chinese, Japanese, and more.

🎨

Multiple Voices

Choose from prebuilt voices like Kore, Charon, and Puck on the Google Gemini Flash platform to give your AI assistant a unique personality that fits your brand.

🔧

Pure Frontend — No Backend Needed

Connect directly to the Google API from the browser. No server required. Run the entire Gemini Flash API integration with just three files and zero dependencies.

How It Works

How the Gemini Flash API Handles Real-Time Voice

From establishing a Google Gemini API connection to playing back audio — the entire Gemini Live flow is clean and simple.

Connect to the Gemini API

Open a WebSocket to the Google Gemini Flash endpoint and send your Gemini model configuration — voice, language, and system instructions.

Capture Microphone Audio

Use AudioWorklet to capture real-time PCM audio from the mic, resample to 16kHz, and stream it to the Gemini Flash Live API.

Gemini 3.1 Flash Streams Back

The Gemini 3.1 Flash model processes your audio stream in real time — listening, understanding, and generating voice replies on the fly.

Playback & Transcribe

The browser decodes and plays audio chunks while displaying real-time input/output transcriptions powered by Gemini AI.

Developers

Integrate the Gemini Flash API in 3 Files

Zero dependencies, zero build step. All you need is a Gemini API key and an HTTP server. Deploy to Vercel, Netlify, or any static host.

                
                
                
              

              app.js — Core Connection Code
            
// 1. Establish WebSocket connection
const ws = new WebSocket(
  `wss://generativelanguage.googleapis.com/ws/...?key=${apiKey}`
);

// 2. Send model configuration
ws.send(JSON.stringify({
  config: {
    model: 'models/gemini-3.1-flash-live-preview',
    responseModalities: ['AUDIO'],
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: { voiceName: 'Kore' }
      }
    },
    inputAudioTranscription: {},
    outputAudioTranscription: {},
  }
}));

// 3. Send real-time audio stream
ws.send(JSON.stringify({
  realtimeInput: {
    audio: {
      data: base64PCMData,
      mimeType: 'audio/pcm;rate=16000'
    }
  }
}));

Gemini Flash Live API — Technical Specs

✓ Protocol: WebSocket (wss://) bidirectional streaming via the Google API — single connection handles all interaction
✓ Input Audio: PCM 16-bit, 16kHz mono, Base64 encoded
✓ Output Audio: PCM 16-bit, 24kHz mono, streamed in chunks
✓ Audio Capture: AudioWorklet API with echo cancellation and noise suppression
✓ Transcription: Bidirectional real-time transcription — no separate STT/TTS needed
✓ Deployment: Requires HTTPS or localhost — works on Vercel, Cloudflare Pages, or any static host
✓ Browser Support: Chrome 66+, Edge 79+, Safari 14.1+, Firefox 76+

Comparison

Gemini Flash Live vs. Traditional Voice AI

Gemini Flash Live natively supports streaming voice, eliminating the multi-step STT → LLM → TTS pipeline entirely.

Feature	Traditional (STT → LLM → TTS)	Gemini Flash Live
Latency	2-5 seconds (3 serial API calls)	< 500ms end-to-end
Architecture	3 separate APIs + backend orchestration	Single WebSocket connection
Deployment	Requires backend server	Pure frontend, zero servers
Interruption	Difficult to support real-time barge-in	Native support, smooth & natural
Context	STT → LLM loses tone & nuance	End-to-end understanding preserves emotion
Code Size	Hundreds of lines + multiple dependencies	~400 lines, zero dependencies

Gemini Model Family & Alternatives

Wondering how Gemini 3.1 Flash Live stacks up against other Gemini models and competing platforms? Here's a quick overview.

Google Gemini Models

Gemini 3.1 Flash is the latest generation optimized for speed and cost. It succeeds Gemini 2.5 Flash and Gemini 2.0 Flash, offering significantly lower latency. For lighter workloads, Gemini Flash Lite (also known as Gemini Flash 3.1 Lite) provides an even more cost-effective option. For maximum capability, Gemini Pro remains the flagship reasoning model. Additionally, Flash Express offers a high-throughput variant for batch and non-interactive use cases.

Gemini Flash Live vs. Gemini 2.5 Flash

The Gemini 2.5 Flash and Gemini 2.5 Flash API are excellent for text and Gemini Flash Image generation tasks, but they don't support real-time voice streaming. Gemini 3 Flash Live (the Gemini 3.1 Flash Live variant) is purpose-built for bidirectional audio — making it the right choice for voice-first applications. If you're using the Google Gemini Flash 2.5 API today, upgrading to Gemini AI 2.5 Flash's live successor gives you real-time voice for free.

Competing Platforms

Other real-time AI options include MiniMax M2.7 for multilingual voice synthesis and OpenAI's Realtime API. For developers who want to route across multiple providers, LiteLLM offers a unified proxy that supports the Gemini API alongside OpenAI, Anthropic, and others — useful for A/B testing latency and cost. All of these can be deployed to Vercel or similar edge platforms.

Live Demo

Try the Gemini 3.1 Flash Live Demo

Experience Gemini Live real-time voice conversation in your browser

Enter your Gemini API key and click "Start Chat" to talk to Gemini AI in real time. Your key is only used client-side and is never sent to any third-party server.

✦

Gemini Flash Live

Real-Time Voice Chat

Disconnected

🎙️

Enter your API Key and click "Start Chat"

Once connected, speak directly or type a message below. Requires HTTPS or localhost access.

Common Questions

FAQ

Gemini Flash Live is Google Gemini's real-time multimodal AI model purpose-built for low-latency, bidirectional streaming interactions. It receives live audio over WebSocket, understands on the fly, and replies with streamed voice — delivering a natural, phone-call-like conversation experience. The current Gemini model identifier is gemini-3.1-flash-live-preview.

Gemini 3.1 Flash Live pricing follows Google's pay-as-you-go model. The Gemini API free tier provides generous rate limits at no cost — perfect for prototyping and personal projects. For production workloads, Gemini pricing is based on audio duration and token usage. The Gemini Flash pricing tier is significantly cheaper than Gemini Pro, making it ideal for real-time voice applications. Check Google's pricing page for current rates.

Visit Google AI Studio, sign in with your Google account, and create a free Google Gemini API key. The free tier includes a generous allowance for the Gemini Flash API; upgrade to a paid plan for higher quotas.

Yes. This demo is a pure frontend application — your Google API key is sent directly from the browser to Google's servers via WebSocket. It never passes through any third-party backend. We do not store, transmit, or log your key.

Gemini 3.1 Flash is the latest generation, succeeding Gemini 2.5 Flash and Gemini 2.0 Flash. Key improvements include native real-time voice (Gemini Live), lower latency, and better multilingual performance. The Gemini 2.5 Flash API and Google Gemini Flash 2.5 remain excellent for text and Gemini Flash Image generation, but only the Gemini 3.1 family supports live bidirectional audio streaming.

Gemini Flash Lite (also called Gemini Flash 3.1 Lite) is a smaller, cheaper variant of the Gemini Flash model — great for lightweight tasks where cost is a priority. Flash Express is a high-throughput variant optimized for batch processing and non-interactive workloads. For real-time voice, you need the full Gemini 3.1 Flash Live model.

Gemini Flash Live offers a more generous free tier and broader language support than OpenAI's Realtime API, with comparable latency. MiniMax M2.7 is strong in multilingual voice synthesis but lacks Gemini's multimodal reasoning. For developers routing across providers, LiteLLM supports the Gemini API alongside OpenAI and Anthropic via a unified proxy.

All modern browsers: Chrome 66+, Edge 79+, Safari 14.1+, Firefox 76+. Requires AudioWorklet API and WebSocket support. The page must be served over HTTPS or localhost (file:// is not supported).

Absolutely. Use the systemInstruction config to define the Gemini AI's role, language style, and response rules. You can also choose different prebuilt voices (Kore, Charon, Puck, etc.) and configure response modalities (audio, text, or both).

Simply copy the three project files and serve them over HTTP — deploy to Vercel, Netlify, or any static host. The core Gemini Flash API integration is under 400 lines with zero external dependencies. For full API reference, see the official Google docs.

Talk to AI in
Real Time

The Gemini Flash Live API — Google's Real-Time Voice AI

<500ms

Duplex

40+

Zero Backend

Why Choose the Google Gemini Flash Live API?

Gemini Live Real-Time Voice Chat

Ultra-Low Latency with Gemini 3.1 Flash

Live Transcription Built In

Multilingual Gemini Model

Multiple Voices

Pure Frontend — No Backend Needed

How the Gemini Flash API Handles Real-Time Voice

Connect to the Gemini API

Capture Microphone Audio

Gemini 3.1 Flash Streams Back

Playback & Transcribe

Integrate the Gemini Flash API in 3 Files

Gemini Flash Live API — Technical Specs

Gemini Flash Live vs. Traditional Voice AI

Gemini Model Family & Alternatives

Google Gemini Models

Gemini Flash Live vs. Gemini 2.5 Flash

Competing Platforms

Try the Gemini 3.1 Flash Live Demo

Experience Gemini Live real-time voice conversation in your browser

FAQ

Ready to Build with Gemini Flash Live?

Talk to AI in Real Time

The Gemini Flash Live API — Google's Real-Time Voice AI

<500ms

Duplex

40+

Zero Backend

Why Choose the Google Gemini Flash Live API?

Gemini Live Real-Time Voice Chat

Ultra-Low Latency with Gemini 3.1 Flash

Live Transcription Built In

Multilingual Gemini Model

Multiple Voices

Pure Frontend — No Backend Needed

How the Gemini Flash API Handles Real-Time Voice

Connect to the Gemini API

Capture Microphone Audio

Gemini 3.1 Flash Streams Back

Playback & Transcribe

Integrate the Gemini Flash API in 3 Files

Gemini Flash Live API — Technical Specs

Gemini Flash Live vs. Traditional Voice AI

Gemini Model Family & Alternatives

Google Gemini Models

Gemini Flash Live vs. Gemini 2.5 Flash

Competing Platforms

Try the Gemini 3.1 Flash Live Demo

Experience Gemini Live real-time voice conversation in your browser

FAQ

Ready to Build with Gemini Flash Live?

Talk to AI in
Real Time