google /

gemini-2.5-flash

105 DZD in 875 DZD out/ 1M tokens

Partner

Gemini 2.5 Flash is Google's latest thinking model, designed to tackle increasingly complex problems. It's capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. Gemini 2.5 Flash: best for balancing reasoning and speed.

Public1,000,000JSONFunctionMultimodalGemini

ArchitectureMultimodal

Context Window1,000,000

Model Information

Description

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It is the next iteration in the Gemini 2.0 series of models, representing a suite of highly-capable, natively multimodal, reasoning models.

Notably, Gemini 2.5 Flash is Google's first fully hybrid reasoning model, giving developers the ability to turn a model's "thinking" on or off via the thinkingConfig parameter. It is optimized for low latency and high-volume traffic, delivering the best balance of price, performance, and well-rounded capabilities.

Architecture

Gemini 2.5 models are sparse mixture-of-experts (MoE) transformer-based models with native multimodal support. Sparse MoE models activate a subset of model parameters per input token by learning to dynamically route tokens to a subset of parameters (experts); this allows them to decouple total model capacity from computation and serving cost per token. They were trained using Google's Tensor Processing Units (TPUs).

Context Window & Token Limits

Property	Limit
Maximum Input Tokens	1,048,576 (1M context window)
Maximum Output Tokens	65,535 (Default)

Native Multimodal Capabilities

Gemini 2.5 Flash is natively multimodal, trained concurrently on text, images, audio, video, and code. This allows for seamless cross-modal reasoning without intermediate conversion steps.

Documents & Text:

Process up to 3,000 files per prompt.
Maximum 1,000 pages per file.
Ideal for Large Document Analysis: Process a 1M token archive in one shot without retrieval pipelines or chunking errors.

Images (Vision):

Maximum images per prompt: 3,000.
Maximum file size (inline/direct upload): 7 MB.
Maximum file size (Google Cloud Storage): 30 MB.

Video:

Maximum video length (with audio): Approximately 45 minutes.
Maximum video length (without audio): Approximately 1 hour.
Maximum number of videos per prompt: 10.

Audio:

Maximum audio length per prompt: Approximately 8.4 hours (up to 1 million tokens).
Maximum number of audio files per prompt: 1.

Usage & API Integration

Load the model using the official Google Gen AI SDK. You can dynamically enable the reasoning (thinking) feature by specifying a thinking budget.

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

async function runHybridReasoning() {
  const response = await ai.models.generateContent({
    model: "gemini-2.5-flash",
    contents: "Analyze this complex architectural diagram and write a migration plan.",
    config: {
      // Enable thinking for complex tasks natively
      thinkingConfig: {
        thinkingBudget: 1024,
      },
      temperature: 0.4,
    }
  });

  console.log("Model Output:", response.text);
}

runHybridReasoning();

Performance Benchmarks

Gemini 2.5 Flash is heavily optimized for speed and large-scale deployment. Below are the average throughput and latency metrics based on independent evaluations:

Metric	Value	Context
Throughput	74 tokens / sec	Average on Google Vertex Global
First Token Latency	0.63 s	Average on Google Vertex Global
E2E Latency	1.70 s	Average on Google AI Studio
Structured Output Error Rate	1.17 %	Highly reliable JSON schema adherence

Model Library

gemini-2.5-flash

Model Information

Gemini 2.5 Flash Model Card

Model Information

Description

Architecture

Context Window & Token Limits

Native Multimodal Capabilities

Supported Features

Usage & API Integration

Performance Benchmarks