Model Library
Browse and deploy state-of-the-art AI models through the DevUp Gateway.
Browse and deploy state-of-the-art AI models through the DevUp Gateway.
Gemini 2.5 Flash is Google's latest thinking model, designed to tackle increasingly complex problems. It's capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. Gemini 2.5 Flash: best for balancing reasoning and speed.

Model Page: Gemini 2.5 Flash
Resources and Technical Documentation:
Authors: Google DeepMind
Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It is the next iteration in the Gemini 2.0 series of models, representing a suite of highly-capable, natively multimodal, reasoning models.
Notably, Gemini 2.5 Flash is Google's first fully hybrid reasoning model, giving developers the ability to turn a model's "thinking" on or off via the thinkingConfig parameter. It is optimized for low latency and high-volume traffic, delivering the best balance of price, performance, and well-rounded capabilities.
Gemini 2.5 models are sparse mixture-of-experts (MoE) transformer-based models with native multimodal support. Sparse MoE models activate a subset of model parameters per input token by learning to dynamically route tokens to a subset of parameters (experts); this allows them to decouple total model capacity from computation and serving cost per token. They were trained using Google's Tensor Processing Units (TPUs).
| Property | Limit |
|---|---|
| Maximum Input Tokens | 1,048,576 (1M context window) |
| Maximum Output Tokens | 65,535 (Default) |
Gemini 2.5 Flash is natively multimodal, trained concurrently on text, images, audio, video, and code. This allows for seamless cross-modal reasoning without intermediate conversion steps.
Load the model using the official Google Gen AI SDK. You can dynamically enable the reasoning (thinking) feature by specifying a thinking budget.
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
async function runHybridReasoning() {
const response = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: "Analyze this complex architectural diagram and write a migration plan.",
config: {
// Enable thinking for complex tasks natively
thinkingConfig: {
thinkingBudget: 1024,
},
temperature: 0.4,
}
});
console.log("Model Output:", response.text);
}
runHybridReasoning();Gemini 2.5 Flash is heavily optimized for speed and large-scale deployment. Below are the average throughput and latency metrics based on independent evaluations:
| Metric | Value | Context |
|---|---|---|
| Throughput | 74 tokens / sec | Average on Google Vertex Global |
| First Token Latency | 0.63 s | Average on Google Vertex Global |
| E2E Latency | 1.70 s | Average on Google AI Studio |
| Structured Output Error Rate | 1.17 % | Highly reliable JSON schema adherence |