Right now, we’re including two new service tiers to the Gemini API: Flex and Priority. These new choices provide you with granular management over value and reliability by way of a single, unified interface.
As AI evolves from easy chat into complicated, autonomous brokers, builders sometimes need to handle two distinct forms of logic:
- Background duties: Excessive-volume workflows like information enrichment or “considering” processes that do not want immediate responses.
- Interactive duties: Consumer-facing options like chatbots and copilots the place excessive reliability is required.
Till now, supporting each meant splitting your structure between commonplace synchronous serving and the asynchronous Batch API. Flex and Precedence assist to bridge this hole. Now you can route background jobs to Flex and interactive jobs to Precedence, each utilizing commonplace synchronous endpoints. This eliminates the complexity of async job administration whereas supplying you with the financial and efficiency advantages of specialised tiers.
Flex Inference: scale innovation for 50% much less
Flex Inference is our new cost-optimized tier, designed for latency-tolerant workloads with out the overhead of batch processing.
- 50% value financial savings: Pay half the value of the Commonplace API by downgrading criticality of your request (making them much less dependable, and including latency).
- Synchronous simplicity: Not like the Batch API, Flex is a synchronous interface. You employ the identical acquainted endpoints with out managing enter/output recordsdata or polling for job completion.
- Ultimate use circumstances: Background CRM updates, large-scale analysis simulations, and agentic workflows the place the mannequin “browses” or “thinks” within the background.
Get began quick by merely configuring the service_tier parameter in your request:
