Our First Natively Multimodal Embedding Model

At this time we’re releasing Gemini Embedding 2, our first totally multimodal embedding mannequin constructed on the Gemini structure, in Public Preview through the Gemini API and Vertex AI.

Increasing on our earlier text-only basis, Gemini Embedding 2 maps textual content, photographs, movies, audio and paperwork right into a single, unified embedding house, and captures semantic intent throughout over 100 languages. This simplifies complicated pipelines and enhances all kinds of multimodal downstream duties—from Retrieval-Augmented Era (RAG) and semantic search to sentiment evaluation and knowledge clustering.

New modalities and versatile output dimensions

The mannequin relies on Gemini and leverages its best-in-class multimodal understanding capabilities to create high-quality embeddings throughout:

Textual content: helps an expansive context of as much as 8192 enter tokens
Photos: able to processing as much as 6 photographs per request, supporting PNG and JPEG codecs
Movies: helps as much as 120 seconds of video enter in MP4 and MOV codecs
Audio: natively ingests and embeds audio knowledge while not having intermediate textual content transcriptions
Paperwork: immediately embed PDFs as much as 6 pages lengthy

Past processing one modality at a time, this mannequin natively understands interleaved enter so you possibly can move a number of modalities of enter (e.g., picture + textual content) in a single request. This permits the mannequin to seize the complicated, nuanced relationships between completely different media varieties, unlocking extra correct understanding of complicated, real-world knowledge.

Source link

our first natively multimodal embedding model

AI-powered apps can make money, but struggle with long-term retention, new data shows

Google shares Gemini updates to Docs, Sheets, Slides and Drive

Whoop launches a new blood test focused on women’s health

An iPhone-hacking toolkit used by Russian spies likely came from U.S military contractor

Top Insights

our first natively multimodal embedding model

Bitmine Secures 60,976 Ethereum In Volatile Condition, But Here’s How They Are Making Money

AI-powered apps can make money, but struggle with long-term retention, new data shows

our first natively multimodal embedding model

New modalities and versatile output dimensions

Related Posts

AI-powered apps can make money, but struggle with long-term retention, new data shows

Google shares Gemini updates to Docs, Sheets, Slides and Drive

Whoop launches a new blood test focused on women’s health

An iPhone-hacking toolkit used by Russian spies likely came from U.S military contractor

our first natively multimodal embedding model

Bitmine Secures 60,976 Ethereum In Volatile Condition, But Here’s How They Are Making Money

AI-powered apps can make money, but struggle with long-term retention, new data shows