Close Menu
CryptoAINews
  • Cryptocurrency
  • Blockchain
  • Bitcoin News
  • Altcoins
  • Crypto Market Trends
  • Crypto Mining
  • Ethereum
  • AI News
  • Sponsored
  • Advertise
Trending
  • Will ETH Dump Toward $1K Next?
  • The Trump administration might take an equity stake in OpenAI
  • Build Kaggle Benchmarks Locally
  • Ethereum treasury giant offers 9.5% payout as BitMine paper losses top $8.5 billion
  • Founders share VC horror stories, and some are naming names
  • Google AI announcements from May 2026
  • Reid Hoffman is leaving Microsoft’s board to go ‘founder mode’ with startup Manus
  • Gemma 4 with quantization-aware training
  • AI News
  • Cryptocurrency
  • Blockchain
  • Bitcoin News
  • Altcoins
  • Crypto Market Trends
  • Crypto Mining
  • Ethereum
  • Sponsored
  • Advertise
CryptoAINews
  • Cryptocurrency
  • Blockchain
  • Bitcoin News
  • Altcoins
  • Crypto Market Trends
  • Crypto Mining
  • Ethereum
  • AI News
  • Sponsored
  • Advertise
CryptoAINews
Home » AI News » Gemma 4 with quantization-aware training
Hero Visual Blog.width 1300
AI News

Gemma 4 with quantization-aware training

CryptoAINewsBy CryptoAINewsJune 5, 2026No Comments2 Mins Read
Share
Facebook Twitter LinkedIn Pinterest Email


Since releasing Gemma 4 two months in the past, we have been constantly working to broaden its capabilities. First, we launched Multi-Token Prediction (MTP) to speed up inference, and simply a few days in the past, we launched a 12B model to bridge the hole between our E4B and 26B MOE fashions.

At the moment, we’re releasing new checkpoints optimized with Quantization-Conscious Coaching (QAT) to make Gemma 4 much more environment friendly, so you possibly can run fashions domestically on on a regular basis edge gadgets and client GPUs.

By simulating quantization throughout coaching, QAT minimizes high quality loss when the mannequin is compressed. This launch consists of QAT checkpoints for the favored Q4_0 quantization format in addition to a novel quantization format specialised for cellular use instances. Utilizing this cellular format, we’ve lowered the reminiscence footprint of Gemma 4 E2B to 1GB. Collectively, these dramatically scale back reminiscence necessities whereas preserving the capabilities and high quality you count on from Gemma 4.

Preserving mannequin high quality whereas making them smaller

Quantization is a key know-how to run fashions on client {hardware} by lowering their reminiscence footprint whereas additionally accelerating decode velocity. Nevertheless, commonplace Submit-Coaching Quantization (PTQ) usually results in efficiency degradation. As an alternative of merely quantizing the mannequin after coaching, QAT integrates the quantization course of straight into coaching. Whereas PTQ is already efficient at preserving high quality, our QAT outcomes yield even greater total high quality in comparison with commonplace PTQ baselines.

We utilized this QAT recipe to the favored Q4_0 format to maximise efficiency for all of the fashions. For the sting fashions (E2B and E4B), we rethought how we strategy quantization with a particular mobile-specialized quantization schema.

Saving on VRAM and Storage

Beneath are the approximate reminiscence necessities indicating how a lot VRAM is required to load the fashions:



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
CryptoAINews
  • Website

Related Posts

The Trump administration might take an equity stake in OpenAI

June 6, 2026

Build Kaggle Benchmarks Locally

June 6, 2026

Founders share VC horror stories, and some are naming names

June 6, 2026

Google AI announcements from May 2026

June 6, 2026
Add A Comment
Leave A Reply Cancel Reply

About us

CryptoAINews is an independent digital publication focused on cryptocurrency, blockchain, and artificial intelligence news.

The platform is owned and operated by Robert Grabarevic, providing timely news coverage, market updates, and educational content for a global audience interested in emerging technologies and digital finance.

CryptoAINews is committed to transparent reporting, responsible publishing, and delivering informative content based on publicly available data, verified sources, and industry developments.

All content published on this website is for informational purposes only and does not constitute financial or investment advice.

Top Insights

Will ETH Dump Toward $1K Next?

June 6, 2026

The Trump administration might take an equity stake in OpenAI

June 6, 2026

Build Kaggle Benchmarks Locally

June 6, 2026
Categories
  • Advertise
  • AI News
  • Altcoins
  • Bitcoin News
  • Blockchain
  • Crypto Market Trends
  • Crypto Mining
  • Cryptocurrency
  • Ethereum
  • Sponsored
  • Imprint-Legal-Notice
  • Author / Publisher Bio
  • Privacy Policy
© 2025 CryptoAINews – Owned & Operated by Robert Grabarevic

Type above and press Enter to search. Press Esc to cancel.