As AI fashions evolve from easy chatbots into reasoning brokers that write code, use instruments and resolve complicated issues, conventional benchmarks are now not sufficient. The group wants dynamic, rigorous evaluations — constructed by the individuals who use these fashions within the real-world.
That’s why we launched Kaggle Benchmarks. Since then, the worldwide AI group has created greater than 10,000 analysis duties, creating the reliable, clear public leaderboards that assist labs measure and speed up AI progress.
Right this moment, we’re taking the following step by launching native improvement for Kaggle Benchmarks.
Use Kaggle Benchmarks out of your native improvement atmosphere
Till now, creating analysis duties meant working solely in Kaggle’s web-based pocket book editor, as a substitute of builders’ most well-liked stack to construct with.
Our new replace permits builders to create, validate, push, run and obtain duties immediately from their native improvement environments like Antigravity, VSCode, Cursor and coding brokers. This replace is designed to satisfy builders the place they work, making the journey from thought to analysis sooner and extra intuitive.
Construct analysis duties in pure language with AI coding brokers
Native improvement additionally unlocks a robust new workflow: utilizing AI coding brokers to put in writing benchmark duties by means of the write-kaggle-benchmarks skill. This talent includes a set of structured directions that teaches a coding agent how you can construct duties utilizing the kaggle-benchmarks SDK and the Kaggle CLI.
So as to add this talent to your agent, merely ask your agent to:
As soon as put in, you may describe an analysis in plain language and get a working activity on Kaggle. For instance, you may inform your agent:
These highly effective capabilities are pushed by the brand new instructions that we have now constructed for Benchmarks within the Kaggle CLI.
