AI Model Rankings for February 24, 2026: 10 New Categories Launch — Here’s Who Leads Every One

AI Rankings — February 24 2026

The Big Picture

Today’s update is a landmark one: the AI Arena leaderboard just expanded from a handful of tracked categories to a full ten, giving us the most comprehensive snapshot yet of who’s winning across the entire AI API landscape. Whether you’re generating code, editing images, converting speech to text, or building AI-powered search — there’s now a definitive, vote-backed ranking for every major capability you’re shipping to production.

What Changed Today

  • Code Generation — New category with 10 ranked models (2,524+ votes on leader)
  • Image Editing — New category with 10 ranked models (184,593+ votes on leader)
  • Image-to-Video — New category with 10 ranked models (13,668+ votes on leader)
  • AI Search — New category with 10 ranked models (11,062+ votes on leader)
  • Speech-to-Text — New category with 10 ranked models
  • Text Generation — New category with 10 ranked models
  • Text-to-Image — New category with 10 ranked models
  • Text-to-Speech — New category with 10 ranked models
  • Text-to-Video — New category with 10 ranked models
  • Vision — New category with 10 ranked models

Category Breakdown: Who Leads and Why It Matters

Code Generation

Claude Opus 4.6 from Anthropic takes the #1 spot with an ELO of 1561, built on over 2,500 head-to-head votes in agentic coding tasks. Anthropic is now running the table here — they hold four of the top six positions, with Claude Sonnet 4.6 (ELO 1524) offering a compelling fast/cheap alternative and OpenAI’s GPT-5.2 High (ELO 1471) sitting back in fifth place. For developers building coding agents, the practical call is clear: use claude-opus-4.6 for complex multi-step work ($5/$25 per MTok), drop down to claude-sonnet-4.6 ($3/$15 per MTok) for agentic loops where speed matters, and consider gemini-3-flash-preview (~$0.15/$0.60 per MTok) when you need frontier-class code generation on a budget.

Image Editing

GPT-Image-1 (listed as chatgpt-image-latest-high-fidelity) from OpenAI leads with an ELO of 1413, backed by a massive 184,593 votes — the most statistically confident result across any category today. Google’s Gemini 3 Pro Image is breathing down its neck at ELO 1395, and xAI’s Grok Imagine Image Pro has emerged as a dark horse at ELO 1330. The API call is gpt-image-1 with pricing ranging from $0.011 to $0.25 per image, depending on quality and resolution, making it viable for production pipelines in e-commerce product photography, marketing asset generation, and content editing workflows.

Image-to-Video

Grok Imagine Video 720p from xAI edges out Google’s Veo 3.1 by a single ELO point (1402 vs 1401), essentially a statistical tie at the top of image-to-video. Google dominates the depth of this category with five models in the top ten across various speed/quality/resolution tradeoffs, veo-3.1 priced at $0.40/second and veo-3.1-fast at $0.15/second. For developers who need to animate product images or reference frames, this is now a two-horse race between xAI and Google, with Alibaba’s open-source Wan2.5 I2V (ELO 1339) as the best self-hostable alternative.

AI Search

Gemini 3 Flash Grounding from Google takes the top position with an ELO of 1224, narrowly beating its own Pro variant (1219) and OpenAI’s GPT-5.2 Search (1218). The remarkable story here is cost: Flash Grounding costs roughly $0.35 per 1,000 search-grounded requests on top of the already cheap ~$0.15/$0.60 per MTok base token cost a fraction of what the Pro variant costs. For builders shipping RAG pipelines, research assistants, or anything that needs current web data, gemini-3-flash-preview using the Google Search tool enabled is the clear default. Anthropic’s Claude Opus 4.5 Search (ELO 1179) rounds out the options if you’re already in the Anthropic ecosystem.

Speech-to-Text

This new category now tracks 10 models for transcription and speech recognition. Developers building voice interfaces, transcription services, or meeting summarization tools should consult the full leaderboard for the latest rankings. We’ll be publishing a detailed deep-dive on this category as vote counts mature.

Text Generation

The foundational LLM category is now formally tracked with 10 ranked models. This is the broadest category covering general-purpose chat, writing, analysis, and reasoning, and will be one of the most closely watched rankings going forward. Stay tuned for detailed breakdowns as we integrate the full reference data.

Text-to-Image

Image generation has its own dedicated ranking separate from image editing, with 10 models now tracked. This distinction matters for developers: generation (creating from a text prompt) and editing (modifying an existing image) are different use cases with different leaders. Check the full leaderboard for current standings.

Text-to-Speech

Voice synthesis is now a tracked category with 10 ranked models. For developers building voice agents, audiobook generation, or accessibility features, this category will be essential for picking the right API. We’ll publish detailed pricing and quality comparisons as data stabilizes.

Text-to-Video

Distinct from image-to-video, this category covers generating video directly from text prompts — 10 models are now ranked. This is one of the fastest-moving areas in generative AI, with new entrants appearing monthly. Watch this space closely.

Vision

Visual understanding, image analysis, OCR, visual question answering, and document parsing now has its own dedicated 10-model ranking. For developers building document processing pipelines, visual QA, or multimodal agents, this category separates visual comprehension ability from image generation quality.

Current Leaders at a Glance

Category #1 Model Provider Score (ELO)
Code Generation Claude Opus 4.6 Anthropic 1561
Image Editing GPT-Image-1 (High Fidelity) OpenAI 1413
Image-to-Video Grok Imagine Video 720p xAI 1402
AI Search Gemini 3 Flash Grounding Google 1224
Speech-to-Text GPT-4o Transcribe OpenAI 5.4% WER
Text Generation Claude Opus 4.6 Anthropic 1504
Text-to-Image GPT-Image-1.5 (High Fidelity) OpenAI 1247
Text-to-Speech Vocu V3.0 Vocu 1612
Text-to-Video Veo 3.1 Audio 1080p Google 1392
Vision Gemini 3 Pro Google 1288

So What?

The big takeaway from today’s expansion: no single provider dominates everything. Anthropic owns code generation, OpenAI leads image editing, xAI just barely edges out Google in image-to-video, and Google runs away with AI search on a cost-efficiency basis that’s hard to argue with. If you’re an AI builder still defaulting to one provider for all your API calls, today’s data makes the case for a multi-provider strategy stronger than ever. Practically, here are your immediate action items: swap your coding agents claude-opus-4.6 if you haven’t already, evaluate gemini-3-flash-preview with grounding for any search or RAG pipeline (the cost savings over alternatives are significant), and start benchmarking gpt-image-1 for any image editing workflows you’re currently handling with older models. We’ll be filling in the remaining category details, including text generation, vision, TTS, and STT leaders, as the full reference data publishes. Bookmark this page and check back daily.

Scroll to Top