Поиск по этому блогу

Search1

123

понедельник, 16 февраля 2026 г.

The Week of the Three AI Musketeers

Last week, the AI race stopped being about demos and benchmark slides.‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  ‌  
  -  
1770012626775-strzt
1770012626775-strzt

The Week of the Three AI Musketeers

THE BELAMY

Weekly Newsletter of AIM

Monday, Feb 16, 2026 | By Mohit Pandey

Now, subscribe to our Digital & Print Editions >


Last week, the AI race stopped being about demos and benchmark slides. It turned into a distribution fight across terminals, desktops, APIs, and everyday developer workflows. And the contenders weren't startups. The battle was between the incumbents: OpenAI, Anthropic, and Google.

OpenAI fired the opening salvo with GPT-5.3-Codex and then quickly followed up with Codex-Spark—a real-time coding model that pushes more than 1,000 tokens per second. 

Meanwhile, Anthropic doubled down on Claude Code with Opus 4.6, agent teams, and a one-million-token context window. 

Google responded with an upgraded Gemini 3 Deep Think, now posting elite results across math, programming, and science benchmarks, including an Elo of 3455 on Codeforces.

This is no longer ChatGPT vs Claude vs Gemini. It's OpenAI vs Anthropic vs Google across the full AI agent stack.

1771219902539-7kekqb
1771219902539-7kekqb

The Coding Front Line: Claude Code vs Codex

Let's first look at the Indian IT killer

Anthropic's Claude Code built early credibility by embedding itself directly within the developer workflow rather than acting like a chat window. It runs primarily in the terminal and integrates with VS Code, JetBrains, and Slack. 

It can read repositories, propose plans, generate diffs, run tests, and commit changes. The plan-first workflow gives developers checkpoints before execution.

With Claude Opus 4.6, the case got stronger. It now scores 80.8% on SWE-bench Verified and 72.7% on OSWorld-Verified, ahead of Codex on desktop agent tasks. 

The one million token context window in beta also means it can track massive codebases and long documents with less drift. For teams working on complex repositories, it is a big deal.

Anthropic also introduced agent teams in a research preview. Instead of one reasoning engine attempting to ingest everything, a lead agent delegates tasks to smaller parallel subagents. Each handles a scoped slice of context, then reports back. 

Less overload, more coordination.

In one experiment, 16 Claude agents ran in parallel for two weeks and built a Rust-based C compiler with nearly one lakh lines of code, costing just under $20,000 in API usage.

 

OpenAI answered with scale and speed.

GPT-5.3-Codex leads on Terminal-Bench 2.0 with 77.3%, compared with Claude's 65.4%. 

Inside OpenAI, leadership claims Codex now writes most of the code and handles a large share of debugging and operational tasks. CEO Sam Altman says Codex crossed one million downloads in its first week, with usage growing more than 60% week over week.

Then came Codex-Spark.

It is a smaller research-preview model built for real-time iteration. It supports a 1,28,000-token context window and delivers more than 1,000 tokens per second on Cerebras hardware. It is optimised for targeted edits, logic adjustments, and quick interface refinement.

During preview, its usage limits are separate from standard quotas. On Terminal-Bench 2.0, it scores 58.4%, lower than full Codex but finishes tasks significantly faster.

OpenAI is effectively splitting the workflow into two modes. Frontier Codex handles long-running autonomous tasks that runs for hours or days. Codex-Spark focuses on live collaboration.

AIM Network Deep Dive >>

In this AI Talk Show episode, we discuss BharatGen Param2 17B, its Mixture-of-Experts architecture, multilingual capabilities, and how sovereign datasets could shape India's long-term AI infrastructure and global competitiveness. The country is bracing for the India AI Impact Summit which is taking place this week in New Delhi.

video_preview_f0d07f9c24cd8061975c9229b678f3dd.jpg
video_preview_f0d07f9c24cd8061975c9229b678f3dd.jpg

Developers, Alert! AMD AI Engage is Here >>

AMD-AI-Engage2_1
AMD-AI-Engage2_1

Join AMD AI Engage, an AMD-led initiative for developers and AI enthusiasts. Strengthen your AI skills, participate in challenges, and unlock opportunities for advanced certification experiences with AMD and industry experts. Register now.

Google Enters From Above

While OpenAI and Anthropic fight in terminals, Google is attacking from the research frontier.

Gemini 3 Deep Think just received a major upgrade. It now sets a new standard at 48.4% on Humanity's Last Exam without tools. It achieves 84.6% on ARC-AGI-2

It reaches an Elo of 3455 on Codeforces. Only seven humans rank above that score

It achieved gold-medal-level performance on the International Mathematical Olympiad 2025 and strong results on the International Physics and Chemistry Olympiads.

Deep Think is built for scientific and engineering reasoning rather than pure code completion. Early testers report that it identified subtle logical flaws in advanced mathematics papers that had passed human peer review.

Pricing and Distribution: The Quiet Battle

Unlike Codex or Claude Code, Gemini 3 Deep Think is not a developer CLI tool. It lives inside the Gemini app for Google AI Ultra subscribers and via the Gemini API for selected researchers and enterprises through early access.

Google's strategy is ecosystem control. Models such as Gemini Pro, Flash, Deep Think, and Gemma sit alongside tools like Stitch, Whisk, Imagen, and NotebookLM. The bet is that reasoning power at the top cascades downwards into agents, design tools, research systems, and cloud infrastructure.

Google is not trying to win a race for coding assistants. It's trying to own the AI agent layer across search, productivity, research, and cloud.

The technical debate hides a more important war: access and limits.

Claude Code is bundled inside Anthropic's paid subscriptions. The Pro plan costs around $17 per month, the Max 5× plan $100 per month, and the Max 20× plan $200 per month. 

Codex is embedded inside ChatGPT tiers. The Free and Go tier is priced at $8 per month during its offer period. ChatGPT Plus costs $20 per month. ChatGPT Pro, on the other hand, costs $200 per month and offers six to 10 times higher usage limits than Plus.

Developers have publicly noted that Codex appears to allow more sustained usage at the $20 tier. That perception matters. In developer tools, friction is fatal.

OpenAI also ran a Codex commercial during the Super Bowl instead of advertising ChatGPT. That was not branding. That was a signal. Coding is the wedge.

video_preview_e563bf0d6dc1cb4fff10386ece908bbc.jpg
video_preview_e563bf0d6dc1cb4fff10386ece908bbc.jpg

The benchmarks reveal a split in strengths.

Codex dominates terminal-centric tasks. Claude Opus 4.6 leads on reasoning-heavy and software engineering evaluations. Gemini Deep Think leads on math and scientific rigour.

But developers do not buy benchmarks. They buy reliability. The recompetition is orchestration.

Claude uses parallel subagents and lightweight Haiku workers under an orchestrator. Codex introduces Skills and parallel agents inside a macOS app with Git integration. Google exposes Deep Think through API surfaces where specialised agents can be built on top.

The developer's role is shifting from writing code to supervising intent.

Anthropic is optimising for depth. Long context, structured plans, architectural respect, and multi-agent coordination. It wants to be the trusted engineer's co-worker.

Meanwhile, OpenAI is optimising for scale. Distribution, downloads, aggressive integration, and real-time iteration with Codex-Spark. It wants to be the default operating layer for digital work.

Google, on the other hand, is optimising for intelligence ceilings. Scientific reasoning, competitive programming dominance, and ecosystem reach. Google wants to be the intelligence backbone behind everything.

Each strategy reflects its DNA.

AI is No Longer a Tool at Xebiait is the Operating System

1770797394667-x0pf4
1770797394667-x0pf4

Over the last 18 months, Xebia has shifted from experimenting with AI to operationalising it across delivery teams and support functions. Today, almost every step of the software lifecycle includes AI assistance. 

The company has rolled out an enterprise generative AI stack built around Anthropic's Claude and positioned itself as an authorised reseller capable of deploying the model within a client's own cloud. Click here to find out more.

Now, subscribe to our Digital & Print Editions >

For Brand collaborations, reply to this email or write to info@aim.media

You received this email because you signed up to the updates from AIM. Click here to unsubscribe if you do not want to receive emails from us.

  -  

Комментариев нет:

Отправить комментарий

Примечание. Отправлять комментарии могут только участники этого блога.