The Coding Front Line: Claude Code vs Codex Let's first look at the Indian IT killer. Anthropic's Claude Code built early credibility by embedding itself directly within the developer workflow rather than acting like a chat window. It runs primarily in the terminal and integrates with VS Code, JetBrains, and Slack. It can read repositories, propose plans, generate diffs, run tests, and commit changes. The plan-first workflow gives developers checkpoints before execution. With Claude Opus 4.6, the case got stronger. It now scores 80.8% on SWE-bench Verified and 72.7% on OSWorld-Verified, ahead of Codex on desktop agent tasks. The one million token context window in beta also means it can track massive codebases and long documents with less drift. For teams working on complex repositories, it is a big deal. Anthropic also introduced agent teams in a research preview. Instead of one reasoning engine attempting to ingest everything, a lead agent delegates tasks to smaller parallel subagents. Each handles a scoped slice of context, then reports back. Less overload, more coordination. In one experiment, 16 Claude agents ran in parallel for two weeks and built a Rust-based C compiler with nearly one lakh lines of code, costing just under $20,000 in API usage. OpenAI answered with scale and speed. GPT-5.3-Codex leads on Terminal-Bench 2.0 with 77.3%, compared with Claude's 65.4%. Inside OpenAI, leadership claims Codex now writes most of the code and handles a large share of debugging and operational tasks. CEO Sam Altman says Codex crossed one million downloads in its first week, with usage growing more than 60% week over week. Then came Codex-Spark. It is a smaller research-preview model built for real-time iteration. It supports a 1,28,000-token context window and delivers more than 1,000 tokens per second on Cerebras hardware. It is optimised for targeted edits, logic adjustments, and quick interface refinement. During preview, its usage limits are separate from standard quotas. On Terminal-Bench 2.0, it scores 58.4%, lower than full Codex but finishes tasks significantly faster. OpenAI is effectively splitting the workflow into two modes. Frontier Codex handles long-running autonomous tasks that runs for hours or days. Codex-Spark focuses on live collaboration. |
Комментариев нет:
Отправить комментарий
Примечание. Отправлять комментарии могут только участники этого блога.