In a significant stride towards AI-driven software development, Anthropic has demonstrated the capabilities of its AI agents by releasing multi-agent tools, alongside OpenAI's similar initiatives. Anthropic has unveiled an ambitious experiment showcasing the prowess of its AI in coding tasks, although it comes with some prerequisite disclaimers typical of AI endeavors.
On Thursday, Nicholas Carlini, a researcher at Anthropic, detailed in a blog post how he harnessed 16 instances of the Claude Opus 4.6 AI model to collaboratively work on a single codebase with minimal guidance. The task assigned was to develop a C compiler from the ground up.
Over a two-week period, involving nearly 2,000 sessions and incurring approximately $20,000 in API fees, these AI agents produced a 100,000-line Rust-based compiler. This compiler can construct a bootable Linux 6.9 kernel compatible with x86, ARM, and RISC-V architectures.
Carlini, a member of Anthropic's Safeguards team with prior tenure at Google Brain and DeepMind, leveraged a newly introduced feature of Claude Opus 4.6 known as “agent teams.” Practically, each AI instance operated in an independent Docker container, accessing a shared Git repository, self-assigning tasks via lock files, and subsequently updating completed code to the main repository. There was no centralized coordination; each instance autonomously identified and tackled the most pressing issues. Even when merge conflicts occurred, they were resolved independently by the AI models.
The resulting compiler, now available on GitHub, has demonstrated its capability by compiling various significant open source projects such as PostgreSQL, SQLite, Redis, FFmpeg, and QEMU. It successfully passed 99 percent of the GCC torture test suite and, notably, managed to compile and execute Doom, which Carlini referenced as “the developer’s ultimate litmus test.”
However, it’s important to acknowledge that creating a C compiler is notably suited for semi-autonomous AI coding efforts: The task benefits from an established, well-defined specification, existing comprehensive test suites, and a well-known reference compiler for comparison. In contrast, most real-world software projects lack these clear-cut advantages. The primary challenge is often not just writing code that passes tests, but determining what those tests should be in the first place.