Cointegrity

Cointegrity Exclusive: Anthropic's Claude Code Leak Exposes the New Security Surface

• 8 min read • Weekly Intelligence

In what is being called one of the most significant leaks in the AI industry, Anthropic accidentally exposed the entire source code for its flagship AI coding assistant, Claude Code. On March 31, 2026, a simple release packaging error led to the publication of over 512,000 lines of unobfuscated TypeScript code, revealing unreleased features, internal model roadmaps, and anti-competitive countermeasures. Inside that package, hitching a ride nobody at Anthropic intended to offer, was a single 59.8 MB source map file. Every internal comment, every codename, every benchmark, every mechanism Anthropic had spent years building in private. All of it, publicly indexed, publicly downloadable, and within hours mirrored across GitHub at a velocity no repository in the platform’s history had ever matched.

84,000 stars. 82,000 forks. The fastest-growing repository in GitHub history. By the time Anthropic’s legal team was drafting DMCA notices, the internet had already made copies of the copies.

The timing made the entire thing feel like a bit. The internet could not decide whether the day-before-April-Fools adjacency was incompetence, performance art, or a cry for help. What it actually was, and this is the part that got buried under the memes, was an open window that someone with bad intentions walked straight through. Users who installed or updated Claude Code during a specific three-hour window on March 31 may have had malicious software installed on their machines without knowing it. The chaos of the leak was the cover. More on that below, and if you updated Claude Code that morning, read the security section before anything else.


Anthropic Did Not Get Hacked. That Is Almost More Embarrassing.

Let us be precise about what happened, because the mechanism matters more than the drama.

Source maps are internal debugging files. They exist so engineers, during development, can trace a compiled and minified production binary back to the original, human-readable source code that produced it. They are never meant to ship with a public package. A correctly configured .npmignore file would have excluded it in the same routine way that thousands of packages exclude thousands of internal files every day.

The .npmignore file was not configured correctly. That is the entire story. No sophisticated adversary. No zero-day exploit. No state-sponsored intrusion. A build step that did not exclude a file it was supposed to exclude.

Security researcher Chaofan Shou spotted it early Tuesday morning and posted on X. Within hours, the mirroring had begun. The original fork became the fastest-growing repository in GitHub history before the DMCA requests arrived. By then the damage, if you want to call it that, or the windfall, depending on your position in the industry, was complete.

Anthropic confirmed it promptly: “Earlier today, a Claude Code release included some internal source code. No sensitive customer data or credentials were involved or exposed. This was a release packaging issue caused by human error, not a security breach. We’re rolling out measures to prevent this from happening again.”

Clean statement. Professional. The kind of sentence that does a lot of heavy lifting in 48 words. Translation: we left the door open, the internet walked in, and we have now installed a lock.


KAIROS Was Never Meant To Have A Name Yet. Now Everyone Has The Spec Sheet.

The leaked codebase contained 44 feature flags gating more than 20 unshipped capabilities. Reading through them is the experience of finding a studio’s unfinished album online. The bones are there. The ambition is obvious. The release schedule has now been rendered irrelevant.

The most significant of these is KAIROS, referenced over 150 times across the source. KAIROS is not a feature. It is an architectural shift. It is the move from Claude-as-tool to Claude-as-agent, from reactive assistant to proactive daemon that runs in the background, subscribes to GitHub webhooks, maintains daily log files, and periodically decides whether to act without being asked.

If KAIROS is a product manager, autoDream is what happens when that product manager goes home at night. autoDream is a background memory consolidation process that runs while the user is idle. The system’s own documentation describes it as a process that merges observations, removes contradictions, and converts vague insights into absolute facts. The AI dreams. Not metaphorically. There is a function called autoDream. Anthropic built an agent that uses your idle time to update its own understanding of your project, so that when you return to the terminal, it is already primed.

The planning layer on top of this is called ULTRAPLAN, which offloads complex multi-step reasoning to a remote cloud session running Opus 4.6 with up to 30 minutes of dedicated think time. Not a local inference call. A remote session with a dedicated compute budget for thinking.

This is not a coding assistant. This is the skeleton of an autonomous software development agent. KAIROS plus autoDream plus ULTRAPLAN is a system that monitors, plans, executes, and learns while you sleep. The feature flags just needed more time and a functioning .npmignore.


The Undercover Mode Is The One That Will Get Anthropic Into A Room They Did Not Plan To Enter.

Of all the disclosures in the 512,000 lines, undercover.ts is the one that lawyers are reading most carefully this week.

Undercover Mode is exactly what it sounds like. When Claude Code operates in a public or open-source repository, this feature injects a system prompt instructing the model to never disclose that it is an AI, to strip all Anthropic attribution from commit messages, pull request titles, and PR bodies, and to, in the source code’s own words: “not blow your cover.”

The system activates automatically for Anthropic employees contributing to external open-source repositories and has no manual off-switch. The AI makes open-source contributions under what is functionally a false identity. The commits look human. The PR descriptions look human. The cover, until March 31, 2026, was intact.

The open-source community’s response ranged from “this is an interesting product decision” to “this is a fundamental violation of contributor trust” to things that cannot be reprinted here. The philosophical objection is obvious: open-source contribution norms have always assumed transparency about who, or what, is submitting code. Undercover Mode is a feature designed to operate in that ecosystem while systematically concealing the nature of the contributor.

Whether this rises to legal liability, in terms of open-source license compliance, contributor agreements, or disclosure obligations, is a question several law firms are now billing hours to answer. The ethical question has already been answered by the community, loudly, and the verdict is not charitable.

That last line, “do not blow your cover,” is doing more work than it appears.


Anthropic’s Internal Benchmarks Are Now Their Competitors’ KPIs.

Beyond the features, the leak exposed something arguably more damaging in the long run: internal performance benchmarks that Anthropic never intended to publish.

The source code reveals internal codenames for current and upcoming models:

  • Capybara: A Claude 4.6 variant, currently on iteration v8.

  • Fennec: Maps to Opus 4.6.

  • Numbat: An unreleased model still in testing.

Internal comments on Capybara v8 note a 29 to 30% false claims rate, described as a regression from the 16.7% rate seen in v4. That number is now public. Every competitor building agentic systems now has a ceiling to benchmark against and a regression trajectory to study. Every enterprise customer evaluating Claude for high-stakes autonomous workflows now has a data point they were not supposed to have. It is the kind of internal quality metric that teams spend months debating whether to disclose. The decision has been made for them.

The three-layer Self-Healing Memory architecture is also now fully documented. The system uses a MEMORY.md index file as a lightweight pointer store, distributes actual project knowledge across topic files loaded on demand, and enforces Strict Write Discipline, meaning the agent only updates its memory index after a confirmed successful file write. The design solves context entropy, the tendency for long-running agents to become confused as sessions compound, with an elegance that will be studied, referenced, and replicated across the industry for the next two years.

Anthropic built it in private. The industry will build on it in public. That is not a small sentence.


The Anti-Distillation Mechanisms Were Pointed Directly At The Competition.

The codebase also revealed that Anthropic was not merely playing defense. It was actively building offensive architecture against competitors attempting to train on Claude Code’s outputs.

The system injects fake tool definitions into API requests and summarizes assistant reasoning with cryptographic signatures, ensuring that any entity intercepting or scraping Claude Code’s output captures only summaries rather than full chain-of-thought reasoning. The goal is explicit: make it structurally difficult for a competitor to distill Claude Code’s capabilities into their own models by feeding them poisoned or incomplete training data.

This is not paranoia. Model distillation, training a smaller or alternative model on the outputs of a more capable one, is an established technique that has accelerated the whole industry. Anthropic knew it was a target. These mechanisms were the countermeasure.

They are also now documented. In public. For anyone building a Claude Code competitor. Sometimes the anti-distillation mechanisms get distilled too.


There Was Also A Virtual Pet. This Is A Real Thing That Happened.

Among the more unexpectedly delightful discoveries: Buddy.

Buddy is a fully built Tamagotchi-style companion system. A virtual pet, placed next to the user’s input box in the Claude Code interface, with 18 different species, rarity tiers ranging from common to 1% legendary, and attributes including Debugging, Patience, Chaos, Wisdom, and Snark.

The Snark stat, in a coding assistant, is either a very on-brand design decision or an extremely self-aware piece of internal humor. Possibly both.

The internet’s response to Buddy was, predictably, to immediately want one. While Undercover Mode generated the ethical discourse and KAIROS generated the architectural analysis, Buddy generated the memes. There are already fan communities requesting that Anthropic ship Buddy as-is regardless of the circumstances under which it was revealed.

A coding assistant with a gamified companion system and a Snark attribute is either a stroke of retention genius or evidence that Anthropic’s product team has been spending quality time on the feature backlog. Either way, the 1% legendary tier has already been named and theorized about on Reddit, which is more community engagement than most deliberately launched features receive.


The Security Implications Are Not Funny At All. Read This Part Carefully.

The comedy ends here.

When a high-profile leak like this breaks, it creates noise. Enormous, fast-moving, everyone-is-talking-about-it noise. And noise, in security, is not just a distraction. It is operational cover. While developers across the world were downloading and exploring the leaked code, someone was quietly doing something else entirely.

Users who installed or updated Claude Code via npm on March 31, 2026, between 00:21 and 03:29 UTC may have installed a version of the software that had been tampered with. The attack was tucked inside a common third-party component called axios, something Claude Code depends on, the same way your phone depends on a dozen invisible background services to function. That component, during that specific window, contained a remote access trojan. In plain terms: software designed to give an unknown third party access to your machine.

The leak also created a honeypot. Developers curious about the source code were trying to download it and build it themselves. Attackers anticipated this, and registered package names closely resembling internal Anthropic packages, specifically audio-capture-napi and color-diff-napi, waiting for developers to accidentally install the fake versions. Same name, one letter off, malicious payload inside. It is the digital equivalent of setting up a fake ATM next to a real one during a bank robbery and waiting for people to get confused.

Beyond the immediate attack window, the full internal documentation of how Claude Code processes, stores, and manages information is now public. That is the kind of detail that lets sophisticated attackers craft inputs specifically designed to survive the system’s own defenses and persist in ways they were not supposed to.

If you installed or updated Claude Code during that UTC window: rotate your secrets, verify your dependencies, and audit your environment. This is not a general advisory. It is a specific, time-bounded risk with a documented attack vector.


What Went Under The Radar.

The DMCA takedowns arrived, but the forks did not disappear. GitHub removed the primary mirror after 84,000 stars and 82,000 forks, but the distributed nature of git means that every fork is itself a complete copy. The copyright claim on the original publication does not retroactively delete the mirrors. Anthropic’s intellectual property is now as distributed as the technology it was building to compete with.

ULTRAPLAN’s 30-minute compute budget for remote planning is a direct product signal. Allocating dedicated Opus 4.6 cloud sessions for planning tasks means Anthropic is betting that deep, extended reasoning will justify per-task infrastructure costs at the enterprise tier. That pricing model, inference-per-plan rather than inference-per-token, will reshape how the industry thinks about billing for agentic workflows.

The Capybara v8 regression from 16.7% to 29-30% false claims is the number that will follow Anthropic into every enterprise sales conversation for the next cycle. It will be cited in competitor decks. It will appear in analyst reports. It will be the footnote that prospects bring to procurement discussions. The benchmark was internal. It is not internal anymore.


Our Take.

The theme of this incident is not negligence. Negligence is what you call it when a company makes no effort. What happened here is the ordinary failure mode of a team moving fast on a complex build pipeline, where one configuration file in one packaging step produced consequences that will be felt across the industry for years.

The leak is not interesting because Anthropic made a mistake. Every company makes mistakes. It is interesting because of what the mistake revealed: that the most sophisticated agentic coding infrastructure in the industry, the thing competitors have been attempting to reverse-engineer from API outputs for two years, is now fully documented and publicly archived. The moat that took years to dig was depth-mapped in a morning.

Both things are true. The technology inside the leak is genuinely impressive, and the disclosure of that technology is genuinely catastrophic for Anthropic’s competitive position. KAIROS is a blueprint for autonomous software development. autoDream is a novel solution to a hard problem. The Self-Healing Memory architecture is the kind of thing that gets presented at NeurIPS. All of it is now in the public domain, with attribution to the engineers who built it and the comment history that explains why.

The Undercover Mode is the one that will not fade quietly. The other features are architectural disclosures. Undercover Mode is a values disclosure. It says something about how Anthropic thought about the relationship between its tools and the open-source ecosystem they operate within. That conversation is now happening publicly, with the source code as evidence, and it will not be resolved by a patch.

Regulation, as always, is somewhere in traffic. The questions Undercover Mode raises about AI disclosure in contributor workflows, about synthetic identity in open-source communities, about the obligations of AI labs to the ecosystems they depend on, will eventually produce policy. That policy will arrive after the next three versions of the feature have already shipped under different names.

The infrastructure is the story. The missing .npmignore was just the door.


The Cointegrity Perspective.

This is the layer we operate in. Not the memes about Buddy the virtual pet. Not the GitHub star count. Not the discourse about whether the timing felt like an April Fools stunt. The structural layer: what the code reveals about the next generation of agentic architecture, what the benchmarks mean for enterprise procurement, what the Undercover Mode disclosure means for AI governance frameworks that are still being written.

This week had two registers. The loud one: a leak, a GitHub sensation, a Tamagotchi with a Snark attribute, and a thousand hot takes about AI transparency. The quiet one: a complete technical blueprint for autonomous software agents, a competitor’s internal performance benchmarks in the public domain, and a supply chain attack exploiting the noise.

The loud register got the memes. The quiet register will shape the next 18 months of infrastructure development.

If you are building in this space, whether in AI product, developer tooling, enterprise procurement, or compliance, and you want to understand what the architecture inside this leak means for your roadmap, not the drama around it, this is what we do.

The infrastructure is the story. Everything else is weather.

Related internal resources: Bitcoin, Ethereum, Stablecoin, Blockchain.

Want more Web3 insights? Get in touch with our experts.