Fast Facts — Key Takeaways
On March 9, 2026, security startup CodeWall disclosed that its autonomous AI agent breached McKinsey’s internal AI platform — Lilli — in just two hours. No credentials. No insider access. No human involvement after launch.
- 46.5 million chat messages covering strategy, M&A, and client engagements were exposed.
- 728,000 confidential client files and 57,000 user accounts were accessible.
- Lilli’s system prompts were writable — a silent attacker could have reprogrammed what 40,000 consultants were told without deploying a single line of code.
- The vulnerability was a SQL injection flaw — one of the oldest attack classes in existence — on a platform running in production for over two years.
- McKinsey patched all exposed endpoints within 24 hours of responsible disclosure. But the breach tells every enterprise exactly where the real AI security gap sits.
Here is the question every enterprise deploying AI needs to sit with this week: if McKinsey — a firm advising the world’s largest corporations on how to build and secure AI systems, with the resources and talent to do it properly — had a production AI platform vulnerable to a textbook SQL injection attack for over two years, what does your AI security posture actually look like right now?
That is the central argument of this analysis. The McKinsey Lilli AI hack is not primarily a story about one consultancy’s embarrassing near-miss. It is a diagnostic. It reveals a structural gap that exists across every organisation moving fast to deploy AI capabilities: the infrastructure surrounding the AI system is not being secured at the same pace as the AI system itself is being built.
The attack succeeded not because the model was vulnerable. It succeeded because 22 API endpoints required no authentication, because a JSON field was concatenated directly into SQL, and because the system prompts controlling the chatbot’s behaviour were stored in the same production database that the exploit gave write access to. None of these are exotic vulnerabilities. All of them are preventable. And none of them showed up on McKinsey’s own internal security scanners.
Why an Autonomous Agent Found What Two Years of Internal Scanning Missed
According to CodeWall’s disclosure, their autonomous agent was pointed at McKinsey’s Lilli platform in late February 2026 — and the agent itself suggested the target, citing McKinsey’s published responsible disclosure policy on HackerOne as a green light to proceed.
The agent mapped the attack surface by finding publicly exposed API documentation — over 200 endpoints, fully documented. Most required authentication. Twenty-two did not. One of those open endpoints accepted user search queries and wrote them to the database without proper input validation. The agent found that JSON field names were concatenated directly into SQL, recognised a classic injection pattern, and escalated from there to full read-and-write access on the production database within two hours.
“In the AI era, the threat landscape is shifting drastically — AI agents autonomously selecting and attacking targets will become the new normal.”
— CodeWall, Responsible Disclosure Report, March 2026
The reason internal scanners missed it is instructive. Standard tools like OWASP ZAP do not flag this specific injection pattern because the vulnerable field names were not obviously user-controlled inputs. The agent recognised the vulnerability because it does not follow a checklist — it maps, probes, chains, and escalates the same way a capable attacker would, but continuously and at machine speed. According to The Register, the entire process was “fully autonomous from researching the target, analyzing, attacking, and reporting.”
This is the threat model shift that enterprise security teams are not yet calibrated for. Pre-agentic security assumed that finding a vulnerability required a human attacker with time, skill, and intent. Agentic security assumes a capable agent can find and chain vulnerabilities at machine speed with no prior credentials and no human involvement after launch.
2 hrs Time for CodeWall’s autonomous AI agent to gain full read-write access to McKinsey’s Lilli production database — with no credentials and no human involvement after launch
Why Writable System Prompts Are the Most Dangerous Exposure Nobody Is Talking About
Most post-mortems on the McKinsey breach focus on data exposure: 46.5 million messages, 728,000 files, 57,000 user accounts. Those numbers are significant. But the most alarming finding is one that received far less attention.
Lilli’s system prompts — the instructions that control how the AI behaves, what it recommends, what sources it cites, and what guardrails it applies — were stored in the same production database that the SQL injection gave write access to. According to Development Corporate’s analysis, a malicious actor could have updated those prompts with a single HTTP call — no deployment, no code change, no security alert triggered.
That means someone with bad intentions could have silently reprogrammed what Lilli told 40,000 consultants across every strategy engagement, M&A analysis, and client recommendation — without McKinsey’s security team ever knowing a change had been made. There are no standard enterprise controls for system prompt integrity. No audits. No version history. No change logs. Because most enterprise security frameworks were designed before the concept of a writable AI instruction layer existed.
⚠ Critical Security Gap
If your enterprise AI platform stores system prompts in a production database rather than in version-controlled, access-controlled configuration files — and that database is reachable from any application layer — your AI’s behaviour can be modified without a deployment, without a code review, and without triggering any standard security monitoring. This is not a theoretical risk. It happened at scale in February 2026.
Why the Real Attack Surface Is the Action Layer — Not the Model
There is a framing problem in how most organisations think about AI security. The conversation centres on model behaviour: jailbreaks, prompt injection at the user level, output controls, hallucination rates. Those are real concerns. But they are one layer of a much larger attack surface.
According to Salt Security’s analysis, the McKinsey breach demonstrates that the real enterprise risk sits in what they call the action layer — the APIs, internal services, connected databases, and shadow integrations that AI agents can reach, invoke, and manipulate. The model itself was not compromised. The infrastructure around it was.
In a pre-agentic environment, a weakly governed internal API might sit quietly for months without being discovered. In an agentic environment, it only needs to be reachable once. An autonomous agent will find it, probe it, chain it with other endpoints, and escalate — in the time it takes a human security analyst to write the first line of their incident report.
This is the same structural vulnerability that makes protecting AI infrastructure in operational environments so much harder than protecting traditional IT systems. The attack surface expands every time an AI agent is given access to a new tool, API, or data source — and most organisations are granting that access without a corresponding security review of what the agent can reach from there.
Why McKinsey’s Embarrassment Is Actually the Industry’s Wake-Up Call
The timing matters here. According to Bank Info Security, AI advisory work accounts for approximately 40% of McKinsey’s revenue. The firm’s CEO stated this year that McKinsey has built 25,000 AI agents to support its workforce. The company has actively positioned its own AI adoption — including Lilli — as evidence that it practices what it sells to clients on AI transformation.
The reputational dimension is real. But the more important implication is structural. If McKinsey’s security team — with full access to the platform, significant security investment, and world-class technology teams — did not find a SQL injection flaw that had been sitting in a production system for over two years, the problem is systemic, not individual. It reflects the fact that AI deployment timelines across the industry have consistently outpaced the security review cycles designed to catch these vulnerabilities.
⚠ Fiction — Illustrative Scenario
A regional bank deploys an internal AI assistant for its loan officers in Q3 2025. The system is connected to the bank’s CRM, document management platform, and internal communications tool via APIs. Security review focuses on the model: adversarial prompts are tested, output filtering is configured, and the system passes its internal audit. Six months later, a red team exercise discovers that three of the CRM API endpoints used by the AI agent have no authentication requirement and are reachable from outside the corporate network. The system prompt controlling the AI’s loan recommendation logic is stored in a shared production database with no write-access restrictions. No client data has been accessed. But the exposure window was six months. This scenario is speculative but reflects the structural gap the McKinsey breach made visible.
The 5 Failures the McKinsey Lilli Breach Reveals — and What to Do About Each One
Based on the CodeWall disclosure, The Register’s reporting, and Salt Security’s structural analysis, five preventable failures defined this breach. Each one has a direct corrective action.
Failure 1 — Public API documentation that maps the attack surface. Over 200 endpoints were publicly documented, including schemas and field names. This gave the autonomous agent a complete map before the first probe was sent. Corrective action: API documentation should never be publicly accessible without authentication, regardless of whether individual endpoints require it.
Failure 2 — Unauthenticated endpoints accepting database input. Twenty-two endpoints required no authentication. One accepted user input and wrote it to the database without parameterisation. Corrective action: Every endpoint that interacts with a database must require authentication and use parameterised queries without exception.
Failure 3 — System prompts stored in an accessible production database. Writable system prompts in a production database mean any database-level access translates into AI behaviour control. Corrective action: System prompts must be stored in version-controlled, access-controlled configuration files with change logging, completely separate from the application database.
Failure 4 — No continuous attack surface monitoring. The flaw existed for over two years and internal scanners did not find it. Corrective action: Static security reviews are insufficient for agentic AI platforms. Continuous, automated attack surface monitoring — including for AI-specific API footprints — is now a baseline requirement.
Failure 5 — Security framework designed for a pre-agentic threat model. Standard compliance checklists, penetration test cycles, and vulnerability disclosure processes were designed for environments where human attackers operate at human speed. Autonomous agents operate at machine speed. Corrective action: Security governance for AI platforms must explicitly model agentic attackers — including the assumption that any reachable endpoint will be found and probed.
The Amelia AI failure case study earlier this year showed a similar pattern: governance frameworks that were not designed for the operational reality of deployed AI systems. The McKinsey breach confirms that this is not an isolated problem — it is an industry-wide calibration gap between deployment speed and security depth.
Global Implications
The McKinsey breach carries direct implications for every enterprise deploying AI at scale — regardless of geography. In markets where data privacy regulation is tightening (EU AI Act, Nigeria’s NDPR, Singapore’s PDPA), a breach of this nature involving client data would trigger mandatory disclosure, regulatory fines, and reputational damage that extends well beyond the original incident. For enterprise buyers evaluating AI vendors, the breach sets a new due diligence standard: security architecture documentation, system prompt governance, and API authentication practices are now deal-relevant criteria, not footnotes. Any AI vendor that cannot answer detailed questions about how its system prompts are stored and who has write access to them should be treated as an elevated risk.
The McKinsey breach was patched within 24 hours of responsible disclosure. That speed of response is commendable. But the more uncomfortable truth is that the vulnerability existed for over two years in a production system used by 40,000 people, processing over 500,000 prompts every month. The patch speed does not change the exposure window. It confirms that the detection gap — not the remediation gap — is where the real risk lives.
For teams managing orphaned AI models and governance gaps in operational environments, the McKinsey incident adds urgency to a question that is easy to defer: who is responsible for the security of the infrastructure your AI touches, not just the AI itself? That question needs an answer before an autonomous agent finds the gap for you.
Understanding how AI safety protocols fail under real conditions is no longer an academic exercise. In 2026, it is an operational requirement.
Further Reading — Related Articles
- → An AI Lied About Shutdown — How AI Safety Protocols Failed Under Real Conditions
- → Amelia AI Failure Case Study — 2026’s Critical System Governance Lesson
- → Why You Need to Protect Industrial AI Infrastructure Before It Protects Everything Else
- → Managing Orphaned AI Models — The Hidden Industrial Risk Nobody Is Auditing
- → Industrial AI Safety Concerns 2026 — What the Data Is Actually Telling Us
Frequently Asked Questions
What happened in the McKinsey Lilli AI hack?
Security startup CodeWall used an autonomous AI agent to breach McKinsey’s internal AI platform, Lilli, in March 2026. The agent gained full read-and-write access to the production database in two hours — with no credentials and no human involvement — exploiting a SQL injection flaw in an unauthenticated API endpoint. It exposed 46.5 million chat messages, 728,000 files, and 57,000 user accounts.
How did the autonomous agent find the vulnerability that internal scanners missed?
Standard security tools like OWASP ZAP follow checklists and do not flag certain injection patterns when field names are not obviously user-controlled inputs. The autonomous agent maps, probes, chains, and escalates the way a skilled attacker would — continuously and at machine speed — without being constrained by a predefined scan checklist.
Why were writable system prompts the most dangerous part of the breach?
Because a malicious actor with write access to the database could have silently changed what Lilli told 40,000 consultants — their recommendations, guardrails, and sourcing logic — without deploying any code or triggering any security alert. No standard enterprise security framework has controls for system prompt integrity, because the threat model did not exist before large-scale AI deployment.
What should enterprises do immediately to reduce AI platform security exposure?
Three immediate actions: (1) Audit every API endpoint connected to your AI system and require authentication without exception. (2) Move system prompts out of production databases into version-controlled, access-controlled configuration files with change logging. (3) Map your full AI action layer — every tool, API, and data source your AI can reach — and treat any reachable endpoint as a potential attack vector.
Does the McKinsey breach mean enterprise AI platforms are fundamentally insecure?
No — it means they are being secured with frameworks designed for a pre-agentic threat model. The vulnerabilities exploited here are well-understood and preventable. The gap is not technical capability; it is that deployment speed has consistently outpaced security review depth across the industry.
How should procurement teams evaluate AI vendor security after this incident?
Ask three specific questions before signing: How are system prompts stored and who has write access? What authentication is required for every API endpoint connected to the AI? What continuous monitoring exists for the AI action layer — not just the model output? Any vendor that cannot answer these questions clearly represents elevated risk in your supply chain.
Your AI system is only as secure as the infrastructure it touches.
If your team is deploying AI faster than it is auditing the APIs, databases, and action layers connected to it, the McKinsey breach is not a cautionary tale — it is a preview. CreedTec covers the governance gaps, security failures, and structural risks that enterprise AI teams need to understand before an autonomous agent finds them first.
Subscribe to CreedTec →
