The AI War on Open Source Is a Fundraising Pitch, Not an Engineering Argument

AI industry hype beasts are spreading one of the craziest narratives I've ever seen. It goes a little something like this... "You don't need open source libraries anymore. Large Language Models can generate the code you need on demand. Why carry the risk of third-party dependencies when you can just synthesize the functionality yourself? Why expose your software supply chain to vulnerabilities in projects like XZ Utils or Firefox when a model can write you something bespoke in seconds?"

This argument is being floated implicitly and explicitly by the same companies charging you per token to generate that replacement code. Who would've thought? It's also one of the most intellectually dishonest positions in modern software engineering.

However, the argument does resonate with some as software supply chain attacks have genuinely gotten worse. The XZ Utils backdoor discovered in March 2024 (CVE-2024-3094) is the canonical recent example. A likely state sponsored threat actor spent over two years cultivating a persona named "Jia Tan," slowly building trust within a small, under-resourced open source project before gaining co-maintainer status and injecting a backdoor that, had it shipped undetected, could have provided remote code execution on countless Linux servers running OpenSSH. It was found almost by accident by a PostgreSQL engineer investigating anomalous SSH latency. As one security researcher put it at the time, it was "a nightmare scenario: malicious, competent, authorized upstream in a widely used library."

It's a serious attack that deserves our attention as these sorts of infiltrations will likely continue. However, it does not warrant being used as ammunition to sell a more expansive fear-mongering subscription that will generate fresh unmaintained, unaudited, in-house replacements for battle-tested infrastructure software. It also doesn't warrant needing yet another static analysis tool likely with it's own LLM to circle back into the LLM so you've got slop watching slop un-slop itself until the next slop cycle re-slops. Let's not forget these same models being floated as the solution are the same ones trained on hard labor in open source. It can't really be better than the source that fed it.

Before we get to major works like Postgres and Linux, we need to talk about left-pad. It's a pre-requisite in understanding a pre-LLM take on the risks of dependency in the open source community. In March 2016, a developer named Azer Koculu unpublished 273 npm packages after a dispute with Kik Interactive and npm, Inc. over a package name. Among those packages was left-pad. An eleven line JavaScript function that pads the left side of a string. Because thousands of projects had taken left-pad as a hard dependency such as Babel, Webpack, React, and React Native, builds failed globally for hours across companies including Facebook, PayPal, Netflix, and Spotify. npm's CTO had to take the unprecedented step of manually restoring a deleted package version to stop the bleeding. Eleven lines of code. Padded strings. Internet broken. I was there. I was overcoming a stint typing Coffeescript at the time.

This incident was embarrassing, and it exposed something real about the JavaScript ecosystem's dependency culture. As David Haney noted at the time, "I can't help but be amazed by the fact that developers are taking on dependencies for single line functions that they should be able to write with their eyes closed." He had a point. There's an isArray package on npm that was downloaded 18 million times in a single month. It contains one line of code. An is-positive-integer package, four lines long, and originally required three dependencies to function.

This was a library bloat problem treating npm like a standard library, packaging trivial functions as public dependencies, and a community building software on a tower of micro-packages they've never read. It is absolutely a legitimate engineering concern. It is the correct reason to own your primitives and internalize your core business logic. It is still a PTSD that has had me completely retreat from the JavaScript ecosystem except when necessary to churn out a website. I prefer to solve tooling problems with a good CLI or TUI these days.

The dependency pyramid problem has absolutely nothing to do with whether you should use Postgres, Chromium, Redis, OpenSSL, etc. Generating trivial code like left-pad is reasonable. Generating your own front-end framework is delusional. I know I'm being a bit over the top here. I'm also a huge advocate from abandoning bloated frameworks. I also believe in spirit the anti-OSS mouths aren't saying go and regenerate something like Postgres. However the words echo and amplify as they percolate up to tech ignorant executives, security professionals who've never written a line of code, vendors oiling the sale pitch, and consultant communities looking to lock in a scope of work with infinite billing potential.

For instance, PostgreSQL has been in active development since 1996. It is maintained by a global community of contributors. It has millions of production deployments. Its codebase is audited by security researchers, funded by enterprises, stress-tested by financial institutions, and relied upon by infrastructure that underpins substantial portions of the global economy. The same is true of the Linux kernel, the OpenSSL library, the Go standard library, GCC, LLVM, and the C runtime.

Are we seriously proposing that a team of engineers feeling a vibe should generate their own replacement for any of these because they only use a subset of the available feature set? By that logic, no one should use a C++ compiler because they don't need every construct in the standard. No team should run Linux because they don't touch every kernel subsystem. The argument collapses immediately under its own weight.

The security argument inverts even more dramatically. The premise is that AI-generated, internally-owned code is safer because it avoids open source supply chain risk. But please consider what is actually being proposed.

Thousands of security researchers reviewing public code for a production database versus your team reviewing AI-generated code that nobody outside your organization has ever seen. The open source model for critical infrastructure provides something that generated code fundamentally cannot... An entire industry's worth of security eyes, funded testing, formal audits, and decades of hardening against adversarial conditions.

Let's look at what some research data actually says about the security posture of AI-generated code, since that's the axis on which the "generate instead of depend" argument is being made.

Veracode's 2025 GenAI Code Security Report, which analyzed code produced by over 100 LLMs across 80 real-world coding tasks, found that AI-generated code introduces security vulnerabilities in 45 percent of cases. Across Java, JavaScript, Python, and C#, AI-generated code contained 2.74 times more vulnerabilities than equivalent human-written code. Java was the worst performer, with a 72 percent security failure rate for AI-generated code.

Apiiro's independent research across Fortune 50 enterprises found 322 percent more privilege escalation paths and 153 percent more design flaws in AI-assisted codebases. Granted these are also products shilling a separate cost onto us to help repair generated messes. They also aren't wrong from my own code review experience in the last 3 years.

Separately, research published in 2025 found that even leading LLMs recommended incorrect dependency versions, hallucinated package names, outdated or non-existent versions in 27 percent of cases, opening the door to dependency confusion attacks where an attacker registers the made-up package name before your build system pulls it. I've personally experienced generated code taking "shortcuts" into unsafe. It's become an amplifier in a way of the dependency pyramid.

The Open Source Security Foundation, summarizing this body of work, stated it plainly: "AI can generate lots of code, but that code must be reviewed and reworked. Productivity measurement must include this rework time to be meaningful. AI doesn't replace humans; it's an assistant to humans that needs constant guidance."

So the security argument being made against open source applies with far greater force to AI-generated code. When you depend on open source, the failure modes are public, documented, and subject to CVE tracking. When you depend on code you generated, you own every vulnerability in silence until that generated MFA bypass becomes incredibly loud.

Here is what the proponents of "generate your own infrastructure" seem to want you to forget: the XZ Utils backdoor was caught by the open source community. A software engineer at Microsoft, Andres Freund, noticed an anomaly while doing routine performance investigation. He published his findings to the Openwall mailing list. The open source security community converged on the analysis within days and mitigations were distributed through the same open source infrastructure that had been targeted.

Freund himself acknowledged it was a lucky find. He wasn't a security researcher, or even auditing XZ Utils when he stumbled across it while investigating something else entirely. But the critical point is that "something else" was visible. The source code was public and the anomaly was observable. The entire machinery of distributed open source trust, despite its failure in this case to catch the attack before it shipped, ultimately surfaced it before mass exploitation occurred.

Compare this to what happens when your organization generates five hundred custom utility functions over the course of a year and distributes them internally. There is no Openwall list for your private codebase. There is no CVE entry when your AI-generated authentication middleware has a subtle logic flaw. There is no community working to release a patch while your team is asleep. There is no Andres Freund watching your SSH performance.

The open source model has supply chain risks. So does every other alternative including your closed source. How many internal Git repositories in your enterprise don't have a global lock requiring multiple reviews and no force pushing to Main? I'd be willing to bet your internal code upstreams, reviewing, and releasing are far less rigorous than SQLite. The question is which risks are manageable and which are catastrophic.

There is a correct lesson from the left pad incident that is applicable to generated code in trivial places. Left-pad did not prove that open source is dangerous. It proved that dependency culture without engineering discipline is dangerous. The correct takeaways were:

Own your primitives. If a function is short enough that any competent engineer should be able to write it in five minutes, it should not be a dependency. This is good engineering hygiene, not a critique of open source as a concept.
Understand your dependency graph. You should know what you're pulling in and why. Blindly chaining npm packages five levels deep without reading any of them is a risk, regardless of whether those packages are legitimate.
Use frameworks, not micro-packages. A trigonometry library is a reasonable dependency. A cosine package is not. A cohesive framework maintained by a community serves an entirely different function than a one-liner someone published on a weekend.

None of these lessons say: generate your own Postgres. Generate your own React. Reimplement your own TLS stack. These are the arguments of people who either hasn't maintained a line code or is selling you something.

What I actually think this is about is far more manipulative and it's resonating. When a major LLM vendor implies that AI-generated code can displace open source dependencies, they are making a fundraising argument, not an engineering argument. They are screaming, "You need this more than ever. Tokens! Tokens! Tokens!"

The economics are transparent. If you believe open source is a liability and generated code is the answer, then every function you write becomes a billable token. Every dependency you remove becomes a new LLM API call. The company selling you inference at scale benefits directly from you believing that your software should be generated rather than assembled from the commons that the industry has spent decades building.

It is worth being clear about what "the commons" actually means here. The Linux kernel is open source. The GCC and LLVM compilers are open source. The Go and Rust standard libraries are open source. OpenSSH is open source. The TLS implementations securing nearly every connection on the internet are open source. The languages, operating systems, and architectures on which every piece of AI-generated code runs are themselves built on open source. The suggestion that we can route around this foundation by generating our own code on top of it is not a safety argument.

The path forward is not "generate everything" and it is not "depend on everything." It is engineering judgment applied with rigor armed with tooling to eliminate the mundane given good guidance. Write and own the code that is core to your business logic and competitive differentiation. If you're building a security product, your detection logic should be yours. If you're building a video platform, your streaming pipelines should be yours. Code that is your IP encodes your domain knowledge, and probably doesn't and shouldn't exist in a third-party library you don't control.

Depend on the things that represent massive shared investment in correctness and security such as databases, compilers, operating systems, cryptographic primitives, widely-adopted frameworks. Contribute back where you can. A software bill of materials (SBOM) is more important than ever so audit your dependency graph. Know what you're pulling in and from whom. Treat supply chain security as a real engineering discipline, not a reason to abandon the infrastructure the industry runs on.

Be ruthless about micro-package bloat. Left-pad should never have been a dependency for anyone and isArray should never have had 18 million downloads. Stop using React when a few hundred lines of straightforward code or yes, sometimes, a small well-understood library is what the problem actually calls for. Please be skeptical when the company selling you code generation tells you that the problem with software is that it relies too much on open source. The generation is quite literally inferred from existing open source. That particular argument has a very clear financial beneficiary, and it isn't you.

Search This Blog

rydonahue

The AI War on Open Source Is a Fundraising Pitch, Not an Engineering Argument

Popular posts from this blog

The Fallacy of Cybersecurity by Backlog: Why Counting Patches Will Never Make You Secure

IPv6 White Paper I: Primer to Passive Discovery and Topology Inference in IPv6 Networks Using Neighbor Discovery Protocol

This is Cybermancy