QA as mitigation for the uncertainty of AI-generated code

If you’ve ever backed up files remotely, you’ve almost certainly used rsync. It is one of the most widely deployed tools in the Unix world, running on servers, backup systems and in scientific computing clusters. It is infrastructure in the most unglamorous sense: invisible when it works, catastrophic when it doesn’t.

In the wake of a spike in rsync bug reports and related complaints about the new vibe coded releases, its creator and maintainer Andrew Tridgell published a Medium post, where he defends his decision to make AI integral to his development process.


All hands on deck

In een Medium-post bespreekt Tridgell de achtergrond van dit verhaal. Hij kreeg te maken met een toenemende stroom meldingen van security vulnerabilities. Veel daarvan waren AI-gegenereerd, al zegt hij dat “er enkele opmerkelijke zijn met zeer zorgvuldige en hoogwaardige handmatige analyses.” Dit is duidelijk geen compliment aan de AI-gegenereerde meldingen, maar Tridgell was facing an increasing flood of security vulnerability reports. Many of them were AI generated, though he says “there are some notable ones with very careful and high quality manual analysis.” This is clearly not a compliment to the AI generated ones, but it does seem like Claude Mythos and other companies’ latest models are well capable of discovering vulnerabilities in software. Many of the reports Tridgell received concerned real exploits in the rsync code, he says.het lijkt er wel op dat Claude Mythos en de nieuwste modellen van andere bedrijven goed in staat zijn kwetsbaarheden in software te ontdekken. Veel van de meldingen die Tridgell ontving, betroffen echte exploits in de rsync-code, zegt hij.

Tridgell was (is) under pressure, and realised the defenses on rsync (test coverage, pipelines, security scans) needed a lot of work. He is not an AI enthusiast per se though. His motivation to use it is, well, pragmatic:

Andrew Tridgell: “I’m retired (though my wife may dispute that!) and I’d rather be out sailing than working on rsync security issues, so I have reached for several AI tools to help with what needs to be done.”

The regressions, which Tridgell admits happened in that particular (3.4.3) release, put a lot of attention on the project and its maintainer, one user going so far as to create a Github issue “Please Do Not Vibe Fuck Up This Software.” The sentiment on vibe coding online is largely negative, but Tridgell defends his actions as deliberate:

Andrew Tridgell: “I quite deliberately tried to err on the side of fixing security issues for that release, and there were some valid (but unusual) use cases that got caught up in the changes. None of those cases were covered by the existing rsync test suite (…)”

What are some ways AI generated code can cause regressions on a project like this, run by an engineer with 40 years of experience and a comprehensive test suite? In one example from a Hacker News thread, a contributor noticed a commit introducing this change:

- if (!ptr)
-   ptr = malloc(num * size);
- else if (ptr == do_calloc)
+  if (!ptr || ptr == do_calloc)
     ptr = calloc(num, size);

The now reverted change, written with Claude, quietly forces all memory allocations to use calloc (a subtler and more expensive operation) rather than only those that require zeroed memory. “For large and recursive allocations, this becomes a significant cost,” user GodelNumbering writes.

The 3.4.3 release, rushed out to address security issues, introduced regressions like this that broke legitimate use cases not covered by the existing test suite. Because it was framed as a security update, it propagated quickly into distributions and automated upgrade paths. Real data was at risk.


developer looking overwhelmed at laptop on desk with sailboat replica next to him

One man army

Rsync is maintained effectively by a single person. Tridgell is retired. If he would rather be out sailing, why isn’t he? He is not paid to do this, though by rights he probably should be. By virtue of caring for his project, he felt he had to choose. Patch fast, and risk regressions. Patch carefully, and users remain exposed for longer.

While previously the task would have been too gargantuan for one person to even start on, AI offered something that looked like a way out of the bind: patch fast and even increase test coverage. And because AI tools were both part of the problem and part of the solution, the surrounding debate quickly became an ugly AI-versus-anti-AI turf war. But this obscures a deeper issue, as one commenter on the Medium post observes:

Medium reaction: “Congratulations. You’ve succeeded in rescinding Claude and Vibe Coding of responsibilities for this mess. Turns out the problem was a mismanagement of the project.


Uncertainty both ways

Reasonable people can argue past each other when the situation is genuinely uncertain in two directions at once, and this is the case with rsync.

How serious are the vulnerabilities, really?

Firstly: we do not know how serious the vulnerabilities actually are in practice. One commenter put it bluntly: “Most of these CVEs are going to be completely inconsequential to the material realities of our lives.” But if you are the one maintaining the project, faced with a thousand CVE’s, are you so sure?

What is the risk of speed?

At the same time, shipping AI-assisted refactors at speed carries risk and uncertainty too. Especially with software so widely used as rsync. Regressions in software like this will certainly cause harm to users. What is the chance, how many people will be affected, and how much?

We can certainly say something about both paths, and they can even be true simultaneously. None of these observations settles the debate. This uncertainty is not Tridgell’s fault, nor any open source maintainer, whether they use AI or not. Uncertainty and risk are intrinsic to software development at speed. If we could pull out a measuring stick and call the option with the lowest risk, we’d be doing that. This is why Quality Assurance exists (and also why individual contributors are generally not liable when harm does occur).


Zeilboot

Foto: Evan Smogor via Unsplash

Quality Assurance fundamentals

Some humility is warranted when we are not the one making the call; signing off on the code, shipping the release, waking up to floods of regression tickets and personal threats. But should a project as widely used as rsync rest on the shoulders of a handful of maintainers, however capable?

This is not a new problem. Open source software that underpins commercial systems has always been disproportionately maintained by uncompensated individuals working in the open, without the organisational scaffolding that normally exists to manage risk.

Waarom QA-processen en releasebeheer bestaan

🧭 They ensure that tradeoffs are made deliberately, not under the pressure of a backlog of unanswered CVEs.
⚖️ They were never there to produce perfect software - perfection is not a realistic bar for software shipped at speed.
👥 They ensure someone can carry the risk with accountability - not one person in isolation.

We cannot eliminate uncertainty, but we can make reasoned decisions about it. So that those of us who have earned our retirement can spend it on sailing if we want to.

Share this article:

Gerelateerde posts