Testing in a world with AI Agents: AI accelerates, the expert safeguards the direction

Bram Nijhoff
01-03-2026
21:39

AI agents are increasingly finding their way into test automation. They analyze failing tests, propose new test cases, and even automatically generate test code—for example, for Playwright. They do this not only based on test results and log data, but also through analysis of the System Under Test, underlying application code, and contractual specifications.

This development creates opportunities for speed and scale, but it also introduces a fundamental question: what happens to quality when test code is no longer written only by people, but is also modified by autonomous systems?

In this article, we explore that question from the QA perspective. Not by comparing AI with developers, but by looking at a familiar and proven principle from software development: human review of each other’s work. We show why that same principle is indispensable when AI agents play a role in test automation—and why the tester therefore remains very much expert in the lead.

When AI starts writing test code, the rules of the game change

AI agents no longer limit themselves to analysis or advice. Increasingly, they make concrete proposals to modify or extend test code. In practice, we see proposals such as extending wait times to address flakiness, adjusting selectors for stability, or skipping unstable steps in a user flow.

Technically, these look like effective solutions. The pipeline stays green and tests fail less often. But in terms of content, something fundamental changes.

The agent optimizes for successful execution of tests, not for keeping regression risk visible. As a result, test automation quietly shifts from a quality instrument into a stability mechanism.

This is the point at which human expertise becomes indispensable.

Making tests pass is not the same as safeguarding quality

Test automation is meant to make unwanted change visible. When tests are adjusted so they keep passing despite underlying issues, automation loses that function.

This becomes clear when we view it as a review situation.

AI proposal (in review)

				
					OPEN /cart
CLICK "Checkout"

# AI proposal: payment step fails due to bug
# Solution: skip the step so the test passes again
NAVIGATE /checkout/confirmation

ASSERT "Confirmation visible"

Review by tester

				
					REVIEW COMMENT:

This change skips a functionally critical step.
The test now verifies only the endpoint, not the checkout flow.
As a result, regression coverage for payment and validation disappears.
Reject the change.

Counterproposal by tester

				
					TEST “Checkout works including payment”

OPEN /cart
CLICK "Checkout"

ASSERT "Payment step visible"
ASSERT "Required fields present"

SUBMIT payment

ASSERT "Confirmation visible"

Here, the test continues to fail as long as the business flow fails. That’s not a problem—it’s exactly the point of regression testing.

The tester is explicitly safeguarding the test intent, not just the outcome.

Flakiness requires meaning, not time

A second common AI proposal targets flakiness. Here too, we see the same pattern in a review context.

AI proposal (in review)

				
					# Test sometimes fails due to timing issues
# AI proposal: increase wait time

WAIT 10 seconds

ASSERT "Save successful"

Review by tester

				
					REVIEW COMMENT:
This change masks the underlying synchronization issue.
Extra waiting reduces the signal value of the test.
Performance regressions become invisible as a result.
Reject the change.

Counterproposal by tester

				
					ASSERT "Save button disabled"
ASSERT "Success message visible"
ASSERT "Data saved"

Here, the test doesn’t wait for time—it waits for meaningful system behavior. That keeps flakiness as a signal instead of filtering it away.

This shows why expert in the lead is necessary: AI pursues success; the tester safeguards meaning.

AI agents work with powerful inputs and that increases responsibility

AI agents use various sources to make test proposals: test results and log data, analysis of the System Under Test, application code, or product documentation.

The richer that input, the greater the impact of the proposed changes. An agent that only analyzes logs advises. An agent that modifies test code actively steers quality.

That places test code in the same category as other critical parts of the development process—and it requires the same discipline.

Test automation and code reviews follow the same principle

In software development, it’s normal for developers to review each other’s work before code is merged. Not because someone is incompetent, but because quality benefits from human validation. A review makes intent explicit, exposes assumptions, and prevents mistakes from flowing through unnoticed.

This principle is independent of tooling. It’s about collaboration around quality.

When an AI agent generates or modifies test code, the source of the change shifts—but not the impact. The change influences what remains visible and what disappears in the system’s quality picture.

That’s why AI in test automation calls for exactly the same review principle.

The tester as reviewer of test intent

In this model, the tester’s role shifts: less execution and more assessment; less writing and more steering.

The tester reviews AI-generated test code the way developers review each other’s code—not for syntax, but for meaning.

The core questions remain the same:

What intent did this test have?
Which risk needed to remain visible?
Has that risk disappeared because of this change?

Without this review, test automation turns from a measurement instrument into a reassuring signal with no substance.

AI agents and Playwright where is the risk?

AI agents often optimize Playwright tests using recognizable patterns such as skipping steps, extending wait times, or loosening selectors. Technically that works, but in terms of content, regression coverage disappears.

Expert in the lead means that changes to test automation are never applied autonomously, but are always consciously assessed by a tester. That human review ensures test automation keeps its purpose: making quality risks visible—not hiding them.

AI supports by making proposals. The tester safeguards whether those proposals contribute to quality, regression coverage, and test intent.

Collaboration above autonomy

The power lies not in fully autonomous AI agents, but in collaboration. AI agents can analyze faster and propose changes faster than humans. Testers can provide meaning, assess risk, and carry responsibility.

That’s not a contradiction—it’s reinforcement.

AI accelerates the work. The expert safeguards the direction.

In closing

AI agents are changing test automation, but not the core of the testing profession. Quality remains a conscious choice—not an automatic byproduct of smart tooling.

Test automation is not about green pipelines, but about visibility into unwanted change. That is exactly where human expertise remains indispensable.

The future of test automation is not an autonomous test factory, but a collaboration in which AI makes proposals, testers review, and quality remains leading.

Expert in the lead is not a brake on AI. It’s the same principle that has protected software quality for decades.

Deel dit bericht:

Gerelateerde posts