·7 min read
How to Test Passkey (WebAuthn) Login in Playwright (2026)
Playwright 1.61 added a virtual authenticator. You can now test passkey login with no hardware key, in every browser. Here is a complete working test.

Published: · 6 min read
Where AI helps in QA, where it lies, and the one rule that keeps AI tests trustworthy — the AI does the work, a fixed check decides. With a working example.
On this page
To implement AI in QA, let AI do the work and let a fixed check decide the result. Use AI to generate tests, repair broken selectors, and explore your app. Never let AI grade its own output. Add an independent check the AI cannot change, test for repeatable results, and test that the agent did only what you asked.
AI in QA means using an AI model to help test software. That is the whole idea. The model can write test cases, fix tests that broke, click through your app like a user, or read a failure and guess the cause.
It does not mean the AI replaces testing. It means the AI does some of the work a tester used to do by hand. The judgment stays with you.
I test software for a living. The teams that win with AI in QA all draw the same line. AI does the work. A human, and a fixed check, decide if the work is good.
Here is the rule the rest of this guide hangs on.
Let AI do the work. Never let AI judge its own work.
Picture an AI agent that tests your login. It clicks around. It reports green. Everyone relaxes. But the agent decided what "pass" means. If it is too kind, it passes a broken page. Now you shipped a bug with a green check on top.
An AI that grades its own work is not a test. It is an opinion.
So you give the AI room to explore, and you keep one thing it cannot touch. A fixed check. A known-good answer. A hard assert (a check that fails loudly) on the real outcome. The agent finds the path. The fixed check says pass or fail.
In testing this fixed answer has a name: an oracle. The oracle is the part the system being tested is not allowed to influence. Keep your oracle out of the AI's reach and most AI-in-QA risk goes away.
These four jobs are where AI pays off today.
In all four, the AI proposes. You and your fixed checks dispose.
This is the part most guides skip. AI in QA fails in four ways. Plan for each.
That last one is not theoretical. In June 2026 a widely used model was pulled offline overnight. Teams that pinned their model switched to a fallback in one line. Teams that did not found out when their build broke.
Here is the pattern in code. An AI agent books a meeting room. Then three fixed checks decide if it really worked. The agent never grades itself.
import { test, expect } from '@playwright/test';
import { Stagehand } from '@browserbasehq/stagehand';
test('AI books a room — and only that', async ({ page }) => {
// Pin the model. Do not let it auto-upgrade under your tests.
const stage = new Stagehand({ env: 'LOCAL', model: 'anthropic/claude-opus-4-8' });
await stage.init();
// 1) Let the AI do the work. It decides HOW to book the room.
await stage.act('Book room B for 2pm tomorrow, for 30 minutes');
// 2) The fixed check the AI cannot move (the oracle).
// These helpers read your real database, not the agent's report.
const booking = await getBookingFromDb({ room: 'B', time: '14:00' });
expect(booking).toBeTruthy(); // it did the task
expect(booking.durationMin).toBe(30); // exactly what we asked for
// 3) Scope check. Did it touch anything it should not have?
const otherChanges = await getChangesExcept(booking.id);
expect(otherChanges).toHaveLength(0); // no surprise side effects
});
Read the three checks again. The agent's own "I booked it" is never trusted. The database is the oracle. The duration check catches a sloppy booking. The scope check catches the agent doing extra. (getBookingFromDb and getChangesExcept are your own helpers — they read real state, not the agent's words.)
To catch the repeatable-result trap, run the same prompt twice and compare:
test('same request, same result', async () => {
const a = await runBooking('Book room B for 2pm tomorrow, 30 minutes');
const b = await runBooking('Book room B for 2pm tomorrow, 30 minutes');
// The wording of the agent's reply may differ. The outcome may not.
expect(a.room).toBe(b.room);
expect(a.durationMin).toBe(b.durationMin);
});
The model is allowed to phrase its answer differently each time. It is not allowed to book a different room.
If you ship an AI feature to users, these three tests catch the failures that page you at 2am. Most teams only write the first easy one ("does it give a good answer?").
You do not need a platform or a budget. Start small.
Do those four and you have AI in QA that you can trust. The AI does more work. You keep the judgment. The fixed checks keep everyone honest.
That is the whole job: the gap between "the AI says it passed" and "it passed, for the right reason."
Anton Gulin is the AI QA Architect — the first person to claim this title on LinkedIn. He builds AI-powered test automation systems where AI agents and human engineers collaborate on quality. Former Apple SDET (Apple.com / Apple Card pre-release testing). Find him at anton.qa or on LinkedIn.
Get notified when I publish something new, and unsubscribe at any time.
·7 min read
Playwright 1.61 added a virtual authenticator. You can now test passkey login with no hardware key, in every browser. Here is a complete working test.

·6 min read
Playwright v1.60 adds scoped HAR recording, locator.drop(), ARIA boxes, and test.abort() so CI failures carry better proof.

·4 min read
AI Test Automation Architecture: The 3-Layer System AI test automation architecture is the system that tells AI what to test. It also defines how to run tests and prove the result. I split it into three layers: orchestration, execution, and evidence. Without all three, AI testing becomes prompt output with no production gate.
