Notebook · 2026

Now Testing 🎮 Gaming & sim UIs

Ten years making sure simulation
software didn't fail. Then GenAI
rewrote the rules of testing.

I'm Pragati. Most of the last decade has been inside CAE Canada's simulation group — the kind of place where a single regression in a flight-deck training rig isn't a Slack ping, it's a missed certification cycle. That bar is what I bring to the next decade of QE, where the hard problem isn't writing tests but validating non-deterministic systems before they ship. This page isn't a résumé. It's a notebook on five things I built to figure that shift out — what each one is trying to solve, how it works, what I'd do differently.

Read the notebook ↓ Download CV ↓ Get in touch →

The projects

Five builds. Five problems I kept seeing.

Each of these started the same way — a recurring tax on a QE team I'd worked with, or one I was watching at a distance. Each card has the same three beats: why I built it, how it actually works, and what I'd do differently. They're sequenced roughly in the order the questions hit me — not by date shipped.

Visual QE · Claude Vision 001

DriftGate

Visual regression as a CI gate

Why I built it

Visual QA was the slowest gate on every release I'd ever owned — 15 to 30 minutes of a human clicking through screens looking for the kind of drift no test asserts on. The bug isn't in the diff, it's in the didn't-notice.

How it works

A Playwright + Claude-Vision loop renders every front-end change, scores it against a design system, and runs a bounded fix loop until the screen conforms. Deterministic pixel diff and design-token assertions are the hard pass/fail; the vision critic stays advisory. That hybrid keeps a non-deterministic model from ever wrongly failing a build.

What I'd do differently

Hold the cost line earlier. Prompt-caching the design system dropped per-run cost to ~$0.01 — I should have wired that in on day one, not after the first bill.

Playwright
Claude Vision
FastAPI
GitHub Actions

Multi-Agent · Sonnet + Haiku 002

Qalibur

Ten agents, one merge-ready test PR

Why I built it

Most "AI for testing" tools generate a handful of cases and stop. The work that actually eats a senior QE's day — risk strategy, traceability, triaging which failures are flaky vs. real — gets handed straight back. I wanted the pipeline a QE lead would build if they had ten engineers each owning one stage.

How it works

Ten focused agents own the lifecycle end-to-end: Scout walks the repo, Strategist produces a risk matrix and equivalence-partition tables, Scribe writes Gherkin, Crafter writes Playwright, Deployer opens the PR, Runner dispatches CI, Triage classifies failures. A Gatekeeper scores every handoff ≥8.0/10 or retries upstream up to 3× before escalating. Every artefact records its parent, gate score, and attempt — fully auditable chain.

What I learned

Specialisation beats one big agent. Sonnet for the synthesis stages, Haiku for the comparison-heavy ones with cacheable prompts. The gating loop is the part that makes non-deterministic models safe to put in a CI position.

Claude Sonnet 4.6
Claude Haiku 4.5
TypeScript
Node + Express
React + Vite
GitHub API

AI · LangChain 003

Spectra

AI test generation from any OpenAPI spec

Why I built it

Authoring API tests from an OpenAPI spec is mechanical work — and yet a senior engineer's whole day disappears into it. The schema already tells you what the happy-path, boundary, and auth-bypass cases look like. Why is anyone still typing them out?

How it works

Parses any OpenAPI 3.x spec, scores each endpoint by risk tier (Low / Med / High / Critical) with a LangChain agent, then generates four test categories per endpoint — happy path, boundary, auth bypass, malformed input. Pydantic contracts on every LLM output mean zero malformed cases reach CI. Cosine-similarity dedup at 0.92 threshold cuts test bloat 40%+ before the suite ever runs.

What I learned

Schema-enforced LLM output is the difference between a demo and a tool you'd actually wire into CI. LangSmith made token cost a number I could explain in a review — that's what made the project legible to non-QE stakeholders.

LangChain
FastAPI
Pydantic
Docker
LangSmith

Security QE · Terraform + Claude 004

TerraGuard

Catch Terraform AWS security regressions before they ship

Why I built it

Terraform PRs ship security regressions silently. Static scanners flag the same hundred findings on every run, so the signal that matters — what's new in this PR — gets buried in noise. The team stops looking.

How it works

An end-to-end pipeline that lives entirely inside GitHub Actions — no AWS credentials, no external infra. Checkov (CIS/NIST-mapped) + tfsec + Trivy scan the plan; findings diff against the main baseline to isolate true regressions; pytest invariants per domain gate the PR by severity; Claude Haiku triages each new finding with exploitability scoring and flags the auto-remediable ones; an auto-fix PR lands a minimal HCL patch back on the contributor's branch. A public dashboard publishes the posture-score delta every run.

What I learned

Baselines are what turn raw scanner output into a working CI gate. Teams don't react to a number that's been red for a year — they react to a delta. The dashboard kept the right metric visible: not absolute findings, but regressions per PR.

Checkov
tfsec
Trivy
pytest
Claude Haiku
GitHub Actions
Pydantic

Automation · LLM 005

Self-Healing E2E

Autonomous Playwright selector repair

Why I built it

Flaky locators are the single biggest tax on E2E suites. Every selector break is hours of debugging that doesn't ship value — and the team that pays it is the same team writing the next set of features.

How it works

The framework detects a broken Playwright locator at run time, queries an LLM with the live DOM, generates a repaired selector, and files an auto-PR — turning hours of investigation into zero human intervention per failure. A Next.js dashboard shows healed-selector history per run, across stable and breaking app versions side-by-side.

What I learned

The dashboard mattered more than the model. Without the audit trail, self-healing feels like a black box and teams won't trust the auto-PRs — with it, they treat the healed selector like any other reviewable diff.

Playwright
Node.js
LLM
Next.js

Background

The decade behind the projects.

Every choice in the projects above comes from somewhere — usually a real release where I watched something fail (or almost did). This is the short version of where those instincts came from, not the long-form CV. The CV is one click away if you want it.

2014
QA Tester

VISCAR Education · Chandigarh

Stood up the QA practice from scratch — 100% planned test execution across 3 client releases before each one reached production.
2015
QA Engineer

Infostretch · Ahmedabad

Caught a payment-redirect security flaw across three checkout pages before release, and led the team's transition to TDD — quality ownership moved upstream.
2016 — 2018
QA Analyst

Teleperformance Canada · Montréal

Ran QA across a 50+ agent team and overhauled the QA integration gates for SAP CRM e-commerce releases, cutting go-live defects ticket-by-ticket.
2018 — Apr 2026
Software Test Engineer

CAE Canada (Presagis) · Montréal

Eight years inside aerospace simulation. Sustained ISO 9001:2015 across six audit cycles, converted manual regression into a Selenium + Java automated suite, and owned Nessus security testing end-to-end for the mission-critical sim modules.

Education

Where I learned the craft.

Masters in Automation Testing SimpliLearn · Online 2021 — 2023
M.Eng. — Quality Systems Engineering Concordia University · Montréal 2016 — 2018
Masters in Computer Applications HP Institute of Management Studies · Shimla 2011 — 2014
B.C.A. St. Bede's College · Shimla 2008 — 2011

Get in touch

Always up for a conversation about AI-native QE.

Hiring, collaborating, or just want to compare notes on what's working in your QE org — send me a line. The CV is below if you need the structured version.

Email guptapragati1990@gmail.com
Phone +1 (514) 754-9774
GitHub github.com/pragatig25
Location Montréal, QC · EST/GMT-5
Languages English (primary) · French (basic)
CV pragati-gupta-cv.pdf

Ten years making sure simulation software didn't fail. Then GenAI rewrote the rules of testing.

Five builds. Five problems I kept seeing.

DriftGate

Qalibur

Spectra

TerraGuard

Self-Healing E2E

The decade behind the projects.

QA Tester

QA Engineer

QA Analyst

Software Test Engineer

Where I learned the craft.

Always up for a conversation about AI-native QE.

Ten years making sure simulation
software didn't fail. Then GenAI
rewrote the rules of testing.