Test Plans

Aiqaramba's smallest unit of testing is a journey: a single prompt that an agent works through in a single browser. That model handles a lot of the day-to-day testing work, but plenty of real flows do not fit inside one journey. An admin invites a teammate, the teammate accepts the invite from their own inbox, the admin then verifies the new seat showed up. A buyer places an order, a fulfillment user picks it, a customer-support user issues a refund. A signed-up user runs through onboarding while an anonymous visitor checks that the marketing pages still render. Stretching any of these into a single prompt produces an unwieldy mess of "now switch users, now check email, now go back" instructions that the agent has to re-derive every run.

A test plan is how you split that workload up. You write each segment as its own journey, small and focused and reusable, and let the test plan stitch them together into the larger scenario. The two primitives it gives you are step dependencies, which describe the order of work, and browser profiles, which describe whose browser each step runs in. Most of designing a test plan is figuring out the right combination of the two.

If you are after JSON payloads, addressing rules, and endpoint reference, head to the Test Plans API reference. This page is about what test plans are and how to use them.

Modeling the order of work

A test plan is a graph. Each node is a step that runs a journey; each edge says "this step needs that step to finish first." You are not writing a procedure top to bottom; you are declaring which steps depend on which.

The runner uses that graph in two ways. It walks it in dependency order, so a step only starts once everything it depends on has finished. And it parallelizes anything that does not depend on each other: if three steps sit at the root with no incoming edges, they start at the same time; if two branches happen to share an ancestor, they fan out the moment that ancestor completes. For a long test plan with mostly independent verification steps, this turns hours of serial wall-clock time into minutes.

When a step fails, the runner skips its descendants by default, because running them produces noise. A checkout step is not interesting if signup never completed. There is an opt-out for the cases where a step really does need to run regardless: cleanup, teardown, or a separate verification that does not actually depend on the failing step's success.

The point of the graph model is that it lets you describe scenarios as they really are, with their natural concurrency and natural failure semantics, instead of flattening everything into a script.

Sharing browser state across steps

The other half of a real multi-step scenario is browser state. A journey that "logs in" is only useful to a later step if that later step runs in the same logged-in browser. A journey that asserts "the public homepage is reachable" is only useful if it runs in a browser that is not logged in.

Test plans handle this with browser profiles. Every step is assigned a profile, which is just a label like admin or invitee. Steps that share a profile share a browser session: the same cookies, the same local storage, the same logged-in identity. Steps with different profiles run in fully isolated sessions that know nothing about each other.

Once you see this, the awkward multi-actor scenarios become straightforward. The admin invite flow uses two profiles: an admin profile that runs the invite step and the later verification step, and an invitee profile that runs the accept-invite step in between. The two browsers exist side-by-side. The admin step at the end picks up exactly where the earlier admin step left off, still authenticated and still on the dashboard, while the invitee browser stays a separate, isolated client throughout. There is no "log out then log back in as someone else" gymnastics to script.

Step-level browser sharing also makes anonymous flows easy: a step whose journey does not attach any mailbox or pre-authentication simply runs unauthenticated. That is how you verify that public pages are reachable, that gated routes redirect, or that a brand-new visitor can complete signup, without standing up a separate project.

Passing data between steps

The other thing the runner gives you is a way to thread data through a plan, so a step that creates something can hand the result to a step that needs it. There are two pieces to understand.

A plan variable is a value the caller supplies when they start a run. You declare each variable on the plan with a name, a description, an optional default, and a type. Variables without a default are required at run time. The caller passes them in the body of the run request; missing required variables get rejected before any step starts.

A step output is a piece of data the step's journey will publish during its run, for downstream steps to consume. You declare each output by name on the step; the journey itself is responsible for actually producing the value. A signup step might declare a user_id output; the journey records the new user's id during its run, and any descendant step can pick it up.

Inside a journey's prompt template, you can directly reference these values using Go-template double-brace syntax: a step output is referenced as {{ steps.LABEL.outputs.NAME }}, and a plan variable as {{ plan.variables.NAME }}. The runner resolves every reference into a concrete string at dispatch time.

For the JSON shapes of the variables and outputs fields, see the Test Plans API reference. For the full journey API reference, see User Journeys.

Designing long-running flows

A useful way to approach a new plan is to write down the actors first: who is involved in the scenario, and which of them needs their own browser. That gives you the set of profiles. Then write down what each actor does, in order, and which of those actions only make sense after some other actor has done their part. That gives you the steps and the edges between them. Anything that is not connected by an edge is a hint that those steps can run in parallel; if that is not what you want, add the edge.

For long, deep flows like a full e-commerce checkout, a multi-day approval workflow compressed into a single run, or an onboarding sequence that touches half a dozen screens, the payoff is that you stop writing one giant brittle journey and start composing small ones. Each journey stays short enough that an agent can complete it reliably. The test plan owns the orchestration, the parallelism, and the actor switching, so the journeys themselves do not need to know they are part of something larger. When a step fails, you get a precise failure on a small surface, not a forty-step prompt that died somewhere in the middle.