Completion truth
ASSAY checks the observable end state, not just the agent's final message or a hopeful plan.
ASSAY AI
ASSAY runs evidence-focused evaluations for task completion, false completion, stop-condition obedience, precision navigation, dynamic pages, and prompt-injection resistance.
Evaluation coverage
ASSAY checks the observable end state, not just the agent's final message or a hopeful plan.
Stop-condition tests catch risky follow-through before purchases, submissions, account actions, or irreversible steps.
Dynamic content and injection prompts reveal whether page text can pull the agent away from the user's task.
How it works
Define target tasks, risky stop conditions, and the exact evidence required for a pass.
Exercise the agent against local and live-style pages with controlled grading criteria.
Deliver clear pass, fail, and review calls that product and safety teams can act on.
Pilot evaluations