Evaluations

Run evaluations against traces and gate workflows on a minimum score.

Evaluations score agent runs against criteria you define. From the CLI you can run an evaluation on demand or use it as a pass/fail gate.

Run an evaluation

Score one or more traces against an evaluation:

retrace eval run --evaluation <eval_id> --traces <trace_id_1>,<trace_id_2>

Quality gate

The gate exits with code 0 when the score meets your threshold and code 1 when it falls below — ideal for blocking a deploy:

retrace eval gate --evaluation <eval_id> --trace <trace_id> --threshold 0.8

See CI/CD for a complete pipeline example.

Next steps

  • CI/CD — wire the gate into your pipeline.
  • Forks — replay a run before scoring it.

On this page