Reusable eval suites
Save datasets once, rerun them on demand, and keep the comparison loop honest.
Use datasets for release gates, customer-specific workflows, or stable scorecards that should stay comparable over time.
Saved datasets
0
Loaded dimensions
0
Product loop
Dataset to run to report to compare.
Create dataset
Build a reusable eval suite
Start from a realistic workflow template or your own recorded outputs. Once a suite is saved, you can relaunch it, version it, and compare candidates against the same benchmark.
Starter workflows
Case 1
Saved datasets
Reusable suites
Loading datasets...
Selected dataset
Select a saved dataset to inspect its cases and launch a run.