Aegis: Closed-Loop Intelligence Engine
Ground behavior, improve it, and defend every ship decision with evidence.
Mode
Eval-first
Release
Gate-aware
Reports
Shareable
Access
Accounts enabled
Shell policy
The workspace chrome does not inject sample benchmark rows, synthetic scores, or decorative regression traces. Live evidence belongs in the closed loop, research runs, review queue, and release train after a real workspace is populated.
Closed Loop
Import traces, run the strict loop, and open the dossier.
Research Runs
Measure benchmark deltas and investigate candidate behavior.
Review Queue
Attach ownership, severity, and operator judgment.
Release Train
Persist gate state beside the same artifact lineage.
Launch-grade proof should be grounded in persisted artifacts, not shell placeholders.
surface
purpose
required
owner
dataset
fixed benchmark contract
yes
research
comparison
baseline vs candidate delta
yes
operator
review
annotated release judgment
yes
human
promotion
gate outcome + lineage
yes
release
Reusable eval suites

Save datasets once, rerun them on demand, and keep the comparison loop honest.

Use datasets for release gates, customer-specific workflows, or stable scorecards that should stay comparable over time.

Saved datasets
0
Loaded dimensions
0
Product loop
Dataset to run to report to compare.
Create dataset
Build a reusable eval suite

Start from a realistic workflow template or your own recorded outputs. Once a suite is saved, you can relaunch it, version it, and compare candidates against the same benchmark.

Starter workflows
Case 1
Saved datasets
Reusable suites
Open compare
Loading datasets...
Selected dataset
Select a saved dataset to inspect its cases and launch a run.