Aegis: Closed-Loop Intelligence Engine

Ground behavior, improve it, and defend every ship decision with evidence.

Mode

Eval-first

Release

Gate-aware

Reports

Shareable

Access

Accounts enabled

Public site Workspace settings

Shell policy

The workspace chrome does not inject sample benchmark rows, synthetic scores, or decorative regression traces. Live evidence belongs in the closed loop, research runs, review queue, and release train after a real workspace is populated.

Closed Loop

Import traces, run the strict loop, and open the dossier.

Research Runs

Measure benchmark deltas and investigate candidate behavior.

Review Queue

Attach ownership, severity, and operator judgment.

Release Train

Persist gate state beside the same artifact lineage.

Launch-grade proof should be grounded in persisted artifacts, not shell placeholders.

surface

purpose

required

owner

dataset

fixed benchmark contract

yes

research

comparison

baseline vs candidate delta

yes

operator

review

annotated release judgment

yes

human

promotion

gate outcome + lineage

yes

release

Reusable eval suites

Save datasets once, rerun them on demand, and keep the comparison loop honest.

Use datasets for release gates, customer-specific workflows, or stable scorecards that should stay comparable over time.

Create suite Open release train

Saved datasets

Loaded dimensions

Product loop

Dataset to run to report to compare.

Create dataset

Build a reusable eval suite

Start from a realistic workflow template or your own recorded outputs. Once a suite is saved, you can relaunch it, version it, and compare candidates against the same benchmark.

Starter workflows

Dataset nameTags

DescriptionChange summary

Case 1

Saved datasets

Reusable suites

Open compare

Loading datasets...

Selected dataset

Select a saved dataset to inspect its cases and launch a run.