gecx eval reference
Usage: gecx eval <scenarios-dir> [options]
Arguments
<scenarios-dir>— directory to walk for*.scenario.ts,*.scenario.yaml, and*.scenario.ymlfiles.
Options
| Flag | Description |
|---|---|
--json | Print the JSON EvalReport to stdout instead of the table |
--output <path> | Write the JSON report to a file |
--baseline <path> | Path to a previous EvalReport JSON to compare against |
--fail-on-regress | Exit non-zero when any regression threshold trips, OR a new scenario fails relative to baseline |
--regression-config <path> | JSON file with { regressionThresholds: {...} } overrides |
--update-baseline <path> | Write the new report to this path (use after intentional improvements) |
--filter <tag> | Only run scenarios that include this tag |
--config <path> | EvalConfig JSON file (providers, scorers, regression thresholds) |
--help, -h | Print usage |
Exit codes
0— all scenarios passed (or skipped) and no regression tripped1— at least one scenario failed, baseline parse error, config parse error, or--fail-on-regresstripped
Environment variables
ANTHROPIC_API_KEY— enable the Anthropic judge providerOPENAI_API_KEY— enable the OpenAI judge providerGEMINI_API_KEY— enable the Gemini judge provider
When a key is missing, scorers that need that provider return status: 'skipped'; the scenario itself is not failed.
JSON report
The shape is defined in schemas/eval-report.schema.json. Top-level keys:
runId— UUID for this runstartedAt/finishedAt— ISO timestampsscenarios[]— per-scenario result with expectations and the fullScenarioRunRecordmetrics— aggregate metrics (see schema for full list)env—{ node, sdkVersion, providers: { anthropic, openai, gemini } }
Source:
docs/reference/eval-cli.md