Replay a Run
Stub
This How-to is a stub. JARVIS emits durable artifacts, but a first-class “replay API” is not yet implemented.
Goal
You will replay a run with the same inputs and compare outputs to debug regressions.
When to use this
- You want deterministic regression testing across stack versions.
- You need to debug “why did it do that?” with stable inputs.
Prerequisites
- A run with durable artifacts (candidate sets, decisions, outputs)
- A way to re-run with pinned versions (recommended:
JARVIS_Releasepinning)
Steps
- Identify the run to replay (
run_id). - Capture the exact inputs and the pinned stack/node versions.
- Start a new run with the same inputs and pins.
- Compare artifacts and outputs.
Verify
- The replay run completes and produces comparable artifacts.
- Differences are explainable (version change, policy change, model/profile change).
Troubleshooting
- Non-determinism → pin model/profile and reduce LLM variability.
- Missing artifacts → ensure artifact emission is enabled and stored durably.
- External dependencies changed → record external inputs as artifacts or mock them.
Cleanup / Rollback
- Optional: delete replay runs according to retention policy.
Next steps
- How-to: Diff two runs
- Concept: Artifacts and replay