Skip to main content

Diff two Runs

Stub

This How-to is a stub. A canonical “diff tool” is planned but not yet shipped in JARVIS.

Goal

You will compare two runs to find where they diverged (planner output, candidate sets, binding, execution outputs).

When to use this

You are debugging regressions between versions.
You want to understand differences caused by policy or inventory changes.

Prerequisites

Two runs (run_id_a, run_id_b)
Durable artifacts for both runs

Steps

Compare run-level metadata:
- root node type ref,
- policy profile,
- model/profile (if applicable).
Compare decomposition artifacts (subtask lists).
Compare candidate sets and binding decisions per subtask.
Compare node run outputs and evaluation results.

Verify

You can point to a small set of divergence points (not “everything is different”).

Troubleshooting

Too much noise → pin versions and run with fixed seeds/temperature (where supported).
Missing correlation → ensure artifacts include stable IDs (subtask_id, candidate_set_id, node_run_id).
External side effects → isolate write/irreversible nodes and treat them specially.

Cleanup / Rollback

None.

Next steps

How-to: Replay a Run
Troubleshooting: Evaluation flaky or inconsistent

Goal
When to use this
Prerequisites
Steps
Verify
Troubleshooting
Cleanup / Rollback
Next steps