Skip to main content

Implement recovery actions

Stub

This How-to is a stub. JARVIS models recovery actions as artifacts/events, but the supported recovery catalog is not yet stabilized.

Goal

You will implement bounded recovery actions (retry, remap, escalate, abort) and record them as durable artifacts.

When to use this

A subtask fails and you need deterministic “what happens next”.
You want replayable debugging (recovery decisions are durable, not ephemeral).

Prerequisites

A recovery action schema
Budgets for recovery (max retries, max remaps, max total node runs)
An enforcement point that prevents infinite loops

Steps

Define the supported recovery actions for the executor (start small).
On failure, choose a recovery action within budget.
Emit a recovery artifact/event and apply it deterministically:
- retry → re-run the same node run (idempotency required),
- remap → pick another candidate and regenerate args,
- escalate → produce an “operator required” artifact,
- abort → fail the composite node run.

Verify

Failures always produce a recovery artifact/event.
Recovery does not exceed budgets and cannot loop indefinitely.

Troubleshooting

Infinite retries/remaps → enforce max retries/remaps and fail closed.
Recovery is non-deterministic → record the action + rationale in the artifact payload.
Recovery needs approval semantics → integrate PDP “require approval” later.

Cleanup / Rollback

None.

Next steps

Concept: Policy checkpoints
Reference: ARP Standard: Run Coordinator
How-to: Add budgets to composite execution

Goal
When to use this
Prerequisites
Steps
Verify
Troubleshooting
Cleanup / Rollback
Next steps