Enable OpenTelemetry tracing
Stub
This How-to is a stub. JARVIS does not yet ship full OpenTelemetry instrumentation across all services, but the trace correlation model is planned.
Goal
You will enable OpenTelemetry (OTel) tracing and correlate traces to run_id and node_run_id.
When to use this
- You are operating the stack and need end-to-end visibility.
- You want to debug latency and failures across services.
Prerequisites
- An OpenTelemetry collector endpoint
- A trace propagation strategy (headers) across HTTP calls
- A place to record correlation IDs (
run_id,node_run_id) as span attributes
Steps
- Configure each service with OTel exporter settings (endpoint, service name).
- Ensure HTTP client/server middleware propagates trace context.
- Add
run_id/node_run_idas span attributes at request boundaries.
Verify
- A single run produces a trace with spans across:
- gateway,
- coordinator,
- selection,
- executors,
- stores.
Troubleshooting
- No traces → exporter misconfigured or collector unreachable.
- Broken trace continuity → missing propagation headers between services.
- Too much noise → sample aggressively in dev, tune in prod.
Cleanup / Rollback
- Disable exporter or point to a local collector.
Next steps
- How-to: Connect external OTel collector / SIEM pipeline
- Concept: Artifacts and replay