/docs/concepts/metrics

Metrics

Outcome scores attached to traces. Use them to capture both synthetic evaluations and observed user signals.

Shape of a metric

A metric has a name, a kind (boolean, numeric,enum), a value, and an associated trace ID. Optional: per-step ID for finer-grained scoring.

Recording metrics

client.record_metric(
    "task_adherence",
    "tr_…",
    passed=True,
)

client.record_metric(
    "response_latency_ms",
    "tr_…",
    score=1840,
)

Synthetic vs observed

Synthetic metrics come from automated evals (LLM judges, deterministic rules, rubrics). They run on a schedule or inline.

Observed metrics come from real signals — user thumbs-down, refunds, escalations, conversation reopened, item purchased, etc. obsrv treats both kinds the same way for storage, rollup, and monitoring.

Pass rates & rollups

Boolean metrics roll up into pass rates per release, per cluster, or per metadata facet. Numeric metrics roll up to p50/p95/p99. The dashboard renders both natively.