Synthetic metrics

Record automated evaluation outcomes as metrics on traces. Built-in evaluator registration is not required for the current SDK path.

Recommended metrics

Task adherence — did the agent solve the user's actual ask?
Tool selection — was the right tool chosen for the step?
Response coherence — does the final reply read well?
Unsupported request — did the agent gracefully decline out-of-scope asks?
User frustration — does the user appear frustrated in the conversation?
Context retrieval — were the right docs retrieved?

Inline evaluation

Run your evaluator in application code, then write the result with record_metric or recordMetric.

result = client.list_traces(
    run_type="eval",
    metadata_filters=[{"key": "release", "value": "v3.2.1"}],
    limit=50,
)

for trace in result.data:
    passed = my_judge(trace)
    client.record_metric("task_adherence", trace.trace_id, passed=passed)

Dashboard behavior

Metrics appear next to traces and can be used for filters, monitor thresholds, and release comparisons.

← Desktop & mobile

Observed signals →