/docs/evals/synthetic

Synthetic metrics

Record automated evaluation outcomes as metrics on traces. Built-in evaluator registration is not required for the current SDK path.

Recommended metrics

  • Task adherence — did the agent solve the user's actual ask?
  • Tool selection — was the right tool chosen for the step?
  • Response coherence — does the final reply read well?
  • Unsupported request — did the agent gracefully decline out-of-scope asks?
  • User frustration — does the user appear frustrated in the conversation?
  • Context retrieval — were the right docs retrieved?

Inline evaluation

Run your evaluator in application code, then write the result with record_metric or recordMetric.

result = client.list_traces(
    run_type="eval",
    metadata_filters=[{"key": "release", "value": "v3.2.1"}],
    limit=50,
)

for trace in result.data:
    passed = my_judge(trace)
    client.record_metric("task_adherence", trace.trace_id, passed=passed)

Dashboard behavior

Metrics appear next to traces and can be used for filters, monitor thresholds, and release comparisons.