/docs/evals/synthetic
Synthetic metrics
Record automated evaluation outcomes as metrics on traces. Built-in evaluator registration is not required for the current SDK path.
Recommended metrics
- Task adherence — did the agent solve the user's actual ask?
- Tool selection — was the right tool chosen for the step?
- Response coherence — does the final reply read well?
- Unsupported request — did the agent gracefully decline out-of-scope asks?
- User frustration — does the user appear frustrated in the conversation?
- Context retrieval — were the right docs retrieved?
Inline evaluation
Run your evaluator in application code, then write the result with record_metric or recordMetric.
result = client.list_traces(
run_type="eval",
metadata_filters=[{"key": "release", "value": "v3.2.1"}],
limit=50,
)
for trace in result.data:
passed = my_judge(trace)
client.record_metric("task_adherence", trace.trace_id, passed=passed)Dashboard behavior
Metrics appear next to traces and can be used for filters, monitor thresholds, and release comparisons.