Observability
Tracing, logging, and monitoring for production agent systems
Observability
Observability is the difference between agents that work in demos and agents that work in production. Every agent run should produce a complete trace of what happened, why it happened, and how long it took.
Built-in Tracing
Every Runner.run() and Runner.stream() call produces a trace:
const result = await Runner.run(agent, { messages })
console.log(result.trace)
// {
// runId: 'run_abc123',
// agent: 'researcher',
// status: 'completed',
// startedAt: '2026-03-12T10:00:00Z',
// completedAt: '2026-03-12T10:00:05Z',
// durationMs: 5000,
// turns: [
// { type: 'model_call', tokens: { input: 1200, output: 340 }, durationMs: 800 },
// { type: 'tool_call', name: 'web_search', args: { query: '...' }, durationMs: 1200 },
// { type: 'model_call', tokens: { input: 2100, output: 520 }, durationMs: 1100 },
// { type: 'tool_call', name: 'read_url', args: { url: '...' }, durationMs: 900 },
// { type: 'model_call', tokens: { input: 3500, output: 1200 }, durationMs: 1900 },
// ],
// usage: { inputTokens: 6800, outputTokens: 2060, totalTokens: 8860 },
// guardrails: [],
// handoffs: [],
// approvals: [],
// }OpenTelemetry Integration
Export traces to any OpenTelemetry-compatible backend:
import { OTelTracer } from 'assistme-agent-sdk/otel'
const tracer = new OTelTracer({
serviceName: 'my-agent-app',
endpoint: 'http://localhost:4317', // OTLP endpoint
})
const result = await Runner.run(agent, {
messages,
tracer,
})This produces OpenTelemetry spans following the AI Agent semantic conventions:
agent_run (root span)
├── guardrail.input.content_filter
├── model.generate (turn 1)
├── tool.web_search
├── model.generate (turn 2)
├── tool.read_url
├── model.generate (turn 3)
└── guardrail.output.pii_filterCustom Trace Exporters
Send traces to any destination:
import { TraceExporter } from 'assistme-agent-sdk'
class CustomExporter implements TraceExporter {
async export(trace: RunTrace): Promise<void> {
// Send to your analytics platform
await analytics.track('agent_trace', {
runId: trace.runId,
agent: trace.agent,
status: trace.status,
duration: trace.durationMs,
tokens: trace.usage.totalTokens,
toolCalls: trace.turns.filter(t => t.type === 'tool_call').length,
})
}
}
const result = await Runner.run(agent, {
messages,
tracer: new CustomExporter(),
})Metrics
Built-in Metrics
import { Metrics } from 'assistme-agent-sdk'
const metrics = Metrics.create({
exporter: 'prometheus', // or 'otlp', 'statsd', 'custom'
endpoint: 'http://localhost:9090',
})
// Metrics are automatically collected
// agent_run_duration_seconds{agent="researcher", status="completed"}
// agent_run_tokens_total{agent="researcher", type="input"}
// agent_tool_call_duration_seconds{agent="researcher", tool="web_search"}
// agent_guardrail_triggers_total{agent="researcher", guardrail="pii_filter", phase="output"}
// agent_handoff_total{from="triage", to="billing"}
// agent_error_total{agent="researcher", type="timeout"}Custom Metrics
const result = await Runner.run(agent, {
messages,
hooks: {
onEnd: async (result, context) => {
metrics.record({
'agent.cost_usd': calculateCost(result.usage),
'agent.user_satisfaction': await getUserFeedback(context.runId),
})
},
},
})Logging
Structured Logging
import { Logger } from 'assistme-agent-sdk'
const logger = Logger.create({
level: 'info', // 'debug' | 'info' | 'warn' | 'error'
format: 'json', // or 'pretty' for development
output: 'stdout', // or file path, or custom writer
})
const result = await Runner.run(agent, {
messages,
logger,
})
// Output:
// {"level":"info","ts":"2026-03-12T10:00:00Z","msg":"run_started","runId":"run_abc","agent":"researcher"}
// {"level":"info","ts":"2026-03-12T10:00:01Z","msg":"tool_called","runId":"run_abc","tool":"web_search","duration":1200}
// {"level":"info","ts":"2026-03-12T10:00:05Z","msg":"run_completed","runId":"run_abc","status":"completed","tokens":8860}Log Levels
| Level | What's logged |
|---|---|
error | Errors, guardrail blocks, failed tool calls |
warn | Max turns reached, slow tool calls, high token usage |
info | Run start/end, tool calls, handoffs |
debug | Model prompts, tool arguments/results, memory operations |
Dashboards
Key Metrics to Track
| Metric | Alert Threshold | Why |
|---|---|---|
| Run success rate | < 95% | Core reliability |
| P99 latency | > 30s | User experience |
| Token usage per run | > 50K | Cost control |
| Guardrail block rate | > 10% | Possible attack or bad UX |
| Tool error rate | > 5% | Integration health |
| Handoff rate | > 50% | Triage agent accuracy |
Example Dashboard Query (Prometheus)
# Success rate over last hour
sum(rate(agent_run_total{status="completed"}[1h]))
/
sum(rate(agent_run_total[1h]))
# P99 latency
histogram_quantile(0.99, rate(agent_run_duration_seconds_bucket[5m]))
# Cost per run
sum(rate(agent_run_tokens_total[1h])) * 0.000003 # Approximate $/tokenDebugging Failed Runs
const result = await Runner.run(agent, { messages })
if (result.status === 'error') {
// Full trace for debugging
console.log(JSON.stringify(result.trace, null, 2))
// What happened step by step
for (const turn of result.trace.turns) {
if (turn.type === 'tool_call') {
console.log(`Tool: ${turn.name}`)
console.log(`Args: ${JSON.stringify(turn.args)}`)
console.log(`Result: ${JSON.stringify(turn.result)}`)
console.log(`Error: ${turn.error}`)
}
}
}Best Practices
-
Trace everything in production — The cost of tracing is negligible compared to the cost of debugging blind.
-
Use OpenTelemetry — It's the industry standard. Your traces will integrate with any observability platform.
-
Alert on anomalies, not just errors — A spike in token usage or a drop in success rate often indicates a problem before errors appear.
-
Log tool inputs and outputs at debug level — Don't log them in production (privacy, volume), but make them available when debugging.
-
Track cost per user — Token usage varies dramatically between users. Track per-user to catch abuse and optimize.
-
Retain traces — Keep traces for at least 30 days. You'll need them when investigating user-reported issues.