Observability

Observability is the difference between agents that work in demos and agents that work in production. Every agent run should produce a complete trace of what happened, why it happened, and how long it took.

Built-in Tracing

Every Runner.run() and Runner.stream() call produces a trace:

const result = await Runner.run(agent, { messages })

console.log(result.trace)
// {
//   runId: 'run_abc123',
//   agent: 'researcher',
//   status: 'completed',
//   startedAt: '2026-03-12T10:00:00Z',
//   completedAt: '2026-03-12T10:00:05Z',
//   durationMs: 5000,
//   turns: [
//     { type: 'model_call', tokens: { input: 1200, output: 340 }, durationMs: 800 },
//     { type: 'tool_call', name: 'web_search', args: { query: '...' }, durationMs: 1200 },
//     { type: 'model_call', tokens: { input: 2100, output: 520 }, durationMs: 1100 },
//     { type: 'tool_call', name: 'read_url', args: { url: '...' }, durationMs: 900 },
//     { type: 'model_call', tokens: { input: 3500, output: 1200 }, durationMs: 1900 },
//   ],
//   usage: { inputTokens: 6800, outputTokens: 2060, totalTokens: 8860 },
//   guardrails: [],
//   handoffs: [],
//   approvals: [],
// }

OpenTelemetry Integration

Export traces to any OpenTelemetry-compatible backend:

import { OTelTracer } from 'assistme-agent-sdk/otel'

const tracer = new OTelTracer({
  serviceName: 'my-agent-app',
  endpoint: 'http://localhost:4317', // OTLP endpoint
})

const result = await Runner.run(agent, {
  messages,
  tracer,
})

This produces OpenTelemetry spans following the AI Agent semantic conventions:

agent_run (root span)
├── guardrail.input.content_filter
├── model.generate (turn 1)
├── tool.web_search
├── model.generate (turn 2)
├── tool.read_url
├── model.generate (turn 3)
└── guardrail.output.pii_filter

Custom Trace Exporters

Send traces to any destination:

import { TraceExporter } from 'assistme-agent-sdk'

class CustomExporter implements TraceExporter {
  async export(trace: RunTrace): Promise<void> {
    // Send to your analytics platform
    await analytics.track('agent_trace', {
      runId: trace.runId,
      agent: trace.agent,
      status: trace.status,
      duration: trace.durationMs,
      tokens: trace.usage.totalTokens,
      toolCalls: trace.turns.filter(t => t.type === 'tool_call').length,
    })
  }
}

const result = await Runner.run(agent, {
  messages,
  tracer: new CustomExporter(),
})

Metrics

Built-in Metrics

import { Metrics } from 'assistme-agent-sdk'

const metrics = Metrics.create({
  exporter: 'prometheus', // or 'otlp', 'statsd', 'custom'
  endpoint: 'http://localhost:9090',
})

// Metrics are automatically collected
// agent_run_duration_seconds{agent="researcher", status="completed"}
// agent_run_tokens_total{agent="researcher", type="input"}
// agent_tool_call_duration_seconds{agent="researcher", tool="web_search"}
// agent_guardrail_triggers_total{agent="researcher", guardrail="pii_filter", phase="output"}
// agent_handoff_total{from="triage", to="billing"}
// agent_error_total{agent="researcher", type="timeout"}

Custom Metrics

const result = await Runner.run(agent, {
  messages,
  hooks: {
    onEnd: async (result, context) => {
      metrics.record({
        'agent.cost_usd': calculateCost(result.usage),
        'agent.user_satisfaction': await getUserFeedback(context.runId),
      })
    },
  },
})

Logging

Structured Logging

import { Logger } from 'assistme-agent-sdk'

const logger = Logger.create({
  level: 'info', // 'debug' | 'info' | 'warn' | 'error'
  format: 'json', // or 'pretty' for development
  output: 'stdout', // or file path, or custom writer
})

const result = await Runner.run(agent, {
  messages,
  logger,
})

// Output:
// {"level":"info","ts":"2026-03-12T10:00:00Z","msg":"run_started","runId":"run_abc","agent":"researcher"}
// {"level":"info","ts":"2026-03-12T10:00:01Z","msg":"tool_called","runId":"run_abc","tool":"web_search","duration":1200}
// {"level":"info","ts":"2026-03-12T10:00:05Z","msg":"run_completed","runId":"run_abc","status":"completed","tokens":8860}

Log Levels

Level	What's logged
`error`	Errors, guardrail blocks, failed tool calls
`warn`	Max turns reached, slow tool calls, high token usage
`info`	Run start/end, tool calls, handoffs
`debug`	Model prompts, tool arguments/results, memory operations

Dashboards

Key Metrics to Track

Metric	Alert Threshold	Why
Run success rate	< 95%	Core reliability
P99 latency	> 30s	User experience
Token usage per run	> 50K	Cost control
Guardrail block rate	> 10%	Possible attack or bad UX
Tool error rate	> 5%	Integration health
Handoff rate	> 50%	Triage agent accuracy

Example Dashboard Query (Prometheus)

# Success rate over last hour
sum(rate(agent_run_total{status="completed"}[1h]))
/
sum(rate(agent_run_total[1h]))

# P99 latency
histogram_quantile(0.99, rate(agent_run_duration_seconds_bucket[5m]))

# Cost per run
sum(rate(agent_run_tokens_total[1h])) * 0.000003  # Approximate $/token

Debugging Failed Runs

const result = await Runner.run(agent, { messages })

if (result.status === 'error') {
  // Full trace for debugging
  console.log(JSON.stringify(result.trace, null, 2))

  // What happened step by step
  for (const turn of result.trace.turns) {
    if (turn.type === 'tool_call') {
      console.log(`Tool: ${turn.name}`)
      console.log(`Args: ${JSON.stringify(turn.args)}`)
      console.log(`Result: ${JSON.stringify(turn.result)}`)
      console.log(`Error: ${turn.error}`)
    }
  }
}

Best Practices

Trace everything in production — The cost of tracing is negligible compared to the cost of debugging blind.
Use OpenTelemetry — It's the industry standard. Your traces will integrate with any observability platform.
Alert on anomalies, not just errors — A spike in token usage or a drop in success rate often indicates a problem before errors appear.
Log tool inputs and outputs at debug level — Don't log them in production (privacy, volume), but make them available when debugging.
Track cost per user — Token usage varies dramatically between users. Track per-user to catch abuse and optimize.
Retain traces — Keep traces for at least 30 days. You'll need them when investigating user-reported issues.

Observability

On this page