Agent SDK

Observability

Tracing, logging, and monitoring for production agent systems

Observability

Observability is the difference between agents that work in demos and agents that work in production. Every agent run should produce a complete trace of what happened, why it happened, and how long it took.

Built-in Tracing

Every Runner.run() and Runner.stream() call produces a trace:

const result = await Runner.run(agent, { messages })

console.log(result.trace)
// {
//   runId: 'run_abc123',
//   agent: 'researcher',
//   status: 'completed',
//   startedAt: '2026-03-12T10:00:00Z',
//   completedAt: '2026-03-12T10:00:05Z',
//   durationMs: 5000,
//   turns: [
//     { type: 'model_call', tokens: { input: 1200, output: 340 }, durationMs: 800 },
//     { type: 'tool_call', name: 'web_search', args: { query: '...' }, durationMs: 1200 },
//     { type: 'model_call', tokens: { input: 2100, output: 520 }, durationMs: 1100 },
//     { type: 'tool_call', name: 'read_url', args: { url: '...' }, durationMs: 900 },
//     { type: 'model_call', tokens: { input: 3500, output: 1200 }, durationMs: 1900 },
//   ],
//   usage: { inputTokens: 6800, outputTokens: 2060, totalTokens: 8860 },
//   guardrails: [],
//   handoffs: [],
//   approvals: [],
// }

OpenTelemetry Integration

Export traces to any OpenTelemetry-compatible backend:

import { OTelTracer } from 'assistme-agent-sdk/otel'

const tracer = new OTelTracer({
  serviceName: 'my-agent-app',
  endpoint: 'http://localhost:4317', // OTLP endpoint
})

const result = await Runner.run(agent, {
  messages,
  tracer,
})

This produces OpenTelemetry spans following the AI Agent semantic conventions:

agent_run (root span)
├── guardrail.input.content_filter
├── model.generate (turn 1)
├── tool.web_search
├── model.generate (turn 2)
├── tool.read_url
├── model.generate (turn 3)
└── guardrail.output.pii_filter

Custom Trace Exporters

Send traces to any destination:

import { TraceExporter } from 'assistme-agent-sdk'

class CustomExporter implements TraceExporter {
  async export(trace: RunTrace): Promise<void> {
    // Send to your analytics platform
    await analytics.track('agent_trace', {
      runId: trace.runId,
      agent: trace.agent,
      status: trace.status,
      duration: trace.durationMs,
      tokens: trace.usage.totalTokens,
      toolCalls: trace.turns.filter(t => t.type === 'tool_call').length,
    })
  }
}

const result = await Runner.run(agent, {
  messages,
  tracer: new CustomExporter(),
})

Metrics

Built-in Metrics

import { Metrics } from 'assistme-agent-sdk'

const metrics = Metrics.create({
  exporter: 'prometheus', // or 'otlp', 'statsd', 'custom'
  endpoint: 'http://localhost:9090',
})

// Metrics are automatically collected
// agent_run_duration_seconds{agent="researcher", status="completed"}
// agent_run_tokens_total{agent="researcher", type="input"}
// agent_tool_call_duration_seconds{agent="researcher", tool="web_search"}
// agent_guardrail_triggers_total{agent="researcher", guardrail="pii_filter", phase="output"}
// agent_handoff_total{from="triage", to="billing"}
// agent_error_total{agent="researcher", type="timeout"}

Custom Metrics

const result = await Runner.run(agent, {
  messages,
  hooks: {
    onEnd: async (result, context) => {
      metrics.record({
        'agent.cost_usd': calculateCost(result.usage),
        'agent.user_satisfaction': await getUserFeedback(context.runId),
      })
    },
  },
})

Logging

Structured Logging

import { Logger } from 'assistme-agent-sdk'

const logger = Logger.create({
  level: 'info', // 'debug' | 'info' | 'warn' | 'error'
  format: 'json', // or 'pretty' for development
  output: 'stdout', // or file path, or custom writer
})

const result = await Runner.run(agent, {
  messages,
  logger,
})

// Output:
// {"level":"info","ts":"2026-03-12T10:00:00Z","msg":"run_started","runId":"run_abc","agent":"researcher"}
// {"level":"info","ts":"2026-03-12T10:00:01Z","msg":"tool_called","runId":"run_abc","tool":"web_search","duration":1200}
// {"level":"info","ts":"2026-03-12T10:00:05Z","msg":"run_completed","runId":"run_abc","status":"completed","tokens":8860}

Log Levels

LevelWhat's logged
errorErrors, guardrail blocks, failed tool calls
warnMax turns reached, slow tool calls, high token usage
infoRun start/end, tool calls, handoffs
debugModel prompts, tool arguments/results, memory operations

Dashboards

Key Metrics to Track

MetricAlert ThresholdWhy
Run success rate< 95%Core reliability
P99 latency> 30sUser experience
Token usage per run> 50KCost control
Guardrail block rate> 10%Possible attack or bad UX
Tool error rate> 5%Integration health
Handoff rate> 50%Triage agent accuracy

Example Dashboard Query (Prometheus)

# Success rate over last hour
sum(rate(agent_run_total{status="completed"}[1h]))
/
sum(rate(agent_run_total[1h]))

# P99 latency
histogram_quantile(0.99, rate(agent_run_duration_seconds_bucket[5m]))

# Cost per run
sum(rate(agent_run_tokens_total[1h])) * 0.000003  # Approximate $/token

Debugging Failed Runs

const result = await Runner.run(agent, { messages })

if (result.status === 'error') {
  // Full trace for debugging
  console.log(JSON.stringify(result.trace, null, 2))

  // What happened step by step
  for (const turn of result.trace.turns) {
    if (turn.type === 'tool_call') {
      console.log(`Tool: ${turn.name}`)
      console.log(`Args: ${JSON.stringify(turn.args)}`)
      console.log(`Result: ${JSON.stringify(turn.result)}`)
      console.log(`Error: ${turn.error}`)
    }
  }
}

Best Practices

  1. Trace everything in production — The cost of tracing is negligible compared to the cost of debugging blind.

  2. Use OpenTelemetry — It's the industry standard. Your traces will integrate with any observability platform.

  3. Alert on anomalies, not just errors — A spike in token usage or a drop in success rate often indicates a problem before errors appear.

  4. Log tool inputs and outputs at debug level — Don't log them in production (privacy, volume), but make them available when debugging.

  5. Track cost per user — Token usage varies dramatically between users. Track per-user to catch abuse and optimize.

  6. Retain traces — Keep traces for at least 30 days. You'll need them when investigating user-reported issues.