Agent Sdk Cost Tracking

Updated 2 days ago

SDK Cost Tracking

The Claude Agent SDK provides detailed token usage information for each interaction with Claude. This guide explains how to properly track costs and understand usage reporting, especially when dealing with parallel tool uses and multi-step conversations.

For complete API documentation, see the TypeScript SDK reference.

Understanding Token Usage

When Claude processes requests, it reports token usage at the message level. This usage data is essential for tracking costs and billing users appropriately.

Key Concepts

Steps: A step is a single request/response pair between your application and Claude
Messages: Individual messages within a step (text, tool uses, tool results)
Usage: Token consumption data attached to assistant messages

Usage Reporting Structure

Single vs Parallel Tool Use

When Claude executes tools, the usage reporting differs based on whether tools are executed sequentially or in parallel:

typescript

import { query } from "@anthropic-ai/claude-agent-sdk";

// Example: Tracking usage in a conversation
const result = await query({
  prompt: "Analyze this codebase and run tests",
  options: {
    onMessage: (message) => {
      if (message.type === 'assistant' && message.usage) {
        console.log(`Message ID:  ${message.id}MATHICDPROTECT1ENDUsage:MATHICDPROTECT2END`MATHICDPROTECT3END`MATHICDPROTECT4ENDidMATHICDPROTECT5ENDresultMATHICDPROTECT6END`MATHICDPROTECT7ENDSteps processed:$ {stepUsages.length}`);
console.log(`Total cost: $${totalCost.toFixed(4)}`);

from claude_agent_sdk import query, AssistantMessage, ResultMessage from datetime import datetime import asyncio

class CostTracker: def init(self): self.processed_message_ids = set() self.step_usages = []

  async def track_conversation(self, prompt):
      result = None

      # Process messages as they arrive
      async for message in query(prompt=prompt):
          self.process_message(message)

          # Capture the final result message
          if isinstance(message, ResultMessage):
              result = message

      return {
          "result": result,
          "step_usages": self.step_usages,
          "total_cost": result.total_cost_usd if result else 0
      }

  def process_message(self, message):
      # Only process assistant messages with usage
      if not isinstance(message, AssistantMessage) or not hasattr(message, 'usage'):
          return

      # Skip if already processed this message ID
      message_id = getattr(message, 'id', None)
      if not message_id or message_id in self.processed_message_ids:
          return

      # Mark as processed and record usage
      self.processed_message_ids.add(message_id)
      self.step_usages.append({
          "message_id": message_id,
          "timestamp": datetime.now().isoformat(),
          "usage": message.usage,
          "cost_usd": self.calculate_cost(message.usage)
      })

  def calculate_cost(self, usage):
      # Implement your pricing calculation
      input_cost = usage.get("input_tokens", 0) * 0.00003
      output_cost = usage.get("output_tokens", 0) * 0.00015
      cache_read_cost = usage.get("cache_read_input_tokens", 0) * 0.0000075

      return input_cost + output_cost + cache_read_cost

Usage

async def main(): tracker = CostTracker() result = await tracker.track_conversation("Analyze and refactor this code")

  print(f"Steps processed: {len(result['step_usages'])}")
  print(f"Total cost: ${result['total_cost']:.4f}")

asyncio.run(main())

</CodeGroup>

## Handling Edge Cases

### Output Token Discrepancies

In rare cases, you might observe different `output_tokens` values for messages with the same ID. When this occurs:

1. **Use the highest value** - The final message in a group typically contains the accurate total
2. **Verify against total cost** - The `total_cost_usd` in the result message is authoritative
3. **Report inconsistencies** - File issues at the [Claude Code GitHub repository](https://github.com/anthropics/claude-code/issues)

### Cache Token Tracking

When using prompt caching, track these token types separately:

```typescript
interface CacheUsage {
cache_creation_input_tokens: number;
cache_read_input_tokens: number;
cache_creation: {
  ephemeral_5m_input_tokens: number;
  ephemeral_1h_input_tokens: number;
};
}

Best Practices

Use Message IDs for Deduplication: Always track processed message IDs to avoid double-charging
Monitor the Result Message: The final result contains authoritative cumulative usage
Implement Logging: Log all usage data for auditing and debugging
Handle Failures Gracefully: Track partial usage even if a conversation fails
Consider Streaming: For streaming responses, accumulate usage as messages arrive

Usage Fields Reference

Each usage object contains:

input_tokens: Base input tokens processed
output_tokens: Tokens generated in the response
cache_creation_input_tokens: Tokens used to create cache entries
cache_read_input_tokens: Tokens read from cache
service_tier: The service tier used (e.g., "standard")
total_cost_usd: Total cost in USD (only in result message)

Example: Building a Billing Dashboard

Here's how to aggregate usage data for a billing dashboard: