Agents And Tools Tool Use Token Efficient Tool Use

Updated 2 days ago

Starting with Claude Sonnet 3.7, Claude is capable of calling tools in a token-efficient manner. Requests save an average of 14% in output tokens, up to 70%, which also reduces latency. Exact token reduction and latency improvements depend on the overall response shape and size.

Info: Token-efficient tool use is a beta feature that only works with Claude 3.7 Sonnet. To use this beta feature, add the beta header token-efficient-tools-2025-02-19 to a tool use request. This header has no effect on other Claude models.

All Claude 4 models support token-efficient tool use by default. No beta header is needed.

Warning: Token-efficient tool use does not currently work with disable_parallel_tool_use.

Here's an example of how to use token-efficient tools with the API in Claude Sonnet 3.7:

The above request should, on average, use fewer input and output tokens than a normal request. To confirm this, try making the same request but remove token-efficient-tools-2025-02-19 from the beta headers list.

Tip: To keep the benefits of prompt caching, use the beta header consistently for requests you'd like to cache. If you selectively use it, prompt caching will fail.

Was this page helpful?