"stream": true
to incrementally stream the response using server-sent events (SSE).
Streaming with SDKs
Our Python and TypeScript SDKs offer multiple ways of streaming. The Python SDK allows both sync and async streams. See the documentation in each SDK for details.Event types
Each server-sent event includes a named event type and associated JSON data. Each event will use an SSE event name (e.g.event: message_stop
), and include the matching event type
in its data.
Each stream uses the following event flow:
message_start
: contains aMessage
object with emptycontent
.- A series of content blocks, each of which have a
content_block_start
, one or morecontent_block_delta
events, and acontent_block_stop
event. Each content block will have anindex
that corresponds to its index in the final Messagecontent
array. - One or more
message_delta
events, indicating top-level changes to the finalMessage
object. - A final
message_stop
event.
The token counts shown in the
usage
field of the message_delta
event are cumulative.Ping events
Event streams may also include any number ofping
events.
Error events
We may occasionally send errors in the event stream. For example, during periods of high usage, you may receive anoverloaded_error
, which would normally correspond to an HTTP 529 in a non-streaming context:
Example error
Other events
In accordance with our versioning policy, we may add new event types, and your code should handle unknown event types gracefully.Content block delta types
Eachcontent_block_delta
event contains a delta
of a type that updates the content
block at a given index
.
Text delta
Atext
content block delta looks like:
Text delta
Input JSON delta
The deltas fortool_use
content blocks correspond to updates for the input
field of the block. To support maximum granularity, the deltas are partial JSON strings, whereas the final tool_use.input
is always an object.
You can accumulate the string deltas and parse the JSON once you receive a content_block_stop
event, by using a library like Pydantic to do partial JSON parsing, or by using our SDKs, which provide helpers to access parsed incremental values.
A tool_use
content block delta looks like:
Input JSON delta
input
at a time. As such, when using tools, there may be delays between streaming events while the model is working. Once an input
key and value are accumulated, we emit them as multiple content_block_delta
events with chunked partial json so that the format can automatically support finer granularity in future models.
Thinking delta
When using extended thinking with streaming enabled, you’ll receive thinking content viathinking_delta
events. These deltas correspond to the thinking
field of the thinking
content blocks.
For thinking content, a special signature_delta
event is sent just before the content_block_stop
event. This signature is used to verify the integrity of the thinking block.
A typical thinking delta looks like:
Thinking delta
Signature delta
Full HTTP Stream response
We strongly recommend that you use our client SDKs when using streaming mode. However, if you are building a direct API integration, you will need to handle these events yourself. A stream response is comprised of:- A
message_start
event - Potentially multiple content blocks, each of which contains:
- A
content_block_start
event - Potentially multiple
content_block_delta
events - A
content_block_stop
event
- A
- A
message_delta
event - A
message_stop
event
ping
events dispersed throughout the response as well. See Event types for more details on the format.
Basic streaming request
Response
Streaming request with tool use
Tool use now supports fine-grained streaming for parameter values as a beta feature. For more details, see Fine-grained tool streaming.
Response
Streaming request with extended thinking
In this request, we enable extended thinking with streaming to see Claude’s step-by-step reasoning.Response
Streaming request with web search tool use
In this request, we ask Claude to search the web for current weather information.Response
Error recovery
When a streaming request is interrupted due to network issues, timeouts, or other errors, you can recover by resuming from where the stream was interrupted. This approach saves you from re-processing the entire response. The basic recovery strategy involves:- Capture the partial response: Save all content that was successfully received before the error occurred
- Construct a continuation request: Create a new API request that includes the partial assistant response as the beginning of a new assistant message
- Resume streaming: Continue receiving the rest of the response from where it was interrupted
Error recovery best practices
- Use SDK features: Leverage the SDK’s built-in message accumulation and error handling capabilities
- Handle content types: Be aware that messages can contain multiple content blocks (
text
,tool_use
,thinking
). Tool use and extended thinking blocks cannot be partially recovered. You can resume streaming from the most recent text block.