-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
Confirm this is an issue with the Python library and not an underlying OpenAI API
- This is an issue with the Python library
Describe the bug
In streaming mode the response chunk that we create using the openAI type - ChatCompletionChunk
that has the usage which is captured by CompletionUsage
currently doesn't support these stats reported by the model when being called in streaming mode
Sample output from the model as ChatCompletionChunk
{
"id": "xxxxxx",
"choices": [
{
"delta": {
"content": " straight-line drawing or architectural work",
"function_call": null,
"refusal": null,
"role": "assistant",
"tool_calls": null
},
"finish_reason": "length",
"index": 0,
"logprobs": null
}
],
"created": 1757617904,
"model": "gemini-2.5-flash",
"object": "chat.completion.chunk",
"service_tier": null,
"system_fingerprint": null,
"usage": {
"completion_tokens": 1994,
"prompt_tokens": 12,
"total_tokens": 2006,
"completion_tokens_details": {
"accepted_prediction_tokens": null,
"audio_tokens": null,
"reasoning_tokens": 1269,
"rejected_prediction_tokens": null
},
"prompt_tokens_details": {
"audio_tokens": null,
"cached_tokens": null
}
}
}
Here we are missing the following attributes/properties in model usage -
- completion_tokens. text_tokens
- prompt_tokens_details. text_tokens
- prompt_tokens_details.image_tokens
The usage reported by the model
{
"completion_tokens": 1994,
"prompt_tokens": 12,
"total_tokens": 2006,
"completion_tokens_details": {
"accepted_prediction_tokens": null,
"audio_tokens": null,
"reasoning_tokens": 1269,
"rejected_prediction_tokens": null,
"text_tokens": 725
},
"prompt_tokens_details": {
"audio_tokens": null,
"cached_tokens": null,
"text_tokens": 12,
"image_tokens": null
}
}
Could you please update the pydantic model defined for capturing usage CompletionUsage
in the file - CompletionTokensDetails
to accomodate for these additional usage properties being reported by model in streaming?
Note - These properties are being supported in regular non-streaming output pydantic model used for model responses
To Reproduce
- Call the model using openAI SDK
- Call the model directly from vendor
- Compare the usage reported by the model
Code snippets
OS
Linux/macOS
Python version
Python 3.10
Library version
openai v1.101.0