Skip to content

The usage object for streaming doesn't support all the stats reported by LLMs #2641

@sreekarkamireddy

Description

@sreekarkamireddy

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

In streaming mode the response chunk that we create using the openAI type - ChatCompletionChunk that has the usage which is captured by CompletionUsage currently doesn't support these stats reported by the model when being called in streaming mode
Sample output from the model as ChatCompletionChunk

{
    "id": "xxxxxx",
    "choices": [
        {
            "delta": {
                "content": " straight-line drawing or architectural work",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
            },
            "finish_reason": "length",
            "index": 0,
            "logprobs": null
        }
    ],
    "created": 1757617904,
    "model": "gemini-2.5-flash",
    "object": "chat.completion.chunk",
    "service_tier": null,
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 1994,
        "prompt_tokens": 12,
        "total_tokens": 2006,
        "completion_tokens_details": {
            "accepted_prediction_tokens": null,
            "audio_tokens": null,
            "reasoning_tokens": 1269,
            "rejected_prediction_tokens": null
        },
        "prompt_tokens_details": {
            "audio_tokens": null,
            "cached_tokens": null
        }
    }
} 

Here we are missing the following attributes/properties in model usage -

  1. completion_tokens. text_tokens
  2. prompt_tokens_details. text_tokens
  3. prompt_tokens_details.image_tokens

The usage reported by the model

{
  "completion_tokens": 1994,
  "prompt_tokens": 12,
  "total_tokens": 2006,
  "completion_tokens_details": {
    "accepted_prediction_tokens": null,
    "audio_tokens": null,
    "reasoning_tokens": 1269,
    "rejected_prediction_tokens": null,
    "text_tokens": 725
  },
  "prompt_tokens_details": {
    "audio_tokens": null,
    "cached_tokens": null,
    "text_tokens": 12,
    "image_tokens": null
  }
}

Could you please update the pydantic model defined for capturing usage CompletionUsage in the file - CompletionTokensDetails to accomodate for these additional usage properties being reported by model in streaming?
Note - These properties are being supported in regular non-streaming output pydantic model used for model responses

To Reproduce

  1. Call the model using openAI SDK
  2. Call the model directly from vendor
  3. Compare the usage reported by the model

Code snippets

OS

Linux/macOS

Python version

Python 3.10

Library version

openai v1.101.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions