vllm.v1.outputs ¶
   EMPTY_MODEL_RUNNER_OUTPUT  module-attribute  ¶
 EMPTY_MODEL_RUNNER_OUTPUT = ModelRunnerOutput(
    req_ids=[],
    req_id_to_index={},
    sampled_token_ids=[],
    logprobs=None,
    prompt_logprobs_dict={},
    pooler_output=[],
    num_nans_in_logits=None,
)
  AsyncModelRunnerOutput ¶
  Bases: ABC
Source code in vllm/v1/outputs.py
   get_output  abstractmethod  ¶
 get_output() -> ModelRunnerOutput
Get the ModelRunnerOutput for this async output.
This is a blocking call that waits until the results are ready, which might involve copying device tensors to the host. This method should only be called once per AsyncModelRunnerOutput.
Source code in vllm/v1/outputs.py
   DraftTokenIds  dataclass  ¶
 Source code in vllm/v1/outputs.py
    KVConnectorOutput  dataclass  ¶
 Source code in vllm/v1/outputs.py
   invalid_block_ids  class-attribute instance-attribute  ¶
    kv_connector_stats  class-attribute instance-attribute  ¶
 kv_connector_stats: KVConnectorStats | None = None
  LogprobsLists ¶
  Bases: NamedTuple
Source code in vllm/v1/outputs.py
   cu_num_generated_tokens  class-attribute instance-attribute  ¶
    slice ¶
  Source code in vllm/v1/outputs.py
   LogprobsTensors ¶
  Bases: NamedTuple
Source code in vllm/v1/outputs.py
   empty_cpu  staticmethod  ¶
 empty_cpu(
    num_positions: int, num_tokens_per_position: int
) -> LogprobsTensors
Create empty LogprobsTensors on CPU.
Source code in vllm/v1/outputs.py
   tolists ¶
     ModelRunnerOutput  dataclass  ¶
 Source code in vllm/v1/outputs.py
   kv_connector_output  class-attribute instance-attribute  ¶
 kv_connector_output: KVConnectorOutput | None = None
  num_nans_in_logits  class-attribute instance-attribute  ¶
    __init__ ¶
 __init__(
    req_ids: list[str],
    req_id_to_index: dict[str, int],
    sampled_token_ids: list[list[int]],
    logprobs: LogprobsLists | None,
    prompt_logprobs_dict: dict[str, LogprobsTensors | None],
    pooler_output: list[Tensor | None],
    kv_connector_output: KVConnectorOutput | None = None,
    num_nans_in_logits: dict[str, int] | None = None,
) -> None