CVE-2026-44223

May 13, 2026, 6:16 p.m.

6.5
Medium

Description

vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.

Product(s) Impacted

Vendor Product Versions
Vllm
  • Vllm
  • <0.20.0

Weaknesses

Common security weaknesses mapped to this vulnerability.

CWE-131
Incorrect Calculation of Buffer Size
The product does not correctly calculate the size to be used when allocating a buffer, which could lead to a buffer overflow.

*CPE(s)

Affected systems and software identified for this CVE.

Type Vendor Product Version Update Edition Language Software Edition Target Software Target Hardware Other Information
a vllm vllm <0.20.0 / / / / / / /

CVSS Score

6.5 / 10

CVSS Data - 3.1

  • Attack Vector: NETWORK
  • Attack Complexity: LOW
  • Privileges Required: LOW
  • Scope: UNCHANGED
  • Confidentiality Impact: NONE
  • Integrity Impact: NONE
  • Availability Impact: HIGH
  • CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

    View Vector String

Timeline

Published: May 12, 2026, 8:16 p.m.
Last Modified: May 13, 2026, 6:16 p.m.

Status : Undergoing Analysis

CVE has been recently published to the CVE List and has been received by the NVD.

More info

*Disclaimer: Some vulnerabilities do not have an associated CPE. To enhance the data, we use AI to infer CPEs based on CVE details. This is an automated process and might not always be accurate.