SecuriTricks - CVE-2026-44223

Description

vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.

Product(s) Impacted

Vendor	Product	Versions
Vllm	Vllm	<0.20.0

Weaknesses

Common security weaknesses mapped to this vulnerability.

CWE-131

Incorrect Calculation of Buffer Size

The product does not correctly calculate the size to be used when allocating a buffer, which could lead to a buffer overflow.

*CPE(s)

Affected systems and software identified for this CVE.

Type	Vendor	Product	Version	Update	Edition	Language	Software Edition	Target Software	Target Hardware	Other Information
a	vllm	vllm	<0.20.0	/	/	/	/	/	/	/

References

https://github.com/vllm-project/vllm/pull/38610

https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw

CVSS Score

6.5 / 10

CVSS Data - 3.1

Attack Vector: NETWORK
Attack Complexity: LOW
Privileges Required: LOW
Scope: UNCHANGED
Confidentiality Impact: NONE
Integrity Impact: NONE
Availability Impact: HIGH

CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

View Vector String

Timeline

Published: May 12, 2026, 8:16 p.m.
Last Modified: May 13, 2026, 6:16 p.m.

Status : Undergoing Analysis

CVE has been recently published to the CVE List and has been received by the NVD.

More info

Source

[email protected]

CVE-2026-44223