● LIVE   Breaking News & Analysis
Paintou
2026-05-14
Linux & DevOps

Critical Bug in Linux CUBIC Congestion Controller Permanently Stalls QUIC Connections – One-Line Fix Deployed

Cloudflare found a critical bug in Linux's CUBIC congestion controller that permanently stalls QUIC connections after congestion collapse. A one-line fix restores recovery.

Urgent: Widespread Internet Impact Averted

Cloudflare engineers have discovered and fixed a critical bug in the default Linux congestion controller CUBIC that could permanently stall QUIC connections after a congestion collapse. The flaw, found in Cloudflare's open-source QUIC implementation quiche, caused the congestion window (cwnd) to lock at its minimum value, preventing recovery and effectively halting data transfer. A one-line code change resolved the issue.

Critical Bug in Linux CUBIC Congestion Controller Permanently Stalls QUIC Connections – One-Line Fix Deployed
Source: blog.cloudflare.com

The Symptom: Tests Failing 61% of the Time

The investigation began after erratic failures in Cloudflare's ingress proxy integration tests. In scenarios with heavy packet loss early in a connection, CUBIC failed to recover from congestion collapse. "Recovery after congestion collapse is exactly the regime a congestion controller exists to handle," said a Cloudflare networking engineer. "Yet most tests skip this corner case."

The bug was invisible in standard throughput tests but surfaced in real-world traffic patterns where connections experience early loss.

Background: CUBIC's Role and the Linux Kernel Change

CUBIC, standardized in RFC 9438, is the default congestion control algorithm in Linux. It governs how TCP and QUIC connections probe for bandwidth and respond to loss. Cloudflare's quiche uses CUBIC as its default, placing this code in the critical path for a significant share of internet traffic.

The bug originated from a Linux kernel change meant to align CUBIC with an app-limited exclusion in RFC 9438 §4.2-12. That fix addressed a real TCP problem, but when ported to quiche, it triggered unexpected behavior: after a congestion collapse reduced cwnd to its minimum, the algorithm never increased it again. "The fix was well-intentioned, but it exposed a subtle interaction in the QUIC state machine," explained a Cloudflare developer.

Critical Bug in Linux CUBIC Congestion Controller Permanently Stalls QUIC Connections – One-Line Fix Deployed
Source: blog.cloudflare.com

The Fix: An Elegant (Almost) One-Line Change

The solution was deceptively simple. By adjusting how CUBIC tracks time since last congestion event, the engineers broke the cycle that kept cwnd pinned. The change ensures the algorithm properly resets its internal state after a recovery attempt. "We were thrilled to find such a clean fix for a bug that caused so much trouble," said the engineer. The patch has been merged into the quiche repository.

What This Means

For Cloudflare users and the broader internet, this bug meant that any QUIC connection suffering early loss could remain stuck in a low-throughput state, leading to poor performance or complete stalls. With the fix, connections now properly recover after congestion collapse, restoring normal bandwidth probing.

This incident highlights the complexity of porting kernel-level optimizations to user-space stacks. "What works fine in TCP can break in QUIC if we don't carefully model the differences," the engineer noted. Cloudflare urges all users of quiche to update to the latest version immediately.

Further analysis of the bug and the fix is available in Cloudflare's engineering blog.