MoocchenDocsProgramming
Related
All About the Python Insider Blog Relocation: A Q&A GuideNVIDIA Unveils Nemotron 3 Nano Omni: All-in-One AI Agent Model Slashes Costs, Boosts Speed by 9xBuilding Trust in a World of Information Overload: A Leader's GuidePython Packaging Gains Formal Governance Council with PEP 772 ApprovalSmarter Breakpoints in GDB: How Source-Tracking Keeps Your Debugging on TrackYour Guide to Joining the Python Security Response Team: Steps, Tips, and What You NeedMastering the Latest Rustup 1.29.0: A Complete Guide to Faster Toolchain ManagementGo 1.26 Ships with Major Language Tweaks and Green Tea GC Now Default

The Unseen Dependencies: How TCMalloc Challenged Kernel's API Stability

Last updated: 2026-05-02 04:31:08 · Programming

In software engineering, it is often said that any observable behavior of a system—no matter how trivial or unintended—will eventually become a dependency for someone. This principle, known as Hyrum's Law, has recently come into sharp focus within the Linux kernel community. The kernel's strict no-regressions rule mandates that new changes must not break existing userspace applications. Yet, a recent incident involving restartable sequences (rseq) and Google's TCMalloc memory allocator has put this rule to the test, forcing maintainers to find a creative solution.

What Are Restartable Sequences?

Restartable sequences are a performance optimization mechanism in the Linux kernel, introduced to allow user-space code to execute critical sections without expensive system calls. They provide a way for threads to atomically update data structures while yielding CPU-time to avoid conflicts. The documented API for rseq is well-defined: a set of flags and structures that user-space must respect to ensure correct behavior. When used properly, this mechanism dramatically reduces overhead in multithreaded applications.

The Unseen Dependencies: How TCMalloc Challenged Kernel's API Stability

The Problem: TCMalloc's Violation

During development of the 6.19 kernel release, developers worked to address performance bottlenecks in the rseq implementation itself. The changes were designed to maintain full backward compatibility—all officially documented behaviors remained unchanged. Yet, a hidden assumption broke down: Google's TCMalloc library had been relying on undocumented details of the rseq mechanism.

Specifically, TCMalloc used a particular sequence of operations that, while functional under earlier kernel versions, was never part of the guaranteed stable API. The library effectively locked itself into a specific kernel behavior, violating the documented contract. As a result, with kernel 6.19, TCMalloc began misbehaving: it could no longer correctly claim a restartable sequence slot, preventing itself (and other code) from using the feature.

Impact on Other Users

The consequences of TCMalloc's dependency extended beyond Google's own software. Because TCMalloc is widely used in production environments, any breakage threatened to stall adoption of the new kernel. Moreover, by occupying a restartable sequence slot in a non-standard way, TCMalloc blocked other user-space code from registering new sequences, degrading performance for all applications on the system. The kernel's no-regressions rule meant that simply telling TCMalloc to comply with the documented API was not an option—maintainers had to find a way to keep TCMalloc working.

Kernel's Response: Balancing No-Regressions and Progress

The kernel community faced a classic dilemma: uphold the no-regressions principle by accommodating TCMalloc's behavior, or push for strict API compliance and risk alienating a major user. After extensive discussion, maintainers opted for the former. They introduced a new mechanism that allows TCMalloc to continue using its internal assumptions while preserving the documented API for all other users. This required careful re-engineering of the restartable sequences subsystem, adding a compatibility mode that recognizes the original (non-conformant) usage pattern.

The solution was not simple: it involved

  • Adding a fallback code path for sequences registered with flags that TCMalloc used.
  • Modifying the kernel's internal state machine to detect and handle the legacy behavior.
  • Creating new ABI documentation that explicitly warns against relying on undocumented features.

This compromise allowed the 6.19 release to proceed without breaking existing systems, but it also set a precedent: even when kernel developers adhere strictly to API specifications, they may still need to support patterns that inadvertently become de facto standards.

Lessons from Hyrum's Law

This incident is a vivid illustration of Hyrum's Law in action. The kernel's rseq API was designed with a clear contract, yet TCMalloc's developers assumed a certain behavior that was never guaranteed. Over time, that assumption became a hidden dependency. When the kernel changed, the dependency broke. The lesson for API designers is twofold:

  1. Document everything, even the accidental behaviors. If a particular sequence of operations produces a deterministic result, it is likely someone will depend on it.
  2. Be cautious when extending APIs with internal details. Performance optimizations in user-space can easily morph into de facto standards, as seen with TCMalloc.

For library authors, the takeaway is equally important: always consult the documented API and avoid relying on implementation artifacts. Testing against a single kernel version is not enough; future changes can invalidate hidden assumptions.

Conclusion

The TCMalloc/rseq saga demonstrates that maintaining a stable kernel API is not merely about adhering to specifications—it is also about anticipating how those specifications (and their gaps) will be interpreted by users. Thanks to the kernel community's commitment to the no-regressions rule, a careful compromise was reached without breaking any existing software. However, the incident serves as a powerful reminder that Hyrum's Law is always lurking, and that the only way to tame it is through rigorous documentation, proactive communication, and a willingness to accommodate unintended dependencies when necessary.