Linux Kernel Moves Toward Programmable Page Cache Eviction via BPF

Linux kernel developers are advancing a proposal to integrate BPF-driven custom eviction policies into the page cache subsystem, a move that could reshape how enterprise cloud platforms and database systems manage memory under pressure. The proposal, discussed at the 2026 Linux Storage, Filesystem, Memory Management, and BPF Summit, outlines a path toward programmable, workload-aware memory management that replaces the one-size-fits-all heuristics currently baked into the kernel.

According to LWN.net, which covered the summit discussions, the kernel's page cache is responsible for maintaining copies of file data in memory, and its eviction decisions directly impact overall system performance. Tal Z presented the case for allowing administrators and platform engineers to attach custom, BPF-verified eviction logic without modifying kernel source or rebooting systems. The proposal has gained traction among developers who argue that static tuning knobs like sysctl parameters are increasingly inadequate for the heterogeneous workloads found in modern containerized and cloud-native environments.

The approach leverages BPF's existing verifier and sandboxing infrastructure, ensuring that custom eviction policies run in a safe, isolated context. This eliminates the stability risks historically associated with out-of-tree kernel modules while still giving infrastructure teams the flexibility to optimize memory behavior for specific applications — such as keeping database hot pages resident longer or prioritizing eviction of transient container artifacts.

Early benchmarking cited in the summit discussions suggests that targeted BPF policies can reduce I/O bottlenecks by up to 40 percent under sustained memory pressure. However, developers caution that injecting executable logic into the page cache eviction path — a critical performance hot path — introduces latency overhead that must be carefully measured before the feature is considered production-ready.

The shift toward BPF-driven eviction represents a strategic evolution from static sysctl tuning toward software-defined, workload-aware memory management. Traditional page cache heuristics cannot efficiently handle the demands of heterogeneous, containerized environments, while BPF sandboxing provides a dynamic alternative that operates without kernel patches or reboots. Yet the performance gains must be weighed against hot-path latency risks: custom eviction logic can significantly reduce I/O bottlenecks, but injecting executable code into a critical kernel path demands rigorous optimization before production deployment.

Several open questions remain before the feature can be merged into mainline. Developers must define a stable API for policy attachment and lifecycle management, establish mechanisms for resolving conflicts when multiple policies target overlapping file types, and implement automatic fallback to default heuristics if a custom policy fails or exceeds resource limits. The community is also expected to produce reference policy templates and user-space tooling to lower the barrier to adoption. Broad enterprise uptake will depend on these abstraction and safety nets — operations teams need robust utilities to bridge the gap between kernel development and day-to-day infrastructure management.

If the proposal proceeds as planned, Linux distributions could begin shipping BPF-enabled page cache tuning capabilities within the next two kernel release cycles. For enterprises running PostgreSQL, MongoDB, or custom caching layers on Linux, the ability to programmatically control what stays in memory and what gets evicted could translate directly into reduced infrastructure costs and more predictable application latency. Tracking the three pending engineering milestones — stable API, conflict resolution, and automatic fallback — will be key to gauging progress toward production readiness and mainline inclusion.

Linux 內核開發人員正推進一項提案，將由 BPF 驅動的自訂 eviction 策略整合至 page cache 子系統，此舉或將重塑企業雲端平台和數據庫系統在記憶體壓力下的管理方式。該提案於 2026 年 Linux Storage、Filesystem、Memory Management 及 BPF Summit 上討論，勾勒出一條可編程、感知工作負載的記憶體管理路徑，以取代目前內核中內建的一刀切 heuristics。

LWN.net 報導峰會討論時指出，內核的 page cache 負責在記憶體中維持文件數據的副本，其 eviction 決策直接影響整體系統效能。Tal Z 提出理據，主張應允許管理員和平台工程師附加自訂且經 BPF 驗證的 eviction 邏輯，而無需修改內核原始碼或重啟系統。該提案在開發人員中獲得支持，他們認為在現代 containerized 和 cloud-native 環境中常見的異質工作負載下，sysctl 參數等靜態調節選項已日益不足。

此方法利用 BPF 現有的 verifier 和 sandboxing 基礎設施，確保自訂 eviction 策略在安全、隔離的環境中運行。這消除了歷史上與 out-of-tree kernel modules 相關的穩定性風險，同時仍讓基礎設施團隊可靈活地針對特定應用優化記憶體行為——例如讓數據庫 hot pages 在記憶體中停留更久，或優先 eviction 短暫的 container artifacts。

峰會討論中引用的早期 benchmarking 顯示，在持續記憶體壓力下，針對性的 BPF 策略可減少高達 40% 的 I/O 瓶頸。然而，開發人員警告，將可執行邏輯注入 page cache eviction 路徑——一條關鍵的效能 hot path——會引入延遲開銷，在該功能被視為 production-ready 之前必須仔細量度。

轉向 BPF 驅動的 eviction 代表了從靜態 sysctl 調節邁向軟件定義、感知工作負載的記憶體管理的策略性演進。傳統的 page cache heuristics 無法有效應對異質、containerized 環境的需求，而 BPF sandboxing 提供了一個動態替代方案，無需 kernel patches 或重啟即可運作。然而，效能提升必須與 hot path 延遲風險作權衡：自訂 eviction 邏輯可顯著減少 I/O 瓶頸，但將可執行程式碼注入關鍵內核路徑，在 production 部署前需要嚴謹的優化。

在該功能可合併至 mainline 之前，仍有數個待解決的問題。開發人員必須定義穩定的 API 以進行策略附加和生命週期管理，建立解決衝突的機制（當多個策略針對重疊的文件類型時），並在自訂策略失效或超出資源限制時實現自動回退至預設 heuristics。社群預計還將提供參考策略模板和 user-space 工具，以降低採用門檻。廣泛的企業採用將取決於這些抽象化和安全網——operations 團隊需要實用的工具來彌補內核開發與日常基礎設施管理之間的差距。

如提案按計劃推進，Linux 發行版或可在接下來兩個內核發布週期內開始提供支援 BPF 的 page cache 調節功能。對於在 Linux 上運行 PostgreSQL、MongoDB 或自訂 caching layers 的企業而言，能夠以編程方式控制哪些數據保留在記憶體中、哪些被 eviction，將直接轉化為降低基礎設施成本和更可預測的應用延遲。追蹤三個待完成的工程里程碑——穩定 API、衝突解決和自動回退——將是衡量邁向 production readiness 和 mainline inclusion 進度的關鍵。

新聞來源 / Original News Source

Hong Kong Linux User Group 香港Linux用家協會 (HKLUG)

Linux Kernel Moves Toward Programmable Page Cache Eviction via BPF