Opportunistic Resource Reclamation in Kubernetes: From Aggressive Resizing to Flash Jobs

Varování

Publikace nespadá pod Ústav výpočetní techniky, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.
Autoři

SPIŠAKOVÁ Viktória STOYANOV Radostin KLUSÁČEK Dalibor HEJTMÁNEK Lukáš

Rok publikování 2026
Druh Článek ve sborníku
Konference Job Scheduling Strategies for Parallel Processing
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
Klíčová slova Kubernetes; Resource Management; Resource Utilization; In-place Resizing
Popis Modern cloud data centers suffer from chronic resource under-utilization. The gap between static resource allocations and dynamic workload demand creates systemic inefficiency that current orchestration platforms fail to address adequately. In this work, we explore resource reclamation strategies in production Kubernetes clusters using emerging infrastructure-level primitives---in-place resource resizing and transparent checkpoint/restore (C/R). For CPU resources, we analyze a production workload trace, which we release publicly, and reveal significant allocation-utilization gaps. Through trace-driven simulation, we demonstrate that aggressive in-place resizing substantially increases resource utilization as well as workload evictions. We find a balanced strategy for in-place resizing and identify C/R as the missing primitive that makes aggressive resizing safe by enabling graceful termination and resumable migrations instead of progress loss. For GPU resources, where dynamic resizing is infeasible, we propose a C/R-enabled sharing strategy that allocates reserved-but-idle GPU memory to secondary workloads (flash jobs) with safety guarantees for reclamation. Our work demonstrates how the same infrastructure primitives address resource reclamation across different resource types, each with distinct technical constraints, validated through real production cluster deployments.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info