Understanding the Importance of Interactions among Job Scheduling Policies

Investor logo

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

TÓTH Šimon KLUSÁČEK Dalibor

Year of publication 2014
Type Article in Proceedings
Conference Memics 2014
MU Faculty or unit

Faculty of Informatics

Citation
Field Informatics
Keywords Scheduling; Queues; Fairshare; Simulation
Description Many studies in the past two decades focused on the problem of efficient job scheduling in large computational systems. While many new scheduling algorithms have been proposed, mainstream resource management systems and schedulers are still using only a limited set of scheduling policies. For example, the core of the system is generally based on the simple First Come First Served (FCFS) approach, while backfilling (a trivial optimization of FCFS to increase utilization) is typically the most advanced option available. Since backfilling has been proposed in 1995, it is obvious that there is some misunderstanding between the research community and system administrators concerning "what is really important". In this work -- recently presented at the Euro-Par conference -- we show that the problem of operating a production scheduler is far more complex than just choosing a proper scheduling algorithm. Using our experience from the Czech National Grid Infrastructure MetaCentrum we explain several additional challenges that appear when searching for a functional solution. These problems are related to the fact that real systems must meet far more complicated requirements than those that are typically considered in classical research papers. In fact, production systems need to balance various policies that are set in place to satisfy both resource providers and users. While many works address these separate policies, e.g., fairshare for fair resource allocation, complex interactions between policies are not properly discussed in the literature. In our work we describe how to approach these interactions when developing site-specific policies. Notably, we describe how (priority) queues interact with scheduling algorithms, fairshare and with anti-starvation mechanisms. Moreover, we present a~case study describing how detailed simulations were used to find new configuration for MetaCentrum, significantly increasing its performance.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info