Automatic tuning based on hardware performance counters and machine learning

Autoři

GEVORGYAN Suren Harutyunyan CESAR Eduardo SIKORA Anna FILIPOVIČ Jiří ALCARAZ Jordi

Rok publikování 2026
Druh Článek v odborném periodiku
Časopis / Zdroj FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE
Fakulta / Pracoviště MU

Ústav výpočetní techniky

Citace
www https://www.sciencedirect.com/science/article/pii/S0167739X25006521
Doi https://doi.org/10.1016/j.future.2025.108358
Klíčová slova Hardware performance counters; Automatic dimension reduction; Machine learning ensembles; Tuning parameter optimization; Parallel region classification
Popis This paper presents a Machine Learning (ML) methodology for automatically tuning parallel applications in heterogeneous High Performance Computing (HPC) environments using Hardware Performance Counters (HwPCs). The methodology addresses three critical challenges: counter quantity versus accessibility tradeoff, data interpretation complexity, and dynamic optimization needs. The introduced ensemble-based methodology automatically identifies minimal yet informative HwPC sets for code region identification and tuning parameter optimization. Experimental validation demonstrates high accuracy in predicting optimal thread allocation ( > 0.90 K-fold accuracy) and thread affinity ( > 0.95 accuracy) while requiring only 4-6 HwPCs. Compared to search-based methods like OpenTuner, the methodology achieves competitive performance with dramatically reduced optimization time. The architecture-agnostic design enables consistent performance across CPU and GPU platforms. These results establish a foundation for efficient, portable, automatic, and scalable tuning of parallel applications.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info