Automatic tuning based on hardware performance counters and machine learning
| Autoři | |
|---|---|
| Rok publikování | 2026 |
| Druh | Článek v odborném periodiku |
| Časopis / Zdroj | FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE |
| Fakulta / Pracoviště MU | |
| Citace | |
| www | https://www.sciencedirect.com/science/article/pii/S0167739X25006521 |
| Doi | https://doi.org/10.1016/j.future.2025.108358 |
| Klíčová slova | Hardware performance counters; Automatic dimension reduction; Machine learning ensembles; Tuning parameter optimization; Parallel region classification |
| Popis | This paper presents a Machine Learning (ML) methodology for automatically tuning parallel applications in heterogeneous High Performance Computing (HPC) environments using Hardware Performance Counters (HwPCs). The methodology addresses three critical challenges: counter quantity versus accessibility tradeoff, data interpretation complexity, and dynamic optimization needs. The introduced ensemble-based methodology automatically identifies minimal yet informative HwPC sets for code region identification and tuning parameter optimization. Experimental validation demonstrates high accuracy in predicting optimal thread allocation ( > 0.90 K-fold accuracy) and thread affinity ( > 0.95 accuracy) while requiring only 4-6 HwPCs. Compared to search-based methods like OpenTuner, the methodology achieves competitive performance with dramatically reduced optimization time. The architecture-agnostic design enables consistent performance across CPU and GPU platforms. These results establish a foundation for efficient, portable, automatic, and scalable tuning of parallel applications. |
| Související projekty: |