Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches
| Název česky | Akcelerace dRMSD výpočtu a efektivní užití GPU cache |
|---|---|
| Autoři | |
| Rok publikování | 2015 |
| Druh | Článek ve sborníku |
| Konference | Proceedings of IEEE International Conference on High Performance Computing & Simulation |
| Fakulta / Pracoviště MU | |
| Citace | |
| Doi | https://doi.org/10.1109/HPCSim.2015.7237020 |
| Obor | Informatika |
| Klíčová slova | RMSD; GPU; code optimization; cache |
| Popis | In this paper, we introduce the GPU acceleration of dRMSD algorithm, used to compare different structures of a molecule. Comparing to multithreaded CPU implementation, we have reached 13.4x speedup in clustering and 62.7x speedup in 1:1 dRMSD computation using mid-end GPU. The dRMSD computation exposes strong memory locality and thus is compute-bound. Along with conservative implementation using shared memory, we have decided to implement variants of the algorithm using GPU caches to maintain memory locality. Our implementation using cache reaches 96.5 % and 91.6 % of shared memory performance on Fermi and Maxwell, respectively. We have identified several performance pitfalls related to cache blocking in compute-bound codes and suggested optimization techniques to improve the performance. |
| Související projekty: |