Time Predictable Modeling Method for GPU Architecture with SIMT and Cache Miss Awareness

Shaojie Zhang

doi:10.26689/jera.v8i2.6323

Download PDF

Keywords

Heterogeneous computing
GPU
Architecture modeling
Time predictability

DOI

10.26689/jera.v8i2.6323

Submitted : 2024-02-28

Accepted : 2024-03-14

Published : 2024-03-29

Abstract

Graphics Processing Units (GPUs) are used to accelerate computing-intensive tasks, such as neural networks, data analysis, high-performance computing, etc. In the past decade or so, researchers have done a lot of work on GPU architecture and proposed a variety of theories and methods to study the microarchitectural characteristics of various GPUs. In this study, the GPU serves as a co-processor and works together with the CPU in an embedded real-time system to handle computationally intensive tasks. It models the architecture of the GPU and further considers it based on some excellent work. The SIMT mechanism and Cache-miss situation provide a more detailed analysis of the GPU architecture. In order to verify the GPU architecture model proposed in this article, 10 GPU kernel_task and an Nvidia GPU device were used to perform experiments. The experimental results showed that the minimum error between the kernel task execution time predicted by the GPU architecture model proposed in this article and the actual measured kernel task execution time was 3.80%, and the maximum error was 8.30%.

References

Hong S, Kim H, 2009. An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness. SIGARCH Comput. Archit. News, 37(3): 152–163.

Wong H, Papadopoulou M-M, Sadooghi-Alvandi M, et al., 2010, Proceedings of the 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), March 28–30, 2010: Demystifying GPU Microarchitecture Through Microbenchmarking. White Plains, 235–246.

Mei X, Chu X, 2017, Dissecting GPU Memory Hierarchy Through Microbenchmarking. IEEE Transactions on Parallel and Distributed Systems, 28(1): 72–86.

Amaris M, Cordeiro D, Goldman A, et al., 2015, Proceedings of the 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), December 16–19: A Simple BSP-based Model to Predict Execution Time in GPU Applications. Bengaluru, 285–294.

Liu G, Wang S, Bao Y, 2021, Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), September 26–29, 2021: SEER: A Time Prediction Model for CNNs from GPU Kernel’s View, Atlanta, 173–185.

Abdelkhalik H, Arafa Y, Santhi N, et al., 2022, Proceedings of 2022 IEEE High-Performance Extreme Computing Conference (HPEC), September 19–23, 2022: Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-Level Analysis. Waltham, 1–8.

Wang Q, Chu X, 2020, GPGPU Performance Estimation with Core and Memory Frequency Scaling. IEEE

Transactions on Parallel and Distributed Systems 31(12): 2865–2881.

Restuccia F, Biondi A, 2021, Proceedings of the 2021 IEEE Real-Time Systems Symposium (RTSS), December 7–10, 2021: Time-Predictable Acceleration of Deep Neural Networks on FPGA SoC Platforms, Dortmund, 441–454.

Hong S, Kim H, 2010, Proceedings of the 37th Annual International Symposium on Computer Architecture, June 19–23, 2010: An Integrated GPU Power and Performance Model. Saint-Malo, 280–289.

Song S, Su C, Rountree B, et al., 2013, Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, May 20–24, 2013: A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures, Cambridge, 673–686.

CUDA C++ Programming Guide, n.d., viewed September 9, 2023, http://docs.nvidia.com/cuda/cuda-cprogramming-guide/

Che S, Boyer M, Meng J, et al., 2009, Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), October 4–6, 2009: Rodinia: A Benchmark Suite for Heterogeneous Computing, 44–54.

Preparing An Application for Profiling, n.d., viewed September 10, 2023, http://docs.nvidia.com/cuda/profiler-usersguide