High Performance Computing Architectures for Accelerating Data-Intensive Applications in Heterogeneous Environments

Authors

  • Charlotte G. Evie High Performance Computing (HPC) Architect, Australia Author

Keywords:

High-performance computing, heterogeneous architectures, data-intensive applications, accelerators, memory hierarchy, parallel processing, FPGA, GPU, data movement optimization

Abstract

In the data-centric era, the increasing complexity and scale of data-intensive applications across scientific research, industry, and artificial intelligence necessitate the evolution of high-performance computing (HPC) architectures. Traditional homogeneous systems are being rapidly outpaced by heterogeneous computing environments that integrate CPUs, GPUs, FPGAs, and emerging accelerators. This paper investigates the design and deployment of modern HPC architectures to accelerate data-intensive workloads within heterogeneous ecosystems. We survey recent advancements in hardware accelerators, data movement optimization, memory hierarchies, and parallel programming models. Emphasis is placed on architectural strategies that improve throughput, scalability, and energy efficiency. The paper also presents a critical literature review of developments, identifies performance bottlenecks, and proposes future directions for next-generation heterogeneous HPC systems.

References

Mittal, Sparsh, and Jeffrey S. Vetter. "A Survey of Methods for Analyzing and Improving GPU Energy Efficiency." ACM Computing Surveys (CSUR), vol. 47, no. 2, 2015, pp. 1–23.

Putnam, Andrew, et al. "A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services." Proceedings of the 41st Annual International Symposium on Computer Architecture (ISCA), 2014, pp. 13–24.

NVIDIA. Grace Hopper Superchip Architecture Overview. NVIDIA Corporation, 2024.

Intel. Falcon Shores: Merging x86 and GPU Computing. Intel Research Blog, 2024.

The Khronos Group. SYCL 2023 Specification. SYCL Working Group, 2023.

Williams, Samuel, et al. "Roofline: An Insightful Visual Performance Model for Multicore Architectures." Communications of the ACM, vol. 52, no. 4, 2009, pp. 65–76.

Farmahini-Farahani, Arash, et al. "Memory Systems for High-Performance Computing: A Survey." ACM Computing Surveys (CSUR), vol. 50, no. 4, 2017, pp. 1–38.

Besta, Maciej, et al. "Communication-Efficient Sparse Collective Communication and Its Application to Accelerating Deep Learning." Proceedings of the IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), 2020.

Durillo, Juan J., and Thomas Fahringer. "From Parallel Task Scheduling to Energy Efficiency: A Survey of the State of the Art." Computing, vol. 98, no. 7, 2016, pp. 677–722.

Pavlo, Andrew, and Matei Zaharia. "Data Management for Machine Learning: Challenges, Techniques, and Systems." Communications of the ACM, vol. 63, no. 2, 2020, pp. 34–42.

Jha, Shantenu, et al. "Computing Systems for Data-Driven Science." Future Generation Computer Systems, vol. 111, 2020, pp. 878–902.

Chen, Yu-Hsin, et al. "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks." IEEE Journal of Solid-State Circuits, vol. 52, no. 1, 2017, pp. 127–138.

Downloads

Published

2025-01-13