Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

Exascalable Communication for Modern Supercomputing

Abstract

Supercomputing applications rely on strong scaling to achieve faster results on a larger number of processing units. But, at the strong-scaling limit, where communication is a relatively large portion of an application’s runtime, today’s state-of-the-art hybrid MPI+threads applications perform slower than their traditional MPI everywhere counterparts. This slowdown is primarily due to the supercomputing community’s outdated view: the network is a single device. NICs of modern interconnects feature multiple network hardware contexts. These parallel interfaces into the network are not utilized in MPI+threads applications today because MPI libraries still use conservative approaches to maintain MPI’s ordering constraints. MPI libraries do so because domain scientists today do not do a good job exposing logically parallel communication in their multithreaded MPI communication even though the existing MPI standard provides them with opportunities to do so. Only when domain scientists and MPI developers take a step forward together can we eliminate the communication bottleneck in MPI+threads applications.

This dissertation eliminates the communication bottleneck by bridging the two ends of the HPC stack—MPI library developers and domain experts—that typically do not talk to each other directly. Through collaborations with system researchers and MPI library developers, we develop a fast MPI+threads library capable of achieving scaling communication throughput similar to that of MPI everywhere and make high-speed multithreaded communication a reality. Through collaborations with domain scientists, we use various designs to expose logically parallel communication to the fast MPI+threads library on exemplar applications targeted to run on the upcoming exascale systems. Our conversations with the end-users—the domain experts—educate us on the usability aspects of the various designs. Hence, in addition to the performance comparisons of the various designs, we discuss the strengths and limitations of the different designs and provide our design recommendation for the supercomputing community. Through such collaborations on both ends of the HPC stack, we unlock the true potential of the MPI+threads programming model. Prominent modern applications and computational frameworks, such as Uintah, WOMBAT, and Legion, now perform significantly faster (up to 2x) at the strong-scaling limit.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View