Optimizing MPI collectives on intel MIC through effective use of cache

Pinak Panigrahi; Sriram Kanchiraju; Ashok Srinivasan; Pallav Kumar Baruah; C D Sudheer

doi:10.1109/PDGC.2014.7030721

Back

Conference proceeding

Optimizing MPI collectives on intel MIC through effective use of cache

Pinak Panigrahi, Sriram Kanchiraju, Ashok Srinivasan, Pallav Kumar Baruah and C D Sudheer

2014 International Conference on Parallel, Distributed and Grid Computing, pp.88-93

International Conference on Parallel, Distributed and Grid Computing (Solan, India, 12/11/2014–12/13/2014)

12/01/2014

DOI: https://doi.org/10.1109/PDGC.2014.7030721

Metrics

107 Record Views

Abstract

Conference Title: 2014 International Conference on Parallel, Distributed and Grid Computing (PDGC) Conference Start Date: 2014, Dec. 11 Conference End Date: 2014, Dec. 13 Conference Location: Solan, India The Intel MIC architecture, implemented in the Xeon Phi coprocessor, is targeted at highly parallel applications. In order to exploit it, one needs to make full use of simultaneous multi-threading, which permits four simultaneous threads per core. Our results also show that distributed tag directories can be a greater bottleneck than the ring for small messages when multiple threads access the same cache line. Careful design of algorithms and implementations based on these results can yield substantial performance improvement. We demonstrate these ideas by optimizing MPI collective calls. We obtain a speedup of 9x on barrier and a speed-up of 10x on broadcast, when compared with Intel's MPI implementation. We also show the usefulness of our collectives in two realistic codes: particle transport and the load balancing phase in QMC. Another important contribution of our work lies in showing that optimization techniques -- such as double buffering -- used with programmer controlled caches are also useful on MIC. These results can help optimize other communication intensive codes running on MIC.

Details

Title: Optimizing MPI collectives on intel MIC through effective use of cache
Publication Details: 2014 International Conference on Parallel, Distributed and Grid Computing, pp.88-93
Resource Type: Conference proceeding
Conference: International Conference on Parallel, Distributed and Grid Computing (Solan, India, 12/11/2014–12/13/2014)
Publisher: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Identifiers: 99380178096306600
Academic Unit: Hal Marcus College of Science and Engineering ; Computer Science
Language: English

Optimizing MPI collectives on intel MIC through effective use of cache

Metrics

Abstract

Related links

Details

University of West Florida Social media