2015 IEEE 22nd International Conference on High Performance Computing (HiPC), pp.165-173
International Conference on High Performance Computing, 22nd (Bengaluru, India , 12/16/2015–12/19/2015)
01/01/2015
Metrics
79 Record Views
Abstract
POWER8 is a new generation of POWER processor capable of 8-way simultaneous multi-threading per core. High-performance computing capabilities, such as high amount of instruction-level and thread level parallelism, are integrated with a deep memory hierarchy. Fine-grained parallel applications running on such architectures often rely on an efficient barrier implementation for synchronization. We present a variety of barrier implementations for a 4-chip POWER8 node. These implementations are optimized based on a careful study of the POWER8 memory sub-system. Our best implementation yields one to two orders of magnitude lower time than the current MPI and POSIX threads based barrier implementations on POWER8. Apart from providing efficient barrier implementations, an additional significance of this work lies in demonstrating how certain features of the memory subsystem, such as NUMA access to remote L3 cache and the impact of prefetching, can be used to design efficient primitives on the POWER8.
Related links
Details
Title
Efficient Barrier Implementation on the POWER8 Processor
Publication Details
2015 IEEE 22nd International Conference on High Performance Computing (HiPC), pp.165-173
Resource Type
Conference proceeding
Conference
International Conference on High Performance Computing, 22nd (Bengaluru, India , 12/16/2015–12/19/2015)
Publisher
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)