The Cell is a heterogeneous multicore processor that has attracted much attention in the HPC community. The bulk of the computational workload on the Cell processor is carried by eight co-processors called SPEs. The SPEs are connected to each other and to main memory by a high speed bus called the element interconnect bus (EIB), which is capable of 204.8 GB/s. However, access to the main memory is limited by the performance of the memory interface controller (MIC) to 25.6 GB/s. It is, therefore, advantageous for the algorithms to be structured such that SPEs communicate directly between themselves over the EIB, and make less use of memory. We show that the actual bandwidth obtained for inter-SPE communication is strongly influenced by the assignment of threads to SPEs (thread-SPE affinity) in many realistic communication patterns. We identify the bottlenecks to optimal performance and use this information to determine good affinities for common communication patterns. Our solutions improve performance by up to a factor of two over the default assignment. We also discuss the optimization of affinity on a Cell blade consisting of two Cell processors, and provide a software tool to help with this. Our results will help Cell application developers choose good affinities for their applications.
Related links
Details
Title
Optimizing Assignment of Threads to SPEs of the Cell BE Processor
Publication Details
2009 IEEE International Symposium on Parallel & Distributed Processing
Resource Type
Conference proceeding
Conference
Tenth IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC), 10th (Rome, Italy, 05/23/2009–05/29/2009)
Publisher
Institute of Electrical and Electronics Engineers (IEEE); https://doi.org/10.1109/IPDPS14427.2009