Dr. Ashok Srinivasan

Professor, Hal Marcus College of Science and Engineering

Supercomputing

High Performance Computing

Mathematical Software

Applications in Computational Sciences

Supercomputing for Public Policy Analysis

Scalable Algorithms

Book chapter

Random number generators for parallel applications

by Ashrok Srinivasan, David M. Ceperley and Michael Mascagni

Published 2009

Advances in Chemical Physics, Volume 105: Monte Carlo Methods in Chemical Physics, 1 - 30

Book chapter Peer reviewed

A Synchronous Mode MPI Implementation on the Cell BETM Architecture

by Murali Krishna, Arun Kumar, Naresh Jayam, Ganapathy Senthilkumar, Pallav K. Baruah, Raghunath Sharma, Shakti Kapoor and Ashok Srinivasan

Published 2007

Parallel and Distributed Processing and Applications: 5th International Symposium, ISPA 2007, Niagara Falls, Canada, August 29-31, 2007, Proceedings, 982 - 991

The Cell Broadband Engine shows much promise in high performance computing applications. The Cell is a heterogeneous multi-core processor, with the bulk of the computational work load meant to be borne by eight co-processors called SPEs. Each SPE operates on a distinct 256 KB local store, and all the SPEs also have access to a shared 512 MB to 2 GB main memory through DMA. The unconventional architecture of the SPEs, and in particular their small local store, creates some programming challenges. We have provided an implementation of core features of MPI for the Cell to help deal with this. This implementation views each SPE as a node for an MPI process, with the local store used as if it were a cache. In this paper, we describe synchronous mode communication in our implementation, using the rendezvous protocol, which makes MPI communication for long messages efficient. We further present experimental results on the Cell hardware, where it demonstrates good performance, such as throughput up to 6.01 GB/s and latency as low as 0.65 μs on the pingpong test. This demonstrates that it is possible to efficiently implement MPI calls even on the simple SPE cores.

Book chapter Peer reviewed

Optimization of Collective Communication in Intra-cell MPI

by M. K. Velamati, A. Kumar, N. Jayam, G. Senthilkumar, P. K. Baruah, R. Sharma, S. Kapoor and A. Srinivasan

Published 2007

High Performance Computing – HiPC 2007: 14th International Conference, Goa, India, December 18-21, 2007, Proceedings, 488 - 499

The Cell is a heterogeneous multi-core processor, which has eight co-processors, called SPEs. The SPEs can access a common shared main memory through DMA, and each SPE can directly operate on a small distinct local store. An MPI implementation can use each SPE as if it were a node for an MPI process. In this paper, we discuss the efficient implementation of collective communication operations for intra-Cell MPI, both for cores on a single chip, and for a Cell blade. While we have implemented all the collective operations, we describe in detail the following: barrier, broadcast, and reduce. The main contributions of this work are (i) describing our implementation, which achieves low latencies and high bandwidths using the unique features of the Cell, and (ii) comparing different algorithms, and evaluating the influence of the architectural features of the Cell processor on their effectiveness.

Book chapter Peer reviewed

A buffered-mode MPI implementation for the cell BE (TM) processor

by Arun Kumar, Ganapathy Senthilkumar, Murali Krishna, Naresh Jayam, Pallav K. Baruah, Raghunath Sharma, Ashok Srinivasan and Shakti Kapoor

Published 01/01/2007

Computational Science – ICCS 2007: 7th International Conference, Beijing China, May 27-30, 2007, Proceedings, Part I, 4487, 603 - 610

The Cell Broadband Engine (TM) is a heterogeneous multi-core architecture developed by IBM, Sony and Toshiba. It has eight computation intensive cores (SPEs) with a small local memory, and a single PowerPC core. The SPEs have a total peak single precision performance of 204.8 Gflops/s, and 14.64 Gflops/s in double precision. Therefore, the Cell has a good potential for high performance computing. But the unconventional architecture makes it difficult to program. We propose an implementation of the core features of MPI as a solution to this problem. This can enable a large class of existing applications to be ported to the Cell. Our MPI implementation attains bandwidth up to 6.01 GB/s, and latency as small as 0.41 mu s. The significance of our work is in demonstrating the effectiveness of intra-Cell MPI, consequently enabling the porting of MPI applications to the Cell with minimal effort.

Book chapter Peer reviewed

Domain decomposition for particle methods on the sphere

by Ömer Eğecioğlu and Ashok Srinivasan

Published 06/16/2005

Parallel Algorithms for Irregularly Structured Problems: Third International Workshop, IRREGULAR '96, Santa Barbara, CA, USA, August 19 - 21, 1996. Proceedings, 119 - 130

We present an algorithm for efficient parallelization of particle methods when the domain is the surface of a sphere. Such applications typically arise when dealing with directional data. We propose a domain decomposition scheme based on geometric partitioning that provides domains suitable for practical implementation. This algorithm has the advantage of being fast enough to be applied dynamically, and at the same time provides good partitions, comparable in quality to those produced by spectral graph partitioning schemes.

Book chapter Peer reviewed

Parallel Simulation of Carbon Nanotube Based Composites

by Jyoti Kolhe, Usha Chandra, Sirish Namilae, Ashok Srinivasan and Namas Chandra

Published 2004

High Performance Computing - HiPC 2004: 11th International Conference, Bangalore, India, December 19-22, 2004, Proceedings, 211

Computational simulation plays a vital role in nanotechnology. Molecular dynamics (MD) is an important computational method to understand the fundamental behavior of nanoscale systems, and to transform that understanding into useful products. MD computations, however, are severely restricted by the spatial and temporal scales of simulations. This paper describes the methods used to achieve effective spatial parallelization of a MD code that is based on a multi-body bond order potential. The material system studied here is a carbon nanotube (CNT). We discuss the scientific and computational issues in the development and implementation of parallel algorithms, when the domain needs to be discretized with fine granularity. Specific issues in terms of neighbor-list computation, communication reduction, and cache awareness are delineated, with corresponding benefit in terms of speed up.Important practical problems relevant to CNT based composites are studied, and the effectiveness of various strategies reported. Our implementation achieves efficient parallelization at a finer granularity compared with published works on CNTs with complex configurations.

Book chapter Peer reviewed

Improved Monte Carlo Linear Solvers Through Non-diagonal Splitting

by Ashok Srinivasan and Vkram Aggarwal

Published 2003

Computational Science and Its Applications — ICCSA 2003: International Conference, Montreal, Canada, May 18-21, 2003, Proceedings, Part III, 168 - 177

Fast, but approximate, solutions to linear algebra problems have many potential applications, such as in graph partitioning, preconditioning, information retrieval, etc. Monte Carlo techniques appear attractive for such needs. While Monte Carlo linear solvers have a long history, their application has been limited due to slow convergence. Despite the development of techniques to improve their accuracy, current methods suffer from the drawback that they are stochastic realizations of inherently poor iterative methods. The reason for such choices is the need for efficient Monte Carlo implementation, which has restricted the splittings that are considered. However, in this paper we demonstrate that such restrictions are not necessarily required, and that efficient Monte Carlo implementations are possible even with splittings that do not appear amenable to it.

Book chapter Peer reviewed

Monte Carlo techniques for estimating the Fiedler vector in graph applications

by Ashok Srinivasan and Michael Mascagni

Published 2002

Computational Science — ICCS 2002: International Conference Amsterdam, The Netherlands, April 21–24, 2002 Proceedings, Part II, 2330, 2, 635 - 645

Determining the Fiedler vector of the Laplacian or adjacency matrices of graphs is the most computationally intensive component of several applications, such as graph partitioning, graph coloring, envelope reduction, and seriation. Often an approximation of the Fiedler vector is sufficient. We discuss issues involved in the use of Monte Carlo techniques for this purpose.

Book chapter Peer reviewed

A compact storage scheme for fast wavelet-based subregion retrieval

by A. S Poulakidas, A Srinivasan, O Egecioglu, O Ibarra and T Yang

Published 1997

Computing and Combinatorics: Third Annual International Conference, COCOON '97, Shanghai, China, August 20-22, 1997. Proceedings, 1276, 353 - 362

Home Computing and Combinatorics Conference paper
A compact storage scheme for fast wavelet-based subregion retrieval
Session 11: Algorithms and Applications

A. S. Poulakidas, A. Srinivasan, O. Egecioglu, O. Ibarra & T. Yang
Conference paper
First Online: 01 January 2006
94 Accesses

1 Citations

Part of the Lecture Notes in Computer Science book series (LNCS,volume 1276)

Abstract
In an image browsing environment there is need for progressively viewing image subregions at various resolutions. We describe a storage scheme that accomplishes good image compression, while supporting fast image subregion retrieval. We evaluate analytically and experimentally the compression performance of our algorithm. We also provide results on the speed of the algorithm to demonstrate its effectiveness, and present an extension to a client/server environment.

Dr. Ashok Srinivasan

Professor, Hal Marcus College of Science and Engineering

List of works

University of West Florida Social media