Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleJune 2005
Automatic generation and tuning of MPI collective communication routines
ICS '05: Proceedings of the 19th annual international conference on SupercomputingPages 393–402https://doi.org/10.1145/1088149.1088202In order for collective communication routines to achieve high performance on different platforms, they must be able to adapt to the system architecture and use different algorithms for different situations. Current Message Passing Interface (MPI) ...
- ArticleJune 2005
affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system
ICS '05: Proceedings of the 19th annual international conference on SupercomputingPages 387–392https://doi.org/10.1145/1088149.1088201The non-uniform memory access times of modern cc-NUMA systems often impair performance for shared memory applications. This is especially true for applications exhibiting complex access patterns. To improve performance, a mechanism for co-locating ...
- ArticleJune 2005
Disk layout optimization for reducing energy consumption
ICS '05: Proceedings of the 19th annual international conference on SupercomputingPages 274–283https://doi.org/10.1145/1088149.1088186Excessive power consumption is becoming a major barrier to extracting the maximum performance from high-performance parallel systems. Therefore, techniques oriented towards reducing power consumption of such systems are expected to become increasingly ...
- ArticleJune 2005
Optimization of MPI collective communication on BlueGene/L systems
- George Almási,
- Philip Heidelberger,
- Charles J. Archer,
- Xavier Martorell,
- C. Chris Erway,
- José E. Moreira,
- B. Steinmacher-Burow,
- Yili Zheng
ICS '05: Proceedings of the 19th annual international conference on SupercomputingPages 253–262https://doi.org/10.1145/1088149.1088183BlueGene/L is currently the world's fastest supercomputer. It consists of a large number of low power dual-processor compute nodes interconnected by high speed torus and collective networks, Because compute nodes do not have shared memory, MPI is the ...
- ArticleJune 2005
Generating new general compiler optimization settings
ICS '05: Proceedings of the 19th annual international conference on SupercomputingPages 161–168https://doi.org/10.1145/1088149.1088171Finding nearly optimal optimization settings for modern compilers which can utilize a large number of optimizations is a combinatorially exponential problem. In this paper, we investigate whether in the presence of many optimization choices random ...
- ArticleJune 2005
Lightweight reference affinity analysis
ICS '05: Proceedings of the 19th annual international conference on SupercomputingPages 131–140https://doi.org/10.1145/1088149.1088167Previous studies have shown that array regrouping and structure splitting significantly improve data locality. The most effective technique relies on profiling every access to every data element. The high overhead impedes its adoption in a general ...
- ArticleJune 2005
Automatic thread distribution for nested parallelism in OpenMP
ICS '05: Proceedings of the 19th annual international conference on SupercomputingPages 121–130https://doi.org/10.1145/1088149.1088166OpenMP is becoming the standard programming model for shared-memory parallel architectures. One of its most interesting features in the language is the support for nested parallelism. Previous research and parallelization experiences have shown the ...
- ArticleJune 2005
A hybrid hardware/software approach to efficiently determine cache coherence Bottlenecks
ICS '05: Proceedings of the 19th annual international conference on SupercomputingPages 21–30https://doi.org/10.1145/1088149.1088153High-end computing increasingly relies on shared-memory multiprocessors (SMPs), such as clusters of SMPs, nodes of chip-multiprocessors (CMP) or large-scale single-system image (SSI) SMPs. In such systems, performance is often affected by the sharing ...