skip to main content
10.1145/1128022.1128066acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
Article

Lazy direct-to-cache transfer during receive operations in a message passing environment

Published: 03 May 2006 Publication History

Abstract

The focus of this work is on techniques that promise to reduce the message delivery latency in message passing interface (MPI) environments. The main contributors to message delivery latency in message passing environments are the copying operations needed to transfer and bind a received message to the consuming process/thread. To reduce this copying overhead and to reach toward finer granularity, we introduce architectural extensions comprising of a specialized network cache and instructions to manage the operations of this extension. In this work we study the caching environment and evaluate a new technique called Lazy Direct-to-Cache Transfer (DTCT). Our simulations show that messages can be bound and kept into a network cache where they persist long enough to be consumed. We also demonstrate that lazy DTCT provides a significant reduction in the access latency for I/O intensive environments such as message passing configurations and SMPs without polluting the data cache.

References

[1]
F. Khunjush, M. Watheq El-Kharashi, K. F. Li, and N. J. Dimopoulos "Network Processor Design: Issues and Challenges", Proceedings, 2003 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing pp. 164--168, Aug. 2003.
[2]
"Evaluation of Direct-To-Cache Transfer during Receive Operations in a Message Passing Environment",Proceedings, Second Workshop on Advanced Networking and Communication Hardware, ANCHOR2005, in conjunction with ISCA-32, Madison, USA, June 2005.
[3]
F. Khunjush and N. J. Dimopoulos, "Hiding Message Delivery and Reducing Memory Access Latency by providing Direct-to-Cache Transfer during Receive Operations in a Message Passing Environment", Proceedings, the Sixth Workshop on Memory Performance: Dealing with Applications, Systems, and Architecture, MEDEA 2005, held in Conjunction with PACT05, St. Louis, USA, September 2005.
[4]
A. Afsahi and N. J. Dimopoulos, "Architectural Extensions to Support Efficient Communication Using Message Prediction", Proceedings, 16th Annual International Symposium on High Performance Computing Systems and Applications, HPCS2002, pp. 20--27, June 2002.
[5]
A. Afsahi and N. J. Dimopoulos, "Efficient Communication Using Message Prediction for Cluster of Multiprocessors", Fourth Workshop on Communication, Architecture, and Applications for Network-based Parallel Computing, CANPC'00, in conjunction with HPCA-6, Lecture Notes in Computer Science, No. 1797, pp. 162--178, January, 2000.
[6]
D. H. Bailey, T. Harsis, W. Saphir, R. V. der Wijngaart, A. Woo and M. Yarrow, "The NAS Parallel Benchmarks 2.0: Report NAS-95-020", Nasa Ames Research Center, December 1995.
[7]
T. Austin, E. Larson, D. Ernst, "SimpleScalar: an infrastructure for computer system modeling", IEEE Computer, Vol 35, Issue 2, February 2002, pp. 59--67.
[8]
N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L. Seitz, J. N. Seizovic and W-K. Su, "Myrinet: A Gigabit-per-Second Local Area Network", IEEE Micro, Feb. 1995.
[9]
InfiniBand Trade Association. InfiniBand Architecture Specification, Release 1.0, October 24 2000.
[10]
C. Dubnicki, A. Bilas, Y. Chen, S. Damianakis and K. Li, "VMMC-2: Efficient Support for Reliable, Connection-Oriented Communication", Proceedings of the Hot Interconnect7, 1997.
[11]
S. H. Rodrigues, T. E. Anderson and D. E. Culler, "High-Performance Local Area Communication with Fast Sockets", USENIX 1997, Jan. 1997. R. Sheifert, Gigabit Ethernet Addison-Wesley, 1998.
[12]
A. Basu, M. Welsh, T. V. Eicken, "Incorporating Memory Management into User-Level Network Interface", Hot Interconnects V, Aug. 1997.
[13]
M. Blumrich, K. Li, R. Alpert, C. Dubnicki, E. Felten, and J. Sandberg, "A Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer", Proceedings, 21st Annual International Symposium on Computer Architecture ISCA-21, 1994, pp. 142--153.
[14]
M. Banikazemi, R. K. Govindaraju, R. Blackmore, D. K. Panda, "MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems," IEEE Trans. Parallel Distri. Systems, Vol. 12, No. 10, Oct. 2001, pp. 1081--1093.
[15]
H. Chu, "Zero-copy TCP in Solaris", Proceedings of the USENIX Annual Technical Conference, 1996, pp. 253--263.
[16]
Alacritech, Inc. Allacritech / SLIC technology overview. http://www.alacritech.com/html/tech_review.html.
[17]
N. L. Binkert, R. G. Dreslinski, E. G. Hallnor, Lisa R. Hsu, S. E. Raasch, A. L. Schultz, and S. K. Reinhardt, "The Performance Potential of an Integrated Network Interface", Proceedings, First Workshop on Advanced Networking and Communication Hardware, ANCHOR2004, in conjunction with ISCA-31, June 2004.
[18]
N. L. Binkert, L. R. Hsu, A. G. Saidi, R. G. Dreslinski, A. L. Schultz, and S. K. Reinhardt. "Performance Analysis of System Overheads in TCP/IP Workloads," Proc. 14th Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT 2005), Sept. 2005.
[19]
R. Huggahalli, R. Iyer, and S. Tetrick, "Direct Cache Access for High Bandwidth Network I/O," In Proc. 32nd Annual International Symposium on Computer Architecture, ISCA 32, pp. 50--59, June 2005.
[20]
S. S. Mukherjee and M. D. Hill, "Using Prediction to Accelerate Coherence Protocols", Proceedings, 25th Annual International Symposium on Computer Architecture ISCA-25, 1998.
[21]
M. M. K. Martin, P. J. Harper, D. J. Sorin, M. D. Hill, and D. A. Wood, "Using Destination-Set Prediction to Improve the Latency /Bandwidth Tradeoff in Shared Memory Multiprocessors," Proceedings, 30th Annual International Symposium on Computer Architecture ISCA-30 June 2003.
[22]
M. E. Acacio, J. Gonzlez, J. M. Garca, and J. Duato, "Owner Prediction for Accelerating Cache-to-Cache Transfers in a cc-NUMA Architecture," In Proceedings of the 2002 ACM/IEEE conference on Supercomputing, SC2002, Baltimore, Maryland, 2002.
[23]
J. Kim and D. J. Lilja, "Characterization of Communication Patterns in Message-Passing Parallel Scientific Application Programs", Proceedings of the Workshop on Communication, Architecture, and Applications for Network-based Parallel Computing, HPCA-4, February 1998, pp. 202--216.
[24]
P. H. Worley and I. T. Foster, "Parallel Spectral Transform Shallow Water Model: A Runtime-tunable parallel benchmark code", Proceedings of the Scalable High Performance Computing Conference, 1994, pp. 207--214.
[25]
S. Hioki, "Construction of Staples in Lattice Gauge Theory on a Parallel Computer", Parallel Computing, Volume 22, No. 10, October 1996, pp. 1335--1344.
[26]
N. Agarwal and N. J. Dimopoulos, "Using CoDeL to Rapidly Prototype Network Processor Extensions", Proceedings of Computer Systems: Architectures, Modeling, and Simulation: Third and Fourth International Workshops, SAMOS 2004, Samos, Greece, July, 2004, pp. 333--342.

Cited By

View all
  • (2015)Stream my modelsProceedings of the 18th International Conference on Model Driven Engineering Languages and Systems10.5555/3351736.3351750(80-89)Online publication date: 30-Sep-2015
  • (2011)Single-port and multi-port collective communication operations on single and dual Cell BE processor systemsInternational Journal of Communication Networks and Distributed Systems10.1504/IJCNDS.2011.0405596:4(373-391)Online publication date: 1-Jun-2011
  • (2009)Hiding message delivery latency using Direct-to-Cache-Transfer techniques in message passing environmentsMicroprocessors & Microsystems10.1016/j.micpro.2009.07.00133:7-8(430-440)Online publication date: 1-Oct-2009
  • Show More Cited By

Index Terms

  1. Lazy direct-to-cache transfer during receive operations in a message passing environment

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CF '06: Proceedings of the 3rd conference on Computing frontiers
    May 2006
    430 pages
    ISBN:1595933026
    DOI:10.1145/1128022
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 May 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. MPI
    2. latency hiding
    3. network cache

    Qualifiers

    • Article

    Conference

    CF06
    Sponsor:
    CF06: Computing Frontiers Conference
    May 3 - 5, 2006
    Ischia, Italy

    Acceptance Rates

    Overall Acceptance Rate 273 of 785 submissions, 35%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2015)Stream my modelsProceedings of the 18th International Conference on Model Driven Engineering Languages and Systems10.5555/3351736.3351750(80-89)Online publication date: 30-Sep-2015
    • (2011)Single-port and multi-port collective communication operations on single and dual Cell BE processor systemsInternational Journal of Communication Networks and Distributed Systems10.1504/IJCNDS.2011.0405596:4(373-391)Online publication date: 1-Jun-2011
    • (2009)Hiding message delivery latency using Direct-to-Cache-Transfer techniques in message passing environmentsMicroprocessors & Microsystems10.1016/j.micpro.2009.07.00133:7-8(430-440)Online publication date: 1-Oct-2009
    • (2008)Extended characterization of DMA transfers on the Cell BE processor2008 IEEE International Symposium on Parallel and Distributed Processing10.1109/IPDPS.2008.4536190(1-8)Online publication date: Apr-2008
    • (2007)Comparing direct-to-cache transfer policies to TCP/IP and M-VIA during receive operations in MPI environmentsProceedings of the 5th international conference on Parallel and Distributed Processing and Applications10.5555/2395970.2395994(208-222)Online publication date: 29-Aug-2007
    • (2007)Using the Cell Processor As a Network Assist to Minimize Latency2007 Canadian Conference on Electrical and Computer Engineering10.1109/CCECE.2007.239(936-939)Online publication date: Apr-2007
    • (2007)Comparing Direct-to-Cache Transfer Policies to TCP/IP and M-VIA During Receive Operations in MPI EnvironmentsParallel and Distributed Processing and Applications10.1007/978-3-540-74742-0_21(208-222)Online publication date: 2007

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media