skip to main content
10.1145/224170.224398acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article

Lazy release consistency for hardware-coherent multiprocessors

Published: 08 December 1995 Publication History

Abstract

Release consistency is a widely accepted memory model for distributed shared memory systems. Eager release consistency represents the state of the art in release consistent protocols for hardware-coherent multiprocessors, while lazy release consistency has been shown to provide better performance for software distributed shared memory (DSM). Several of the optimizations performed by lazy protocols have the potential to improve the performance of hardware-coherent multiprocessors as well, but their complexity has precluded a hardware implementation. With the advent of programmable protocol processors it may become possible to use them after all. We present and evaluate a lazy release-consistent protocol suitable for machines with dedicated protocol processors. This protocol admits multiple concurrent writers, sends write notices concurrently with computation, and delays invalidations until acquire operations. We also consider a lazier protocol that delays sending write notices until release operations. Our results indicate that the first protocol outperforms eager release consistency by as much as 20% across a variety of applications. The lazier protocol, on the other hand, is unable to recoup its high synchronization overhead. This represents a qualitative shift from the DSM world, where lazier protocols always yield performance improvements. Based on our results, we conclude that machines with flexible hardware support for coherence should use protocols based on lazy release consistency, but in a less ''aggressively lazy'' form than is appropriate for DSM.

References

[1]
S. V. Adve and M. D. Hill. A Unified Formulation of Four Shared-Memory Models. IEEE Transactions on Parallel and Distributed Systems, 4(6):613--624, June 1993.]]
[2]
R. Bianchini and T. J. LeBlanc. Can High Bandwidth and Latency Justify Large Cache Blocks in Scalable Multiprocessors? In Proceedings of the 1994 International Conference on Parallel Processing, St. Charles, IL, August 1994. Expanded version available as TR 486, Computer Science Department, University of Rochester, January 1994.]]
[3]
R. Bianchini and L. Kontothanassis. Algorithms for Categorizing Multiprocessor Communication under Invalidate and Update-Based Coherence Protocols. In Proceedings of the Twenty-Eighth Annual Simulation Symposium, Phoenix, AZ, April 1995. Earlier version available as TR 533, Computer Science Department, University of Rochester, September 1994.]]
[4]
J. B. Carter, J. K. Bennett, and W. Zwaenepoel. Implementation and Performance of Munin. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles, pages 152--164, Pacific Grove, CA, October 1991.]]
[5]
K. Dackland, E. Elmroth, B. Kagstrom, and C. V. Loan. Parallel Block Matrix Factorizations on the Shared-Memory Multiprocessor IBM 3090 VF/600J. The International Journal of Supercomputer Applications, 6(1):69--97, Spring 1992.]]
[6]
M. Dubois, C. Scheurich, and F. A. Briggs. Synchronization, Coherence, and Event Ordering in Multiprocessors. Computer, 21(2):9--21, February 1988.]]
[7]
M. Dubois, J. C. Wang, L. A. Barroso, K. L. Lee, and Y. Chen. Delayed Consistency and its Effect on the Miss Rate of Parallel Programs. In Supercomputing'91 Proceedings, pages 197--7206, Albuquerque, NM, November 1991.]]
[8]
M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, and P. Stenstr~m. The Detection and Elimination of Useless Misses in Multiprocessors. In Proceedings of the Twentieth International Symposium on Computer Architecture, pages 88--97, San Diego, CA, May 1993.]]
[9]
S. J. Eggers and T. E. Jeremiassen. Eliminating False Sharing. In Proceedings of the 1991 International Conference on Parallel Processing, pages I:377--381, St. Charles, IL, August 1991.]]
[10]
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. L. Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In Proceedings of the Seventeenth International Symposium on Computer Architecture, pages 15--26, Seattle, WA, May 1990.]]
[11]
K. Gharachorloo, S. V. Adve, A. Gupta, J. L. Hennessy, and M. D. Hill. Programming for Different Memory Consistency Models. Journal of Parallel and Distributed Computing, 15:399--407, 1992.]]
[12]
N. Jouppi. Cache Write Policies and Performance. In Proceedings of the Twentieth International Symposium on Computer Architecture, San Diego, CA, May 1993.]]
[13]
P. Keleher, A. L. Cox, and W. Zwaenepoel. Lazy Release Consistency for Software Distributed Shared Memory. In Proceedings of the Nineteenth International Symposium on Computer Architecture, pages 13--21, Gold Coast, Australia, May 1992.]]
[14]
P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. In Proceedings of the USENIX Winter '94 Technical Conference, San Francisco, CA, January 1994.]]
[15]
L. I. Kontothanassis and M. L. Scott. Software Cache Coherence for Large Scale Multiprocessors. In Proceedings of the First International Symposium on High Performance Computer Architecture, pages 286--295, Raleigh, NC, January 1995.]]
[16]
J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy. The FLASH Multiprocessor. In Proceedings of the Twenty-First International Symposium on Computer Architecture, pages 302--313, Chicago, IL, April 1994.]]
[17]
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the Seventeenth International Symposium on Computer Architecture, pages 148--159, Seattle, WA, May 1990.]]
[18]
D. Lenoski, J. Laudon, K. Gharachorloo, W. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam. The Stanford Dash Multiprocessor. Computer, 25(3):63--79, March 1992.]]
[19]
K. Petersen and K. Li. Cache Coherence for Shared Memory Multiprocessors Based on Virtual Memory Support. In Proceedings of the Seventh International Parallel Processing Symposium, Newport Beach, CA, April 1993.]]
[20]
S. K. Reinhardt, J. R. Larus, and D. A. Wood. Tempest and Typhoon: User-level Shared-Memory. In Proceedings of the Twenty-First International Symposium on Computer Architecture, pages 325--336, Chicago, IL, April 1994.]]
[21]
J. P. Singh, W. Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Shared-Memory. ACM SIGARCH Computer Architecture News, 20(1):5--44, March 1992.]]
[22]
J. Skeppstedt and P. Stenstrom. Simple Compiler Algorithms to Reduce Ownership Overhead in Cache Coherence Protocols. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 286--296, San Jose, CA, October 1994.]]
[23]
J. E. Veenstra and R. J. Fowler. MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors. In Proceedings of the Second International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS '94), pages 201--207, Durham, NC, January--February 1994.]]

Cited By

View all
  • (2016)Static Network Reliability Estimation under the Marshall-Olkin CopulaACM Transactions on Modeling and Computer Simulation10.1145/277510626:2(1-28)Online publication date: 13-Jan-2016
  • (2016)Computing Bayesian Means Using SimulationACM Transactions on Modeling and Computer Simulation10.1145/273563126:2(1-26)Online publication date: 13-Jan-2016
  • (2016)FNMACM Transactions on Modeling and Computer Simulation10.1145/273563026:2(1-26)Online publication date: 29-Jan-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing
December 1995
875 pages
ISBN:0897918169
DOI:10.1145/224170
  • Chairman:
  • Sid Karin
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 1995

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cache Coherence
  2. Lazy Release Consistency
  3. Protocol Processors
  4. Shared Memory

Qualifiers

  • Article

Conference

SC '95
Sponsor:

Acceptance Rates

Supercomputing '95 Paper Acceptance Rate 69 of 241 submissions, 29%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)3
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2016)Static Network Reliability Estimation under the Marshall-Olkin CopulaACM Transactions on Modeling and Computer Simulation10.1145/277510626:2(1-28)Online publication date: 13-Jan-2016
  • (2016)Computing Bayesian Means Using SimulationACM Transactions on Modeling and Computer Simulation10.1145/273563126:2(1-26)Online publication date: 13-Jan-2016
  • (2016)FNMACM Transactions on Modeling and Computer Simulation10.1145/273563026:2(1-26)Online publication date: 29-Jan-2016
  • (2015)CCICheckProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830782(26-37)Online publication date: 5-Dec-2015
  • (2015)The Wi-STARK architecture for resilient real-time wireless communicationsACM SIGBED Review10.1145/2724942.272495211:4(61-66)Online publication date: 22-Jan-2015
  • (2015)Specific read only data management for memory hierarchy optimizationACM SIGBED Review10.1145/2724942.272495111:4(55-60)Online publication date: 22-Jan-2015
  • (2015)Minimizing energy under performance constraints on embedded platformsACM SIGBED Review10.1145/2724942.272495011:4(49-54)Online publication date: 22-Jan-2015
  • (2015)Revisiting read-ahead efficiency for raw NAND flash storage in embedded LinuxACM SIGBED Review10.1145/2724942.272494911:4(43-48)Online publication date: 22-Jan-2015
  • (2015)Microkernel dedicated for dynamic partial reconfiguration on ARM-FPGA platformACM SIGBED Review10.1145/2724942.272494711:4(31-36)Online publication date: 22-Jan-2015
  • (2015)On the energy efficiency of parallel multi-core vs hardware accelerated HD video decodingACM SIGBED Review10.1145/2724942.272494611:4(25-30)Online publication date: 22-Jan-2015
  • Show More Cited By

View Options

View options

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media