skip to main content
research-article
Open access

DRFx: An Understandable, High Performance, and Flexible Memory Model for Concurrent Languages

Published: 15 September 2016 Publication History

Abstract

The most intuitive memory model for shared-memory multi-threaded programming is sequential consistency (SC), but it disallows the use of many compiler and hardware optimizations and thus affects performance. Data-race-free (DRF) models, such as the C++11 memory model, guarantee SC execution for data-race-free programs. But these models provide no guarantee at all for racy programs, compromising the safety and debuggability of such programs. To address the safety issue, the Java memory model, which is also based on the DRF model, provides a weak semantics for racy executions. However, this semantics is subtle and complex, making it difficult for programmers to reason about their programs and for compiler writers to ensure the correctness of compiler optimizations.
We present the drfx memory model, which is simple for programmers to understand and use while still supporting many common optimizations. We introduce a memory model (MM) exception that can be signaled to halt execution. If a program executes without throwing this exception, then drfx guarantees that the execution is SC. If a program throws an MM exception during an execution, then drfx guarantees that the program has a data race. We observe that SC violations can be detected in hardware through a lightweight form of conflict detection. Furthermore, our model safely allows aggressive compiler and hardware optimizations within compiler-designated program regions. We formalize our memory model, prove several properties of this model, describe a compiler and hardware design suitable for drfx, and evaluate the performance overhead due to our compiler and hardware requirements.

References

[1]
S. Adve and K. Gharachorloo. 1996. Shared memory consistency models: A tutorial. Computer 29, 12 (1996), 66--76.
[2]
Sarita V. Adve and Hans-J. Boehm. 2010. Memory models: A case for rethinking parallel languages and hardware. Commun. ACM 53, 8 (Aug. 2010), 90--101.
[3]
S. V. Adve and M. D. Hill. 1990. Weak ordering—A new definition. In Proceedings of the 17th Annual International Symposium on Computer Architecture. ACM, 2--14.
[4]
S. V. Adve, M. D. Hill, B. P. Miller, and R. H. B. Netzer. 1991. Detecting data races on weak memory systems. In Proceedings of the 18th Annual International Symposium on Computer Architecture. 234--243.
[5]
Wonsun Ahn, Shanxiang Qi, Jae-Woo Lee, Marios Nicolaides, Xing Fang, Josep Torrellas, David Wong, and Samuel Midkiff. 2009. BulkCompiler: High-performance sequential consistency through cooperative compiler and hardware support. In Proceedings of the 42nd International Symposium on Microarchitecture.
[6]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques.
[7]
C. Blundell, M. M. K. Martin, and Thomas F. Wenisch. 2009. InvisiFence: Performance-transparent memory ordering in conventional multiprocessors. In Proceedings of the 36th Annual International Symposium on Computer Architecture.
[8]
H. J. Boehm. 2009. Simple thread semantics require race detection. In FIT Session at PLDI.
[9]
H. J. Boehm and S. Adve. 2008. Foundations of the C++ concurrency memory model. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 68--78.
[10]
Michael D. Bond, Katherine E. Coons, and Kathryn S. McKinley. 2010. PACER: Proportional detection of data races. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, New York, NY, 255--268.
[11]
C. Boyapati, R. Lee, and M. Rinard. 2002. Ownership types for safe programming: Preventing data races and deadlocks. In Proceedings of OOPSLA.
[12]
Chandrasekhar Boyapati and Martin Rinard. 2001. A parameterized type system for race-free java programs. In Proceedings of OOPSLA. ACM Press, 56--69.
[13]
L. Ceze, J. Devietti, B. Lucia, and S. Qadeer. 2009. The case for system support for concurrency exceptions. In USENIX HotPar.
[14]
Luis Ceze, James Tuck, Pablo Montesinos, and Josep Torrellas. 2007. BulkSC: Bulk enforcement of sequential consistency. In Proceedings of the 34th Annual International Symposium on Computer Architecture. 278--289.
[15]
Luis Ceze, James Tuck, Josep Torrellas, and Calin Cascaval. 2006. Bulk disambiguation of speculative threads in multiprocessors. In Proceedings of the 33rd Annual International Symposium on Computer Architecture. IEEE Computer Society, 227--238.
[16]
D. Dice, Y. Lev, M. Moir, and D. Nussbaum. 2009. Early experience with a commercial hardware transactional memory implementation. In Proceedings of ASPLOS.
[17]
T. Elmas, S. Qadeer, and S. Tasiran. 2007. Goldilocks: A race and transaction-aware java runtime. In Proceedings of the 2007 Conference on Programming Language Design and Implementation. ACM, 245--255.
[18]
C. Fidge. 1991. Logical time in distributed computing systems. IEEE Comput. 24, 8 (Aug. 1991), 28--33.
[19]
C. Flanagan and S. N. Freund. 2009. FastTrack: Efficient and precise dynamic race detection. In Proceedings of the 2009 Conference on Programming Language Design and Implementation.
[20]
K. Gharachorloo and P. B. Gibbons. 1991. Detecting violations of sequential consistency. In Proceedings of the 2nd Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’90). ACM New York, NY, USA, 316--326.
[21]
K. Gharachorloo, A. Gupta, and J. Hennessy. 1991. Two techniques to enhance the performance of memory consistency models. In Proceedings of the International Conference on Parallel Processing. 355--364.
[22]
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. 1990. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 18th Annual International Symposium on Computer Architecture. 15--26.
[23]
Lance Hammond, Vicky Wong, Michael K. Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu, Honggo Wijaya, Christos Kozyrakis, and Kunle Olukotun. 2004. Transactional memory coherence and consistency. In Proceedings of the 31st Annual International Symposium on Computer Architecture. 102--113.
[24]
R. A. Haring, M. Ohmacht, T. W. Fox, M. K. Gschwind, D. L. Satterfield, K. Sugavanam, P. W. Coteus, P. Heidelberger, M. A. Blumrich, R. W. Wisniewski, A. Gara, G. L.-T. Chiu, P. A. Boyle, N. H. Chist, and Changhoan Kim. 2012. The IBM blue gene/Q compute chip. IEEE Micro 32, 2 (2012), 48--60.
[25]
Maurice Herlihy and J. Eliot B. Moss. 1993. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture. ACM, 289--300.
[26]
Intel Corporation. 2012. Intel architecture instruction set extensions programming reference. 319433-012 Edition (Feb. 2012).
[27]
A. Kamil, J. Su, and K. Yelick. 2005. Making sequential consistency practical in titanium. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing. IEEE Computer Society, 15.
[28]
A. Krishnamurthy and K. Yelick. 1996. Analyses and optimizations for shared address space programs. J. Parallel Distrib. Comput. 38, 2 (1996), 130--144.
[29]
L. Lamport. 1978. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (1978), 558--565.
[30]
L. Lamport. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. 100, 28 (1979), 690--691.
[31]
C. Lattner and V. Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization. IEEE Computer Society.
[32]
Changhui Lin, Vijay Nagarajan, Rajiv Gupta, and Bharghava Rajaram. 2012. Efficient sequential consistency via conflict ordering. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems.
[33]
Brandon Lucia, Luis Ceze, Karin Strauss, Shaz Qadeer, and Hans Boehm. 2010. Conflict exceptions: Providing simple parallel language semantics with precise hardware exceptions. In Proceedings of the 37th Annual International Symposium on Computer Architecture.
[34]
J. Manson, W. Pugh, and S. Adve. 2005. The java memory model. In Proceedings of POPL. ACM, 378--391.
[35]
D. Marino, M. Musuvathi, and S. Narayanasamy. 2009a. LiteRace: Effective sampling for lightweight data-race detection. (2009).
[36]
Daniel Marino, Abhayendra Singh, Todd Millstein, Madanlal Musuvathi, and Satish Narayanasamy. 2009b. DRFx: A Simple and Efficient Memory Model for Concurrent Programming Languages. Technical Report 090021. UCLA Computer Science Department. http://fmdb.cs.ucla.edu/Treports/090021.pdf.
[37]
Daniel Marino, Abhayendra Singh, Todd Millstein, Madanlal Musuvathi, and Satish Narayanasamy. 2010. DRFx: A simple and efficient memory model for concurrent programming languages. In PLDI’10. ACM, 351--362.
[38]
Daniel Marino, Abhayendra Singh, Todd Millstein, Madanlal Musuvathi, and Satish Narayanasamy. 2011. A case for an SC-preserving compiler. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation.
[39]
Friedemann Mattern. 1989. Virtual time and global states of distributed systems. In Proceedings Workshop on Parallel and Distributed Algorithms, Cosnard M. et al. (Ed.). North-Holland/Elsevier, 215--226. (Reprinted in: Z. Yang, T. A. Marsland (Eds.), Global States and Time in Distributed Systems, IEEE, 1994, pp. 123--133.).
[40]
Abdullah Muzahid, Shanxiang Qi, and Josep Torrellas. 2012. Vulcan: Hardware support for detecting sequential consistency violations dynamically. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’12). IEEE Computer Society, Washington, DC, USA, 363--375.
[41]
A. Muzahid, D. Suarez, S. Qi, and J. Torrellas. 2009. SigRace: Signature-based data race detection. In Proceedings of the 36th Annual International Symposium on Computer Architecture.
[42]
N. Neelakantam, C. Blundell, J. Devietti, M. Martin, and C. Zilles. 2008. FeS2: A Full-system Execution-driven Simulator for x86. In Poster at Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'08).
[43]
M. Prvulovic and J. Torrelas. 2003. ReEnact: Using thread-level speculation mechanisms to debug data races in multithreaded codes. In Proceedings of the 30th Annual International Symposium on Computer Architecture. San Diego, CA.
[44]
Xuehai Qian, Josep Torrellas, Benjamin Sahelices, and Depei Qian. 2013. Volition: Scalable and precise sequential consistency violation detection. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’13). ACM, New York, NY, 535--548.
[45]
P. Ranganathan, V. S. Pai, and S. V. Adve. 1997. Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures. 199--210.
[46]
Koushik Sen. 2008. Race directed random testing of concurrent programs. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’08). ACM, New York, NY, 11--21.
[47]
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond, and Milind Kulkarni. 2015. Hybrid static--dynamic analysis for statically bounded region serializability. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). ACM, New York, NY, 561--575.
[48]
D. Shasha and M. Snir. 1988. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst. 10, 2 (1988), 282--312.
[49]
Abhayendra Singh, Daniel Marino, Satish Narayanasamy, Todd Millstein, and Madan Musuvathi. 2011a. Efficient processor support for DRFx, a memory model with exceptions. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, 53--66.
[50]
Abhayendra Singh, Daniel Marino, Satish Narayanasamy, Todd Millstein, and Madanlal Musuvathi. 2011b. Efficient Processor Support for DRFx, a Memory Model with Exceptions. Technical Report 110002. UCLA Computer Science Department. Retrieved from http://fmdb.cs.ucla.edu/Treports/110002.pdf.
[51]
Abhayendra Singh, S. Narayanasamy, D. Marino, T. Millstein, and M. Musuvathi. 2012. End-to-end sequential consistency. In Proceedings of the 39th Annual International Symposium on Computer Architecture. 524--535.
[52]
Z. Sura, X. Fang, C. L. Wong, S. P. Midkiff, J. Lee, and D. Padua. 2005. Compiler techniques for high performance sequentially consistent java programs. In Proceedings of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2--13.
[53]
Walter Triebel, Joseph Bissell, and Rick Booth. 2001. Programming Itaniumö-based Systems. Intel Press.
[54]
Thomas F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. 2007. Mechanisms for store-wait-free multiprocessors. In Proceedings of the 34th Annual International Symposium on Computer Architecture.
[55]
M. Wolfe. 1989. More iteration space tiling. In Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing’89). ACM, New York, NY, 655--664.
[56]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. 24--36.

Cited By

View all
  • (2022)Arbitrarily Parallelizable Code: A Model of Computation Evaluated on a Message-Passing Many-Core SystemComputers10.3390/computers1111016411:11(164)Online publication date: 18-Nov-2022
  • (2021)Modular data-race-freedom guarantees in the promising semanticsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454082(867-882)Online publication date: 19-Jun-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Programming Languages and Systems
ACM Transactions on Programming Languages and Systems  Volume 38, Issue 4
October 2016
204 pages
ISSN:0164-0925
EISSN:1558-4593
DOI:10.1145/2982214
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 September 2016
Accepted: 01 April 2016
Revised: 01 February 2016
Received: 01 July 2013
Published in TOPLAS Volume 38, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DRFx
  2. Sequential consistency
  3. memory models

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)61
  • Downloads (Last 6 weeks)18
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Arbitrarily Parallelizable Code: A Model of Computation Evaluated on a Message-Passing Many-Core SystemComputers10.3390/computers1111016411:11(164)Online publication date: 18-Nov-2022
  • (2021)Modular data-race-freedom guarantees in the promising semanticsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454082(867-882)Online publication date: 19-Jun-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media