skip to main content
research-article
Open access

Dynamic Deadlock Verification for General Barrier Synchronisation

Published: 11 December 2018 Publication History

Abstract

We present Armus, a verification tool for dynamically detecting or avoiding barrier deadlocks. The core design of Armus is based on phasers, a generalisation of barriers that supports split-phase synchronisation, dynamic membership, and optional-waits. This allows Armus to handle the key barrier synchronisation patterns found in modern languages and libraries. We implement Armus for X10 and Java, giving the first sound and complete barrier deadlock verification tools in these settings.
Armus introduces a novel event-based graph model of barrier concurrency constraints that distinguishes task-event and event-task dependencies. Decoupling these two kinds of dependencies facilitates the verification of distributed barriers with dynamic membership, a challenging feature of X10. Further, our base graph representation can be dynamically switched between a task-to-task model, Wait-for Graph (WFG), and an event-to-event model, State Graph (SG), to improve the scalability of the analysis.
Formally, we show that the verification is sound and complete with respect to the occurrence of deadlock in our core phaser language, and that switching graph representations preserves the soundness and completeness properties. These results are machine checked with the Coq proof assistant. Practically, we evaluate the runtime overhead of our implementations using three benchmark suites in local and distributed scenarios. Regarding deadlock detection, distributed scenarios show negligible overheads and local scenarios show overheads below�1.15�. Deadlock avoidance is more demanding, and highlights the potential gains from dynamic graph selection. In one benchmark scenario, the runtime overheads vary from 1.8� for dynamic selection, 2.6� for SG-static selection, and 5.9� for WFG-static selection.

References

[1]
Shivali Agarwal, Rajkishore Barik, Vivek Sarkar, and Rudrapatna K. Shyamasundar. 2007. May-happen-in-parallel analysis of X10 programs. In PPoPP. ACM, 183--193.
[2]
Daniel Atkins, Alex Potanin, and Lindsay Groves. 2013. The design and implementation of clocked variables in X10. In ACSC (CRPIT), Vol. 135. ACS, 87--95. http://crpit.com/abstracts/CRPITV135Atkins.html.
[3]
David A. Bader and Kamesh Madduri. 2005. Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. In HiPC. Lecture Notes in Computer Science, Vol. 3769. Springer, 465--476.
[4]
J�rgen Bang-Jensen and Gregory Z. Gutin. 2009. Digraphs: Theory, Algorithms and Applications (2nd ed.). Springer.
[5]
Ferenc Belik. 1990. An efficient deadlock avoidance technique. Transactions on Computers 39 (1990), 882--888.
[6]
G�rard Boudol. 2009. A deadlock-free semantics for shared memory concurrency. In ICTAC. Lecture Notes in Computer Science, Vol. 5684. Springer, 140--154.
[7]
Yan Cai and Wing-Kwong Chan. 2014. Magiclock: Scalable detection of potential deadlocks in large-scale multithreaded programs. Transactions on Software Engineering 40, 3 (2014), 266--281.
[8]
Vincent Cav�, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2011. Habanero-Java: The new adventures of old X10. In PPPJ. ACM, 51--61.
[9]
Soumen Chakrabarti, Manish Gupta, and Jong-Deok Choi. 1996. Global communication analysis and optimization. ACM SIGPLAN Notices (1996), 68--78.
[10]
Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An object-oriented approach to non-uniform cluster computing. In OOPSLA. ACM, 519--538.
[11]
Sung-Eun Choi and Lawrence Snyder. 1997. Quantifying the effects of communication optimizations. In ICPP. IEEE, 218--222.
[12]
Edward G. Coffman, Jr., M. J. Elphick, and Arie Shoshani. 1971. System deadlocks. Computing Surveys 3, 2 (1971), 67--78.
[13]
Tiago Cogumbreiro, Raymond Hu, Francisco Martins, and Nobuko Yoshida. 2015. Dynamic deadlock verification for general barrier synchronisation. In PPoPP. ACM, 150--160.
[14]
Tiago Cogumbreiro, Francisco Martins, and Vasco Thudichum Vasconcelos. 2013. Coordinating phased activities while maintaining progress. In COORDINATION, Lecture Notes in Computer Science, Vol. 7890. Springer, 31--44.
[15]
Tiago Cogumbreiro, Jun Shirako, and Vivek Sarkar. 2017. Formalization of Habanero phasers using Coq. Journal of Logical and Algebraic Methods in Programming 90 (2017), 50--60.
[16]
Tiago Cogumbreiro, Rishi Surendran, Francisco Martins, Vivek Sarkar, Vasco T. Vasconcelos, and Max Grossman. 2017. Deadlock avoidance in parallel programs with futures: Why parallel tasks should not wait for strangers. Proceedings of the ACM on Programming Languages 1, OOPSLA, Article 103 (2017), 26 pages.
[17]
Don Coppersmith and Shmuel Winograd. 1990. Matrix multiplication via arithmetic progressions. Symbolic Computation 9, 3 (1990), 251--280.
[18]
Silvia Crafa, David Cunningham, Vijay Saraswat, Avraham Shinnar, and Olivier Tardieu. 2014. Semantics of (Resilient) X10. In ECOOP, Lecture Notes in Computer Science, Vol. 8586. Springer, 670--696.
[19]
Steve Deitz. 2006. Parallel Programming in Chapel. Retrieved January 2018 from https://www.cct.lsu.edu/∼estrabd/LACSI2006/Programming%20Models/deitz.pdf. Presented at LACSI.
[20]
Camil Demetrescu and Giuseppe F. Italiano. 2005. Trade-offs for fully dynamic transitive closure on DAGs: Breaking through the O(n<sup>2</sup>) barrier. Journal of the ACM 52, 2 (2005), 147--156.
[21]
Jyotirmoy V. Deshmukh, E. Allen Emerson, and Sriram Sankaranarayanan. 2011. Symbolic modular deadlock analysis. Automated Software Engineering 18, 3--4 (2011), 325--362.
[22]
Edsger W. Dijkstra. 1965. Cooperating Sequential Processes. Technical Report. Technical University of Eindhoven. https://www.cs.utexas.edu/users/EWD/transcriptions/EWD01xx/EWD123.html EWD-123.
[23]
Mahdi Eslamimehr and Jens Palsberg. 2014. Sherlock: Scalable deadlock detection for concurrent programs. In FSE. ACM, 353--365.
[24]
Michael A. Frumkin, Matthew Schultz, Haoqiang Jin, and Jerry Yan. 2003. Performance and scalability of the NAS parallel benchmarks in Java. In IPDPS. IEEE.
[25]
Zeinab Ganjei, Ahmed Rezine, Petru Eles, and Zebo Peng. 2017. Safety verification of phaser programs. In FMCAD. IEEE, 68--75.
[26]
Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically rigorous Java performance evaluation. In OOPSLA. ACM, 57--76.
[27]
Prodromos Gerakios, Nikolaos Papaspyrou, Konstantinos Sagonas, and Panagiotis Vekris. 2011. Dynamic deadlock avoidance in systems code using statically inferred effects. In PLOS. ACM, 1--5.
[28]
Milos Gligoric, Peter C. Mehlitz, and Darko Marinov. 2012. X10X: Model checking a new programming language with an “old” model checker. In ICST. IEEE, 11--20.
[29]
Rajiv Gupta. 1989. The fuzzy barrier: A mechanism for high speed synchronization of processors. SIGARCH Computer Architecture News 17, 2 (1989), 54--63.
[30]
Tobias Hilbrich, Bronis R. de Supinski, Fabian H�nsel, Matthias S. M�ller, Martin Schulz, and Wolfgang E. Nagel. 2013. Runtime MPI collective checking with tree-based overlay networks. In EuroMPI. ACM, 129--134.
[31]
Tobias Hilbrich, Bronis R. de Supinski, Wolfgang E. Nagel, Joachim Protze, Christel Baier, and Matthias S. M�ller. 2013. Distributed wait state tracking for runtime MPI deadlock detection. In SC. ACM, 1--12.
[32]
Tobias Hilbrich, Bronis R. de Supinski, Martin Schulz, and Matthias S. M�ller. 2009. A graph based approach for MPI deadlock detection. In ICS. ACM, 296--305.
[33]
Tobias Hilbrich, Matthias S. M�ller, Martin Schulz, and Bronis R. de Supinski. 2011. Order preserving event aggregation in TBONs. In EuroMPI, Lecture Notes in Computer Science, Vol. 6960. Springer, 19--28.
[34]
Tobias Hilbrich, Joachim Protze, Martin Schulz, Bronis R. de Supinski, and Matthias S. M�ller. 2012. MPI runtime error detection with MUST: Advances in deadlock detection. In SC. IEEE, 1--11.
[35]
Richard C. Holt. 1972. Some deadlock properties of computer systems. Computing Surveys 4, 3 (1972), 179--196.
[36]
Shams Mahmood Imam and Vivek Sarkar. 2014. Cooperative scheduling of parallel tasks with general synchronization patterns. In ECOOP, Lecture Notes in Computer Science, Vol. 8586. Springer, 618--643.
[37]
Kamal Jain, MohammadTaghi Hajiaghayi, and Kunal Talwar. 2005. The generalized deadlock resolution problem. In ICALP, Lecture Notes in Computer Science, Vol. 3580. Springer, 853--865.
[38]
Inbum Jung, Jongwoong Hyun, Joonwon Lee, and Joongsoo Ma. 2001. Two-phase barrier: A synchronization primitive for improving the processor utilization. International Journal of Parallel Programming 29, 6 (2001), 607--627.
[39]
Amir Kamil and Katherine Yelick. 2009. Enforcing textual alignment of collectives using dynamic checks. In LCPC. Lecture Notes in Computer Science, Vol. 5898. Springer, 368--382.
[40]
Edgar Knapp. 1987. Deadlock detection in distributed databases. Computing Survey 19, 4 (1987), 303--328.
[41]
Leslie Lamport. 1978. Time, clocks, and the ordering of events in a distributed system. Commuications of the ACM 21, 7 (1978), 558--565.
[42]
Duy-Khanh Le, Wei-Ngan Chin, and Yong-Meng Teo. 2013. Verification of static and dynamic barrier synchronization using bounded permissions. In ICFEM, Lecture Notes in Computer Science, Vol. 8144. Springer, 231--248.
[43]
Jonathan K. Lee and Jens Palsberg. 2010. Featherweight X10: A core calculus for async-finish parallelism. In PPoPP. ACM, 25--36.
[44]
Daan Leijen, Wolfram Schulte, and Sebastian Burckhardt. 2009. The design of a task parallel library. In OOPSLA. ACM, 227--242.
[45]
Peng Li, Kunal Agrawal, Jeremy Buhler, and Roger D. Chamberlain. 2010. Deadlock avoidance for streaming computations with filtering. In SPAA. ACM, 243--252.
[46]
Piotr R. Luszczek, David H. Bailey, Jack J. Dongarra, Jeremy Kepner, Robert F. Lucas, Rolf Rabenseifner, and Daisuke Takahashi. 2006. The HPC challenge (HPCC) benchmark suite. In SC. ACM.
[47]
Toshimi Minoura. 1982. Deadlock avoidance revisited. Journal of the ACM 29, 4 (1982), 1023--1048.
[48]
Ian Munro. 1971. Efficient determination of the transitive closure of a directed graph. Information Processing Letters 1, 2 (1971), 56--58.
[49]
Karthik Murthy, Sri Raj Paul, Kuldeep S. Meel, Tiago Cogumbreiro, and John M. Mellor-Crummey. 2016. Design and verification of distributed phasers. In EuroPAR. Lecture Notes in Computer Science, Vol. 9833. Springer, 405--418.
[50]
Armand Navabi, Xiangyu Zhang, and Suresh Jagannathan. 2008. Quasi-static scheduling for safe futures. In PPoPP. ACM, 23--32.
[51]
Yarden Nir-Buchbinder, Rachel Tzoref, and Shmuel Ur. 2008. Deadlocks: From exhibiting to healing. Lecture Notes in Computer Science, Vol. 5289. Springer, 104--118.
[52]
Yusuke Nonaka, Kazuo Ushijima, Hibiki Serizawa, Shigeru Murata, and Jingde Cheng. 2001. A run-time deadlock detector for concurrent Java programs. In APSEC. IEEE, 45--52.
[53]
Matthew T. O’Keefe and Henry G. Dietz. 1990. Hardware barrier synchronization: Dynamic barrier MIMD (DBM). In ICPP. Pennsylvania State University, 43--46.
[54]
Antoniu Pop and Albert Cohen. 2013. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. Transactions on Architecture and Code Optimization 9, 4 (2013), Article 53, 25 pages.
[55]
Hari K. Pyla and Srinidhi Varadarajan. 2010. Avoiding deadlock avoidance. In PACT. ACM, 75--86.
[56]
Spiridon A. Reveliotis, Mark A. Lawley, and Placid M. Ferreira. 1997. Polynomial-complexity deadlock avoidance policies for sequential resource allocation systems. Transactions on Automatic Control 42, 10 (1997), 1344--1357.
[57]
Indranil Roy, Glenn R. Luecke, James Coyle, and Marina Kraeva. 2013. A scalable deadlock detection algorithm for UPC collective operations. In PGAS. University of Edinburgh, 2--15. http://www.pgas2013.org.uk/sites/default/files/pgas2013proceedings.pdf.
[58]
Malavika Samak and Murali Krishna Ramanathan. 2014. Trace driven dynamic deadlock detection and reproduction. In PPoPP. ACM, 29--42.
[59]
Vijay Saraswat and Radha Jagadeesan. 2005. Concurrent clustered programming. In CONCUR. Lecture Notes in Computer Science, Vol. 3653. Springer, 353--367.
[60]
Rahul Sharma, Michael Bauer, and Alex Aiken. 2015. Verification of producer-consumer synchronization in GPU programs. In PLDI. ACM, 88--98.
[61]
Chia Shih and John A. Stankovic. 1990. Survey of Deadlock Detection in Distributed Concurrent Programming Environments and Its Application to Real-Time Systems. Technical Report. University of Massachusetts. https://web.cs.umass.edu/publication/details.php?id&equals;447 UM-CS-1990-069.
[62]
Jun Shirako, David M. Peixotto, Vivek Sarkar, and William N. Scherer. 2008. Phasers: A unified deadlock-free construct for collective and point-to-point synchronization. In ICS. ACM, 277--288.
[63]
Jun Shirako, David M. Peixotto, Vivek Sarkar, and William N. Scherer. 2009. Phaser accumulators: A new reduction construct for dynamic parallelism. In IPDPS. IEEE, 1--12.
[64]
Jun Shirako, David M. Peixotto, Dragoş-Dumitru Sbîrlea, and Vivek Sarkar. 2011. Phaser beams: Integrating stream parallelism with task parallelism. Presented at the X10 Workshop.
[65]
Lorna A. Smith, J. Mark Bull, and Jan Obdrz�lek. 2001. A parallel Java Grande benchmark suite. In SC. ACM, 10.
[66]
Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM Journal on Computing 1, 2 (1972), 146--160.
[67]
Franklyn Turbak. 1996. First-class synchronization barriers. In ICFP. ACM, 157--168.
[68]
Nalini Vasudevan, Olivier Tardieu, Julian Dolby, and Stephen A. Edwards. 2009. Compile-time analysis and specialization of clocks in concurrent programs. In CC. Lecture Notes in Computer Science, Vol. 5501. Springer, 48--62.
[69]
Anh Vo. 2011. Scalable Formal Dynamic Verification of MPI Programs Through Distributed Causality Tracking. Ph.D. dissertation. University of Utah. Advisor(s) Gopalakrishnan, Ganesh. AAI3454168.
[70]
Yin Wang, Terence Kelly, Manjunath Kudlur, St�phane Lafortune, and Scott Mahlke. 2008. Gadara: Dynamic deadlock avoidance for multithreaded programs. In OSDI. USENIX, 281--294. https://www.usenix.org/conference/osdi-08/gadara-dynamic-deadlock-avoidance-multithreaded-programs.
[71]
Haitao Wei, Hong Tan, Xiaoxian Liu, and Junqing Yu. 2012. StreamX10: A stream programming framework on X10. In X10. ACM, 1--6.
[72]
Adam Welc, Suresh Jagannathan, and Antony Hosking. 2005. Safe futures for Java. In OOPSLA. ACM, 439--453.
[73]
Yuan Zhang, Evelyn Duesterwald, and Guang R. Gao. 2008. Concurrency analysis for shared memory programs with textually unaligned barriers. In LCPC. Lecture Notes in Computer Science, Vol. 5234. Springer, 95--109.
[74]
Yingchun Zhu and Laurie J. Hendren. 1998. Communication optimizations for parallel C programs. In PLDI. ACM, 199--211.

Cited By

View all
  • (2024)Pipelines and Beyond: Graph Types for ADTs with FuturesProceedings of the ACM on Programming Languages10.1145/36328598:POPL(482-511)Online publication date: 5-Jan-2024
  • (2023)Static Prediction of Parallel Computation Graphs (Abstract)Proceedings of the 2023 ACM Workshop on Highlights of Parallel Computing10.1145/3597635.3598026(21-22)Online publication date: 18-Jul-2023
  • (2022)Static prediction of parallel computation graphsProceedings of the ACM on Programming Languages10.1145/34987086:POPL(1-31)Online publication date: 12-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Programming Languages and Systems
ACM Transactions on Programming Languages and Systems  Volume 41, Issue 1
March 2019
235 pages
ISSN:0164-0925
EISSN:1558-4593
DOI:10.1145/3299867
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 December 2018
Accepted: 01 May 2018
Revised: 01 January 2018
Received: 01 March 2017
Published in TOPLAS Volume 41, Issue 1

Check for updates

Author Tags

  1. Barrier synchronisation
  2. Java
  3. X10
  4. deadlock avoidance
  5. deadlock detection
  6. phasers

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)101
  • Downloads (Last 6 weeks)18
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Pipelines and Beyond: Graph Types for ADTs with FuturesProceedings of the ACM on Programming Languages10.1145/36328598:POPL(482-511)Online publication date: 5-Jan-2024
  • (2023)Static Prediction of Parallel Computation Graphs (Abstract)Proceedings of the 2023 ACM Workshop on Highlights of Parallel Computing10.1145/3597635.3598026(21-22)Online publication date: 18-Jul-2023
  • (2022)Static prediction of parallel computation graphsProceedings of the ACM on Programming Languages10.1145/34987086:POPL(1-31)Online publication date: 12-Jan-2022
  • (2021)An ownership policy and deadlock detector for promisesProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441616(348-361)Online publication date: 17-Feb-2021
  • (2020)Low-overhead deadlock predictionProceedings of the ACM/IEEE 42nd International Conference on Software Engineering10.1145/3377811.3380367(1298-1309)Online publication date: 27-Jun-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media