skip to main content
10.1145/3572848.3577495acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

TL4x: Buffered Durable Transactions on Disk as Fast as in Memory

Published: 21 February 2023 Publication History

Abstract

The arrival of persistent memory devices to consumer market has revived the interest in transactional durable algorithms. Persistent memory (PM) is touted as having two attributes that distinguish it from other storage technologies: byte-addressability and fast transactional persistence.
In this work we investigate how these attributes differentiate PM from block storage in the context of buffered durability. We present a novel algorithm, TL4x, capable of providing buffered durable linearizable transactions with high scalability for disjoint writes and efficient persistence on either PM or block storage devices. TL4x is a software-only user-space solution that optimizes writes to persistent storage, providing buffered durable transactions whose cost is negligible compared to similar non-durable transactions. TL4x maintains a volatile consistent snapshot which is used for buffered durability and shared with irrevocable read-only transactions, allowing long range-query operations to run in parallel with write transactions. We use TL4x to implement a transactional database engine that can outperform RocksDB by an order of magnitude.

References

[1]
Atul Adya. 1999. Weak consistency: a generalized theory and optimistic implementations for distributed transactions. Ph.D. Dissertation. Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science.
[2]
Apache. 2008. Cassandra. https://cassandra.apache.org/.
[3]
Maya Arbel-Raviv and Trevor Brown. 2018. Harnessing Epoch-Based Reclamation for Efficient Range Queries. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Vienna, Austria) (PPoPP '18). Association for Computing Machinery, New York, NY, USA, 14--27.
[4]
Hagit Attiya, Ohad Ben-Baruch, Panagiota Fatourou, Danny Hendler, and Eleftherios Kosmas. 2022. Detectable recovery of lock-free data structures. In PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2--6, 2022, Jaejin Lee, Kunal Agrawal, and Michael F. Spear (Eds.). ACM, 262--277.
[5]
Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O'Neil, and Patrick O'Neil. 1995. A critique of ANSI SQL isolation levels. ACM SIGMOD Record 24, 2 (1995), 1--10.
[6]
Jan B�ttcher, Viktor Leis, Thomas Neumann, and Alfons Kemper. 2019. Scalable Garbage Collection for In-Memory MVCC Systems. Proc. VLDB Endow. 13, 2 (2019), 128--141.
[7]
Dhruva R Chakrabarti, Hans-J Boehm, and Kumud Bhandari. 2014. Atlas: Leveraging locks for non-volatile memory consistency. ACM SIGPLAN Notices 49, 10 (2014), 433--452.
[8]
Joel Coburn, Adrian M Caulfield, Ameen Akel, Laura M Grupp, Rajesh K Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-Heaps: making persistent objects fast and safe with next-generation, nonvolatile memories. ACM Sigplan Notices 46, 3 (2011), 105--118.
[9]
Andreia Correia, Pascal Felber, and Pedro Ramalhete. 2018. Romulus: Efficient Algorithms for Persistent Transactional Memory. In Proceedings of the 30th Symposium on Parallelism in Algorithms and Architectures. ACM, 271--282.
[10]
Natacha Crooks, Youer Pu, Lorenzo Alvisi, and Allen Clement. 2017. Seeing is believing: A client-centric specification of database isolation. In Proceedings of the ACM Symposium on Principles of Distributed Computing. 73--82.
[11]
Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL server's memory-optimized OLTP engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 1243--1254.
[12]
Dave Dice, Ori Shalev, and Nir Shavit. 2006. Transactional locking II. In International Symposium on Distributed Computing. Springer, 194--208.
[13]
Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, Tony Savor, and Michael Strum. 2017. Optimizing Space Amplification in RocksDB. In CIDR, Vol. 3. 3.
[14]
Franz F�rber, Sang Kyun Cha, J�rgen Primsch, Christof Bornh�vd, Stefan Sigg, and Wolfgang Lehner. 2012. SAP HANA database: data management for modern business applications. ACM Sigmod Record 40, 4 (2012), 45--51.
[15]
Panagiota Fatourou, Nikolaos D. Kallimanis, and Eleftherios Kosmas. 2022. The performance power of software combining in persistence. In PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2--6, 2022, Jaejin Lee, Kunal Agrawal, and Michael F. Spear (Eds.). ACM, 337--352.
[16]
Michal Friedman, Naama Ben-David, Yuanhao Wei, Guy E. Blelloch, and Erez Petrank. 2020. NVTraverse: in NVRAM data structures, the destination is more important than the journey. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15--20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 377--392.
[17]
Michal Friedman, Erez Petrank, and Pedro Ramalhete. 2021. Mirror: making lock-free data structures persistent. In PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, June 20--25, 2021, Stephen N. Freund and Eran Yahav (Eds.). ACM, 1218--1232.
[18]
Ellis R. Giles, Kshitij Doshi, and Peter Varman. 2015. SoftWrAP: A lightweight framework for transactional support of storage class memory. In 2015 31st Symposium on Mass Storage Systems and Technologies (MSST). 1--14.
[19]
Jinyu Gu, Qianqian Yu, Xiayang Wang, Zhaoguo Wang, Binyu Zang, Haibing Guan, and Haibo Chen. 2019. Pisces: A Scalable and Efficient Persistent Transactional Memory. In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference (Renton, WA, USA) (USENIX ATC '19). USENIX Association, USA, 913--928.
[20]
Rachid Guerraoui and Michal Kapalka. 2008. On the correctness of transactional memory. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2008, Salt Lake City, UT, USA, February 20--23, 2008, Siddhartha Chatterjee and Michael L. Scott (Eds.). ACM, 175--184.
[21]
Shashank Gugnani, Arjun Kashyap, and Xiaoyi Lu. 2020. Understanding the Idiosyncrasies of Real Persistent Memory. Proc. VLDB Endow. 14, 4 (dec 2020), 626--639.
[22]
Swapnil Haria, Mark D. Hill, and Michael M. Swift. 2020. MOD: Minimally Ordered Durable Datastructures for Persistent Memory. In ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16--20, 2020, James R. Larus, Luis Ceze, and Karin Strauss (Eds.). ACM, 775--788.
[23]
Maurice Herlihy and Nir Shavit. 2008. The Art of Multiprocessor Programming. Newnes.
[24]
Maurice P Herlihy and Jeannette M Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS) 12, 3 (1990), 463--492.
[25]
Qingda Hu, Jinglei Ren, Anirudh Badam, Jiwu Shu, and Thomas Moscibroda. 2017. Log-Structured Non-Volatile Main Memory. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference (Santa Clara, CA, USA) (USENIX ATC '17). USENIX Association, USA, 703--717.
[26]
Intel. 2022. Intel earnings statements, for Q2 2022. https://download.intel.com/newsroom/2022/corporate/Intel-CEO-CFO-2Q22-earnings-statements.pdf.
[27]
Joseph Izraelevitz, Hammurabi Mendes, and Michael L Scott. 2016. Linearizability of persistent memory objects under a full-system-crash failure model. In International Symposium on Distributed Computing. Springer, 313--327.
[28]
Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R Dulloor, et al. 2019. Basic performance measurements of the intel optane DC persistent memory module. arXiv preprint arXiv:1903.05714 (2019).
[29]
Ana Khorguani, Thomas Ropars, and Noel De Palma. 2022. ResPCT: fast checkpointing in non-volatile memory for multi-threaded applications. In EuroSys '22: Seventeenth European Conference on Computer Systems, Rennes, France, April 5--8, 2022, Y�rom-David Bromberg, Anne-Marie Kermarrec, and Christos Kozyrakis (Eds.). ACM, 525--540.
[30]
Kioxia. 2020. XL-Flash. https://americas.kioxia.com/en-us/business/memory/xlflash.html.
[31]
Aasheesh Kolli, Steven Pelley, Ali Saidi, Peter M. Chen, and Thomas F. Wenisch. 2016. High-Performance Transactions for Persistent Memories. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (Atlanta, Georgia, USA) (ASPLOS '16). Association for Computing Machinery, New York, NY, USA, 399--411.
[32]
Cockroach Labs. 2015. CockroachDB. https://www.cockroach.labs.com/product/.
[33]
Linux. 2019. fsync() man page. http://man7.org/linux/man-pages/man2/fdatasync.2.html.
[34]
Mengxing Liu, Mingxing Zhang, Kang Chen, Xuehai Qian, Yongwei Wu, Weimin Zheng, and Jinglei Ren. 2017. DudeTM: Building durable transactions with decoupling for persistent memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 329--343.
[35]
Kevin Loney. 2004. Oracle database 10g: the complete reference. McGraw-Hill/Osborne London.
[36]
Youyou Lu, Jiwu Shu, and Long Sun. 2015. Blurred persistence in trans-actional persistent memory. In Mass Storage Systems and Technologies (MSST), 2015 31st Symposium on. IEEE, 1--13.
[37]
Amirsaman Memaripour, Anirudh Badam, Amar Phanishayee, Yanqi Zhou, Ramnatthan Alagappan, Karin Strauss, and Steven Swanson. 2017. Atomic In-Place Updates for Non-Volatile Main Memories with Kamino-Tx. In Proceedings of the Twelfth European Conference on Computer Systems (Belgrade, Serbia) (EuroSys '17). Association for Computing Machinery, New York, NY, USA, 499--512.
[38]
Amirsaman Memaripour, Joseph Izraelevitz, and Steven Swanson. 2020. Pronto: Easy and Fast Persistence for Volatile Data Structures. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 789--806.
[39]
Meta. 2017. RocksDB. http://rocksdb.org/.
[40]
C Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. 1992. ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems (TODS) 17, 1 (1992), 94--162.
[41]
MongoDB. 2012. WiredTiger. http://source.wiredtiger.com/.
[42]
MySQL Developer Zone. 2019. MySQL 5.0 Reference Manual - Redo log. https://dev.mysql.com/doc/refman/5.7/en/innodb-redo-log.html.
[43]
Faisal Nawab, Joseph Izraelevitz, Terence Kelly, Charles B. Morrey III, Dhruva R. Chakrabarti, and Michael L. Scott. 2017. Dal�: A Periodically Persistent Hash Map. In 31st International Symposium on Distributed Computing, DISC 2017, October 16--20, 2017, Vienna, Austria (LIPIcs, Vol. 91), Andr�a W. Richa (Ed.). Schloss Dagstuhl - Leibniz-Zentrum f�r Informatik, 37:1--37:16.
[44]
Jacob Nelson-Slivon, Ahmed Hassan, and Roberto Palmieri. 2022. Bundling Linked Data Structures for Linearizable Range Queries. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Seoul, Republic of Korea) (PPoPP '22). Association for Computing Machinery, New York, NY, USA, 368--384.
[45]
Ismail Oukid, Johan Lasperas, Anisoara Nica, Thomas Willhalm, and Wolfgang Lehner. 2016. FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, Fatma �zcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 371--386.
[46]
Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Ramnatthan Alagappan, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI '14, Broomfield, CO, USA, October 6--8, 2014, Jason Flinn and Hank Levy (Eds.). USENIX Association, 433--448. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/pillai
[47]
PMDK team. 2018. Persistent Memory Development Kit. https://pmem.io/pmdk/.
[48]
PostgreSQL 9.0.23 Documentation. 2019. Write-Ahead Logging (WAL). https://www.postgresql.org/docs/9.0/wal-intro.html.
[49]
Pedro Ramalhete, Andreia Correia, and Pascal Felber. 2021. Efficient algorithms for persistent transactional memory. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1--15.
[50]
Pedro Ramalhete, Andreia Correia, Pascal Felber, and Nachshon Cohen. 2019. OneFile: A Wait-Free Persistent Transactional Memory. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 151--163.
[51]
Madhava Krishnan Ramanathan, Jaeho Kim, Ajit Mathew, Xinwei Fu, Anthony Demeri, Changwoo Min, and Sudarsun Kannan. 2020. Durable Transactional Memory Can Scale with Timestone. In ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16--20, 2020, James R. Larus, Luis Ceze, and Karin Strauss (Eds.). ACM, 335--349.
[52]
Madhava Krishnan Ramanathan, Wook-Hee Kim, Xinwei Fu, Sumit Kumar Monga, Hee Won Lee, Minsung Jang, Ajit Mathew, and Changwoo Min. 2021. TIPS: Making Volatile Index Structures Persistent with DRAM-NVMM Tiering. In 2021 USENIX Annual Technical Conference, USENIX ATC 2021, July 14--16, 2021, Irina Calciu and Geoff Kuenning (Eds.). USENIX Association, 773--787. https://www.usenix.org/conference/atc21/presentation/krishnan
[53]
Anthony Rebello, Yuvraj Patel, Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2020. Can Applications Recover from fsync Failures?. In 2020 USENIX Annual Technical Conference, USENIX ATC 2020, July 15--17, 2020, Ada Gavrilovska and Erez Zadok (Eds.). USENIX Association, 753--767. https://www.usenix.org/conference/atc20/presentation/rebello
[54]
Samsung. 2017. Z-SSD. https://semiconductor.samsung.com/ssd/z-ssd/.
[55]
Alex Shamis, Matthew Renzelmann, Stanko Novakovic, Georgios Chatzopoulos, Aleksandar Dragojevic, Dushyanth Narayanan, and Miguel Castro. 2019. Fast General Distributed Transactions with Opacity. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 433--448.
[56]
Anubhav Srivastava and Trevor Brown. 2022. Elimination (a, b)-trees with fast, durable updates. In PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2--6, 2022, Jaejin Lee, Kunal Agrawal, and Michael F. Spear (Eds.). ACM, 416--430.
[57]
Shivaram Venkataraman, Niraj Tolia, Parthasarathy Ranganathan, and Roy H. Campbell. 2011. Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory. In 9th USENIX Conference on File and Storage Technologies, San Jose, CA, USA, February 15--17, 2011, Gregory R. Ganger and John Wilkes (Eds.). USENIX, 61--75. http://www.usenix.org/events/fast11/tech/techAbstracts.html#Venkataraman
[58]
Haris Volos, Andres Jaan Tack, and Michael M Swift. 2011. Mnemosyne: Lightweight persistent memory. ACM SIGARCH Computer Architecture News 39, 1 (2011), 91--104.
[59]
Qing Wang, Youyou Lu, Junru Li, and Jiwu Shu. 2021. Nap: A Black-Box Approach to NUMA-Aware Persistent Memory Indexes. In 15th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2021, July 14--16, 2021, Angela Demke Brown and Jay R. Lorch (Eds.). USENIX Association, 93--111. https://www.usenix.org/conference/osdi21/presentation/wang-qing
[60]
Yuanhao Wei, Naama Ben-David, Guy E. Blelloch, Panagiota Fatourou, Eric Ruppert, and Yihan Sun. 2021. Constant-Time Snapshots with Applications to Concurrent Data Structures. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Virtual Event, Republic of Korea) (PPoPP '21). Association for Computing Machinery, New York, NY, USA, 31--46.
[61]
Haosen Wen, Wentao Cai, Mingzhe Du, Louis Jenkins, Benjamin Valpey, and Michael L. Scott. 2021. A Fast, General System for Buffered Persistent Data Structures. In ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9--12, 2021, Xian-He Sun, Sameer Shende, Laxmikant V. Kal�, and Yong Chen (Eds.). ACM, 73:1--73:11.
[62]
Zhenwei Wu, Kai Lu, Andrew Nisbet, Wenzhe Zhang, and Mikel Luj�n. 2020. PMThreads: Persistent Memory Threads Harnessing Versioned Shadow Copies. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (London, UK) (PLDI 2020). Association for Computing Machinery, New York, NY, USA, 623--637.
[63]
Pantea Zardoshti, Tingzhe Zhou, Yujie Liu, and Michael Spear. 2019. Optimizing Persistent Memory Transactions. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). 219--231.
[64]
Lu Zhang and Steven Swanson. 2019. Pangolin: A Fault-Tolerant Persistent Memory Programming Library. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 897--912. https://www.usenix.org/conference/atc19/presentation/zhang-lu
[65]
Yoav Zuriel, Michal Friedman, Gali Sheffi, Nachshon Cohen, and Erez Petrank. 2019. Efficient lock-free durable sets. Proc. ACM Program. Lang. 3, OOPSLA (2019), 128:1--128:26.

Cited By

View all
  • (2024)Scaling Up Transactions with Slower ClocksProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638472(2-16)Online publication date: 2-Mar-2024
  • (2024)A Fully Verified Persistency LibraryVerification, Model Checking, and Abstract Interpretation10.1007/978-3-031-50521-8_2(26-47)Online publication date: 15-Jan-2024

Index Terms

  1. TL4x: Buffered Durable Transactions on Disk as Fast as in Memory

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PPoPP '23: Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming
    February 2023
    480 pages
    ISBN:9798400700156
    DOI:10.1145/3572848
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 February 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. buffered durability
    2. disk persistence
    3. persistent memory
    4. transactions

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    PPoPP '23

    Acceptance Rates

    Overall Acceptance Rate 230 of 1,014 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)110
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 16 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Scaling Up Transactions with Slower ClocksProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638472(2-16)Online publication date: 2-Mar-2024
    • (2024)A Fully Verified Persistency LibraryVerification, Model Checking, and Abstract Interpretation10.1007/978-3-031-50521-8_2(26-47)Online publication date: 15-Jan-2024

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media