skip to main content
10.1145/564691.564727acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Holistic twig joins: optimal XML pattern matching

Published: 03 June 2002 Publication History

Abstract

XML employs a tree-structured data model, and, naturally, XML queries specify patterns of selection predicates on multiple elements related by a tree structure. Finding all occurrences of such a twig pattern in an XML database is a core operation for XML query processing. Prior work has typically decomposed the twig pattern into binary structural (parent-child and ancestor-descendant) relationships, and twig matching is achieved by: (i) using structural join algorithms to match the binary relationships against the XML database, and (ii) stitching together these basic matches. A limitation of this approach for matching twig patterns is that intermediate result sizes can get large, even when the input and output sizes are more manageable.In this paper, we propose a novel holistic twig join algorithm, TwigStack, for matching an XML query twig pattern. Our technique uses a chain of linked stacks to compactly represent partial results to root-to-leaf query paths, which are then composed to obtain matches for the twig pattern. When the twig pattern uses only ancestor-descendant relationships between elements, TwigStack is I/O and CPU optimal among all sequential algorithms that read the entire input: it is linear in the sum of sizes of the input lists and the final result list, but independent of the sizes of intermediate results. We then show how to use (a modification of) B-trees, along with TwigStack, to match query twig patterns in sub-linear time. Finally, we complement our analysis with experimental results on a range of real and synthetic data, and query twig patterns.

References

[1]
S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu. Structural joins: A primitive for efficient XML query pattern matching. In Proceedings of the IEEE International Conference on Data Engineering, 2002.
[2]
S. Boag, D. Chamberlin, M. Fernandez, D. Florescu, J. Robie, J. Simeon, and M. Stefanescu XQuery 1.0: An XML Query Language. W3C Working Draft. Available from http://www.w3.org/TR/xquery, Dec. 2001.
[3]
N. Bruno, N. Koudas, D. Srivastava. Holistic Twig Joins: Optimal XML Pattern Matching. Technical Report. Columbia University. March 2002.
[4]
M. Carey, J. Kiernan, J. Shanmugasundaram, E. Shekita, and S. Subramanian. XPERANTO: Middleware for publishing object relational data as XML documents. Proceedings of VLDB, 2000.
[5]
D. D. Chamberlin, J. Robie, and D. Florescu. Quilt: An XML query language for heterogeneous data sources. In WebDB (Informal Proceedings), 2000.
[6]
M. P. Consens and T. Milo. Optimizing queries on files. In Proceedings of ACM SIGMOD, 1994.
[7]
M. P. Consens and T. Milo. Algebras for querying text regions. In Proceedings of the ACM Symposium on Principles of Database Systems, 1995.
[8]
A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. XML-QL: A query language for XML. Available from http://www.w3.org/TR/NOTE-xml-ql., 1998.
[9]
D. DeWitt, J. Naughton, and D. Schneider. An evaluation of non equijoin algorithms. Proceedings of ACM SIGMOD, 1991.
[10]
M. Fernandez and D. Suciu. SilkRoute: Trading between relations and XML. WWW9, 2000.
[11]
T. Fiebig and G. Moerkotte. Evaluating queries on structure with access support relations. Proceedings of WebDB, 2000.
[12]
D. Florescu and D. Kossman. Storing and querying XML data using an RDMBS. IEEE Data Engineering Bulletin, 22(3):27-34, 1999.
[13]
G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, Vol. 25 No. 2, June 1993.
[14]
N. Koudas and K. C. Sevcik. Size separation spatial join. Proceedings of ACM SIGMOD, 1997.
[15]
M.-L. Lo and C. V. Ravishankar. Spatial hash-joins. Proceedings of ACM SIGMOD, 1996.
[16]
J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: A database management system for semistructured data. SIGMOD Record 26(3), 1997.
[17]
J. McHugh and J. Widom. Query optimization for XML. In Proceedings of VLDB, 1999.
[18]
U. of Washington. The Tukwila system. Available from http://data.cs.washington.edu/integration/tukwila/.
[19]
U. of Wisconsin. The Niagara system. Available from http://www.cs.wisc.edu/niagara/.
[20]
J. M. Patel and D. J. DeWitt. Partition based spatial merge join. Proceedings of ACM SIGMOD, 1996.
[21]
D. Quass, J. Widom, R. Goldman, H. K, Q. Luo, J. McHugh, A. Rajaraman, H. Rivero, S. Abiteboul, J. Ullman, and J. Wiener. LORE: A lightweight object repository for semistructured data. Proceedings of ACM SIGMOD, page 549, 1996.
[22]
G. Salton and M. J. McGill. Introduction to modern information retrieval. McGraw-Hill, New York, 1983.
[23]
J. Shanmugasundaram, E. J. Shekita, R. Barr, M. J. Carey, B. G. Lindsay, H. Pirahesh, and B. Reinwald. Efficiently publishing relational data as XML documents. In Proceedings of VLDB, 2000.
[24]
J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. J. DeWitt, and J. F. Naughton. Relational databases for querying XML documents: Limitations and opportunities. In Proceedings of VLDB, 1999.
[25]
XMach-1. Available from http://dbs.uni-leipzig.de/en/projekte/XML/XmlBenchmarking.html.
[26]
The XML benchmark project. Available from http://www.xml-benchmark.org.
[27]
C. Zhang, J. Naughton, D. Dewitt, Q. Luo, and G. Lohman. On supporting containment queries in relational database management systems. In Proceedings of ACM SIGMOD, 2001.

Cited By

View all
  • (2024)Privacy-Preserving Regular Expression Matching Using TNFAComputer Security – ESORICS 202410.1007/978-3-031-70890-9_12(225-246)Online publication date: 6-Sep-2024
  • (2023)Querying Uncertain Spatiotemporal Data Based on XML Twig PatternUncertain Spatiotemporal Data Management for the Semantic Web10.4018/978-1-6684-9108-9.ch017(373-394)Online publication date: 15-Dec-2023
  • (2023)Fast Leaf-to-Root Holistic Twig Query on Spatiotemporal XML DataUncertain Spatiotemporal Data Management for the Semantic Web10.4018/978-1-6684-9108-9.ch010(183-192)Online publication date: 15-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '02: Proceedings of the 2002 ACM SIGMOD international conference on Management of data
June 2002
654 pages
ISBN:1581134975
DOI:10.1145/564691
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2002

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS02

Acceptance Rates

SIGMOD '02 Paper Acceptance Rate 42 of 240 submissions, 18%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Privacy-Preserving Regular Expression Matching Using TNFAComputer Security – ESORICS 202410.1007/978-3-031-70890-9_12(225-246)Online publication date: 6-Sep-2024
  • (2023)Querying Uncertain Spatiotemporal Data Based on XML Twig PatternUncertain Spatiotemporal Data Management for the Semantic Web10.4018/978-1-6684-9108-9.ch017(373-394)Online publication date: 15-Dec-2023
  • (2023)Fast Leaf-to-Root Holistic Twig Query on Spatiotemporal XML DataUncertain Spatiotemporal Data Management for the Semantic Web10.4018/978-1-6684-9108-9.ch010(183-192)Online publication date: 15-Dec-2023
  • (2023)Querying Spatiotemporal Data Based on XML Twig PatternUncertain Spatiotemporal Data Management for the Semantic Web10.4018/978-1-6684-9108-9.ch009(174-182)Online publication date: 15-Dec-2023
  • (2023)Integrated method for distributed processing of large XML dataCluster Computing10.1007/s10586-023-04010-027:2(1375-1399)Online publication date: 13-May-2023
  • (2022)Cross-Model Conjunctive Queries over Relation and Tree-Structured DataDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_2(21-37)Online publication date: 8-Apr-2022
  • (2021)Parallel XPath query based on cost optimizationThe Journal of Supercomputing10.1007/s11227-021-04074-y78:4(5420-5449)Online publication date: 24-Sep-2021
  • (2020)Distributed Tree-Pattern Matching in Big Data Analytics SystemsAdvances in Databases and Information Systems10.1007/978-3-030-54832-2_14(171-186)Online publication date: 17-Aug-2020
  • (2019)Paying Crowd Workers for Collaborative WorkProceedings of the ACM on Human-Computer Interaction10.1145/33592273:CSCW(1-24)Online publication date: 7-Nov-2019
  • (2019)A Forensic Qualitative Analysis of Contributions to Wikipedia from Anonymity Seeking UsersProceedings of the ACM on Human-Computer Interaction10.1145/33591553:CSCW(1-26)Online publication date: 7-Nov-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media