skip to main content
10.1145/1559845.1559910acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

ROX: run-time optimization of XQueries

Published: 29 June 2009 Publication History

Abstract

Optimization of complex XQueries combining many XPath steps and joins is currently hindered by the absence of good cardinality estimation and cost models for XQuery. Additionally, the state-of-the-art of even relational query optimization still struggles to cope with cost model estimation errors that increase with plan size, as well as with the effect of correlated joins and selections.
In this research, we propose to radically depart from the traditional path of separating the query compilation and query execution phases, by having the optimizer execute, materialize partial results, and use sampling based estimation techniques to observe the characteristics of intermediates. The proposed technique takes as input a Join Graph where the edges are either equi-joins or XPath steps, and the execution environment provides value- and structural-join algorithms, as well as structural and value-based indices.
While run-time optimization with sampling removes many of the vulnerabilities of classical optimizers, it brings its own challenges with respect to keeping resource usage under control, both with respect to the materialization of intermediates, as well as the cost of plan exploration using sampling. Our approach deals with these issues by limiting the run-time search space to so-called "zero-investment algorithms for which sampling can be guaranteed to be strictly linear in sample size. All operators and XML value indices used by ROX for sampling have the zero-investment property.
We perform extensive experimental evaluation on large XML datasets that shows that our run-time query optimizer finds good query plans in a robust fashion and has limited run-time overhead.

References

[1]
A. Aboulnaga, A. Alameldeen, and J. Naughton. Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. In VLDB, 2001.
[2]
R. Avnur and J. Hellerstein. Eddies: Continuously Adaptive Query Processing. SIGMOD, 2000.
[3]
B. Babcock and S. Chaudhuri. Towards a Robust Query Optimizer: a Principled and Practical Approach. In SIGMOD, 2005.
[4]
P. Boncz, T. Grust, M. van Keulen, S. Manegold, J. Rittinger, and J. Teubner. MonetDB/XQuery: A Fast XQuery Processor Powered by a Relational Engine. In SIGMOD, 2006. 4 http://www.w3.org/TR/rdf-sparql-query/
[5]
N. Bruno and S. Chaudhuri. Exploiting Statistics on Query Expressions for Optimization. In SIGMOD, 2002.
[6]
N. Bruno, N. Koudas, and D. Srivastava. Holistic Twig Joins: Optimal XML Pattern Matching. In SIGMOD, 2002.
[7]
S. Chaudhuri, R. Motwani, and V. Narasayya. On Random Sampling over Joins. In SIGMOD, 1999.
[8]
Z. Chen, H. V. Jagadish, F. Korn, N. Koudas, S. Muthukrishnan, R. Ng, and D. Srivastava. Counting Twig Matches in a Tree. In ICDE, 2001.
[9]
F. Chu, J. Halpern, and J. Gehrke. Least Expected Cost Query Optimization: What Can We Expect? In PODS, 2002.
[10]
R. Cole and G. Graefe. Optimization of Dynamic Query Execution Plans. In SIGMOD, 1994.
[11]
A. Deshpande, C. Guestrin, W. Hong, and S. Madden. Exploiting Correlated Attributes in Acquisitional Query Processing. In ICDE, 2005.
[12]
A. Deshpande, Z. Ives, and V. Raman. Adaptive Query Processing. Found. Trends databases, 1(1):1--140, 2007.
[13]
D. Fisher and S. Maneth. Structural Selectivity Estimation for XML Documents. In ICDE, 2007.
[14]
J. Freire, J. Haritsa, M. Ramanath, P. Roy, and J. Sim�on. StatiX: Making XML Count. In SIGMOD, 2002.
[15]
R. Goldman and J. Widom. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In VLDB, 1997.
[16]
G. Graefe and K. Ward. Dynamic Query Evaluation Plans. In SIGMOD, 1989.
[17]
T. Grust. Purely Relational FLWORs. In XIME-P, 2005
[18]
T. Grust, M. Mayr, and J. Rittinger. XQuery Join Graph Isolation. In ICDE, 2009. arXiv:0810.4809.
[19]
T. Grust, M. van Keulen, and J. Teubner. Accelerating XPath Evaluation in Any RDBMS. TODS, 29(1), 2004.
[20]
P. Haas, F. Hueske, and V. Markl. Detecting Attribute Dependencies from Query Feedback. In VLDB, 2007.
[21]
P. Haas, J. Naughton, S. Seshadri, and A. Swami. Selectivity and Cost Estimation for Joins Based on Random Sampling. J. Comput. Syst. Sci., 52(3):550--569, 1996.
[22]
J. Hidders, P. Michiels, J. Sim�on, and R. Vercammen. How To Recognize Different Kinds of Tree Patterns from Quite a Long Way Away. In PLAN-X, 2007.
[23]
Y. Ioannidis and S. Christodoulakis. On the Propagation of Errors in the Size of Join Results. In SIGMOD, 1991.
[24]
N. Kabra and D. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. In SIGMOD, 1998.
[25]
V. Markl, V. Raman, D. Simmen, G. Lohman, H. Pirahesh, and M. Cilimdzic. Robust Query Processing Through Progressive Optimization. In SIGMOD, 2004.
[26]
F. Olken and D. Rotem. Random Sampling from Databases -A Survey. Statistics and Computing, 5:25--42, 1995.
[27]
P. O'Neil, E. O'Neil, S. Pal, I. Cseri, G. Schaller, and N. Westbury. ORDPATH: Insert-Friendly XML Node Labels. In SIGMOD, 2004.
[28]
N. Polyzotis, M. Garofalakis, and Y. Ioannidis. Approximate XML Query Answers. In SIGMOD, 2004.
[29]
S. Seshradi. Probabilistic Methods in Query Processing. PhD thesis, Univ. Wisconsin, 1992.
[30]
W. Wang, H. Jiang, H. Lu, and J. Xu Yu. Bloom Histogram: Path Selectivity Estimation for XML Data with Updates. In VLDB, 2004.
[31]
Y. Wu, J. Patel, and H. V. Jagadish. Estimating Answer Sizes for XML Queries. In EDBT, 2002.

Cited By

View all
  • (2024)PilotScope: Steering Databases with Machine Learning DriversProceedings of the VLDB Endowment10.14778/3641204.364120917:5(980-993)Online publication date: 1-Jan-2024
  • (2023)Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management SystemsACM SIGMOD Record10.1145/3604437.360446052:1(104-113)Online publication date: 8-Jun-2023
  • (2023)Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality EstimationProceedings of the ACM on Management of Data10.1145/35887081:1(1-25)Online publication date: 30-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
June 2009
1168 pages
ISBN:9781605585512
DOI:10.1145/1559845
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2009

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. optimization
  2. xml
  3. xquery

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '09
Sponsor:
SIGMOD/PODS '09: International Conference on Management of Data
June 29 - July 2, 2009
Rhode Island, Providence, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)PilotScope: Steering Databases with Machine Learning DriversProceedings of the VLDB Endowment10.14778/3641204.364120917:5(980-993)Online publication date: 1-Jan-2024
  • (2023)Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management SystemsACM SIGMOD Record10.1145/3604437.360446052:1(104-113)Online publication date: 8-Jun-2023
  • (2023)Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality EstimationProceedings of the ACM on Management of Data10.1145/35887081:1(1-25)Online publication date: 30-May-2023
  • (2018)Smooth ScanThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-018-0507-827:4(521-545)Online publication date: 1-Aug-2018
  • (2018)Query optimization through the looking glass, and what we found running the Join Order BenchmarkThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-017-0480-727:5(643-668)Online publication date: 1-Oct-2018
  • (2018)ROSIE: Runtime Optimization of SPARQL Queries over RDF Using Incremental EvaluationKnowledge Science, Engineering and Management10.1007/978-3-319-99247-1_11(117-131)Online publication date: 11-Aug-2018
  • (2016)On the Use of Abstract Models for RDF/S ProvenanceLinked Data Management10.1201/b16859-26(419-440)Online publication date: 19-Apr-2016
  • (2015)How good are query optimizers, really?Proceedings of the VLDB Endowment10.14778/2850583.28505949:3(204-215)Online publication date: 1-Nov-2015
  • (2015)VXQInformation Systems Frontiers10.1007/s10796-013-9480-317:4(961-981)Online publication date: 1-Aug-2015
  • (2012)The database architectures research group at CWIACM SIGMOD Record10.1145/2094114.209412440:4(39-44)Online publication date: 11-Jan-2012
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media