skip to main content
10.5555/1182635.1164155acmconferencesArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
Article

Putting context into schema matching

Published: 01 September 2006 Publication History

Abstract

Attribute-level schema matching has proven to be an important first step in developing mappings for data exchange, integration, restructuring and schema evolution. In this paper we investigate contextual schema matching, in which selection conditions are associated with matches by the schema matching process in order to improve overall match quality. We define a general space of matching techniques, and within this framework we identify a variety of novel, concrete algorithms for contextual schema matching. Furthermore, we show how common schema mapping techniques can be generalized to take more effective advantage of contextual matches, enabling automatic construction of mappings across certain forms of schema heterogeneity. An experimental study examines a wide variety of quality and performance issues. In addition, it demonstrates that contextual schema matching is an effective and practical technique to further automate the definition of complex data transformations.

References

[1]
{1} S. Abiteboul and R. Hull. Restructuring hierarchical database objects. TCS, 62(1-2), 1988.
[2]
{2} S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
[3]
{3} C. Aggarwal and P. Yu. Finding generalized projected clusters in high dimensional spaces. In SIGMOD, 2000.
[4]
{4} V. Athitsos, M. Hadjieleftheriou, G. Kollios, and S. Sclaroff. Query-sensitive embeddings. In SIGMOD, 2005.
[5]
{5} D. Aumueller, H.-H. Do, S. Massmann, and E. Rahm. Schema and ontology matching with COMA++. In SIGMOD, 2005.
[6]
{6} S. Castano, V. D. Antonellis, and S. D. C. di Vimercati. Global viewing of heterogeneous data sources. TKDE, 13(2):277-297, 2001.
[7]
{7} A. Deutsch, L. Popa, and V. Tannen. Physical data independence, constraints, and optimization with universal plans. In VLDB, 1999.
[8]
{8} R. Dhamankar, Y. Lee, A. Doan, A. Halevy, and P. Domingos. iMAP: discovering complex semantic matches between database schemas. In SIGMOD, 2004.
[9]
{9} H. Do and E. Rahm. COMA - a system for flexible combination of schema matching approaches. In VLDB, 2002.
[10]
{10} A. Doan. Illinois semantic integration archive.
[11]
{11} A. Doan, P. Domingos, and A. Y. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In SIGMOD, 2001.
[12]
{12} W. Fan and L. Libkin. On XML integrity constraints in the presence of DTDs. Journal of the ACM, 49(3):368-406, May 2002.
[13]
{13} G. H. L. Fletcher and C. M. Wyss. Relational data mapping in MIQIS (demo). In SIGMOD, 2005.
[14]
{14} L. Haas, M. Hern�ndez, H. Ho, L. Popa, and M. Roth. Clio grows up: from research prototype to industrial tool. In SIGMOD, 2005.
[15]
{15} Q. He and T. W. Ling. Extending and inferring functional dependencies in schema transformation. In CIKM, 2004.
[16]
{16} J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD, 2003.
[17]
{17} L. Lakshmanan, F. Sadri, and I. N. Subramanian. SchemaSQL - a language for interoperability in relational multi-database systems. In VLDB, 1996.
[18]
{18} D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In ACM Int'l Conf. on Research and Development in Information Retrieval, 1994.
[19]
{19} W.-S. Li and C. Clifton. SemInt: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl Eng., 33(1):49-84, 2000.
[20]
{20} J. Madhavan, P. A. Bernstein, A. Doan, and A. Halevy. Corpus-based schema matching. In ICDE, 2005.
[21]
{21} J. Madhavan, P. A. Bernstein, and E. Rahm. Generic schema matching with Cupid. In VLDB, 2001.
[22]
{22} R. McCann, B. AlShebli, Q. Le, H. Nguyen, L. Vu, and A. Doan. Mapping maintenance for data integration systems. In VLDB, 2005.
[23]
{23} S. Melnik, E. Rahm, and P. A. Bernstein. Rondo: A programming platform for generic model management. In SIGMOD, 2003.
[24]
{24} R. J. Miller, L. M. Haas, and M. A. Hern�ndez. Schema mapping as query discovery. In VLDB, 2000.
[25]
{25} R. J. Miller, M. A. Hern�ndez, L. M. Haas, L.-L. Yan, C. T. H. Ho, R. Fagin, and L. Popa. The Clio project: Managing heterogeneity. SIGMOD Record, 30(1):7883, 2001.
[26]
{26} T. Milo and S. Zohar. Using schema matching to simplify heterogeneous data translation. In VLDB, 1998.
[27]
{27} L. Palopoli, D. Sacca, and D. Ursino. Semi-automatic semantic discovery of properties from database schemas. In IDEAS, 1998.
[28]
{28} L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hern�ndez, and R. Fagin. Translating Web data. In VLDB, 2002.
[29]
{29} E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB Journal, 2001.
[30]
{30} F. Sebastiani. Machine learning in automated text categorization. ACM Comput. Surv., 34(1):1-47, 2002.
[31]
{31} Y. Velegrakis, R. J. Miller, and L. Popa. Mapping adaptation under evolving schemas. In VLDB, 2003.
[32]
{32} U. Washington. Schema matching samples. http://www.cs.washington.edu/homes/jayant/corpus.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
VLDB '06: Proceedings of the 32nd international conference on Very large data bases
September 2006
1269 pages

Sponsors

  • SIGMOD: ACM Special Interest Group on Management of Data
  • K.I.S.S. SIG on Databases
  • AJU Information Technology Co., Ltd
  • US Army ITC-PAC Asian Research Office
  • Google Inc.
  • The Database Society of Japan
  • Samsung SOS
  • Advanced Information Technology Research Center
  • Naver
  • Microsoft: Microsoft
  • Korea Info Sci Society: Korea Information Science Society
  • SK telecom
  • Systems Applications Products
  • ORACLE: ORACLE
  • International Business Management
  • Air Force Office of Scientific Research/Asian Office of Aerospace R&D
  • Kosef
  • Kaist
  • LG Electronics
  • CCF-DBS

Publisher

VLDB Endowment

Publication History

Published: 01 September 2006

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Human-in-the-loop data integrationProceedings of the VLDB Endowment10.14778/3137765.313783310:12(2006-2017)Online publication date: 1-Aug-2017
  • (2014)Query Rewriting and Optimization for Ontological DatabasesACM Transactions on Database Systems10.1145/263854639:3(1-46)Online publication date: 7-Oct-2014
  • (2012)Appearance-Order-Based schema matchingProceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I10.1007/978-3-642-29038-1_8(79-94)Online publication date: 15-Apr-2012
  • (2011)Discovering implicit categorical semantics for schema matchingProceedings of the 16th international conference on Database systems for advanced applications: Part II10.5555/1997251.1997269(179-194)Online publication date: 22-Apr-2011
  • (2011)Leveraging query logs for schema mapping generation in U-MAPProceedings of the 2011 ACM SIGMOD International Conference on Management of data10.1145/1989323.1989337(121-132)Online publication date: 12-Jun-2011
  • (2011)Polymorphic queries for P2P systemsInformation Systems10.1016/j.is.2011.01.00136:5(825-842)Online publication date: 1-Jul-2011
  • (2010)A context-based schema integration process applied to healthcare data sourcesProceedings of the 2010 international conference on On the move to meaningful internet systems10.5555/1948509.1948544(100-109)Online publication date: 25-Oct-2010
  • (2010)Contextual factors in database integrationProceedings of the 29th international conference on Conceptual modeling10.5555/1929757.1929784(274-287)Online publication date: 1-Nov-2010
  • (2010)Synthesizing view definitions from dataProceedings of the 13th International Conference on Database Theory10.1145/1804669.1804683(89-103)Online publication date: 23-Mar-2010
  • (2010)Schema mapping and query translation in heterogeneous P2P XML databasesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-009-0159-919:2(231-256)Online publication date: 1-Apr-2010
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media