skip to main content
10.1145/2213836.2213846acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Sample-driven schema mapping

Published: 20 May 2012 Publication History

Abstract

End-users increasingly find the need to perform light-weight, customized schema mapping. State-of-the-art tools provide powerful functions to generate schema mappings, but they usually require an in-depth understanding of the semantics of multiple schemas and their correspondences, and are thus not suitable for users who are technically unsophisticated or when a large number of mappings must be performed.
We propose a system for sample-driven schema mapping. It automatically constructs schema mappings, in real time, from user-input sample target instances. Because the user does not have to provide any explicit attribute-level match information, she is isolated from the possibly complex structure and semantics of both the source schemas and the mappings. In addition, the user never has to master any operations specific to schema mappings: she simply types data values into a spreadsheet-style interface. As a result, the user can construct mappings with a much lower cognitive burden.
In this paper we present Mweaver, a prototype sample-driven schema mapping system. It employs novel algorithms that enable the system to obtain desired mapping results while meeting interactive response performance requirements. We show the results of a user study that compares Mweaver with two state-of-the-art mapping tools across several mapping tasks, both real and synthetic. These suggest that the Mweaver system enables users to perform practical mapping tasks in about 1/5th the time needed by the state-of-the-art tools.

References

[1]
Altova mapforce. http://www.altova.com/mapforce.html.
[2]
Microsoft biztalk server. http://www.microsoft.com/biztalk/en/us/.
[3]
Stylus studio. http://www.stylusstudio.com/.
[4]
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, page 5, 2002.
[5]
B. Alexe, L. Chiticariu, R. Miller, and W. Tan. Muse: Mapping understanding and design by example. In ICDE, pages 10--19, 2008.
[6]
B. Alexe, L. Chiticariu, and W. Tan. SPIDER: a schema mapPIng DEbuggeR. In VLDB, pages 1179--1182, 2006.
[7]
B. Alexe, P. Kolaitis, and W. Tan. Characterizing schema mappings via data examples. In SIGMOD, pages 261--272, 2010.
[8]
B. Alexe, B. ten Cate, P. Kolaitis, and W. Tan. Designing and refining schema mappings via data examples. In SIGMOD, page 133, 2011.
[9]
P. Barcel�. Logical foundations of relational data exchange. SIGMOD Rec., 38:49--58, June 2009.
[10]
Z. Bellahense, A. Bonifati, and E. Rahm, editors. Schema Matching and Mapping. Springer, 2011.
[11]
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, S. Sudarshan, and I. Bombay. Keyword searching and browsing in databases using BANKS. In ICDE, page 431, 2002.
[12]
M. Cafarella, A. Halevy, and N. Khoussainova. Data integration for the relational web. VLDB, 2(1):1090--1101, 2009.
[13]
H. Do and E. Rahm. COMA: a system for flexible combination of schema matching approaches. In VLDB, pages 610--621, 2002.
[14]
A. Doan, P. Domingos, and A. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In SIGMOD, pages 509--520, 2001.
[15]
C. Drumm, M. Schmitt, H. Do, and E. Rahm. Quickmig: automatic schema matching for data migration projects. In CIKM, pages 107--116, 2007.
[16]
H. Elmeleegy, M. Ouzzani, and A. Elmagarmid. Usage-based schema matching. In ICDE, pages 20--29, 2008.
[17]
V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, page 681, 2002.
[18]
H. V. Jagadish, A. Chapman, A. Elkiss, M. Jayapandian, Y. Li, A. Nandi, and C. Yu. Making database systems usable. SIGMOD, pages 13--24, 2007.
[19]
V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, pages 505--516, 2005.
[20]
J. Kang and J. Naughton. On schema matching with opaque column names and data values. In SIGMOD, pages 205--216, 2003.
[21]
P. Kolaitis. Schema mappings, data exchange, and metadata management. In PODS, pages 61--75, 2005.
[22]
M. Lenzerini. Data integration: A theoretical perspective. In PODS, pages 233--246, 2002.
[23]
J. Madhavan, P. Bernstein, A. Doan, and A. Halevy. Corpus-based schema matching. In ICDE, pages 57--68, 2005.
[24]
J. Madhavan, P. Bernstein, and E. Rahm. Generic schema matching with cupid. In VLDB, pages 49--58, 2001.
[25]
S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In ICDE, pages 117--128, 2002.
[26]
A. Nandi and P. Bernstein. HAMSTER: using search clicklogs for schema and taxonomy matching. VLDB, 2(1):181--192, 2009.
[27]
L. Popa, Y. Velegrakis, M. Hern�ndez, R. Miller, and R. Fagin. Translating web data. In VLDB, pages 598--609, 2002.
[28]
P. Talukdar, Z. Ives, and F. Pereira. Automatically incorporating new sources in keyword search-based data integration. In SIGMOD, pages 387--398, 2010.
[29]
P. Talukdar, M. Jacob, M. Mehmood, K. Crammer, Z. Ives, F. Pereira, and S. Guha. Learning to create data-integrating queries. VLDB, 1(1):785--796, 2008.
[30]
L. Yan, R. Miller, L. Haas, and R. Fagin. Data-driven understanding and refinement of schema mappings. In SIGMOD, page 485, 2001.
[31]
M. Zloof. Query by example. In Proceedings of the May 19-22, 1975, national computer conference and exposition, pages 431--438, 1975.

Cited By

View all
  • (2024)Data distribution tailoring revisited: cost-efficient integration of representative dataThe VLDB Journal10.1007/s00778-024-00849-w33:5(1283-1306)Online publication date: 12-Apr-2024
  • (2023)GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by ExampleProceedings of the ACM on Management of Data10.1145/35892651:2(1-26)Online publication date: 20-Jun-2023
  • (2023)Ver: View Discovery in the Wild2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00045(503-516)Online publication date: Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
May 2012
886 pages
ISBN:9781450312479
DOI:10.1145/2213836
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 May 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data integration
  2. sample-driven
  3. schema mapping
  4. usability

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '12
Sponsor:

Acceptance Rates

SIGMOD '12 Paper Acceptance Rate 48 of 289 submissions, 17%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)4
Reflects downloads up to 21 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Data distribution tailoring revisited: cost-efficient integration of representative dataThe VLDB Journal10.1007/s00778-024-00849-w33:5(1283-1306)Online publication date: 12-Apr-2024
  • (2023)GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by ExampleProceedings of the ACM on Management of Data10.1145/35892651:2(1-26)Online publication date: 20-Jun-2023
  • (2023)Ver: View Discovery in the Wild2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00045(503-516)Online publication date: Apr-2023
  • (2022)Towards distribution-aware query answering in data marketsProceedings of the VLDB Endowment10.14778/3551793.355185815:11(3137-3144)Online publication date: 29-Sep-2022
  • (2022)Data Transformation from Hierarchical Model to Relational Model Based on Example ProgrammingHans Journal of Data Mining10.12677/HJDM.2022.12403212:04(334-350)Online publication date: 2022
  • (2021)Tailoring data source distributions for fairness-aware data integrationProceedings of the VLDB Endowment10.14778/3476249.347629914:11(2519-2532)Online publication date: 27-Oct-2021
  • (2021)NEMA: Automatic Integration of Large Network Management DatabasesIEEE Transactions on Network and Service Management10.1109/TNSM.2020.303641418:3(3783-3797)Online publication date: Sep-2021
  • (2021)FiLiPo: A Sample Driven Approach for Finding Linkage Points Between RDF Data and APIsAdvances in Databases and Information Systems10.1007/978-3-030-82472-3_18(244-259)Online publication date: 16-Aug-2021
  • (2021)Towards Knowledge Exchange: State-of-the-Art and Open ProblemsSOFSEM 2021: Theory and Practice of Computer Science10.1007/978-3-030-67731-2_2(13-27)Online publication date: 11-Jan-2021
  • (2020)Knowledge translationProceedings of the VLDB Endowment10.14778/3407790.340780613:12(2018-2032)Online publication date: 14-Sep-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media