skip to main content
10.1145/1321440.1321449acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Autonomously semantifying wikipedia

Published: 06 November 2007 Publication History

Abstract

Berners-Lee's compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method - creating enough structured data to motivate the development of applications. This paper argues that autonomously "Semantifying Wikipedia" is the best way to solve the problem. We choose Wikipedia as an initial data source, because it is comprehensive, not too large, high-quality, and contains enough manually-derived structure to bootstrap an autonomous, self-supervised process. We identify several types of structures which can be automatically enhanced in Wikipedia (e.g., link structure, taxonomic data, infoboxes, etc.), and we describea prototype implementation of a self-supervised, machine learning system which realizes our vision. Preliminary experiments demonstrate the high precision of our system's extracted data - in one case equaling that of humans.

References

[1]
http://opennlp.sourceforge.net/.
[2]
S. F. Adafre and M. de Rijke. Discovering missing links in wikipedia. In Proceedings of the 3rd International Workshop on Link Discovery at KDD05, Chicago, USA, August 2005.
[3]
S. Auer and J. Lehmann. What have Innsbruck and Leipzig in common? Extracting semantics from wiki content. In ESWC, 2007.
[4]
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the Web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007.
[5]
T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, May 2001.
[6]
L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996.
[7]
E. Brill, S. Dumais, and M. Banko. An analysis of the AskMSR question-answering system. In Proceedings of EMNLP, 2002.
[8]
C. L. A. Clarke, G. V. Cormack, and T. R. Lynam. Exploiting redundancy in question answering. In Proceedings of the 24th Annual International ACM SIGIR Conference, 2001.
[9]
R. de Salvo Braz, R. Girju, V. Punyakanok, D. Roth, and M. Sammons. An inference model for semantic entailment in natural language. In National Conference on Artificial Intelligence (AAAI), pages 1678--1679, 2005.
[10]
S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, S. Rajagopalan, A. Tomkins, J. Tomlin, and J. Y. Zien. Semtag and Seeker: bootstrapping the Semantic Web via automated semantic annotation. In Proceedings of 12th International World Wide Web Conference, pages 178--186, 2003.
[11]
A. Doan and A. Halevy. Semantic integration research in the database community: A brief survey. AI Magazine, Special Issue on Semantic Integration, 2005.
[12]
D. Downey, O. Etzioni, and S. Soderland. A probabilistic model of redundancy in information extraction. In Procs. of IJCAI 2005, 2005.
[13]
S. Dumais, M. Banko, E. Brill, J. Lin, and A. Ng. Web question answering: Is more always better? In Proceedings of the 25th Annual International ACM SIGIR Conference, 2002.
[14]
O. Etzioni, M. Cafarella, D. Downey, S. Kok, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence, 165(1):91--134, 2005.
[15]
E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proceedings of the 21st National Conference on Artificial Intelligence, pages 1301--1306, 2006.
[16]
E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of The 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 2007.
[17]
A. Y. Halevy, O. Etzioni, A. Doan, Z. G. Ives, J. Madhavan, L. McDowell, and I. Tatarinov. Crossing the structure chasm. In Proceedings of CIDR, 2003.
[18]
C. T. Kwok, O. Etzioni, and D. Weld. Scaling question answering to the Web. ACM Transactions on Information Systems (TOIS), 19(3):242--262, 2001.
[19]
J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 15th International Conference on World Wide Web, Edinburgh, Scotland, May 2001.
[20]
B. MacCartney and C. D. Manning. Natural logic for textual inference. In Workshop on Textual Entailment and Paraphrasing, ACL 2007, 2007.
[21]
A. K. McCallum. Mallet: A machine learning for language toolkit. In http://mallet.cs.umass.edu, 2002.
[22]
R. Meir and G. R�tsch. An introduction to boosting and leveraging. Journal of Artificial Intelligence Research, Advanced Lectures on Machine Learning: 118--183, 2003.
[23]
D. P. Nguyen, Y. Matsuo, and M. Ishizuka. Exploiting syntactic and semantic information for relation extraction from wikipedia. In IJCAI07-TextLinkWS, 2007.
[24]
K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In Proceedings of the IJCAI-99 Workshop on Machine Learning for Information Filtering, 1999.
[25]
D. Opitz and R. Maclin. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, pages 169--198, 1999.
[26]
S. P. Ponzetto and M. Strube. Deriving a large scale taxonomy from wikipedia. In Proceedings of the 22st National Conference on Artificial Intelligence, pages 1440--1445, 2007.
[27]
E. Riloff and J. Shepherd. A corpus-based approach for building semantic lexicons. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pages 117--124, Providence, RI, 1997.
[28]
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge - unifying WordNet and Wikipedia. In Proceedings of the 16th International Conference on World Wide Web, 2007.
[29]
M. V&3246;lkel, M. Kr�tzsch, D. Vrandecic, H. Haller, and R. Studer. Semantic wikipedia. In Proceedings of the 15th International Conference on World Wide Web, 2006.
[30]
W. Wu, A. Doan, C. Yu, and W. Meng. Bootstrapping domain ontology for Semantic Web services from source web sites. In Proceedings of the VLDB-05 Workshop on Technologies for E-Services, 2005.

Cited By

View all
  • (2024)G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous Knowledge for Commonsense ReasoningProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658040(1051-1060)Online publication date: 30-May-2024
  • (2024)Box2Go: Collaborative Interactive Infobox FillingCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651235(1003-1006)Online publication date: 13-May-2024
  • (2023)Distantly Supervised Relation Extraction via Contextual Information Interaction and Relation EmbeddingsSymmetry10.3390/sym1509178815:9(1788)Online publication date: 18-Sep-2023
  • Show More Cited By

Index Terms

  1. Autonomously semantifying wikipedia

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
    November 2007
    1048 pages
    ISBN:9781595938039
    DOI:10.1145/1321440
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. information extraction
    2. semantic web
    3. wikipedia

    Qualifiers

    • Research-article

    Conference

    CIKM07

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)111
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 21 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous Knowledge for Commonsense ReasoningProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658040(1051-1060)Online publication date: 30-May-2024
    • (2024)Box2Go: Collaborative Interactive Infobox FillingCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651235(1003-1006)Online publication date: 13-May-2024
    • (2023)Distantly Supervised Relation Extraction via Contextual Information Interaction and Relation EmbeddingsSymmetry10.3390/sym1509178815:9(1788)Online publication date: 18-Sep-2023
    • (2023)Psychiq and Wwwyzzerdd: Wikidata completion using WikipediaSemantic Web10.3233/SW-233450(1-14)Online publication date: 12-Sep-2023
    • (2023)Review of Knowledge Graph and Its Vertical Applications in Industry2023 42nd Chinese Control Conference (CCC)10.23919/CCC58697.2023.10240572(5151-5157)Online publication date: 24-Jul-2023
    • (2023)Dynamic Dense-Sparse Representations for Real-Time Question Answering2023 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME55011.2023.00250(1445-1446)Online publication date: Jul-2023
    • (2023)PD-Box: A People Place Data Box for Processing Engine Anatomy2023 2nd Edition of IEEE Delhi Section Flagship Conference (DELCON)10.1109/DELCON57910.2023.10127379(1-6)Online publication date: 24-Feb-2023
    • (2022)Information asymmetry in Wikipedia across different languagesJournal of the Association for Information Science and Technology10.1002/asi.2455373:3(347-361)Online publication date: 7-Feb-2022
    • (2021)Predicting Links on Wikipedia with Anchor Text InformationProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462994(1758-1762)Online publication date: 11-Jul-2021
    • (2021)Named Entity Location Prediction Combining Twitter and WebIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.297326133:11(3618-3633)Online publication date: 1-Nov-2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media