skip to main content
10.1007/978-3-031-71167-1_15guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Enhancing Machine Learning Predictions Through Knowledge Graph Embeddings

Published: 10 September 2024 Publication History

Abstract

Despite their widespread use, machine learning (ML) methods often exhibit sub-optimal performance. The accuracy of these models is primarily hindered by insufficient training data and poor data quality, with particularly severe consequences in critical areas such as medical diagnosis prediction. Our hypothesis is that enhancing ML pipelines with semantic information such as those available in knowledge graphs (KG) can address these challenges and improve ML prediction accuracy. To that end, we extend the state of the art through a novel approach that uses KG embeddings to augment tabular data in various innovative ways within ML pipelines. Concretely, we introduce and examine several integration techniques of KG embeddings and the influence of KG characteristics on model performance, specifically accuracy and F2 scores. We evaluate our approach with four ML algorithms and two embedding techniques, applied to heart and chronic kidney disease prediction. Our results indicate consistent improvements in model performance across various ML models and tasks, thus confirming our hypothesis, e.g. we increased the F2 score for the KNN from 70% to 82.22%, and the F2 score for SVM from 74.53% to 81.71%, for heart disease prediction.

References

[1]
Alfrjani R, Osman T, and Cosma G A hybrid semantic knowledgebase-machine learning approach for opinion mining Data Knowl. Eng. 2019 121 88-108
[2]
Ali L et al. An optimized stacked support vector machines based expert system for the effective prediction of heart failure IEEE Access 2019 7 54007-54014
[3]
Bhatt S, Sheth A, Shalin V, and Zhao J Knowledge graph semantic enhancement of input data for improving AI IEEE Internet Comput. 2020 24 2 66-72
[4]
Chen, J., Alghamdi, G., Schmidt, R.A., Walther, D., Gao, Y.: Ontology extraction for large ontologies via modularity and forgetting. In: Proceedings of the 10th International Conference on Knowledge Capture, pp. 45–52 (2019)
[5]
Chittora P et al. Prediction of chronic kidney disease-a machine learning perspective IEEE Access 2021 9 17312-17334
[6]
Chute CG and Çelik C Overview of ICD-11 architecture and structure BMC Med. Inform. Decis. Mak. 2021 21 6 1-7
[7]
Confalonieri R, Weyde T, Besold TR, and del Prado Martín FM Using ontologies to enhance human understandability of global post-hoc explanations of black-box models Artif. Intell. 2021 296
[8]
Dash T, Chitlangia S, Ahuja A, and Srinivasan A A review of some techniques for inclusion of domain-knowledge into deep neural networks Sci. Rep. 2022 12 1 1040
[9]
El-Sappagh S, Franda F, Ali F, and Kwak KS SNOMED CT standard ontology based on the ontology for general medical science BMC Med. Inform. Decis. Mak. 2018 18 1-19
[10]
Garcez, A.D., Lamb, L.C.: Neurosymbolic AI: the 3rd wave. Artif. Intell. Rev. 1–20 (2023)
[11]
Gaur, M., et al.: “Let me tell you about your mental health!" contextualized classification of reddit posts to DSM-5 for web-based intervention. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 753–762 (2018)
[12]
Gazzotti R, Faron-Zucker C, Gandon F, Lacroix-Hugues V, Darmon D, et al. Hitzler P et al. Injecting domain knowledge in electronic medical records to improve hospitalization prediction The Semantic Web 2019 Cham Springer 116-130
[13]
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
[14]
Hassler AP, Menasalvas E, García-García FJ, Rodríguez-Mañas L, and Holzinger A Importance of medical data preprocessing in predictive modeling and risk factor discovery for the frailty syndrome BMC Med. Inform. Decis. Mak. 2019 19 1-17
[15]
Herron, D., Jiménez-Ruiz, E., Weyde, T.: On the benefits of OWL-based knowledge graphs for neural-symbolic systems. In: Proceedings of the 17th International Workshop on Neural-Symbolic Learning and Reasoning, vol. 3432, pp. 327–335. CEUR Workshop Proceedings (2023)
[16]
Hitzler, P., Eberhart, A., Ebrahimi, M., Sarker, M.K., Zhou, L.: Neuro-symbolic approaches in artificial intelligence. Natl. Sci. Rev. 9(6), nwac035 (2022)
[17]
Huang, Y.X., et al.: Enabling abductive learning to exploit knowledge graph. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pp. 3839–3847 (2023)
[18]
Ivanović M and Budimac Z An overview of ontologies and data resources in medical domains Expert Syst. Appl. 2014 41 11 5158-5166
[19]
Jovic, A., Prcela, M., Gamberger, D.: Ontologies in medical knowledge representation. In: 2007 29th International Conference on Information Technology Interfaces, pp. 535–540. IEEE (2007)
[20]
Katarya R and Meena SK Machine learning techniques for heart disease prediction: a comparative study and analysis Heal. Technol. 2021 11 87-97
[21]
Kursuncu, U., Gaur, M., Sheth, A.: Knowledge infused learning (k-il): towards deep incorporation of knowledge in deep learning. arXiv preprint arXiv:1912.00512 (2019)
[22]
Lehmann J et al. Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia Semant. web 2015 6 2 167-195
[23]
Llugiqi, M., Ekaputra, F.J., Sabou, M.: Leveraging knowledge graphs for enhancing machine learning-based heart disease prediction. In: The Knowledge Graphs and Neurosymbolic AI (KG-NeSy) 2024 Workshop co-located with AIRoV – The First Austrian Symposium on AI, Robotics, and Vision (accepted for publication) (2024). https://semantic-systems.org/sites/KG-NeSy/papers/P28.pdf
[24]
Mohan S, Thirumalai C, and Srivastava G Effective heart disease prediction using hybrid machine learning techniques IEEE Access 2019 7 81542-81554
[25]
Pisanelli, D.M.: Ontologies in Medicine, vol. 102. IOS press (2004)
[26]
Poulinakis K, Drikakis D, Kokkinakis IW, and Spottswood SM Machine-learning methods on noisy and sparse data Mathematics 2023 11 1 236
[27]
Rady EHA and Anwar AS Prediction of kidney disease stages using data mining algorithms Inf. Med. Unlocked 2019 15
[28]
Rani P, Kumar R, Ahmed NMS, and Jain A A decision support system for heart disease prediction based upon machine learning J. Reliable Intell. Environ. 2021 7 3 263-275
[29]
Ristoski P and Paulheim H Groth P, Simperl E, Gray A, Sabou M, Krötzsch M, Lecue F, Flöck F, and Gil Y RDF2Vec: RDF graph embeddings for data mining The Semantic Web – ISWC 2016 2016 Cham Springer 498-514
[30]
Ruiz, C., Ren, H., Huang, K., Leskovec, J.: High dimensional, tabular deep learning with an auxiliary knowledge graph. Adv. Neural Inf. Process. Syst. 36 (2024)
[31]
Sarker MK, Zhou L, Eberhart A, and Hitzler P Neuro-symbolic artificial intelligence AI Commun. 2021 34 3 197-209
[32]
Shah D, Patel S, and Bharti SK Heart disease prediction using machine learning techniques SN Comput. Sci. 2020 1 1-6
[33]
Szilagyi, I., Wira, P.: An intelligent system for smart buildings using machine learning and semantic technologies: a hybrid data-knowledge approach. In: 2018 IEEE Industrial Cyber-Physical Systems (ICPS), pp. 20–25. IEEE (2018)
[34]
Vijayarani S, Dhayanand S, and Phil M Kidney disease prediction using SVM and ANN algorithms Int. J. Comput. Bus. Res. (IJCBR) 2015 6 2 1-12
[35]
Yadav, A.L., Soni, K., Khare, S.: Heart diseases prediction using machine learning. In: 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1–7. IEEE (2023)
[36]
Yildirim, P.: Chronic kidney disease prediction on imbalanced data by multilayer perceptron: chronic kidney disease prediction. In: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 193–198 (2017).
[37]
Yin, C., Zhao, R., Qian, B., Lv, X., Zhang, P.: Domain knowledge guided deep learning with electronic health records. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 738–747. IEEE (2019)
[38]
Ziegler, K., et al.: Injecting semantic background knowledge into neural networks using graph embeddings. In: 2017 IEEE 26th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), pp. 200–205. IEEE (2017)

Index Terms

  1. Enhancing Machine Learning Predictions Through Knowledge Graph Embeddings
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      Neural-Symbolic Learning and Reasoning: 18th International Conference, NeSy 2024, Barcelona, Spain, September 9–12, 2024, Proceedings, Part I
      Sep 2024
      440 pages
      ISBN:978-3-031-71166-4
      DOI:10.1007/978-3-031-71167-1

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 10 September 2024

      Author Tags

      1. Neurosymbolic AI
      2. Knowledge Graph Embeddings
      3. Machine Learning
      4. Data Augmentation

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 0
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 21 Oct 2024

      Other Metrics

      Citations

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media