research-article

OVANA: An Approach to Analyze and Improve the Information Quality of Vulnerability Databases

Authors:

Marc Wendelborn,

Christian ReuterAuthors Info & Claims

ARES '21: Proceedings of the 16th International Conference on Availability, Reliability and Security

Article No.: 22, Pages 1 - 11

https://doi.org/10.1145/3465481.3465744

Published: 17 August 2021 Publication History

Abstract

Vulnerability databases are one of the main information sources for IT security experts. Hence, the quality of their information is of utmost importance for anyone working in this area. Previous work has shown that machine readable information is either missing, incorrect, or inconsistent with other data sources. In this paper, we introduce a system called Overt Vulnerability source ANAlysis (OVANA), which analyzes the information quality of vulnerability databases utilizing state-of-the-art machine learning (ML) and natural language processing (NLP) techniques, searches the free-form description for relevant information missing from structured fields, and updates it accordingly. Our paper exemplifies that on the National Vulnerability Database, showing that OVANA is able to improve the information quality by 51.23% based on the indicators of accuracy, completeness, and uniqueness. Moreover, we present information which should be incorporated into the structured fields to increase the uniqueness of vulnerability entries and improve the discriminability of different vulnerability entries. The identified information from OVANA enables a more targeted vulnerability search and provides guidance for IT security experts in finding relevant information in vulnerability descriptions for severity assessment.

References

[1]

Nitin Agarwal and Yusuf Yiliyasi. 2010. Information quality challenges in social media. Proceedings of the 2010 International Conference on Information Quality, ICIQ 2010 (2010), 15 pages.

[2]

Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. 2019. FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP. In Proceedings of the 2019 Conference of the North. Association for Computational Linguistics, Stroudsburg, PA, USA, 54–59. https://doi.org/10.18653/v1/N19-4010

[3]

Luca Allodi and Fabio Massacci. 2017. Attack Potential in Impact and Complexity. In Proceedings of the 12th International Conference on Availability, Reliability and Security (Reggio Calabria, Italy) (ARES ’17). Association for Computing Machinery, New York, NY, USA, Article 32, 6 pages. https://doi.org/10.1145/3098954.3098965

Digital Library

[4]

Afsah Anwar, Ahmed Abusnaina, Songqing Chen, Frank Li, and David Mohaisen. 2020. Cleaning the NVD: Comprehensive Quality Assessment, Improvements, and Analyses. arXiv (2020), 1–13. arxiv:2006.15074http://arxiv.org/abs/2006.15074

[5]

Markus Bayer, Marc-André Kaufhold, Björn Buchhold, Marcel Keller, Jörg Dallmeyer, and Christian Reuter. 2021. Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers. arxiv:2103.14453 [cs.CL]

[6]

Oscar Chaparro, Jing Lu, Fiorella Zampetti, Laura Moreno, Massimiliano Di Penta, Andrian Marcus, Gabriele Bavota, and Vincent Ng. 2017. Detecting missing information in bug descriptions. Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering Part F1301(2017), 396–407. https://doi.org/10.1145/3106237.3106285

Digital Library

[7]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv:1412.3555 [cs] (Dec. 2014), 9 pages. http://arxiv.org/abs/1412.3555 arXiv:1412.3555.

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1. Association for Computational Linguistics, Stroudsburg, PA, USA, 4171–4186. https://doi.org/10.18653/v1/N19-1423 arxiv:1810.04805

[9]

Ying Dong, Wenbo Guo, Yueqi Chen, Xinyu Xing, Yuqing Zhang, and Gang Wang. 2019. Towards the Detection of Inconsistencies in Public Security Vulnerability Reports. In USENIX Security. USENIX Association, Santa Clara, CA, 869–885. https://github.com/pinkymm/inconsistency_detection

[10]

Clément Elbaz, Louis Rilling, and Christine Morin. 2020. Fighting N-day vulnerabilities with automated CVSS vector prediction at disclosure. In Proceedings of the 15th International Conference on Availability, Reliability and Security. ACM, New York, NY, USA, 1–10. https://doi.org/10.1145/3407023.3407038

Digital Library

[11]

Martin J. Eppler. 2003. Managing Information Quality. Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24782-1

[12]

Doudou Fall and Youki Kadobayashi. 2019. The Common Vulnerability Scoring System vs. Rock Star Vulnerabilities: Why the Discrepancy?. In Proceedings of the 5th International Conference on Information Systems Security and Privacy. SCITEPRESS - Science and Technology Publications, 405–411. https://doi.org/10.5220/0007387704050411

[13]

Yuanrui Fan, Xin Xia, David Lo, and Ahmed E. Hassan. 2020. Chaff from the Wheat: Characterizing and Determining Valid Bug Reports. IEEE Transactions on Software Engineering 46, 5 (2020), 495–525. https://doi.org/10.1109/TSE.2018.2864217

[14]

Sadegh Farhang, Mehmet Bahadir Kirdan, Aron Laszka, and Jens Grossklags. 2020. An Empirical Study of Android Security Bulletins in Different Vendors. In Proceedings of The Web Conference 2020. ACM, New York, NY, USA, 3063–3069. https://doi.org/10.1145/3366423.3380078

Digital Library

[15]

Hao Guo, Zhenchang Xing, and Xiaohong Li. 2020. Predicting Missing Information of Vulnerability Reports. In Companion Proceedings of the Web Conference 2020. ACM, New York, NY, USA, 81–82. https://doi.org/10.1145/3366424.3382707

Digital Library

[16]

Hannes Holm and Khalid Khan Afridi. 2015. An expert-based investigation of the Common Vulnerability Scoring System. Computers & Security 53 (2015), 18–30. https://doi.org/10.1016/j.cose.2015.04.012

Digital Library

[17]

Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv:1508.01991 [cs] (Aug. 2015), 10 pages. http://arxiv.org/abs/1508.01991 arXiv:1508.01991.

[18]

Pontus Johnson, Robert Lagerstrom, Mathias Ekstedt, and Ulrik Franke. 2018. Can the common vulnerability scoring system be trusted? A Bayesian analysis. IEEE Transactions on Dependable and Secure Computing 15, 6 (2018), 1002–1015. https://doi.org/10.1109/TDSC.2016.2644614

[19]

Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arxiv:1412.6980 [cs.LG]

[20]

Vijay Krishnan and Vignesh Ganapathy. 2005. Named Entity Recognition. http://www.stanford.edu/class/cs229/proj2005/KrishnanGanapathy-NamedEntityRecognition.pdf

[21]

Philipp Kuehn, Thea Riebe, Lynn Apelt, Max Jansen, and Christian Reuter. 2020. Sharing of Cyber Threat Intelligence between States. Sicherheit & Frieden 38, 1 (July 2020), 22–28. https://doi.org/10.5771/0175-274X-2020-1-22 Publisher: Nomos Verlagsgesellschaft mbH & Co. KG.

[22]

Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference. Association for Computational Linguistics, Stroudsburg, PA, USA, 260–270. https://doi.org/10.18653/v1/n16-1030 arxiv:1603.01360

[23]

Frank Li and Vern Paxson. 2017. A Large-Scale Empirical Study of Security Patches. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS ’17). Association for Computing Machinery, New York, NY, USA, 2201–2215. https://doi.org/10.1145/3133956.3134072

Digital Library

[24]

Dongliang Mu, Alejandro Cuevas, Limin Yang, Hang Hu, Xinyu Xing, Bing Mao, and Gang Wang. 2018. Understanding the reproducibility of crowd-reported security vulnerabilities. Proceedings of the 27th USENIX Security Symposium (2018), 919–936.

Digital Library

[25]

Felix Naumann and Claudia Rolker. 2000. Assessment Methods for Infomration Quality Criteria. In Fifth Conference on Information Quality (IQ 2000). MIT, 148–162. https://doi.org/10.18452/2441

[26]

Viet Hung Nguyen and Fabio Massacci. 2013. The (un)reliability of NVD vulnerable versions data. In Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security - ASIA CCS ’13. ACM Press, New York, New York, USA, 493. https://doi.org/10.1145/2484313.2484377 arxiv:1302.4133

Digital Library

[27]

Nicole Perlroth, Mark Scott, and Sheera Frenkel. 2017. Cyberattack Hits Ukraine Then Spreads Internationally. The New York Times (2017), 6 pages. https://www.nytimes.com/2017/06/27/technology/ransomware-hackers.html

[28]

Christian Reuter (Ed.). 2019. Information Technology for Peace and Security. Springer Fachmedien Wiesbaden, Wiesbaden. https://doi.org/10.1007/978-3-658-25652-4

[29]

Jukka Ruohonen. 2019. A look at the time delays in CVSS vulnerability scoring. Applied Computing and Informatics 15, 2 (2019), 129–135. https://doi.org/10.1016/j.aci.2017.12.002

[30]

Thomas Schaberreiter, Veronika Kupfersberger, Konstantinos Rantos, Arnolnt Spyros, Alexandros Papanikolaou, Christos Ilioudis, and Gerald Quirchmayr. 2019. A Quantitative Evaluation of Trust in the Quality of Cyber Threat Intelligence Sources. In Proceedings of the 14th International Conference on Availability, Reliability and Security. ACM, New York, NY, USA, 1–10. https://doi.org/10.1145/3339252.3342112

Digital Library

[31]

Muhammad Shahzad, Muhammad Zubair Shafiq, and Alex X. Liu. 2012. A large scale exploratory analysis of software vulnerability life cycles. In Proceedings - International Conference on Software Engineering. IEEE, 771–781. https://doi.org/10.1109/ICSE.2012.6227141

[32]

Andrew Smith and Miles Osborne. 2006. Using gazetteers in discriminative information extraction. Proceedings of the Tenth Conference on Computational Natural Language Learning, CoNLL-XJune(2006), 133–140. https://doi.org/10.3115/1596276.1596302

[33]

J.�M. Spring, E. Hatleback, A. Householder, A. Manion, and D. Shick. 2018. Towards Improving CVSS.

[34]

Jason Wei and Kai Zou. 2019. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. arXiv:1901.11196 [cs] (Aug. 2019), 9�pages. http://arxiv.org/abs/1901.11196 arXiv:1901.11196.

[35]

Wei You, Peiyuan Zong, Kai Chen, Xiao Feng Wang, Xiaojing Liao, Pan Bian, and Bin Liang. 2017. SemFuzz: Semantics-based automatic generation of proof-of-concept exploits. Proceedings of the ACM Conference on Computer and Communications Security (2017), 2139–2154. https://doi.org/10.1145/3133956.3134085

Digital Library

[36]

Su Zhang, Xinming Ou, and Doina Caragea. 2015. Predicting Cyber Risks through National Vulnerability Database. Information Security Journal 24, 4-6 (2015), 194–206. https://doi.org/10.1080/19393555.2015.1111961

Digital Library

Cited By

Bayer MKuehn PShanehsaz RReuter C(2024)CySecBERT: A Domain-Adapted Language Model for the Cybersecurity DomainACM Transactions on Privacy and Security10.1145/365259427:2(1-20)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3652594
Wunder JKurtz AEichenm�ller CGassmann FBenenson Z(2024)Shedding Light on CVSS Scoring Inconsistencies: A User-Centric Study on Evaluating Widespread Security Vulnerabilities2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00058(1102-1121)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00058
Ezenwoye OPinconschi E(2024)A Study of Fine-Tuned Language Models in Vulnerability Classification2024 12th International Symposium on Digital Forensics and Security (ISDFS)10.1109/ISDFS60797.2024.10527294(1-6)Online publication date: 29-Apr-2024
https://doi.org/10.1109/ISDFS60797.2024.10527294
Show More Cited By

Recommendations

Automated Generation of Attack Graphs Using NVD
CODASPY '18: Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy

Today's computer networks are prone to sophisticated multi-step, multi-host attacks. Common approaches of identifying vulnerabilities and analyzing the security of such networks with naive methods such as counting the number of vulnerabilities, or ...
Identifying Relevant Information Cues for Vulnerability Assessment Using CVSS
CODASPY '18: Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy

The assessment of new vulnerabilities is an activity that accounts for information from several data sources and produces a 'severity' score for the vulnerability. The Common Vulnerability Scoring System (CVSS) is the reference standard for this ...
Vulnerability Analysis of the Exposed Public IPs in a Higher Education Institution
ICCNS '20: Proceedings of the 2020 10th International Conference on Communication and Network Security

Public IP addresses from a private or public higher education institution receive large amounts of network traffic. However, the data network is vulnerable to the possibility of security attacks.

This study develops a case in a practical way based in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ARES '21: Proceedings of the 16th International Conference on Availability, Reliability and Security

August 2021

1447 pages

ISBN:9781450390514

DOI:10.1145/3465481

Copyright � 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Bundesministerium f�r Bildung und Forschung
Hessisches Ministerium f�r Wissenschaft und Kunst

Conference

ARES 2021

ARES 2021: The 16th International Conference on Availability, Reliability and Security

August 17 - 20, 2021

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 228 of 451 submissions, 51%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
263
Total Downloads

Downloads (Last 12 months)69
Downloads (Last 6 weeks)7

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bayer MKuehn PShanehsaz RReuter C(2024)CySecBERT: A Domain-Adapted Language Model for the Cybersecurity DomainACM Transactions on Privacy and Security10.1145/365259427:2(1-20)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3652594
Wunder JKurtz AEichenm�ller CGassmann FBenenson Z(2024)Shedding Light on CVSS Scoring Inconsistencies: A User-Centric Study on Evaluating Widespread Security Vulnerabilities2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00058(1102-1121)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00058
Ezenwoye OPinconschi E(2024)A Study of Fine-Tuned Language Models in Vulnerability Classification2024 12th International Symposium on Digital Forensics and Security (ISDFS)10.1109/ISDFS60797.2024.10527294(1-6)Online publication date: 29-Apr-2024
https://doi.org/10.1109/ISDFS60797.2024.10527294
K�hn PWittorf KReuter C(2024)Navigating the Shadows: Manual and Semi-Automated Evaluation of the Dark Web for Cyber Threat IntelligenceIEEE Access10.1109/ACCESS.2024.344824712(118903-118922)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3448247
Reinhold TKuehn PG�nther DSchneider TReuter C(2023)ExTRUST: Reducing Exploit Stockpiles With a Privacy-Preserving Depletion System for Inter-State RelationshipsIEEE Transactions on Technology and Society10.1109/TTS.2023.32803564:2(158-170)Online publication date: Jun-2023
https://doi.org/10.1109/TTS.2023.3280356
Li XMoreschini SZhang ZPalomba FTaibi D(2023)The anatomy of a vulnerability databaseJournal of Systems and Software10.1016/j.jss.2023.111679201:COnline publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1016/j.jss.2023.111679

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents