skip to main content
10.1145/3465481.3465744acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaresConference Proceedingsconference-collections
research-article

OVANA: An Approach to Analyze and Improve the Information Quality of Vulnerability Databases

Published: 17 August 2021 Publication History

Abstract

Vulnerability databases are one of the main information sources for IT security experts. Hence, the quality of their information is of utmost importance for anyone working in this area. Previous work has shown that machine readable information is either missing, incorrect, or inconsistent with other data sources. In this paper, we introduce a system called Overt Vulnerability source ANAlysis (OVANA), which analyzes the information quality of vulnerability databases utilizing state-of-the-art machine learning (ML) and natural language processing (NLP) techniques, searches the free-form description for relevant information missing from structured fields, and updates it accordingly. Our paper exemplifies that on the National Vulnerability Database, showing that OVANA is able to improve the information quality by 51.23% based on the indicators of accuracy, completeness, and uniqueness. Moreover, we present information which should be incorporated into the structured fields to increase the uniqueness of vulnerability entries and improve the discriminability of different vulnerability entries. The identified information from OVANA enables a more targeted vulnerability search and provides guidance for IT security experts in finding relevant information in vulnerability descriptions for severity assessment.

References

[1]
Nitin Agarwal and Yusuf Yiliyasi. 2010. Information quality challenges in social media. Proceedings of the 2010 International Conference on Information Quality, ICIQ 2010 (2010), 15 pages.
[2]
Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. 2019. FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP. In Proceedings of the 2019 Conference of the North. Association for Computational Linguistics, Stroudsburg, PA, USA, 54–59. https://doi.org/10.18653/v1/N19-4010
[3]
Luca Allodi and Fabio Massacci. 2017. Attack Potential in Impact and Complexity. In Proceedings of the 12th International Conference on Availability, Reliability and Security (Reggio Calabria, Italy) (ARES ’17). Association for Computing Machinery, New York, NY, USA, Article 32, 6 pages. https://doi.org/10.1145/3098954.3098965
[4]
Afsah Anwar, Ahmed Abusnaina, Songqing Chen, Frank Li, and David Mohaisen. 2020. Cleaning the NVD: Comprehensive Quality Assessment, Improvements, and Analyses. arXiv (2020), 1–13. arxiv:2006.15074http://arxiv.org/abs/2006.15074
[5]
Markus Bayer, Marc-André Kaufhold, Björn Buchhold, Marcel Keller, Jörg Dallmeyer, and Christian Reuter. 2021. Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers. arxiv:2103.14453 [cs.CL]
[6]
Oscar Chaparro, Jing Lu, Fiorella Zampetti, Laura Moreno, Massimiliano Di Penta, Andrian Marcus, Gabriele Bavota, and Vincent Ng. 2017. Detecting missing information in bug descriptions. Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering Part F1301(2017), 396–407. https://doi.org/10.1145/3106237.3106285
[7]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv:1412.3555 [cs] (Dec. 2014), 9 pages. http://arxiv.org/abs/1412.3555 arXiv:1412.3555.
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1. Association for Computational Linguistics, Stroudsburg, PA, USA, 4171–4186. https://doi.org/10.18653/v1/N19-1423 arxiv:1810.04805
[9]
Ying Dong, Wenbo Guo, Yueqi Chen, Xinyu Xing, Yuqing Zhang, and Gang Wang. 2019. Towards the Detection of Inconsistencies in Public Security Vulnerability Reports. In USENIX Security. USENIX Association, Santa Clara, CA, 869–885. https://github.com/pinkymm/inconsistency_detection
[10]
Clément Elbaz, Louis Rilling, and Christine Morin. 2020. Fighting N-day vulnerabilities with automated CVSS vector prediction at disclosure. In Proceedings of the 15th International Conference on Availability, Reliability and Security. ACM, New York, NY, USA, 1–10. https://doi.org/10.1145/3407023.3407038
[11]
Martin J. Eppler. 2003. Managing Information Quality. Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24782-1
[12]
Doudou Fall and Youki Kadobayashi. 2019. The Common Vulnerability Scoring System vs. Rock Star Vulnerabilities: Why the Discrepancy?. In Proceedings of the 5th International Conference on Information Systems Security and Privacy. SCITEPRESS - Science and Technology Publications, 405–411. https://doi.org/10.5220/0007387704050411
[13]
Yuanrui Fan, Xin Xia, David Lo, and Ahmed E. Hassan. 2020. Chaff from the Wheat: Characterizing and Determining Valid Bug Reports. IEEE Transactions on Software Engineering 46, 5 (2020), 495–525. https://doi.org/10.1109/TSE.2018.2864217
[14]
Sadegh Farhang, Mehmet Bahadir Kirdan, Aron Laszka, and Jens Grossklags. 2020. An Empirical Study of Android Security Bulletins in Different Vendors. In Proceedings of The Web Conference 2020. ACM, New York, NY, USA, 3063–3069. https://doi.org/10.1145/3366423.3380078
[15]
Hao Guo, Zhenchang Xing, and Xiaohong Li. 2020. Predicting Missing Information of Vulnerability Reports. In Companion Proceedings of the Web Conference 2020. ACM, New York, NY, USA, 81–82. https://doi.org/10.1145/3366424.3382707
[16]
Hannes Holm and Khalid Khan Afridi. 2015. An expert-based investigation of the Common Vulnerability Scoring System. Computers & Security 53 (2015), 18–30. https://doi.org/10.1016/j.cose.2015.04.012
[17]
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv:1508.01991 [cs] (Aug. 2015), 10 pages. http://arxiv.org/abs/1508.01991 arXiv:1508.01991.
[18]
Pontus Johnson, Robert Lagerstrom, Mathias Ekstedt, and Ulrik Franke. 2018. Can the common vulnerability scoring system be trusted? A Bayesian analysis. IEEE Transactions on Dependable and Secure Computing 15, 6 (2018), 1002–1015. https://doi.org/10.1109/TDSC.2016.2644614
[19]
Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arxiv:1412.6980 [cs.LG]
[20]
Vijay Krishnan and Vignesh Ganapathy. 2005. Named Entity Recognition. http://www.stanford.edu/class/cs229/proj2005/KrishnanGanapathy-NamedEntityRecognition.pdf
[21]
Philipp Kuehn, Thea Riebe, Lynn Apelt, Max Jansen, and Christian Reuter. 2020. Sharing of Cyber Threat Intelligence between States. Sicherheit & Frieden 38, 1 (July 2020), 22–28. https://doi.org/10.5771/0175-274X-2020-1-22 Publisher: Nomos Verlagsgesellschaft mbH & Co. KG.
[22]
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference. Association for Computational Linguistics, Stroudsburg, PA, USA, 260–270. https://doi.org/10.18653/v1/n16-1030 arxiv:1603.01360
[23]
Frank Li and Vern Paxson. 2017. A Large-Scale Empirical Study of Security Patches. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS ’17). Association for Computing Machinery, New York, NY, USA, 2201–2215. https://doi.org/10.1145/3133956.3134072
[24]
Dongliang Mu, Alejandro Cuevas, Limin Yang, Hang Hu, Xinyu Xing, Bing Mao, and Gang Wang. 2018. Understanding the reproducibility of crowd-reported security vulnerabilities. Proceedings of the 27th USENIX Security Symposium (2018), 919–936.
[25]
Felix Naumann and Claudia Rolker. 2000. Assessment Methods for Infomration Quality Criteria. In Fifth Conference on Information Quality (IQ 2000). MIT, 148–162. https://doi.org/10.18452/2441
[26]
Viet Hung Nguyen and Fabio Massacci. 2013. The (un)reliability of NVD vulnerable versions data. In Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security - ASIA CCS ’13. ACM Press, New York, New York, USA, 493. https://doi.org/10.1145/2484313.2484377 arxiv:1302.4133
[27]
Nicole Perlroth, Mark Scott, and Sheera Frenkel. 2017. Cyberattack Hits Ukraine Then Spreads Internationally. The New York Times (2017), 6 pages. https://www.nytimes.com/2017/06/27/technology/ransomware-hackers.html
[28]
Christian Reuter (Ed.). 2019. Information Technology for Peace and Security. Springer Fachmedien Wiesbaden, Wiesbaden. https://doi.org/10.1007/978-3-658-25652-4
[29]
Jukka Ruohonen. 2019. A look at the time delays in CVSS vulnerability scoring. Applied Computing and Informatics 15, 2 (2019), 129–135. https://doi.org/10.1016/j.aci.2017.12.002
[30]
Thomas Schaberreiter, Veronika Kupfersberger, Konstantinos Rantos, Arnolnt Spyros, Alexandros Papanikolaou, Christos Ilioudis, and Gerald Quirchmayr. 2019. A Quantitative Evaluation of Trust in the Quality of Cyber Threat Intelligence Sources. In Proceedings of the 14th International Conference on Availability, Reliability and Security. ACM, New York, NY, USA, 1–10. https://doi.org/10.1145/3339252.3342112
[31]
Muhammad Shahzad, Muhammad Zubair Shafiq, and Alex X. Liu. 2012. A large scale exploratory analysis of software vulnerability life cycles. In Proceedings - International Conference on Software Engineering. IEEE, 771–781. https://doi.org/10.1109/ICSE.2012.6227141
[32]
Andrew Smith and Miles Osborne. 2006. Using gazetteers in discriminative information extraction. Proceedings of the Tenth Conference on Computational Natural Language Learning, CoNLL-XJune(2006), 133–140. https://doi.org/10.3115/1596276.1596302
[33]
J.�M. Spring, E. Hatleback, A. Householder, A. Manion, and D. Shick. 2018. Towards Improving CVSS.
[34]
Jason Wei and Kai Zou. 2019. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. arXiv:1901.11196 [cs] (Aug. 2019), 9�pages. http://arxiv.org/abs/1901.11196 arXiv:1901.11196.
[35]
Wei You, Peiyuan Zong, Kai Chen, Xiao Feng Wang, Xiaojing Liao, Pan Bian, and Bin Liang. 2017. SemFuzz: Semantics-based automatic generation of proof-of-concept exploits. Proceedings of the ACM Conference on Computer and Communications Security (2017), 2139–2154. https://doi.org/10.1145/3133956.3134085
[36]
Su Zhang, Xinming Ou, and Doina Caragea. 2015. Predicting Cyber Risks through National Vulnerability Database. Information Security Journal 24, 4-6 (2015), 194–206. https://doi.org/10.1080/19393555.2015.1111961

Cited By

View all
  • (2024)CySecBERT: A Domain-Adapted Language Model for the Cybersecurity DomainACM Transactions on Privacy and Security10.1145/365259427:2(1-20)Online publication date: 8-Apr-2024
  • (2024)Shedding Light on CVSS Scoring Inconsistencies: A User-Centric Study on Evaluating Widespread Security Vulnerabilities2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00058(1102-1121)Online publication date: 19-May-2024
  • (2024)A Study of Fine-Tuned Language Models in Vulnerability Classification2024 12th International Symposium on Digital Forensics and Security (ISDFS)10.1109/ISDFS60797.2024.10527294(1-6)Online publication date: 29-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ARES '21: Proceedings of the 16th International Conference on Availability, Reliability and Security
August 2021
1447 pages
ISBN:9781450390514
DOI:10.1145/3465481
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CVSS
  2. Deep-Learning
  3. Information Quality
  4. NVD
  5. Security

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ARES 2021

Acceptance Rates

Overall Acceptance Rate 228 of 451 submissions, 51%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)7
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CySecBERT: A Domain-Adapted Language Model for the Cybersecurity DomainACM Transactions on Privacy and Security10.1145/365259427:2(1-20)Online publication date: 8-Apr-2024
  • (2024)Shedding Light on CVSS Scoring Inconsistencies: A User-Centric Study on Evaluating Widespread Security Vulnerabilities2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00058(1102-1121)Online publication date: 19-May-2024
  • (2024)A Study of Fine-Tuned Language Models in Vulnerability Classification2024 12th International Symposium on Digital Forensics and Security (ISDFS)10.1109/ISDFS60797.2024.10527294(1-6)Online publication date: 29-Apr-2024
  • (2024)Navigating the Shadows: Manual and Semi-Automated Evaluation of the Dark Web for Cyber Threat IntelligenceIEEE Access10.1109/ACCESS.2024.344824712(118903-118922)Online publication date: 2024
  • (2023)ExTRUST: Reducing Exploit Stockpiles With a Privacy-Preserving Depletion System for Inter-State RelationshipsIEEE Transactions on Technology and Society10.1109/TTS.2023.32803564:2(158-170)Online publication date: Jun-2023
  • (2023)The anatomy of a vulnerability databaseJournal of Systems and Software10.1016/j.jss.2023.111679201:COnline publication date: 1-Jul-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media