skip to main content
10.1145/3486622.3493973acmconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
short-paper

From CIC-IDS2017 to LYCOS-IDS2017: A corrected dataset for better performance

Published: 13 April 2022 Publication History

Abstract

As connected objects become the standard for quality of life, network intrusion detection is getting more critical than ever. Over the past decades, various datasets have been developed to address this security challenge. Analysis of earlier datasets, such as KDD-Cup99 and NSL-KDD, highlighted some of the issues, leading the way for newer datasets that have corrected the identified problems. CIC-IDS2017, one of the newest network intrusion detection datasets, has become a popular choice. Its advantage is the availability of raw data in PCAP files as well as flow-based features in CSV files.
In this paper, a detailed analysis of this dataset is performed and we report several problems discovered in the flows retrieved from the network packets. To overcome these problems, a new feature extraction tool named LycoSTand is suggested. In addition, a feature selection is proposed considering correlations and feature importance. The performance comparison between the original and the new dataset shows significant improvements for all evaluated machine learning algorithms.
Based on the improvements in CIC-IDS2017, we also examine other datasets affected by the same issues on which LycoSTand can be used to produce improved datasets for network intrusion detection.

References

[1]
[n.d.]. CicFlowMeter, A Network Traffic Biflow Generator and Analyzer (Formerly ISCXFlowMeter). https://www.unb.ca/cic/research/applications.html(Accessed Oct 29, 2021).
[2]
[n.d.]. Intrusion Detection Evaluation Dataset (CICIDS2017). https://www.unb.ca/cic/datasets/ids-2017.html (Accessed Oct 29, 2021).
[3]
Sunanda Gamage and Jagath Samarabandu. 2020. Deep learning methods in network intrusion detection: A survey and an objective comparison. Journal of Network and Computer Applications 169 (2020), 102767. https://doi.org/10.1016/j.jnca.2020.102767
[4]
A. Gharib, I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani. 2016. An Evaluation Framework for Intrusion Detection Dataset. In International Conference on Information Science and Security (ICISS). 1–6. https://doi.org/10.1109/ICISSEC.2016.7885840
[5]
Mahshid Gohari, Sattar Hashemi, and Lida Abdi. 2021. Android Malware Detection and Classification Based on Network Traffic Using Deep Learning. In 2021 7th International Conference on Web Research (ICWR). 71–77. https://doi.org/10.1109/ICWR51868.2021.9443025
[6]
Baptiste Gregorutti, Bertrand Michel, and Philippe Saint-Pierre. 2017. Correlation and variable importance in random forests. Statistics and Computing 27, 3 (01 May 2017), 659–678. https://doi.org/10.1007/s11222-016-9646-1
[7]
Samson Ho, Saleh Al Jufout, Khalil Dajani, and Mohammad Mozumdar. 2021. A Novel Intrusion Detection Model for Detecting Known and Innovative Cyberattacks Using Convolutional Neural Network. IEEE Open Journal of the Computer Society 2 (2021), 14–25. https://doi.org/10.1109/OJCS.2021.3050917
[8]
A. H. Lashkari, A. F. A. Kadir, L. Taheri, and A. A. Ghorbani. 2018. Toward Developing a Systematic Approach to Generate Benchmark Android Malware Datasets and Classification. In 2018 International Carnahan Conference on Security Technology (ICCST). 1–7. https://doi.org/10.1109/CCST.2018.8585560
[9]
Ziadoon Kamil Maseer, Robiah Yusof, Nazrulazhar Bahaman, Salama A. Mostafa, and Cik Feresa Mohd Foozy. 2021. Benchmarking of Machine Learning for Anomaly Based Intrusion Detection Systems in the CICIDS2017 Dataset. IEEE Access 9(2021), 22351–22370. https://doi.org/10.1109/ACCESS.2021.3056614
[10]
John McHugh. 2000. Testing Intrusion Detection Systems: A Critique of the 1998 and 1999 DARPA Intrusion Detection System Evaluations As Performed by Lincoln Laboratory. ACM Trans. Inf. Syst. Secur. 3, 4 (Nov. 2000), 262–294. https://doi.org/10.1145/382912.382923
[11]
N. Moustafa and J. Slay. 2015. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set). In Military Communications and Information Systems Conference (MilCIS). 1–6. https://doi.org/10.1109/MilCIS.2015.7348942
[12]
Ranjit Panigrahi and Samarjeet Borah. 2018. A detailed analysis of CICIDS2017 dataset for designing Intrusion Detection Systems. International Journal of Engineering & Technology 7, 3.24(2018), 479–482. https://doi.org/10.14419/ijet.v7i3.24.22797
[13]
Arnaud Rosay, Florent Carlier, and Pascal Leroux. 2020. MLP4NIDS: An Efficient MLP-Based Network Intrusion Detection for CICIDS2017 Dataset. In Machine Learning for Networking, Selma Boumerdassi, Éric Renault, and Paul Mühlethaler (Eds.). Springer International Publishing, 240–254. https://doi.org/10.1007/978-3-030-45778-5_16
[14]
Arnaud Rosay, Kévin Riou, Florent Carlier, and Pascal Leroux. 2021. Multi-layer perceptron for network intrusion detection. Annals of Telecommunications - annales des télécommunications (May 2021). https://doi.org/10.1007/s12243-021-00852-0
[15]
Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani. 2018. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP), Vol. 1. SciTePress, 108–116. https://doi.org/10.5220/0006639801080116
[16]
I. Sharafaldin, A. H. Lashkari, S. Hakak, and A. A. Ghorbani. 2019. Developing Realistic Distributed Denial of Service (DDoS) Attack Dataset and Taxonomy. In International Carnahan Conference on Security Technology (ICCST). 1–8. https://doi.org/10.1109/CCST.2019.8888419
[17]
Anna Sperotto, Ramin Sadre, Frank van Vliet, and Aiko Pras. 2009. A Labeled Data Set for Flow-Based Intrusion Detection. In IP Operations and Management, Giorgio Nunzi, Caterina Scoglio, and Xing Li (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 39–50. https://doi.org/10.1007/978-3-642-04968-2_4
[18]
L. Taheri, A. F. A. Kadir, and A. H. Lashkari. 2019. Extensible Android Malware Detection and Family Classification Using Network-Flows and API-Calls. In International Carnahan Conference on Security Technology (ICCST). 1–8. https://doi.org/10.1109/CCST.2019.8888430
[19]
M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani. 2009. A Detailed Analysis of the KDD CUP 99 Data Set. In IEEE Symposium on Computational Intelligence for Security and Defense Applications. 1–6. https://doi.org/10.1109/CISDA.2009.5356528
[20]
Imtiaz Ullah and Qusay H. Mahmoud. 2020. A Scheme for Generating a Dataset for Anomalous Activity Detection in IoT Networks. In Advances in Artificial Intelligence, Cyril Goutteand Xiaodan Zhu (Eds.). Springer International Publishing, Cham, 508–520.
[21]
BP Welford. 1962. Note on a method for calculating corrected sums of squares and products. Technometrics 4, 3 (1962), 419–420.

Cited By

View all
  • (2024)Development of a Neural Network Module for Detecting Network Threats in Traffic Based on Convolutional and Recurrent Neural Networks2024 Conference of Young Researchers in Electrical and Electronic Engineering (ElCon)10.1109/ElCon61730.2024.10468509(28-31)Online publication date: 29-Jan-2024
  • (2024)A vehicular network based intelligent transport system for smart cities using machine learning algorithmsScientific Reports10.1038/s41598-023-50906-714:1Online publication date: 3-Jan-2024
  • (2023)Machine Learning on Public Intrusion Datasets: Academic Hype or Concrete Advances in NIDS?2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S)10.1109/DSN-S58398.2023.00038(132-136)Online publication date: Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology
December 2021
698 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 April 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CIC-IDS2017
  2. CICFlowMeter
  3. LYCOS-IDS2017
  4. LycoSTand
  5. Machine Learning.
  6. Network Intrusion Detection
  7. datasets

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

WI-IAT '21
Sponsor:
WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence
December 14 - 17, 2021
VIC, Melbourne, Australia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)130
  • Downloads (Last 6 weeks)15
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Development of a Neural Network Module for Detecting Network Threats in Traffic Based on Convolutional and Recurrent Neural Networks2024 Conference of Young Researchers in Electrical and Electronic Engineering (ElCon)10.1109/ElCon61730.2024.10468509(28-31)Online publication date: 29-Jan-2024
  • (2024)A vehicular network based intelligent transport system for smart cities using machine learning algorithmsScientific Reports10.1038/s41598-023-50906-714:1Online publication date: 3-Jan-2024
  • (2023)Machine Learning on Public Intrusion Datasets: Academic Hype or Concrete Advances in NIDS?2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S)10.1109/DSN-S58398.2023.00038(132-136)Online publication date: Jun-2023
  • (2023)A Multi-Class Intrusion Detection System Based on Continual Learning2023 IEEE International Conference on Cyber Security and Resilience (CSR)10.1109/CSR57506.2023.10224974(86-91)Online publication date: 31-Jul-2023
  • (2023)Enhancing ML-Based DoS Attack Detection Through Combinatorial Fusion Analysis2023 IEEE Conference on Communications and Network Security (CNS)10.1109/CNS59707.2023.10288981(1-6)Online publication date: 2-Oct-2023
  • (2023)Enhancing ML-Based DoS Attack Detection with Feature Engineering : IEEE CNS 23 Poster2023 IEEE Conference on Communications and Network Security (CNS)10.1109/CNS59707.2023.10288689(1-2)Online publication date: 2-Oct-2023
  • (2023)Successful intrusion detection with a single deep autoencoder: theory and practiceSoftware Quality Journal10.1007/s11219-023-09636-232:1(95-123)Online publication date: 25-May-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media