short-paper

From CIC-IDS2017 to LYCOS-IDS2017: A corrected dataset for better performance

Authors:

Florent CARLIER,

Elo�se CHEVAL,

Pascal LEROUXAuthors Info & Claims

WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

Pages 570 - 575

https://doi.org/10.1145/3486622.3493973

Published: 13 April 2022 Publication History

Abstract

As connected objects become the standard for quality of life, network intrusion detection is getting more critical than ever. Over the past decades, various datasets have been developed to address this security challenge. Analysis of earlier datasets, such as KDD-Cup99 and NSL-KDD, highlighted some of the issues, leading the way for newer datasets that have corrected the identified problems. CIC-IDS2017, one of the newest network intrusion detection datasets, has become a popular choice. Its advantage is the availability of raw data in PCAP files as well as flow-based features in CSV files.

In this paper, a detailed analysis of this dataset is performed and we report several problems discovered in the flows retrieved from the network packets. To overcome these problems, a new feature extraction tool named LycoSTand is suggested. In addition, a feature selection is proposed considering correlations and feature importance. The performance comparison between the original and the new dataset shows significant improvements for all evaluated machine learning algorithms.

Based on the improvements in CIC-IDS2017, we also examine other datasets affected by the same issues on which LycoSTand can be used to produce improved datasets for network intrusion detection.

References

[1]

[n.d.]. CicFlowMeter, A Network Traffic Biflow Generator and Analyzer (Formerly ISCXFlowMeter). https://www.unb.ca/cic/research/applications.html(Accessed Oct 29, 2021).

[2]

[n.d.]. Intrusion Detection Evaluation Dataset (CICIDS2017). https://www.unb.ca/cic/datasets/ids-2017.html (Accessed Oct 29, 2021).

[3]

Sunanda Gamage and Jagath Samarabandu. 2020. Deep learning methods in network intrusion detection: A survey and an objective comparison. Journal of Network and Computer Applications 169 (2020), 102767. https://doi.org/10.1016/j.jnca.2020.102767

[4]

A. Gharib, I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani. 2016. An Evaluation Framework for Intrusion Detection Dataset. In International Conference on Information Science and Security (ICISS). 1–6. https://doi.org/10.1109/ICISSEC.2016.7885840

[5]

Mahshid Gohari, Sattar Hashemi, and Lida Abdi. 2021. Android Malware Detection and Classification Based on Network Traffic Using Deep Learning. In 2021 7th International Conference on Web Research (ICWR). 71–77. https://doi.org/10.1109/ICWR51868.2021.9443025

[6]

Baptiste Gregorutti, Bertrand Michel, and Philippe Saint-Pierre. 2017. Correlation and variable importance in random forests. Statistics and Computing 27, 3 (01 May 2017), 659–678. https://doi.org/10.1007/s11222-016-9646-1

Digital Library

[7]

Samson Ho, Saleh Al Jufout, Khalil Dajani, and Mohammad Mozumdar. 2021. A Novel Intrusion Detection Model for Detecting Known and Innovative Cyberattacks Using Convolutional Neural Network. IEEE Open Journal of the Computer Society 2 (2021), 14–25. https://doi.org/10.1109/OJCS.2021.3050917

[8]

A. H. Lashkari, A. F. A. Kadir, L. Taheri, and A. A. Ghorbani. 2018. Toward Developing a Systematic Approach to Generate Benchmark Android Malware Datasets and Classification. In 2018 International Carnahan Conference on Security Technology (ICCST). 1–7. https://doi.org/10.1109/CCST.2018.8585560

[9]

Ziadoon Kamil Maseer, Robiah Yusof, Nazrulazhar Bahaman, Salama A. Mostafa, and Cik Feresa Mohd Foozy. 2021. Benchmarking of Machine Learning for Anomaly Based Intrusion Detection Systems in the CICIDS2017 Dataset. IEEE Access 9(2021), 22351–22370. https://doi.org/10.1109/ACCESS.2021.3056614

[10]

John McHugh. 2000. Testing Intrusion Detection Systems: A Critique of the 1998 and 1999 DARPA Intrusion Detection System Evaluations As Performed by Lincoln Laboratory. ACM Trans. Inf. Syst. Secur. 3, 4 (Nov. 2000), 262–294. https://doi.org/10.1145/382912.382923

Digital Library

[11]

N. Moustafa and J. Slay. 2015. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set). In Military Communications and Information Systems Conference (MilCIS). 1–6. https://doi.org/10.1109/MilCIS.2015.7348942

[12]

Ranjit Panigrahi and Samarjeet Borah. 2018. A detailed analysis of CICIDS2017 dataset for designing Intrusion Detection Systems. International Journal of Engineering & Technology 7, 3.24(2018), 479–482. https://doi.org/10.14419/ijet.v7i3.24.22797

[13]

Arnaud Rosay, Florent Carlier, and Pascal Leroux. 2020. MLP4NIDS: An Efficient MLP-Based Network Intrusion Detection for CICIDS2017 Dataset. In Machine Learning for Networking, Selma Boumerdassi, Éric Renault, and Paul Mühlethaler (Eds.). Springer International Publishing, 240–254. https://doi.org/10.1007/978-3-030-45778-5_16

Digital Library

[14]

Arnaud Rosay, Kévin Riou, Florent Carlier, and Pascal Leroux. 2021. Multi-layer perceptron for network intrusion detection. Annals of Telecommunications - annales des télécommunications (May 2021). https://doi.org/10.1007/s12243-021-00852-0

[15]

Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani. 2018. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP), Vol. 1. SciTePress, 108–116. https://doi.org/10.5220/0006639801080116

[16]

I. Sharafaldin, A. H. Lashkari, S. Hakak, and A. A. Ghorbani. 2019. Developing Realistic Distributed Denial of Service (DDoS) Attack Dataset and Taxonomy. In International Carnahan Conference on Security Technology (ICCST). 1–8. https://doi.org/10.1109/CCST.2019.8888419

[17]

Anna Sperotto, Ramin Sadre, Frank van Vliet, and Aiko Pras. 2009. A Labeled Data Set for Flow-Based Intrusion Detection. In IP Operations and Management, Giorgio Nunzi, Caterina Scoglio, and Xing Li (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 39–50. https://doi.org/10.1007/978-3-642-04968-2_4

Digital Library

[18]

L. Taheri, A. F. A. Kadir, and A. H. Lashkari. 2019. Extensible Android Malware Detection and Family Classification Using Network-Flows and API-Calls. In International Carnahan Conference on Security Technology (ICCST). 1–8. https://doi.org/10.1109/CCST.2019.8888430

[19]

M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani. 2009. A Detailed Analysis of the KDD CUP 99 Data Set. In IEEE Symposium on Computational Intelligence for Security and Defense Applications. 1–6. https://doi.org/10.1109/CISDA.2009.5356528

[20]

Imtiaz Ullah and Qusay H. Mahmoud. 2020. A Scheme for Generating a Dataset for Anomalous Activity Detection in IoT Networks. In Advances in Artificial Intelligence, Cyril Goutteand Xiaodan Zhu (Eds.). Springer International Publishing, Cham, 508–520.

[21]

BP Welford. 1962. Note on a method for calculating corrected sums of squares and products. Technometrics 4, 3 (1962), 419–420.

Cited By

Volkov ASobko SSviridov IBaskakov AFride D(2024)Development of a Neural Network Module for Detecting Network Threats in Traffic Based on Convolutional and Recurrent Neural Networks2024 Conference of Young Researchers in Electrical and Electronic Engineering (ElCon)10.1109/ElCon61730.2024.10468509(28-31)Online publication date: 29-Jan-2024
https://doi.org/10.1109/ElCon61730.2024.10468509
Prakash JMurali LManikandan NNagaprasad NRamaswamy K(2024)A vehicular network based intelligent transport system for smart cities using machine learning algorithmsScientific Reports10.1038/s41598-023-50906-714:1Online publication date: 3-Jan-2024
https://doi.org/10.1038/s41598-023-50906-7
Catillo MPecchia AVillano U(2023)Machine Learning on Public Intrusion Datasets: Academic Hype or Concrete Advances in NIDS?2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S)10.1109/DSN-S58398.2023.00038(132-136)Online publication date: Jun-2023
https://doi.org/10.1109/DSN-S58398.2023.00038
Show More Cited By

Recommendations

Study of Network IDS in IoT devices
Abstract
As connected objects become the norm for quality of life, network intrusion detection is more critical than ever. Over the past decades, several different datasets have been developed to tackle this security challenge. Among them, CIC-IDS2017, one ...
Methods for Network Intrusion Detection: Evaluating Rule-Based Methods and Machine Learning Models on the Cic-IDS2017 Dataset
A human morning routine dataset
AAMAS '14: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems

To be able to evaluate and compare the quality of different approaches in research, general and publicly available datasets are needed. While in some areas, there exists a variety of such datasets that are constantly used by researchers, in the area of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

December 2021

698 pages

ISBN:9781450391153

DOI:10.1145/3486622

Copyright � 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 April 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

WI-IAT '21

Sponsor:

SIGAI

WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence

December 14 - 17, 2021

VIC, Melbourne, Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
284
Total Downloads

Downloads (Last 12 months)130
Downloads (Last 6 weeks)15

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Volkov ASobko SSviridov IBaskakov AFride D(2024)Development of a Neural Network Module for Detecting Network Threats in Traffic Based on Convolutional and Recurrent Neural Networks2024 Conference of Young Researchers in Electrical and Electronic Engineering (ElCon)10.1109/ElCon61730.2024.10468509(28-31)Online publication date: 29-Jan-2024
https://doi.org/10.1109/ElCon61730.2024.10468509
Prakash JMurali LManikandan NNagaprasad NRamaswamy K(2024)A vehicular network based intelligent transport system for smart cities using machine learning algorithmsScientific Reports10.1038/s41598-023-50906-714:1Online publication date: 3-Jan-2024
https://doi.org/10.1038/s41598-023-50906-7
Catillo MPecchia AVillano U(2023)Machine Learning on Public Intrusion Datasets: Academic Hype or Concrete Advances in NIDS?2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S)10.1109/DSN-S58398.2023.00038(132-136)Online publication date: Jun-2023
https://doi.org/10.1109/DSN-S58398.2023.00038
Oikonomou CIliopoulos IIoannidis DTzovaras D(2023)A Multi-Class Intrusion Detection System Based on Continual Learning2023 IEEE International Conference on Cyber Security and Resilience (CSR)10.1109/CSR57506.2023.10224974(86-91)Online publication date: 31-Jul-2023
https://doi.org/10.1109/CSR57506.2023.10224974
Owusu ERahouti MHsu DXiong KXin Y(2023)Enhancing ML-Based DoS Attack Detection Through Combinatorial Fusion Analysis2023 IEEE Conference on Communications and Network Security (CNS)10.1109/CNS59707.2023.10288981(1-6)Online publication date: 2-Oct-2023
https://doi.org/10.1109/CNS59707.2023.10288981
Zhao SSantana LOwusu ERahouti MXiong KXin Y(2023)Enhancing ML-Based DoS Attack Detection with Feature Engineering : IEEE CNS 23 Poster2023 IEEE Conference on Communications and Network Security (CNS)10.1109/CNS59707.2023.10288689(1-2)Online publication date: 2-Oct-2023
https://doi.org/10.1109/CNS59707.2023.10288689
Catillo MPecchia AVillano U(2023)Successful intrusion detection with a single deep autoencoder: theory and practiceSoftware Quality Journal10.1007/s11219-023-09636-232:1(95-123)Online publication date: 25-May-2023
https://dl.acm.org/doi/10.1007/s11219-023-09636-2

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents