skip to main content
10.5555/1387709.1387716guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Exploiting machine learning to subvert your spam filter

Published: 15 April 2008 Publication History

Abstract

Using statistical machine learning for making security decisions introduces new vulnerabilities in large scale systems. This paper shows how an adversary can exploit statistical machine learning, as used in the SpamBayes spam filter, to render it useless--even if the adversary's access is limited to only 1% of the training messages. We further demonstrate a new class of focused attacks that successfully prevent victims from receiving specific email messages. Finally, we introduce two new types of defenses against these attacks.

References

[1]
Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph, and J. D. Tygar. Can machine learning be secure? In Proceedings of the ACM Symposium on InformAtion, Computer, and Communications Security (ASIACCS'06), March 2006.
[2]
Simon P. Chung and Aloysius K. Mok. Allergy attack against automatic signature generation. In Recent Advances in Intrusion Detection (RAID), pages 61-80, 2006.
[3]
Simon P. Chung and Aloysius K. Mok. Advanced allergy attacks: Does a corpus really help? In Recent Advances in Intrusion Detection (RAID), pages 236-255, 2007.
[4]
Gordon Cormack and Thomas Lynam. Spam corpus creation for TREC. In Proceedings of the Second Conference on Email and Anti-Spam (CEAS 2005), July 2005.
[5]
Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, and Deepak Verma. Adversarial classification. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 99-108, Seattle, WA, 2004. ACM Press.
[6]
Ronald A. Fisher. Question 14: Combining independent tests of significance. American Statistician, 2(5):30-30J, 1948.
[7]
Paul Graham. A plan for spam. http://www. paulgraham.com/spam.html, August 2002.
[8]
Christoph Karlberger, G�nther Bayler, Christopher Kruegel, and Engin Kirda. Exploiting redundancy in natural language to penetrate Bayesian spam filters. In WOOT'07: Proceedings of the first conference on First USENIX Workshop on Offensive Technologies, 2007.
[9]
Michael Kearns and Ming Li. Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4):807-837, 1993.
[10]
Hyang-Ah Kim and Brad Karp. Autograph: Toward automated, distributed worm signature detection. In USENIX Security Symposium, August 2004.
[11]
Bryan Klimt and Yiming Yang. Introducing the Enron corpus. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), July 2004.
[12]
Daniel Lowd and Christopher Meek. Adversarial learning. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 641-647, 2005.
[13]
Daniel Lowd and Christopher Meek. Good word attacks on statistical spam filters. In Proceedings of the Second Conference on Email and Anti-Spam (CEAS), 2005.
[14]
Tony Meyer and Brendon Whateley. SpamBayes: Effective open-source, Bayesian based, email classification system. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), July 2004.
[15]
James Newsome, Brad Karp, and Dawn Song. Polygraph: Automatically generating signatures for polymorphic worms. In Proceedings of the IEEE Symposium on Security and Privacy, pages 226-241, May 2005.
[16]
James Newsome, Brad Karp, and Dawn Song. Paragraph: Thwarting signature learning by training maliciously. In Proceedings of the 9th International Symposium on Recent Advances in Intrusion Detection (RAID 2006), September 2006.
[17]
Gary Robinson. A statistical approach to the spam problem. Linux Journal, March 2003.
[18]
Cyrus Shaoul and Chris Westbury. A USENET corpus (2005-2007), October 2007. http: //www.psych.ualberta.ca/~westburylab/ downloads/usenetcorpus.download.html.
[19]
Gregory L. Wittel and S. Felix Wu. On attacking statistical spam filters. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.

Cited By

View all
  • (2024)Machine Learning Security Against Data Poisoning: Are We There Yet?Computer10.1109/MC.2023.329957257:3(26-34)Online publication date: 1-Mar-2024
  • (2024)An Overview of Techniques for Obfuscated Android Malware DetectionSN Computer Science10.1007/s42979-024-02637-35:4Online publication date: 16-Mar-2024
  • (2023)What distributions are robust to indiscriminate poisoning attacks for linear learners?Proceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667641(34942-34980)Online publication date: 10-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
LEET'08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats
April 2008
96 pages

Publisher

USENIX Association

United States

Publication History

Published: 15 April 2008

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Machine Learning Security Against Data Poisoning: Are We There Yet?Computer10.1109/MC.2023.329957257:3(26-34)Online publication date: 1-Mar-2024
  • (2024)An Overview of Techniques for Obfuscated Android Malware DetectionSN Computer Science10.1007/s42979-024-02637-35:4Online publication date: 16-Mar-2024
  • (2023)What distributions are robust to indiscriminate poisoning attacks for linear learners?Proceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667641(34942-34980)Online publication date: 10-Dec-2023
  • (2023)Uncovering adversarial risks of test-time adaptationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619967(37456-37495)Online publication date: 23-Jul-2023
  • (2023)Poisoning Network Flow ClassifiersProceedings of the 39th Annual Computer Security Applications Conference10.1145/3627106.3627123(337-351)Online publication date: 4-Dec-2023
  • (2023)Classification Auto-Encoder Based Detector Against Diverse Data Poisoning AttacksData and Applications Security and Privacy XXXVII10.1007/978-3-031-37586-6_16(263-281)Online publication date: 19-Jul-2023
  • (2022)Robust Learning with Adversarial Perturbations and Label NoiseProceedings of the 4th ACM International Conference on Multimedia in Asia10.1145/3551626.3564934(1-7)Online publication date: 13-Dec-2022
  • (2022)Practical Attacks on Machine Learning: A Case Study on Adversarial Windows MalwareIEEE Security and Privacy10.1109/MSEC.2022.318235620:5(77-85)Online publication date: 1-Sep-2022
  • (2022)Poisoning Attacks Against Machine Learning: Can Machine Learning Be Trustworthy?Computer10.1109/MC.2022.319078755:11(94-99)Online publication date: 1-Nov-2022
  • (2022)Stronger data poisoning attacks break data sanitization defensesMachine Language10.1007/s10994-021-06119-y111:1(1-47)Online publication date: 1-Jan-2022
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media