skip to main content
10.1145/2184512.2184568acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
research-article

Identifying features to improve real time clustering and domain blacklisting

Published: 29 March 2012 Publication History

Abstract

Feature analysis is an important task in the area of information extraction. Appropriate features give improved performance for any classification or clustering algorithm. In this paper we try to analyze different features that can be used to cluster spam emails at real time and thus improve IP blacklisting. Domain blacklisting becomes easy when these features are used because masses of IP address get grouped easily. We have explored several features in this paper like sender and subject of the email; email attachments, stylistic and semantic features. These features ensure appropriate clustering of spam originating from dominant hosts. We compute the effectiveness of these features in terms of how well they group emails, gather domain/IP information and thus improve domain blacklisting.

References

[1]
Ramachandran, A., Feamster, N. and Dagon, D. Revealing botnet membership using DNSBL counter-intelligence. In Proc.of the second conference on Steps to Reducing Unwanted Traffic on the Internet. San Jose, CA. 2006.
[2]
Gu G., Perdisci R., Zhang J., and Lee W. BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection. In Proc. of the 17 th conference USENIX Security Symposium, Boston, MA. 2008, 139--154.
[3]
Guerra P., Pires D., Guedes D. Spam Miner: A Platform for Detecting and Characterizing Spam Campaigns. In Proc. of the 6th Conf. on Email and Anti-Spam, CEAS, 2008.
[4]
Pu, C., and Webb, S. Observed trends in spam construction techniques: A case study of spam evolution. The 3rd Conference on Email and Anti-Spam. Mountain View, CA. 2006.
[5]
Wei C., Sprague A., Warner G., Skjellum A. Identifying New Spam Domains by Hosting IPs: Improving Domain blacklisting. In Proc. of the 7th Conf. on Email and Anti-Spam, CEAS, 2010.
[6]
Pillay S., Solorio T. Authorship Attribution of Web Forum Posts, In Proc. of the eCrime Researchers' Summit, Dallas, 2010.
[7]
Salton G., McGill M. J. Introduction to modern information retrieval, McGraw-Hill
[8]
Aranganayagi, S., Thangavel, K. Clustering Categorical Data Using Silhouette Coefficient as a Relocating Measure. In Proc. of Conference on Computational Intelligence and Multimedia Applications, India, 2007
[9]
Wei C., Sprague A. P., Warner G., Skjellum A. Clustering spam domains and targeting spam origin for forensic analysis, J. Digital Forensics, Security, and Law, 5: 2010.
[10]
Klein D. and Manning C. D. Accurate Unlexicalized Parsing. In Proc of the 41st Meeting of the Association for Computational Linguistics, pp. 423--430, 2003.
[11]
Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., Witten I. H. The WEKA data mining software: An update, SIGKDD Explorations, Volume 11, Issue 1, 2009.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ACMSE '12: Proceedings of the 50th annual ACM Southeast Conference
March 2012
424 pages
ISBN:9781450312035
DOI:10.1145/2184512
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2012

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ACM SE '12
Sponsor:
ACM SE '12: ACM Southeast Regional Conference
March 29 - 31, 2012
Alabama, Tuscaloosa

Acceptance Rates

ACMSE '12 Paper Acceptance Rate 28 of 56 submissions, 50%;
Overall Acceptance Rate 502 of 1,023 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 127
    Total Downloads
  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media