skip to main content
10.1109/SP.2012.46guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

On the Feasibility of Internet-Scale Author Identification

Published: 20 May 2012 Publication History

Abstract

We study techniques for identifying an anonymous author via linguistic stylometry, i.e., comparing the writing style against a corpus of texts of known authorship. We experimentally demonstrate the effectiveness of our techniques with as many as 100,000 candidate authors. Given the increasing availability of writing samples online, our result has serious implications for anonymity and free speech--an anonymous blogger or whistleblower may be unmasked unless they take steps to obfuscate their writing style. While there is a huge body of literature on authorship recognition based on writing style, almost none of it has studied corpora of more than a few hundred authors. The problem becomes qualitatively different at a large scale, as we show, and techniques from prior work fail to scale, both in terms of accuracy and performance. We study a variety of classifiers, both "lazy" and "eager," and show how to handle the huge number of classes. We also develop novel techniques for confidence estimation of classifier outputs. Finally, we demonstrate stylometric authorship recognition on texts written in different contexts. In over 20% of cases, our classifiers can correctly identify an anonymous author given a corpus of texts from 100,000 authors; in about 35% of cases the correct author is one of the top 20 guesses. If we allow the classifier the option of not making a guess, via confidence estimation we are able to increase the precision of the top guess from 20% to over 80% with only a halving of recall.

Cited By

View all
  • (2024)Re-pseudonymization Strategies for Smart Meter Data Are Not Robust to Deep Learning Profiling AttacksProceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy10.1145/3626232.3653272(295-306)Online publication date: 19-Jun-2024
  • (2022)PRSONAProceedings of the 21st Workshop on Privacy in the Electronic Society10.1145/3559613.3563197(55-68)Online publication date: 7-Nov-2022
  • (2021)Exploiting Two-Level Information Entropy across Social Networks for User IdentificationWireless Communications & Mobile Computing10.1155/2021/10823912021Online publication date: 1-Jan-2021
  • Show More Cited By

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
SP '12: Proceedings of the 2012 IEEE Symposium on Security and Privacy
May 2012
600 pages
ISBN:9780769546810

Publisher

IEEE Computer Society

United States

Publication History

Published: 20 May 2012

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Re-pseudonymization Strategies for Smart Meter Data Are Not Robust to Deep Learning Profiling AttacksProceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy10.1145/3626232.3653272(295-306)Online publication date: 19-Jun-2024
  • (2022)PRSONAProceedings of the 21st Workshop on Privacy in the Electronic Society10.1145/3559613.3563197(55-68)Online publication date: 7-Nov-2022
  • (2021)Exploiting Two-Level Information Entropy across Social Networks for User IdentificationWireless Communications & Mobile Computing10.1155/2021/10823912021Online publication date: 1-Jan-2021
  • (2021)Similarity ranking using handcrafted stylometric traits in a swedish contextProceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1145/3487351.3492719(635-642)Online publication date: 8-Nov-2021
  • (2021)UrduAI: Writeprints for Urdu Authorship IdentificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/347646721:2(1-18)Online publication date: 31-Oct-2021
  • (2021)Pivoting Image-based Profiles Toward Privacy: Inhibiting Malicious Profiling with Adversarial AdditionsProceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization10.1145/3450613.3456832(267-273)Online publication date: 21-Jun-2021
  • (2020)StyloThai:ACM Transactions on Asian and Low-Resource Language Information Processing10.1145/336583219:3(1-15)Online publication date: 9-Jan-2020
  • (2020)‘Uh-oh Spaghetti-oh’: When Successful Genetic and Evolutionary Feature Selection Makes You More Susceptible to Adversarial Authorship Attacks2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC42975.2020.9283352(567-571)Online publication date: 11-Oct-2020
  • (2020)Author Identification of Micro-Messages via Multi-Channel Convolutional Neural Networks2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC42975.2020.9283214(675-681)Online publication date: 11-Oct-2020
  • (2019)Robust website fingerprinting through the cache occupancy channelProceedings of the 28th USENIX Conference on Security Symposium10.5555/3361338.3361383(639-656)Online publication date: 14-Aug-2019
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media