skip to main content
article
Free access

The Locally Weighted Bag of Words Framework for Document Representation

Published: 01 December 2007 Publication History

Abstract

The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present an effective sequential document representation that goes beyond the bag of words representation and its n-gram extensions. This representation uses local smoothing to embed documents as smooth curves in the multinomial simplex thereby preserving valuable sequential information. In contrast to bag of words or n-grams, the new representation is able to robustly capture medium and long range sequential trends in the document. We discuss the representation and its geometric properties and demonstrate its applicability for various text processing tasks.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

Publisher

JMLR.org

Publication History

Published: 01 December 2007
Published in�JMLR�Volume 8

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)9
Reflects downloads up to 18 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Bag of states: a non-sequential approach to video-based engagement measurementMultimedia Systems10.1007/s00530-023-01244-130:1Online publication date: 28-Jan-2024
  • (2021)Minimalist Fitted Bayesian Classifier-Based on Likelihood Estimations and Bag-of-WordsLinking Theory and Practice of Digital Libraries10.1007/978-3-030-86324-1_2(17-28)Online publication date: 13-Sep-2021
  • (2020)Sparsity-regularized feature selection for multi-class remote sensing image classificationNeural Computing and Applications10.1007/s00521-019-04046-732:11(6513-6521)Online publication date: 1-Jun-2020
  • (2020)Remodularization Analysis for Microservice Discovery Using Syntactic and Semantic ClusteringAdvanced Information Systems Engineering10.1007/978-3-030-49435-3_1(3-19)Online publication date: 8-Jun-2020
  • (2018)Novel artificial bee colony based feature selection method for filtering redundant informationApplied Intelligence10.1007/s10489-017-1010-448:4(868-885)Online publication date: 1-Apr-2018
  • (2014)Automatic classification of documents in a natural languageAutomatic Documentation and Mathematical Linguistics10.3103/S000510551403003048:3(158-166)Online publication date: 1-May-2014
  • (2013)Persistent homologyProceedings of the Twenty-Third international joint conference on Artificial Intelligence10.5555/2540128.2540408(1953-1959)Online publication date: 3-Aug-2013
  • (2013)Multimodal late fusion bag of features applied to scene detectionProceedings of the 19th Brazilian symposium on Multimedia and the web10.1145/2526188.2526202(15-22)Online publication date: 5-Nov-2013
  • (2013)Measuring Group Cohesion in Document CollectionsProceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 0110.1109/WI-IAT.2013.53(373-380)Online publication date: 17-Nov-2013
  • (2012)Text document clustering using global term context vectorsKnowledge and Information Systems10.5555/3225631.322574931:3(455-474)Online publication date: 1-Jun-2012
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media