Automatic Detection of Idiomatic Clauses

Feldman, Anna; Peng, Jing

doi:10.1007/978-3-642-37247-6_35

Anna Feldman^17,18 &
Jing Peng¹⁷�

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7816))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2298 Accesses
11 Citations

Abstract

We describe several experiments whose goal is to automatically identify idiomatic expressions in written text. We explore two approaches for the task: 1) idiom recognition as outlier detection; and 2) supervised classification of sentences. We apply principal component analysis for outlier detection. Detecting idioms as lexical outliers does not exploit class label information. So, in the following experiments, we use linear discriminant analysis to obtain a discriminant subspace and later use the three nearest neighbor classifier to obtain accuracy. We discuss pros and cons of each approach. All the approaches are more general than the previous algorithms for idiom detection – neither do they rely on target idiom types, lexicons, or large manually annotated corpora, nor do they limit the search space by a particular type of linguistic construction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Text Classification Using Novel “Anti-Bayesian” Techniques

An unsupervised method for identifying loanwords in Korean

Article 11 February 2015

Text Classification Using “Anti”-Bayesian Quantile Statistics-Based Classifiers

References

Birke, J., Sarkar, A.: A clustering approach to the nearly unsupervised recognition of nonliteral language. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, pp. 329–336 (2006)
Google Scholar
Burnard, L.: The British National Corpus Users Reference Guide. Oxford University Computing Services (2000)
Google Scholar
Cacciari, C.: The Place of Idioms in a Literal and Metaphorical World. In: Cacciari, C., Tabossi, P. (eds.) Idioms: Processing, Structure, and Interpretation, pp. 27–53. Lawrence Erlbaum Associates (1993)
Google Scholar
Carletta, J.: Assessing Agreement on Classification Tasks: The Kappa Statistic. Computational Linguistics 22(2), 249–254 (1996)
Google Scholar
Cilibrasi, R., Vitányi, P.M.B.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)
Article Google Scholar
Cohen, J.: A Coefficient of Agreement for Nominal Scales. Education and Psychological Measurement (20), 37–46 (1960)
Google Scholar
Cook, P., Fazly, A., Stevenson, S.: The VNC-Tokens Dataset. In: Proceedings of the LREC Workshop: Towards a Shared Task for Multiword Expressions (MWE 2008), Marrakech, Morocco (June 2008)
Google Scholar
Cowie, A.P., Mackin, R., McCaig, I.R.: Oxford Dictionary of Current Idiomatic English, vol. 2. Oxford University Press (1983)
Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)
Book Google Scholar
Degand, L., Bestgen, Y.: Towards Automatic Retrieval of Idioms in French Newspaper Corpora. Literary and Linguistic Computing 18(3), 249–259 (2003)
Article Google Scholar
Fazly, A., Cook, P., Stevenson, S.: Unsupervised Type and Token Identification of Idiomatic Expressions. Computational Linguistics 35(1), 61–103 (2009)
Article Google Scholar
Fellbaum, C.: The Ontological Loneliness of Idioms. In: Schalley, A., Zaefferer, D. (eds.) Ontolinguistics. Mouton de Gruyter (2007)
Google Scholar
Fellbaum, C., Geyken, A., Herold, A., Koerner, F., Neumann, G.: Corpus-based Studies of German Idioms and Light Verbs. International Journal of Lexicography 19(4), 349–360 (2006)
Article Google Scholar
Fukunaga, K.: Introduction to statistical pattern recognition. Academic Press (1990)
Google Scholar
Glucksberg, S.: Idiom Meanings and Allusional Content. In: Cacciari, C., Tabossi, P. (eds.) Idioms: Processing, Structure, and Interpretation, pp. 3–26. Lawrence Erlbaum Associates (1993)
Google Scholar
Jobson, J.: Applied Multivariate Data Analysis, vol. II: Categorical and Multivariate Methods. Springer (1992)
Google Scholar
Jolliffe, I.: Principal Component Analysis. Springer, New York (1986)
Book Google Scholar
Katz, G., Giesbrecht, E.: Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In: Proceedings of the ACL 2006 Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney, Australia, pp. 12–19 (2006)
Google Scholar
Kendall, M., Stuart, A., Ord, J.: Kendall’s Advanced Theory of Statistics, vol. 1: Distribution Theory. John Wiley and Sons (2009)
Google Scholar
Krzanowski, W.J.: Principles of Multivariate Analysis. Oxford University Press (2000)
Google Scholar
Li, L., Sporleder, C.: A Cohesion Graph Based Approach for Unsupervised Recognition of Literal and Non-literal Use of Multiword Expresssions. In: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing (ACL-IJCNLP), Singapore, pp. 75–83 (2009)
Google Scholar
Li, L., Sporleder, C.: Using Gaussian Mixture Models to Detect Figurative Language in Context. In: Proceedings of NAACL/HLT 2010 (2010)
Google Scholar
Nunberg, G., Sag, I.A., Wasow, T.: Idioms. Language 70(3), 491–538 (1994)
Google Scholar
Pado, S., Lapata, M.: Dependency-based construction of semantic space models. Computational Linguistics 33(2), 161–199 (2007)
Article MATH Google Scholar
Peng, J., Feldman, A., Street, L.: Computing linear discriminants for idiomatic sentence detection. Research in Computing Science, Special Issue: Natural Language Processing and its Applications 46, 17–28 (2010)
Google Scholar
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)
Chapter Google Scholar
Seaton, M., Macaulay, A. (eds.): Collins COBUILD Idioms Dictionary, 2nd edn. HarperCollins Publishers (2002)
Google Scholar
Shyu, M., Chen, S., Sarinnapakorn, K., Chang, L.: A novel anomaly detection scheme based on principal component classifier. In: Proceedings of IEEE International Conference on Data Mining (2003)
Google Scholar
Sporleder, C., Li, L.: Unsupervised Recognition of Literal and Non-literal Use of Idiomatic Expressions. In: EACL 2009: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 754–762. Association for Computational Linguistics, Morristown (2009)
Chapter Google Scholar
Villavicencio, A., Copestake, A., Waldron, B., Lambeau, F.: Lexical Encoding of MWEs. In: Proceedings of the Second ACL Workshop on Multiword Expressions: Integrating Processing, Barcelona, Spain, pp. 80–87 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Montclair State University, Montclair, NJ, 07043, USA
Anna Feldman & Jing Peng
Department of Linguistics, Montclair State University, Montclair, NJ, 07043, USA
Anna Feldman

Authors

Anna Feldman
View author publications
You can also search for this author in PubMed�Google Scholar
Jing Peng
View author publications
You can also search for this author in PubMed�Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico D.F., Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

� 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Feldman, A., Peng, J. (2013). Automatic Detection of Idiomatic Clauses. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-37247-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37246-9
Online ISBN: 978-3-642-37247-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Detection of Idiomatic Clauses

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Text Classification Using Novel “Anti-Bayesian” Techniques

An unsupervised method for identifying loanwords in Korean

Text Classification Using “Anti”-Bayesian Quantile Statistics-Based Classifiers

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Detection of Idiomatic Clauses

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Text Classification Using Novel “Anti-Bayesian” Techniques

An unsupervised method for identifying loanwords in Korean

Text Classification Using “Anti”-Bayesian Quantile Statistics-Based Classifiers

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation