Resources for Turkish morphological processing

Sak, Haşim; Güngör, Tunga; Saraçlar, Murat

doi:10.1007/s10579-010-9128-6

Resources for Turkish morphological processing

Published: 10 August 2010

Volume 45, pages 249–261, (2011)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Haşim Sak¹,
Tunga Güngör¹ &
Murat Saraçlar²

781 Accesses
31 Citations
Explore all metrics

Abstract

We present a set of language resources and tools—a morphological parser, a morphological disambiguator, and a text corpus—for exploiting Turkish morphology in natural language processing applications. The morphological parser is a state-of-the-art finite-state transducer-based implementation of Turkish morphology. The disambiguator is based on the averaged perceptron algorithm and has the best accuracy reported for Turkish in the literature. The text corpus has been compiled from the web and contains about 500 million tokens. This is the largest Turkish web corpus published.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Personal communication.
All resources are available at http://www.cmpe.boun.edu.tr/~hasim.

References

Allauzen, C., Riley, M., Schalkwyk, J., Skut, W., & Mohri, M. (2007). OpenFst: A general and efficient weighted finite-state transducer library. In CIAA, pp. 11–23.
Anderson, S. (1992). A-Morphous morphology. Cambridge: Cambridge University Press.
Google Scholar
Antworth, E. L. (1990). PC-KIMMO: A two-level processor for morphological analysis. In Occasional Publications in Academic Computing.
Aronoff, M. (1993). Morphology by itself: Stems and inflectional classes. Cambridge: MIT Press.
Google Scholar
Bozşahin, C. (2002). The combinatory morphemic lexicon. Computational Linguistics, 28(2), 145–186.
Article Google Scholar
Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In EMNLP.
Collins, M., & Duffy, N. (2002). New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In ACL, pp. 263–270.
Ezeiza, N., Alegria, I., Arriola, J. M., Urizar, R., & Aduriz, I. (1998). Combining stochastic and rule-based methods for disambiguation in agglutinative languages. In COLING-ACL.
Freund, Y., & Schapire, R. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277–296.
Article Google Scholar
Göksel, A., & Kerslake, C. (2005). Turkish: A comprehensive grammar. London: Routledge.
Book Google Scholar
Güngör, T. (1995). Computer processing of Turkish: Morphological and lexical investigation. Ph.D. thesis, Boğaziçi University.
Hajic, J., & Hladká, B. (1998). Tagging inflective languages: Prediction of morphological categories for a rich, structured tagset. In COLING-ACL, pp. 483–490.
Hakkani-Tür, D. Z., Oflazer, K., & Tür, G. (2002). Statistical morphological disambiguation for agglutinative languages. Computers and the Humanities, 36(4).
Halle, M., & Marantz, A. (1993). Distributed morphology and the pieces of inflection. In The View from Building 20 (pp 111–176). Cambridge: MIT Press.
Kaplan, R. M., & Kay, M. (1994). Regular models of phonological rule systems. Computational Linguistics, 20(3), 331–378.
Google Scholar
Karttunen, L., & Beesley, K. R. (1992). Two-level rule compiler, Technical report. Palo Alto, CA: Xerox Palo Alto Research Center.
Karttunen, L., Koskenniemi, K., & Kaplan, R. M. (1987). A compiler for two-level phonological rules. In Tools for morphological analysis. Palo Alto, CA: Center for the Study of Language and Information, Stanford University.
Karttunen, L., Kaplan, R. M., & Zaenen, A. (1992). Two-level morphology with composition. In COLING, 141–148.
Kilgarriff, A., & Grefenstette, G. (2003). Introduction to the special issue on the web as corpus. Computational Linguistics, 29(3), 333–348.
Article Google Scholar
Koskenniemi, K. (1984). A general computational model for word-form recognition and production. In ACL, pp. 178–181.
Lewis, G. (2001). Turkish grammar. Oxford: Oxford University Press.
Google Scholar
Liu, V., & Curran, J. R. (2006). Web text corpus for natural language processing. In EACL.
Megyesi, B. (1999). Improving Brill’s PoS tagger for an agglutinative language. In EMNLP/VLC.
Mohri, M. (1997). Finite-state transducers in language and speech processing. Computational Linguistics, 23(2), 269–311.
Google Scholar
Oflazer, K. (1994). Two-level description of Turkish morphology. Literary and Linguistic Computing, 9(2), 137–148.
Article Google Scholar
Oflazer, K., & Inkelas, S. (2006). The architecture and the implementation of a finite state pronunciation lexicon for Turkish. Computer Speech and Language, 20(1), 80–106.
Article Google Scholar
Oflazer, K., & Tür, G. (1996). Combining hand-crafted rules and unsupervised learning in constraint-based morphological disambiguation. In EMNLP, (pp. 69–81). Somerset, NJ: ACL.
Oflazer, K., & Tür, G. (1997). Morphological disambiguation by voting constraints. In ACL, (pp. 222–229).
Oflazer, K., Say, B., Hakkani-Tür, D. Z., & Tür, G. (2003). Building a Turkish treebank. In Building and exploiting syntactically-annotated corpora. Dordrecht: Kluwer.
Öztaner, S. M. (1996). A word grammar of Turkish with morphophonemic rules. Master’s thesis, Middle East Technical University.
Sak, H., Güngör, T., & Saraçlar, M. (2007). Morphological disambiguation of Turkish text with perceptron algorithm. In CICLing 2007, (vol. LNCS 4394, pp. 107–118).
Sak, H., Güngör, T., & Saraçlar, M. (2009). A stochastic finite-state morphological parser for Turkish. In ACL-IJCNLP 2009, (pp. 273–276).
Salor, Ö., Pellom, B. L., Ciloglu, T., Hacioglu, K., & Demirekler, M. (2002). On developing new text and audio corpora and speech recognition tools for the Turkish language. In ICSLP.
Say, B., Zeyrek, D., Oflazer, K., & Özge, U. (2002). Development of a corpus and a treebank for present-day written Turkish. In Proceedings of the eleventh international conference of Turkish linguistics.
Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269.
Article Google Scholar
Yüret, D., & Türe, F. (2006). Learning morphological disambiguation rules for Turkish. In HLT-NAACL.

Download references

Acknowledgments

This work was supported by the Boğaziçi University Research Fund under the grant numbers 06A102 and 08M103, the Scientific and Technological Research Council of Turkey (TÜB$\dot{\hbox{I}}$TAK) under the grant number 107E261, the Turkish State Planning Organization (DPT) under the TAM Project number 2007K120610. Murat Saraçlar is supported by the TUBA-GEBIP award. Haşim Sak is supported by TÜB$\dot{\hbox{I}}$TAK B$\dot{\hbox{I}}$DEB 2211. The authors would like to thank to Kemal Oflazer and Deniz Yüret for the disambiguation data set.

Author information

Authors and Affiliations

Department of Computer Engineering, Boğaziçi University, Bebek, 34342, Istanbul, Turkey
Haşim Sak & Tunga Güngör
Department of Electrical & Electronic Engineering, Boğaziçi University, Bebek, 34342, Istanbul, Turkey
Murat Saraçlar

Authors

Haşim Sak
View author publications
You can also search for this author in PubMed Google Scholar
Tunga Güngör
View author publications
You can also search for this author in PubMed Google Scholar
Murat Saraçlar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haşim Sak.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sak, H., Güngör, T. & Saraçlar, M. Resources for Turkish morphological processing. Lang Resources & Evaluation 45, 249–261 (2011). https://doi.org/10.1007/s10579-010-9128-6

Download citation

Published: 10 August 2010
Issue Date: May 2011
DOI: https://doi.org/10.1007/s10579-010-9128-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Resources for Turkish morphological processing

Abstract

Access this article

Subscribe and save

Buy Now

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation