research-article

A comparative study of parametric coding and wavelet coding based feature extraction techniques in recognizing spoken words

Authors:

S. David Peter,

K. Poulose JacobAuthors Info & Claims

CUBE '12: Proceedings of the CUBE International Information Technology Conference

Pages 326 - 331

https://doi.org/10.1145/2381716.2381777

Published: 03 September 2012 Publication History

Abstract

Speech recognition is a fascinating application of digital signal processing offering unparalleled opportunities. In this paper, a comparative study of different feature extraction techniques like Linear Predictive Coding (LPC), Discrete Wavelet Transforms (DWT) and Wavelet packet Decomposition (WPD) are employed for recognizing speaker independent spoken isolated words. Voice signals are sampled directly from the microphone and then they are processed using these three techniques for extracting the features. Words from Malayalam, one of the four major Dravidian languages of southern India are chosen for recognition. Training, testing and pattern recognition are performed using Artificial Neural Networks (ANN). This work includes three speech recognition methods. First one is a hybrid approach with LPC and ANN, second method uses a combination of DWT and ANN and the third one utilizes a combination of WPD and ANN. Back propagation method is used to train the ANN. The proposed method is implemented for 50 speakers uttering 20 isolated words each. All the three methods produce good recognition accuracy. LPC based method produced an accuracy of 81.20%, DWT gave an accuracy of 90% and WPD produced a recognition accuracy of 87.50%. Thus wavelet based methods are found to be more suitable for recognizing speech because of their multi-resolution characteristics and efficient time frequency localizations. Moreover, wavelet methods have a better capability to model the unvoiced sound details.

References

[1]

Lawrence R., 1997. Applications of Speech Recognition in the Area of Telecommunications, Proc. of IEEE Workshop on Automatic Speech Recognition and Understanding, (Dec. 17, 1997), 501--510.

[2]

recognition.http://www.learnartificialneuralnetworks.com/speechrecognition.html

[3]

Kuldeep Kumar, R. K. Aggarwal, 2011. Hindi Speech Recognition System Using Htk, International Journal of Computing and Business Research, Vol. 2, Issue 2.

[4]

Sarikaya R., Gao Y., Saon G., 2004. Fractional Fourier transform features for speech recognition, Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, (17-21 May 2004 ), Vol.1, 529--532.

[5]

Polur PD, Miller GE, 2005. Experiments with fast Fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using hidden Markov Model, IEEE Trans Neural Syst. Rehabil Eng., 13 (4), 558--61.

[6]

Febe de Wet, Bert Cranen, Johan De Veth, Loe Boves, 2001. A comparison of LPC and FFT-based acoustic features for noise robust ASR, Proc. of 7th European Conference on Speech Communication and Technology, (Aalborg, Denmark, September 3-7, 2001).

[7]

Anuj Mohamed, K. N. Ramachandran Nair, 2010. Continuous Malayalam speech recognition using Hidden Markov Models, Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India, No. 49.

Digital Library

[8]

Cini Kurian, Kannan Balakrishnan, 2011. Automated Transcription System for Malayalam Language, International Journal of Computer Applications, Vol 19, No.5, 5--10.

[9]

Cini Kurian, Firoz Shah.A, Kannan Balakrishnan, 2010. Isolated Malayalam Digit Recognition Using Support Vector Machines, Proc. of. IEEE International Conference on Communication Control and Computing Technologies, (Oct. 7-9, 2010), 692--695

[10]

Bharti W. Gawali, Santosh Gaikwad, Pravin Yannawar, Suresh C. Mehrotra, 2010. Marathi Isolated Word Recognition System using MFCC and DTW Features, Proc. of Int. Conf. on Advances in Computer Science, (Dec 21-22), 143--146.

[11]

Santhosh V. Chapaneri, 2012. Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping, International Journal of Computer Applications, Vol 40, No.3, 6--12.

[12]

Paul A. K., Das D., Kamal M. M., 2009. Bangla Speech Recognition System Using LPC and ANN, Seventh International Conference on Advances in Pattern Recognition, IEEE Xplore, (Kolkata, Feb. 4-6 2009), 171--174.

Digital Library

[13]

Thiang, Suryo Wijoyo, 2011. Speech Recognition Using Linear Predictive Coding and Artificial Neural Network for Controlling Movement of Mobile Robot, Proc. of. Int. Conf. on Information and Electronics Engineering, IPCSIT vol.6, (IACSIT Press, Singapore).

[14]

Ooi Chia Ai, M. Hariharan, Sazali Yaacob, Lim Sin Chee, 2012. Classification of speech dysfluencies with MFCC and LPCC features, Expert Systems with Applications, Vol.39 (2), 2157--2165.

Digital Library

[15]

Engin Avci, Zuhtu Hakan Akpolat, 2006. Speech recognition using a wavelet packet adaptive network based fuzzy inference system, Expert Systems with Applications, Volume 31, Issue 3, pp. 495--503.

[16]

Vimal Krishnan V. R, Babu Anto P, 2009. Features of Wavelet Packet Decomposition and Discrete Wavelet Transform for Malayalam Speech Recognition, International Journal of Recent Trends in Engineering, Vol. 1(2), 93--96.

[17]

Yang Jie, 2009. Noise robust speech recognition by combining speech enhancement in the wavelet domain and Lin-log RASTA, ISECS International Colloquium on Computing, Communication, Control, and Management, IEEE Xplore, (Aug. 8-9, 2009), Vol. 2, 415--418.

[18]

Shivesh Ranjan, 2010. Exploring the Discrete Wavelet Transform as a Tool for Hindi Speech Recognition, International Journal of Computer Theory and Engineering, Vol. 2, No. 4, 642--646.

[19]

Sonia Sunny, David Peter S., K. Poulose Jacob. 2011, Wavelet Packet Decomposition and Artificial Neural Networks based Recognition of Spoken Digits, International journal of machine intelligence, Vol.3, issue 4, 318--321.

[20]

M.A.Anusuya, 2011. Comparison of Different Speech Feature Extraction Techniques with and without Wavelet Transform to Kannada Speech Recognition, International Journal of Computer Applications, Vol. 26, No.4, 19--24.

[21]

Picone J. W., 1993. Signal Modelling Technique in Speech Recognition, Proc. of the IEEE, Vol. 81, No.9, 1215--1247.

[22]

Rabiner L., Juang B. H., 1993. Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, NJ.

Digital Library

[23]

Jeremy Bradbury, 2000. Linear Predictive Coding.

[24]

S. Mallat, 1999. A wavelet Tour of Signal Processing, Academic Press, San Diego.

[25]

K. P Soman, K. I Ramachandran, N. G Resmi, 2010. Insight into Wavelets From Theory to Practice, PHI Learning Private Ltd, New Delhi.

[26]

Elif Derya Ubeyil, 2009. Combined Neural Network Model Employing Wavelet Coefficients for ECG Signals Classification, Digital signal Processing, Vol 19, 297--308.

Digital Library

[27]

S. Chan Woo, C. Peng Lin, R. Osman, 2001. Development of a Speaker Recognition System using Wavelets and Artificial Neural networks, Proc. of 2001 Int. Symposium on Intelligent Multimedia, Video and Speech processing, (Hong Kong, May 2-4, 2001), 413--416.

[28]

S. Kadambe, P. Srinivasan, 1994. Application of Adaptive Wavelets for Speech, Optical Engineering, Vol 33(7), 2204--2211.

[29]

S. G. Mallat 1989. A Theory for Multiresolution Signal Decomposition: The Wavelet Representation, IEEE Transactions on Pattern Analysis And Machine Intelligence, Vol.11, 674--693.

Digital Library

[30]

http://en.wikipedia.org/wiki/Discrete_wavelet_transform

[31]

Fecit Science and Technology Production Research Center, 2003. Wavelet Analysis and Application by MATLAB6.5 {M}, Electronics Industrial Press, Beijing.

[32]

http://en.wikipedia.org/wiki/Wavelet_packet_decomposition

[33]

Y. Hao, X. Zhu, 2000. A New Feature in Speech Recognition based on Wavelet Transform, Proc. of. IEEE 5th Inter. Conf. on Signal Processing, (Aug. 21-25, 2000), Vol 3, 1526--1529.

[34]

Freeman J. A, Skapura D. M., 2006. Neural Networks Algorithm, Application and Programming Techniques, Pearson Education.

Digital Library

[35]

Economou K., Lymberopoulos D., 1999. A New Perspective in Learning Pattern Generation for Teaching Neural Networks, Volume 12, Issue 4-5, 767--775.

Digital Library

[36]

Eiji Mizutani, James W. Demmel, 2003. On Structure-exploiting Trust Region Regularized Nonlinear Least Squares Algorithms for Neural-Network Learning, Neural Networks, Volume 16, 745--753.

Digital Library

[37]

Anil K. Jain, Robert P. W. Duin, Jianchang Mao, 2000. Statistical Pattern Recognition: A Review, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, 4--37.

Digital Library

[38]

Sonia Sunny, David Peter S, K Poulose Jacob, 2012. Optimal Daubechies Wavelets for Recognizing Isolated Spoken Words with Artificial Neural Networks Classifier, International Journal of Wisdom Based Computing, Vol. 2(1), 35--41.

Cited By

Gupta HGupta D(2016)LPC and LPCC method of feature extraction in Speech Recognition System2016 6th International Conference - Cloud System and Big Data Engineering (Confluence)10.1109/CONFLUENCE.2016.7508171(498-502)Online publication date: Jan-2016
https://doi.org/10.1109/CONFLUENCE.2016.7508171
Varshney PBansal AFarooq O(2014)Phoneme confusability reduction by using visual information in noisy environment2014 International Conference on Signal Propagation and Computer Technology (ICSPCT 2014)10.1109/ICSPCT.2014.6884883(476-481)Online publication date: Jul-2014
https://doi.org/10.1109/ICSPCT.2014.6884883

Index Terms

A comparative study of parametric coding and wavelet coding based feature extraction techniques in recognizing spoken words
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition

Recommendations

Feature Extraction Methods Based on Linear Predictive Coding and Wavelet Packet Decomposition for Recognizing Spoken Words in Malayalam
ICACC '12: Proceedings of the 2012 International Conference on Advances in Computing and Communications

Speech signals are one of the most important means of communication among the human beings. In this paper, a comparative study of two feature extraction techniques are carried out for recognizing speaker independent spoken isolated words. First one is a ...
Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method
Abstract
The field of speech recognition makes it simpler for humans and machines to engage with speech. Number-oriented communication, such as using a registration code, mobile number, score, or account number, can benefit from speech recognition for ...
Spoken Arabic Digits Recognition Using Discrete Wavelet
UKSIM '14: Proceedings of the 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation

In this paper, we propose a scheme for recognizing isolated spoken Arabic digits, based on the Discrete Wavelet Transform (DWT) features. The Discrete Wavelet Transform is a transformation that can be used to analyze the temporal and spectral properties ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CUBE '12: Proceedings of the CUBE International Information Technology Conference

September 2012

879 pages

ISBN:9781450311854

DOI:10.1145/2381716

General Chair:
Vidyasagar Potdar
Curtin University, Australia
,
Program Chair:
Debajyoti Mukhopadhyay
Maharashtra Institute of Technology, India

Copyright � 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

CUOT: Curtin University of Technology

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CUBE '12

Sponsor:

CUOT

CUBE '12: CUBE International IT Conference & Exhibition

September 3 - 5, 2012

Pune, India

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
142
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gupta HGupta D(2016)LPC and LPCC method of feature extraction in Speech Recognition System2016 6th International Conference - Cloud System and Big Data Engineering (Confluence)10.1109/CONFLUENCE.2016.7508171(498-502)Online publication date: Jan-2016
https://doi.org/10.1109/CONFLUENCE.2016.7508171
Varshney PBansal AFarooq O(2014)Phoneme confusability reduction by using visual information in noisy environment2014 International Conference on Signal Propagation and Computer Technology (ICSPCT 2014)10.1109/ICSPCT.2014.6884883(476-481)Online publication date: Jul-2014
https://doi.org/10.1109/ICSPCT.2014.6884883

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents