Harmonic-Based Robust Voice Activity Detection for Enhanced Low SNR Noisy Speech Recognition System

Po-Chuan LIN
Jhing-Fa WANG

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E99-A    No.11    pp.1928-1936
Publication Date: 2016/11/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E99.A.1928
Type of Manuscript: Special Section PAPER (Special Section on Smart Multimedia & Communication Systems)
Category: Speech and Hearing
robust voice activity detection,  harmonic spectral local peak,  low SNR noisy speech recognition,  

Full Text: PDF(2.7MB)>>
Buy this Article

This paper describes a novel harmonic-based robust voice activity detection (H-RVAD) method with harmonic spectral local peak (HSLP) feature. HSLP is extracted by spectral amplitude analysis between the adjacent formants, and such characteristic can be used to identify and verify audio stream containing meaningful human speech accurately in low SNR environment. And, an enhanced low SNR noisy speech recognition system framework with wakeup module, speech recognition module and confirmation module is proposed. Users can determine or reject the system feedback while a recognition result was given in the framework, to prevent any chance that the voiced noise misleads the recognition result. The H-RVAD method is evaluated by the AURORA2 corpus in eight types of noise and three SNR levels and increased overall average performance from 4% to 20%. In home noise, the performance of H-RVAD method can be performed from 4% to 14% sentence recognition rate in average.

open access publishing via