Deep neural networks for small footprint text-dependent speaker verification

E Variani, X Lei, E McDermott, IL Moreno… - … on acoustics, speech …, 2014 - ieeexplore.ieee.org
2014 IEEE international conference on acoustics, speech and signal …, 2014ieeexplore.ieee.org
In this paper we investigate the use of deep neural networks (DNNs) for a small footprint text-
dependent speaker verification task. At development stage, a DNN is trained to classify
speakers at the framelevel. During speaker enrollment, the trained DNN is used to extract
speaker specific features from the last hidden layer. The average of these speaker features,
or d-vector, is taken as the speaker model. At evaluation stage, a d-vector is extracted for
each utterance and compared to the enrolled speaker model to make a verification decision …
In this paper we investigate the use of deep neural networks (DNNs) for a small footprint text-dependent speaker verification task. At development stage, a DNN is trained to classify speakers at the framelevel. During speaker enrollment, the trained DNN is used to extract speaker specific features from the last hidden layer. The average of these speaker features, or d-vector, is taken as the speaker model. At evaluation stage, a d-vector is extracted for each utterance and compared to the enrolled speaker model to make a verification decision. Experimental results show the DNN based speaker verification system achieves good performance compared to a popular i-vector system on a small footprint text-dependent speaker verification task. In addition, the DNN based system is more robust to additive noise and outperforms the i-vector system at low False Rejection operating points. Finally the combined system outperforms the i-vector system by 14% and 25% relative in equal error rate (EER) for clean and noisy conditions respectively.
ieeexplore.ieee.org