Skip to main content

Showing 1–5 of 5 results for author: Swaminathan, R V

Searching in archive cs. Search in all archives.
.
  1. Accelerator-Aware Training for Transducer-Based Speech Recognition

    Authors: Suhaila M. Shakiah, Rupak Vignesh Swaminathan, Hieu Duy Nguyen, Raviteja Chinta, Tariq Afzal, Nathan Susanj, Athanasios Mouchtaris, Grant P. Strimel, Ariya Rastrow

    Abstract: Machine learning model weights and activations are represented in full-precision during training. This leads to performance degradation in runtime when deployed on neural network accelerator (NNA) chips, which leverage highly parallelized fixed-point arithmetic to improve runtime memory and latency. In this work, we replicate the NNA operators during the training phase, accounting for the degradat… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: Accepted to SLT 2022

    Journal ref: IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023, pp. 100-107

  2. arXiv:2210.16238  [pdf, ps, other

    eess.AS cs.LG cs.SD eess.SP

    Contextual-Utterance Training for Automatic Speech Recognition

    Authors: Alejandro Gomez-Alanis, Lukas Drude, Andreas Schwarz, Rupak Vignesh Swaminathan, Simon Wiesler

    Abstract: Recent studies of streaming automatic speech recognition (ASR) recurrent neural network transducer (RNN-T)-based systems have fed the encoder with past contextual information in order to improve its word error rate (WER) performance. In this paper, we first propose a contextual-utterance training technique which makes use of the previous and future contextual utterances in order to do an implicit… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  3. arXiv:2209.14868  [pdf, other

    cs.SD cs.CL eess.AS

    ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition

    Authors: Martin Radfar, Rohit Barnwal, Rupak Vignesh Swaminathan, Feng-Ju Chang, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris

    Abstract: The recurrent neural network transducer (RNN-T) is a prominent streaming end-to-end (E2E) ASR technology. In RNN-T, the acoustic encoder commonly consists of stacks of LSTMs. Very recently, as an alternative to LSTM layers, the Conformer architecture was introduced where the encoder of RNN-T is replaced with a modified Transformer encoder composed of convolutional layers at the frontend and betwee… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: This paper was presented in Interspeech 2022

  4. arXiv:2106.07734  [pdf, other

    cs.CL cs.LG eess.AS

    CoDERT: Distilling Encoder Representations with Co-learning for Transducer-based Speech Recognition

    Authors: Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris

    Abstract: We propose a simple yet effective method to compress an RNN-Transducer (RNN-T) through the well-known knowledge distillation paradigm. We show that the transducer's encoder outputs naturally have a high entropy and contain rich information about acoustically similar word-piece confusions. This rich information is suppressed when combined with the lower entropy decoder outputs to produce the joint… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

    Comments: Accepted at InterSpeech 2021

  5. arXiv:2106.06126  [pdf, other

    cs.SD cs.LG eess.AS

    Exploiting Large-scale Teacher-Student Training for On-device Acoustic Models

    Authors: Jing Liu, Rupak Vignesh Swaminathan, Sree Hari Krishnan Parthasarathi, Chunchuan Lyu, Athanasios Mouchtaris, Siegfried Kunzmann

    Abstract: We present results from Alexa speech teams on semi-supervised learning (SSL) of acoustic models (AM) with experiments spanning over 3000 hours of GPU time, making our study one of the largest of its kind. We discuss SSL for AMs in a small footprint setting, showing that a smaller capacity model trained with 1 million hours of unsupervised data can outperform a baseline supervised system by 14.3% w… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: TSD2021