short-paper

Open access

Discovering Undiscovered States in Human Robot Verbal Interaction

Authors:

Sai Nitish Vemuri,

Maged Mikhail,

Sayanti RoyAuthors Info & Claims

HRI '24: Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction

Pages 1077 - 1079

https://doi.org/10.1145/3610978.3640604

Published: 11 March 2024 Publication History

PDF eReader

Abstract

Despite the abilities of automatic speech recognition systems such as CMU Sphinx, Google Speech-to-Text API, and Amazon Tran- scribe to recognize a variety of voices, they often face challenges in accurately processing complete information. To overcome this limitation, we propose a novel approach utilizing Markov Decision Processes. Our research involves an intelligent agent that evaluates human speech (n=1) and identifies new states through learning, enabling it to process more comprehensive information compared to traditional systems. The paper illustrates two scenarios : one where the intelligent agent explores by detecting undiscovered states and ultimately reaches the goal state, and another where while discovering new states it also revisits the previous states.

Supplemental Material

MP4 File

Supplemental video

Download
31.36 MB

References

[1]

Imon Banerjee, Yuan Ling, Matthew C Chen, Sadid A Hasan, Curtis P Langlotz, Nathaniel Moradzadeh, Brian Chapman, Timothy Amrhein, David Mong, Daniel L Rubin, et al. 2019. Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification. Artificial intelligence in medicine 97 (2019), 79--88.

Google Scholar

[2]

Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.".

Digital Library

Google Scholar

[3]

V�ctor Campos, Alexander Trott, Caiming Xiong, Richard Socher, Xavier Gir�-i Nieto, and Jordi Torres. 2020. Explore, discover and learn: Unsupervised discovery of state-covering skills. In International Conference on Machine Learning. PMLR, 1317--1327.

Google Scholar

[4]

Simon Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudik, and John Langford. 2019. Provably efficient rl with rich observations via latent state decoding. In International Conference on Machine Learning. PMLR, 1665--1674.

Google Scholar

[5]

Sean Kennedy, Haipeng Li, Chenggang Wang, Hao Liu, Boyang Wang, and Wenhai Sun. 2019. I can hear your alexa: Voice command fingerprinting on smart home speakers. In 2019 IEEE Conference on Communications and Network Security (CNS). IEEE, 232--240.

Crossref

Google Scholar

[6]

Alex Lamb, Riashat Islam, Yonathan Efroni, Aniket Didolkar, Dipendra Misra, Dylan Foster, Lekan Molu, Rajan Chari, Akshay Krishnamurthy, and John Langford. 2022. Guaranteed discovery of controllable latent states with multi-step inverse models. arXiv preprint arXiv:2207.08229 (2022).

Google Scholar

[7]

Sudha Morwal, Nusrat Jahan, and Deepti Chopra. 2012. Named entity recognition using hidden Markov model (HMM). International Journal on Natural Language Computing (IJNLC) Vol 1 (2012).

Crossref

Google Scholar

[8]

Sayanti Roy, Emily Kieson, Charles Abramson, and Christopher Crick. 2017. Semantic structure for robotic teaching and learning. In 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). IEEE, 391--396.

Digital Library

Google Scholar

[9]

Sayanti Roy, Harshal Maske, Girish Chowdhary, and Christopher Crick. 2017. Teaching and learning using semantic labels. In Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction. 271--272.

Digital Library

Google Scholar

[10]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

Digital Library

Google Scholar

[11]

Jean Tarbouriech, Matteo Pirotta, Michal Valko, and Alessandro Lazaric. 2020. Improved sample complexity for incremental autonomous exploration in mdps. Advances in Neural Information Processing Systems 33 (2020), 11273--11284.

Google Scholar

[12]

Stefanie Tellex, Nakul Gopalan, Hadas Kress-Gazit, and Cynthia Matuszek. 2020. Robots that use language. Annual Review of Control, Robotics, and Autonomous Systems 3 (2020), 25--55.

Crossref

Google Scholar

[13]

Wenpeng Yin, Katharina Kann, Mo Yu, and Hinrich Sch�tze. 2017. Comparative study of CNN and RNN for natural language processing. arXiv preprint arXiv:1702.01923 (2017).

Google Scholar

Index Terms

Discovering Undiscovered States in Human Robot Verbal Interaction
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Markov decision processes

Recommendations

Lithuanian Speech Corpus Liepa for Development of Human-Computer Interfaces Working in Voice Recognition and Synthesis Mode

The problem of speech corpus for design of human-computer interfaces working in voice recognition and synthesis mode is investigated. Specific requirements of speech corpus for speech recognizers and synthesizers were accented. It has been discussed that ...
Regularized minimum variance distortionless response-based cepstral features for robust continuous speech recognition

We study the low-variance and robust features for speech recognition system on the AURORA-4 corpus.We propose to compute cepstral features from a regularized MVDR (RMVDR) spectral estimates, denoted as RMVDR-based Cepstral Coefficient (RMCC) features.A ...
Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...

Comments

Information & Contributors

Information

Published In

HRI '24: Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction

March 2024

1408 pages

ISBN:9798400703232

DOI:10.1145/3610978

General Chairs:
Dan Grollman
Plus One Robotics, USA
,
Elizabeth Broadbent
University of Auckland, New Zealand
,
Program Chairs:
Wendy Ju
Cornell Tech, USA
,
Harold Soh
National University of Singapore, Singapore
,
Tom Williams
Colorado School of Mines, USA

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 March 2024

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Purdue Northwest Catalyst Grant

Conference

HRI '24

Sponsor:

HRI '24: ACM/IEEE International Conference on Human-Robot Interaction

March 11 - 15, 2024

CO, Boulder, USA

Acceptance Rates

Overall Acceptance Rate 268 of 1,124 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
78
Total Downloads

Downloads (Last 12 months)78
Downloads (Last 6 weeks)15

Reflects downloads up to 18 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Lithuanian Speech Corpus Liepa for Development of Human-Computer Interfaces Working in Voice Recognition and Synthesis Mode

Regularized minimum variance distortionless response-based cepstral features for robust continuous speech recognition

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System