Repository logo
 

Approaches for Interactions in Robotics Applications

Loading...
Thumbnail Image

Date

Authors

Roh, Junha

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

As increasing numbers of robots for helping humans are developed and deployed, it is important to have effective interaction with humans and other agents. Robots cooperating with humans are asked to be robust for safety and often desired to be explainable. On top of the challenges that general robots have, communication with humans needs the ability to understand the human direction, conditioning robots' behavior as well as their internal states or observations. Additionally, training a model for robotics applications requires costly environments for collecting data or active simulations to generate data while recent approaches with large models collect the data from the web. It makes the training process expensive, especially for the models with the capability of using language in specific contexts. In this thesis, we propose various methods that produce interpretable results using composable sub-models for interactions in robotics applications. In the first part of the thesis, we propose models for interaction in driving. We develop a model that enables humans to control a vehicle with language instructions such as "turn left and then turn right.” The model consists of two sub-models: a high-level policy to translate the language instruction to a sequence of sub-tasks and a low-level policy to control the vehicle to accomplish each sub-task. We also propose a model that predicts future trajectories of agents on a four-way intersection where it tackles another important form of interaction for autonomous vehicles. The first sub-model predicts destinations and topologically invariant description of the order of executions from reference trajectories. Given the abstract description of the scene, the second sub-model predicts multiple future trajectories. In the second part, we propose visual grounding models in 3D pointcloud and RGBD images as essential tasks for robot navigation and human-robot interaction. The task is to identify the referred object given language description either from a reconstructed 3D scene or a pair of RGBD images. The model for 3D visual grounding extends a large language model to a spatial-language model for identifying the target object. The model for RGBD visual grounding combines a pre-trained 2D visual grounding model and a 3D bounding-box proposal model. They can leverage the high generalization performance of large models, achieve comparable numbers to state-of-the-art methods, and produce interpretable intermediate results.

Description

Thesis (Ph.D.)--University of Washington, 2022

Keywords

, Computer science, Robotics

Citation

DOI