Speech Interaction

Projects | | Links:

Problem: Language-guided navigation for mobile manipulators and quadrupeds
Challenge: Robust speech command recognition and seamless integration with navigation frameworks
Solution: A real-time voice command framework by integrating the NVIDIA Riva ASR engine with ROS and an RViz Plugin

Deployed NVIDIA Riva Citrinet ASR on Jetson Orin AGX, achieving sub-second speech recognition performance in edge computing environments
Integrated the navigational goal grounding and path planning algorithm into the ROS framework
Improved robustness through a human-in-the-loop validation process that enables visualization and manual correction of speech recognition results
Designed a dynamic interface that intuitively displays speech recognition and communication statuses, enhancing system robustness
Implemented configurable options for setting speech command recognition locations and languages to adapt to various command environments

Led the setup and operation of a language-guided navigation demonstration for the KAIST Institute Meta-Research Project using Hello Robot Stretch 2
Delivered five successful language-guided navigation demonstrations with Boston Dynamics Spot

Inter-linguistic Composition for Second-Langauge Pronunciation (under review at NAACL 2025)

Problem: Low recognition rates of second-language (L2) speakers’ pronunications by automatic speech recognition (ASR) models
Challenge: Unconscious substitution of unfamiliar L2 phonemes with similar L1 phonemes, even though native speakers of the target language perceive these phonemes as distinct
Solution: Inter-linguistic Phonetic Composition (IPC), a novel computational method that minimizes incorrect phonological transfer by reconstructing L2 phonemes as composite sounds derived from multiple L1 phoneme

Developed a phoneme approximation algorithm that transforms phonemes across languages using universal phonological features, regardless of their phoneme inventories or orthographic transparency
Created an IPC-based Korean grapheme system that visually encodes English pronunciations using the Korean writing system, Hangul

Improved the recognition rate of target L2 phonemes by 26% through a ten-minute pronunciation training, compared to when influenced by original phonological transfer patterns
Validated pronunciation improvements for all 14 English phonemes approximated using the IPC algorithm
Deployed a publicly accessible web service for pronunciation training