Metamorph: Injecting inaudible commands into over-the-air voice controlled systems

T Chen, L Shangguan, Z Li, K Jamieson - Network and Distributed …, 2020 - par.nsf.gov
Network and Distributed Systems Security (NDSS) Symposium, 2020par.nsf.gov
This paper presents Metamorph, a system that generates imperceptible audio that can
survive over-the-air trans-mission to attack the neural network of a speech recognition
system. The key challenge stems from how to ensure the added perturbation of the original
audio in advance at the sender side is immune to unknown signal distortions during the
transmission process. Our empirical study reveals that signal distortion is mainly due to
device and channel frequency selectivity but with different characteristics. This brings a …
This paper presents Metamorph, a system that generates imperceptible audio that can survive over-the-air trans- mission to attack the neural network of a speech recognition system. The key challenge stems from how to ensure the added perturbation of the original audio in advance at the sender side is immune to unknown signal distortions during the transmission process. Our empirical study reveals that signal distortion is mainly due to device and channel frequency selectivity but with different characteristics. This brings a chance to capture and further pre-code this impact to generate adversarial examples that are robust to the over-the-air transmission. We leverage this opportunity in Metamorph and obtain an initial perturbation that captures the core distortion’s impact from only a small set of prior measurements, and then take advantage of a domain adaptation algorithm to refine the perturbation to further im- prove the attack distance and reliability. Moreover, we consider also reducing human perceptibility of the added perturbation. Evaluation achieves a high attack success rate (90%) over the attack distance of up to 6 m. Within a moderate distance, e.g., 3 m, Metamorph maintains this high success rate, yet can be further adapted to largely improve the audio quality, confirmed by a human perceptibility study.
par.nsf.gov