Augmenting Telephony Audio Data using Robust Principal Component Analysis

Mo, Ronald K. and Lam, Albert Y.S. (2020) Augmenting Telephony Audio Data using Robust Principal Component Analysis. In: IEEE Symposium Series on Computational Intelligence, Dec 1, 2020 - Dec 4, 2020, Canberra, ACT, Australia.

Item Type:	Conference or Workshop Item (Paper)

Abstract

Audio augmentation (e.g., corrupting audio data by noise) has been shown to improve the performance of Automatic Speech Recognition (ASR) systems for low-resource languages. In light of this, we are interested in understanding whether corrupting speech data with telephone channel characteristics (e.g., background music, artifact caused by down-sampling) improves the performance of ASR systems as well. In this work, we investigate the possibility of applying Sound Source Separation (SSS) approaches to capture the telephone channel characteristics. We are in particular interested in Robust Principal Component Analysis (RPCA), which is an unsupervised approach used for various SSS tasks. Our results show that augmenting clean speech data corpus with telephone channel characteristics yields a more robust ASR system, with 7.8% of Word Error Rate reduction. We also find that the characteristic, which has the lowest spectral features, improves ASR the most.

Full text not available from this repository.