Disfluency detection using a noisy channel model and deep neural language model
thesisposted on 2022-03-28, 16:02 authored by Paria Jamshid Lou
Although speech recognition technology has improved considerably in recent years, current systems still output simply a sequence of words without any useful information about the location of disfluencies. On the other hand, such information is necessary for improving the readability of speech transcripts. In fact, speech transcripts containing a lot of disfluencies are difficult to understand, so removing disfluent words can make speech transcripts more readable. Moreover, many tasks including dialogue systems input spontaneous speech. Such systems are usually trained on fluent, clean corpora, so inputting disfluent data would decrease their performance. This thesis aims at introducing a model for automatic disfluency detection in spontaneous speech transcripts called LSTM Noisy Channel Model. The model uses a Noisy Channel Model (NCM) to find "rough copies" that are likely to indicate disfluencies and generate n-best candidate disfluency analyses. Then, the underlying fluent sentences of each candidate analysis are scored using a Long Short-Term Memory (LSTM) language model. The LSTM language model scores, along with other features, are used in a reranker to identify the most plausible analysis. We show that using LSTM language model scores as features to rerank the analyses generated by an NCM improves the state of-the-art in disfluency detection.