Keiichiro Suzuki: Automatic Speech Recognition of Japanese numeral-numeral classifier combinations
The goal of automatic speech recognition (ASR) systems is to achieve complete, or human-level mapping of an acoustic signal to a string of words.
Dealing with natural speech, ASR systems encounter various linguistically significant issues such as coarticulation and morphophonemic alternations as
well as linguistically insignificant issues such as speech rate and environmental noise.
This paper presents a study to improve the performance of Japanese large vocabulary continuous speech recognizer (LVCSR) by modeling morphophonemic
alternation and pronunciation variation (i.e. free variation). In particular, I report the results of performance tests ran on numeral-numeral
classifier combinations in Japanese (e.g. ni-hon 'two stick-type object',
san-bon 'three stick-type objects'), showing how the accuracy of our LVCSR
engine was improved through modeling morphophonemic alternation and pronunciation variation of the target lexical items. On one hand, these
numeral-numeral classifier combinations can be a typical subject of phonological/morphological study, displaying linguistically significant,
"regular" voicing alternation patterns. On the other hand, the same set of data shows linguistically insignificant, pronunciation variation involving
voicing. I demonstrate that these two seemingly unrelated phenomena are indeed resolved by the same process of statistical adjustment.