Abstract: The SUMMIT system is a speaker-independent, continuous-speech recognition system that we have developed at MIT [12]. To date, the system has been ported to a variety of tasks with vocabulary sizes up to 1000 words and perplexities up to 73. The architecture of this system is a product of two guiding principles. First, we desired a framework that could be flexible and modular so that we could explore alternative strategies for embedding speech knowledge into the system. Second, we required that the system be stochastic and trainable from a large body of speech data to account for our current incomplete knowledge of the acoustic realization of speech. The current implementation of the system is a reflection of both of these ideas. SUMMIT differs from the majority of prevailing HMM approaches in many respects ranging from its use of auditory models and selected acoustic measurements, to its segmental framework and use of pronunciation networks. In time, the specific implementation of these ideas will undoubtedly be modified as we discover superior techniques and approaches. Until phonetic and word recognition accuracies are competitive with those of human listeners however, we believe it will be appropriate to incorporate both notions of flexibility and trainability into the system.
0 Replies
Loading