Speech Recognition

Considerations for Use in Language Training
by Norman Harris
DynEd’s Manager for Europe, the Middle East, and Africa
Speech Recognition technology has finally come of age – at least for language training purposes for young adults and adults. Computer programs that truly “understand” natural speech, the Holy Grail of artificial intelligence researchers, may be a decade or more away, and today’s SR programs may be merely pattern-matching devices, still incapable of parsing real language, of achieving anything like “understanding,” but, nonetheless, they can now provide language students with realistic, highly effective, and motivating speech practice. In this article I shall try to provide a brief overview of the state of development of continuous speech, speaker-independent SR programs, and some of the ways that they have been adapted for use in language training.
Historically, Speech Recognition programs required mainframe computers, were very expensive, or in cheaper PC versions performed inadequately for serious language learning. Most early SR programs were limited to discreet speech (single words or short phrases carefully enunciated) and were usually speaker-dependent, requiring each user to train the program by reading a long list of specially selected words or phrases that familiarized the program with that speaker. These technologies were also limited by the dependence on a particular regional English accent, usually a fairly neutral American accent. These early programs allowed users to control their computers with simple oral commands, but for language training purposes, they were not ideal. The essence of real language is not in discreet single words — language students need to practice complete phrases and sentences in realistic contexts. Moreover, programs which were trained to accept a speaker’s individual pronunciation quirks were not ideally suited to helping students move toward more standard pronunciation. These technologies also failed if the speakers voice changed due to common colds, laryngitis and other throat ailments, rendering them useless until the speaker recovered or retrained the speech engine.
The solution to these problems came with the development of continuous-speech SR engines… |
Students are also far more likely to repeat exercises, substantially increasing their effectiveness. |
Even if we accept that accuracy needs to be responsive to proficiency in order to encourage students to speak, we must, as teachers, be concerned that errors do not become reinforced. Higher levels of accuracy can also be expected if the task required is appropriate to the language level of the students, and if there is a language focus other than just speaking. Good SR programs use lesson types for which today’s SR programs are optimized, i.e., lessons which focus on things like phrase discrimination, word order, key words, and/or syntax. Speech-enhanced exercises include answering and asking questions, fill-in the blank and sentence transformation grammar exercises, fluency reading, branching dialogs and role plays which can even be integrated with video sequences. Though many of these exercise types have existed in multimedia programs in the past, their transformation from mouse click to SR programs change them radically.The new interface hugely increases interactivity, student motivation, and focus. Students are also far more likely to repeat exercises, substantially increasing their effectiveness. This increased level of practice helps students achieve real mastery of the material they are studying. Perhaps most important of all is that the safe, private, environment helps students develop confidence and encourages them to do something most of us find very difficult to get them to do in class: speak.•
Norman Harris is DynEd’s Manager for Europe, the Middle East, and Africa.
He has extensive experience in language teaching in different parts of the world, especially the use of multimedia for instructional purposes.
