The problem with speech recognition in the classroom – TechCrunch
Before the pandemic, more than 40% of new internet users were children. Estimates now suggest it Screen time for kids has increased by 60% or more in children under the age of 12 who spend more than five hours a day on screens (with all associated Benefits and dangers).
While it is easy to marvel at the technological prowess of digital natives, educators (and parents) are painfully aware that young “remote learners” often have difficulty navigating the keyboards, menus, and interfaces required to access Keeping the promise of educational technology.
With this in mind, voice-activated digital assistants are hoping for a smoother interaction with the technology. But while kids like to ask Alexa or Siri To box, tell jokes, or make animal noises, parents and teachers know that these systems struggle to understand their youngest users when they deviate from predictable requirements.
The challenge stems from the fact that the speech recognition software that supports popular voice assistants like Alexa, Siri and Google was never designed for children, whose voices, language and behavior are far more complex than adults’.
It’s not just that children’s voices are more squeaky, their vocal channels are thinner and shorter, their vocal folds are smaller, and their larynx is not fully developed. This leads to very different speech patterns than those of an older child or an adult.
It is easy to see from the graphic below that simply changing the pitch of the adult voices used to train speech recognition does not reproduce the complexity of the information required to understand a child’s speech. The language structures and patterns of the children are very different. They make leaps in syntax, pronunciation, and grammar that must be accounted for by the natural language processing component of speech recognition systems. This complexity is exacerbated by the variability between speakers in children at a variety of different stages of development that need not be accounted for in adult language.
A child’s speech behavior is not only more variable than that of an adult, it is also very unpredictable. Children pronounce words excessively, lengthen certain syllables, insert each word when thinking aloud, or skip some words entirely. Their speech patterns are not committed to the usual cadences known to adult users. As adults, we learned how best to interact with these devices and how to evoke the best response. We straighten up, formulate the request in our heads, modify it based on the behavior we have learned and speak our requests out loud, take a deep breath … “Alexa …” Children just burst out their thoughtless requests as if Siri or Alexa were human , and most of the time you get an incorrect or canned answer.
In an educational setting, these challenges are exacerbated by the fact that speech recognition not only has to deal with ambient noise and the unpredictability of the classroom, but also changes in a child’s language over the course of the year and the multitude of accents and dialects in a typical Elementary school. The physical, linguistic, and behavioral differences between children and adults also increase dramatically the younger the child is. This means that young learners who benefit the most from speech recognition are the most difficult for developers to build.
Explaining and understanding the very different quirks of children’s language requires speech recognition systems that intentionally learn from the way children speak. Children’s language cannot be treated as just another accent or dialect for speech recognition. It is fundamentally and practically different and changes as children grow and develop physically and verbally.
Unlike most consumer contexts, accuracy has a profound effect on children. A system that tells a child that they are wrong when they are right (false negative) damages their trust. that tells them that if they are wrong (false positive) they are right, risking socio-emotional (and psychometric) harm. In a entertainment setting, in apps, games, robotics, and smart toys, these false negatives or positives lead to frustrating experiences. In schools, mistakes, misunderstandings, or prepackaged answers can have far more profound effects on education and justice.
Well documented preload In speech recognition, for example, this can have harmful effects on children. It is unacceptable for a product aimed at children with a particular demographic or socio-economic background to perform with poorer accuracy and produce false positives and negatives. A growing number of researches suggests that voice can be an extremely valuable interface for children, but we cannot allow or ignore that it reinforces already endemic prejudices and inequalities in our schools.
Speech recognition can be a powerful tool for kids at home and in the classroom. It can fill critical gaps in helping children through the literacy and language learning stages, and help children better understand and be understood by the world around them. It can pave the way for a new era of “invisibleObservation measures that work reliably even in a remote environment. However, most of today’s speech recognition tools are not suitable for this purpose. Siri, Alexa, and other voice assistant technologies have a job to do – understanding adults speaking clearly and predictably – and for the most part, they do the job well. If speech recognition is to work for children, it must be modeled for and respond to their unique voices, language, and behavior.