. . . bringing technology to you
|Volume 20, No. 1 – Winter 2012||
Subscribe to AT Messenger
Download PDF Viewer
|PDF Version (for printing)
Large Print (PDF)
Jane Chandlee, Ph.D., H. Timothy Bunnell, Ph.D
Nemours A.I. duPont Hospital for Children
Anyone who has ever asked Siri a question has experience with a synthetic voice, but many of those who have either lost or have never developed the ability to speak rely on synthetic voices for daily communication. Though these voices and devices have improved over the years, there are typically only a small number of voices to choose from. As a result, while the personís communicative needs are met, whatís missing is the sense of personal identity that comes with having oneís own voice.
This need for personalized synthetic voices is the focus of current research led by Dr. Timothy Bunnell at the Center for Pediatric Auditory and Speech Sciences (CPASS) at the Nemours A.I. duPont Hospital for Children. Dr. Bunnellís team developed ModelTalker, a speech synthesis system designed to build voices for those who have lost or will soon lose the ability to speak. To date, this technology has been of most benefit to patients diagnosed with ALS.
ALS patients and others who are diagnosed with a neurodegenerative disease undertake a process called voice banking while they are still able to speak fluently. For voice banking, they record approximately an hourís worth of sentences designed to include the range of speech sounds found in American English. Those recordings are then used to build a database of these speech sounds, which in turn can be used to synthesize any English sentence (not just the ones that were recorded by the user) via a process called concatenative synthesis.
Unfortunately, this procedure requires a large amount of usable speech, which means it wonít be successful for those who cannot record the needed sentences. This includes ALS patients who have progressed to a point where they canít speak, as well as children whose speech development has been impaired. With these populations in mind, CPASS is now working on the next generation of technology that was pioneered in partnership with Rupal Patel at Northeastern University. This technology, called VocaliD, combines the voice quality of the impaired speaker with Ďhealthyí speech that has been donated by a speaker of a similar age, sex, and dialect. The result is a synthetic voice that sounds as much like the impaired speaker as possible. In addition, the voice will be totally unique to them.
Here at Nemours, we are focusing on taking this exciting technology to the next level by improving the naturalness of the synthetic speech and demonstrating how important it is for children to have a voice of their own. A personís voice is as individual as his/her face or fingerprint Ė consider how easy it can be to recognize someone by voice alone. The goal is to provide that same sense of vocal individuality to children who use a synthetic voice. Even giving them a voice that sounds like a child is an improvement, as the commonly available synthetic voices are built from adult speech. The hope is that having a unique voice will augment these childrenís sense of identity and even encourage them to communicate more.
To refine and extend this experimental technology, CPASSís newly formed Clinical Speech Technology program is undertaking a research project that will build voices for non-speaking children in Delaware and surrounding states. All children who use speech generating devices (SGDs) as a means of communication and who are able to make at least some vocalizations are potentially eligible to participate in this ground-breaking research.
In addition, CPASS has teamed up with the company Therapy Box to provide ModelTalker voices in upcoming releases of their award-winning Predictable and ChatAble applications for Apple mobile devices. Currently, MT voices can only be used with Android and Windows devices, so these new apps represent an exciting step toward expanding the number of users who can benefit from a personalized synthetic voice.