WellSaid Labs, whose tools produce synthetic speech that could be mistaken for reality, raised $ 10 million for the Series A to help grow the business. The company’s homemade text-to-speech engine works faster than real-time, producing natural-sounding clips of pretty much any length, from quick snippets to hour-long readings.
WellSaid emerged from the Allen Institute for AI Incubator in 2019, and its goal was to create synthetic voices that wouldn’t sound as robotic for general business purposes such as training and marketing content.
The company initially achieved this by basing its solution on Tacotron, a language engine developed by Google and academic researchers. But soon it had built its own that was more efficient, produced more compelling voices, and could produce clips of any length. Talking machines often turn off after a few sentences, lapse into chatter, or lose the tone, but WellSaid read Mary Shelley’s Frankenstein without any hiccups.
The voices were good enough for the audience to rate them as human or as good as human – not something to be said about the usual virtual assistant suspects when they speak more than a handful of words. Additionally, speech was generated significantly faster than real-time, where other high-quality options often ran at a tenth of real-time or slower – meaning three minutes of speech would take a minute from WellSaid and half an hour or more from Tacotron.
Finally, the system enables the creation of new “voice avatars” based on existing speakers, such as a trusted company spokesperson or voice-over artist. It originally took around 20 hours of audio to model their quirks and vocal style, but now it can be done in just 2 hours, said CEO Matt Hocking.
The company is currently strictly business-oriented, which means there is no user-oriented app to digitize your voice into an avatar or anything like that. There are risks involved and no realistic business model for it, so that’s off the table for now.
Such a realistic voice could still be of tremendous help to people with disabilities, but Hocking admits but admits they are not quite ready to deal with it just yet.
“We are committed to expanding access to this technology so that non-verbal communicators, nonprofits and others can benefit from it,” he said.
In the meantime, the company has grown from its first market, corporate training videos, to marketing, longer texts, interactive products with extensive texts and app experience. It is hoped that the talent on which these avatars are based will be adequately compensated for creating a digital representation of their voice.
The oversubscribed $ 10 million round was led by FUSE, with the participation of repeat investor Voyager, Qualcomm Ventures LLC and GoodFriends, all of whom were likely impressed with the product and business growth. Synthetic voices served a handful of popular use cases, but the content wasn’t huge – so there’s plenty of room to grow. The company will invest the money in deepening its product offering and expanding the team.