Not so way back, generative AI might solely talk with human customers by way of textual content. Now it is more and more being given the ability of speech — and this potential is enhancing by the day.
On Thursday, AI voice platform ElevenLabs introduced v3, described on the corporate’s web site as “essentially the most expressive text-to-speech mannequin ever.” The brand new mannequin can exhibit a variety of feelings and refined communicative quirks — like sighs, laughter, and whispering — making its speech extra humanlike than the corporate’s earlier fashions.
Additionally: Could WWDC be Apple’s AI turning point? Here’s what analysts are predicting
In a demo shared on X, v3 was proven producing the voices of two characters, one male and the opposite feminine, who had been having a lighthearted dialog about their newfound potential to talk in additional humanlike voices.
Introducing Eleven v3 (alpha) – essentially the most expressive Textual content to Speech mannequin ever.
Supporting 70+ languages, multi-speaker dialogue, and audio tags similar to [excited], [sighs], [laughing], and [whispers].
Now in public alpha and 80% off in June. pic.twitter.com/n56BersdUc— ElevenLabs (@elevenlabsio) June 5, 2025
There is definitely not one of the Alexa-esque flatness of tone, however the v3-generated voices are usually nearly excessively animated, to the purpose that their laughter is extra creepy than charming — take a listen yourself.
The mannequin may communicate greater than 70 languages, in comparison with its predecessor’s v2 restrict of 29. It is obtainable now in public alpha, and its price ticket has been slashed by 80% till the tip of this month.
The way forward for AI interplay
AI-generated voice has turn out to be a serious focus of innovation as tech builders look towards the way forward for human-machine interplay.
Automated assistants like Siri and Alexa have lengthy been in a position to communicate, after all, however as anybody who routinely makes use of these programs can attest, their voices are very mechanical, with a relatively slender vary of emotional cadence and tones. They’re helpful for dealing with fast and straightforward duties, like taking part in a music or setting an alarm, however they do not make nice dialog companions.
A few of the newest text-to-speech (TTS) AI instruments, then again, have been engineered to talk in voices which can be maximally real looking and fascinating.
Additionally: You shouldn’t trust AI for therapy – here’s why
Customers can immediate v3, for instance, to talk in voices which can be simply customizable via the usage of “audio tags.” Consider these as stylistic filters that modify the output, and which will be inserted immediately into textual content prompts: “Excited,” “Loudly,” “Sings,” “Laughing,” “Offended,” and so forth.
ElevenLabs is not the one firm racing to construct extra lifelike TTS fashions, which large tech firms are promoting as a extra intuitive and accessible method to work together with AI.
In late Might, ElevenLabs competitor Hume AI unveiled its Empathic Voice Interface (EVI) 3 model, which permits customers to generate customized voices by describing them in pure language. Equally nuanced conversational talents are additionally now on provide via Google’s Gemini 2.5 Pro Flash model.
Need extra tales about AI? Sign up for Innovation, our weekly publication.