Real emotions with a synthetic voice

The adjectives “emotional” and “expressive” probably don’t come to mind when we think about voice assistants like Amazon’s Alexa, Google Assistant, and Apple’s Siri. However, that unmistakably flat and courteous voice, devoid of all feeling, is acceptable for an assistant but won’t work in games, movies, or other forms of storytelling media that require more involvement.

To fill that gap, new artificial intelligence software has been developed that allows simulating feelings in speech patterns with ‘non-word sounds’ like sighs and breaths.

Sonantic is a company based in London that creates these expressive artificial intelligence voices for a variety of applications, including Hollywood films and video games. The most recent project was titled ‘What’s Her Secret?’ and was created with an ‘unnamed Hollywood client’ with the goal of creating a flirtatious female main character ‘that has never lived’.

They made a video with the face of an actress but with the voice of an A.I. to show that ‘hyper-realistic emotional interactions’ can be created.

The researchers identified certain secrets that people may employ to sound more romantic and flirty while designing the flirty A.I., such as slowing down to generate suspense, gently smiling when speaking, and maintaining a calming, constant pace.

Sonantic’s speech models can already communicate happiness and grief, but flirting requires a delicate technique that is impossible to do using plain language. Therefore, the team has developed sly and teasing styles of speech, in addition to flirting, to give non-playable characters in movies a far more realistic feel. While the new video incorporates a live actress, the voice-over is entirely artificial, with an A.I. voice reading a love monologue.

It initially gives the impression that the actress in the video is also reading the voiceover, but then explains that “what you are hearing me say was never said by a human; it was generated by a computer”. “I am not real; I was never born”, the voice concludes.

The video was released on Valentine’s Day to demonstrate how realistic an A.I. can be at mimicking human speech patterns, which Sonantic refers to as the ‘CGI of Audio’. Non-verbal sounds like laughter, breathing, sobbing, and scoffing have helped them generate genuine ‘flirty audio’.

Sonantic CEO Zeena Qureshi remarked that human beings are very complicated by nature, and our voices play a key role in helping us connect with the world around us.

“At Sonantic, we are committed to capturing the nuances of the human voice, and we’re incredibly proud of these technological breakthroughs that we have brought to life through ‘What’s Her Secret?'”, he explained.

“From flirting and giggling to breathing and pausing, this is the most realistic romantic demo we’ve created to date, helping us inch closer to our vision of being the CGI of Audio”, he added.

Speech patterns, tone, and pacing, according to Dr. Maggie Vaughan, a New York City-based psychologist who specializes in romantic relationships, may make or break a flirty conversation.

Sonantic also allows for the A.I. voice to cry and shout for a higher level of realism.

A.I. crying
A.I. shouting

In August 2021, the company gained attention again when they said they’d given Val Kilmer’s voice back by reacting to his speech using recordings from before he acquired throat cancer. Kilmer’s voice is now scarcely identifiable after getting a tracheotomy in 2014 as part of his treatment for throat cancer. Fortunately, Kilmer may use the A.I. tool in his personal life to assist him in communicating rather than relying on a voice box.

Customers, mostly in the film and gaming industries, can use Sonantic’s dashboard editor to give voices to characters. It allows people to alter the inflection, speed, volume, and style of their voice, as well as add nonverbal sounds to the script.

Sonantic isn’t the only company working on synthetic voices; Respeecher created a virtual representation of a young Luke Skywalker for The Mandalorian.

Other companies are instead combining entirely automatically generated people, such as William Shatner and Albert Einstein, with synthesized versions of their voices.

Undoubtedly, A.I. voice generators are providing new levels of possibilities for the entertainment field, with ever more specific features to design a realistic speech that could fit well with a character or a narration. In addition, the technology to clone real voices will guarantee that people with voice problems or those unable to speak will use a real voice instead of a robotic and unemotional one.

Movies will surely benefit from this technology, which will guarantee creators have ever more tools to produce content on their own. The power of A.I. voice generators along with A.I. face generators will give them an amazing combination to create whole characters.

However, there are also drawbacks, especially if we use a voice, or worse, a cloned voice, to deceive other people or pretend to be someone else. For this reason, deepfakes and voice generators could be very dangerous.

Source Daily Mail