A.I. creates engament with human-like voice too

Since chatbots have been implemented they’ve never stopped to evolve as well as the Text-To-Speech technology did. Now, thanks to A.I., these technologies can melt together into new forms of virtual assistants with more human-like voices. As a result, companies can now employ speech A.I. to listen to and answer to customers in an expressive voice that is unique to their brand, resulting in more engaging and enjoyable conversations.

Conversational A.I. has advanced over the previous three years to incorporate new types of models that can better summarize and classify text, understand the sentiment, and do new things in speech thanks to NLP (Natural Language Processing).

Recently, NVIDIA presented Riva Custom Voice, a new toolkit that allows customers to create their “human-like” voice with just 30 minutes of speech recording data. This toolkit can be used by other companies to have their virtual assistant with a distinct voice, while developers can use it to launch brand voices and apps to help people with speech and language problems.

Progressive insurance, for example, employed A.I. to build a Facebook Messenger chatbot featuring Stephanie Courtney’s voice, who plays Flo. Duolingo is using artificial intelligence to produce voices for its language learning apps. For customers who call into National Australia Bank’s contact centers, an AI-powered Australian English voice has been installed.

Riva Custom Voice uses semi-supervised learning to create synthetic, bespoke voices for software, IVRs, and other business applications and is included in the newest version of Nvidia’s Riva conversational AI software development kit.

>>>  A new A.I. web app turns photos into Leonardo Da Vinci's style

Semi-supervised learning can be used to solve a variety of real-world situations where supervised learning algorithms would fail due to a lack of labeled data.

AI-powered voices can provide brand consistency, which is one of the keys to increasing customer loyalty.

Even Amazon implemented its own speech service through the Amazon Polly platform, a cloud service that converts text into lifelike speech which include Brand Voice, a service that produces AI-generated voices representing specific personas. KFC’s Colonel Sanders is an example of cloned voice with its typical Southern U.S. English accent through a system that can learn to adopt a new speaking style from just a few hours of training.

There are 2 components to Amazon’s A.I. model. The first is a generative neural network that turns a series of phonemes into spectrograms, which are visual representations of the spectrum of sound frequencies as they change over time. The second is a vocoder, a device that turns spectrograms into a continuous audio output.

Although this technology can help having more realistic conversations, it can also be abused, as in the case of a CEO whose voice was reproduced to execute a $243,000 wire transfer. There are plenty of audio and video data that can be fed into a machine learning system to create a persuasive copy. Malicious actors are planning to use synthetic content for cybercrimes, according to the FBI.

So, some providers require that voice actors consent to its use before deploying a synthetic voice, or that each prospective use case is reviewed, and that consumers sign a code of behavior. Microsoft is working on a means to include a digital watermark within a synthetic voice to identify that the content was made with Custom Neural Voice. Others have developed open-source techniques to detect vocal “deepfakes”, such as Resemble AI, another voice cloning service.

>>>  The new version of GPT-3 is much better

Anyway, Nvidia didn’t announce any safeguards to prevent abuse of Riva Custom Voice at first, but the company’s Riva terms of service forbid the creation of “fraudulent, false, misleading, or deceptive” content, as well as content that “promote[s] discrimination, bigotry, racism, hatred, harassment, or harm against any individual or group”.

However, if we imagine a cloned voice on a deepfake face that can talk using gpt-3 technology can be amazing and scary at the same time. We could have a whole replica of a person which can be very engaging when used by companies for marketing purposes, assistance, games or movies but this makes also easier identity thefts. So, it would be helpful such artificial entities were recognizable, maybe by making them as low-quality avatars to make them look more digital than real or including a digital watermark like mentioned above.

Source venturebeat.com