Humans are able to communicate a wide range of nonverbal emotions — even when just speaking, the human voice is jam packed with clues and a lot of value if you are a company that is collecting personal data.
Researchers at the Imperial College London have used AI to mask the emotional cues in users’ voices when they are speaking to internet-connected voice assistants.
Putting a “layer” between the user and the cloud, their data is uploaded by automatically converting emotional speech into “normal” speech. Recently, they published their paper “Emotionless: Privacy-Preserving Speech Analysis for Voice Assistants” on the arXiv preprint server.
Our voices are able to reveal our confidence as well as stress levels, physical condition, age, gender and personal traits. And while this is not lost on companies like Amazon as well as speaker makers, companies like Amazon are always looking to improve the emotion-detecting abilities of AI.
An AI that could accurately detect people’s “personal preferences, and emotional states,” Ranya Aloufi, lead researcher, shared.
The method for masking emotion involves collecting speech, analyzing it as well as extracting emotional features from the raw signal. In addition, an AI program trains on this signal and replaces the emotional indicators in speech — flattening them. Then, a voice synthesizer re-generates the normalized speech using the AI outputs, all of which gets pinged to the cloud.
Researchers claim this method reduced emotional identification by 96 percent in an experiment while speech recognition accuracy decreased.
While understanding emotion is an important part of making a machine seem human, which is a longtime goal of many AI companies as well as futurists. SER or Speech Emotion Recognition as a field has been around since the late 90s and is a lot older than speech recognition systems running on Alexa, Siri, or Google Home devices.
Just last year, Huawei announced that it was constructing an AI that detected emotions to put in its voice assistance programs.
“We think that, in the future, all our end users wish [that] they can interact with the system in the emotional mode,” vice president of software engineering at Huawei, Felix Zhang, shared with CNBC. “This is the direction we see in the long run.”
In May, Alexa speech engineers then revealed research on using adversarial networks to discern emotion in Amazon’s home voice assistants. Emotion, they shared: “can aid in health monitoring; it can make conversational-AI systems more engaging; and it can provide implicit customer feedback that could help voice agents like Alexa learn from their mistakes.”
The companies are also hoping to use this highly sensitive data to sell you ultra-highlighted targeted advertisements.
Using illness as an example, Amazon’s 2017 patent for emotional speech recognition explains how “physical conditions such as sore throats and coughs may be determined based at least in part on a voice input from the user, and emotional conditions such as an excited emotional state or a sad emotional state may be determined based at least in part on voice input from a user,” the patent says. “A cough or sniffle, or crying, may indicate that the user has a specific physical or emotional abnormality.”
According to the patent, Alexa would then use said cues – paired with your browsing as well as purchase history – to offer you hyper-specific advertisements for medications or other products.
But unless smart home devices like Alexa or Google Home integrate more privacy protections, there is not much consumers can do to protect themselves against this use of their data.
“Users should be informed and given awareness on the ways in which these systems will use and analyse their recording so they are able to make informed choices about the placement of these devices in their households, or their interaction with these devices,” she said. “There is a dire need to apply these [privacy-protecting] technologies to protect the users’ and their emotional state.”