TechLair

  • Home
  • contact
  • About
  • Privacy Policy

Google’s New AI Can Adjust Pitch, Emotion, and Speed with Just 30 Minutes of Data

Saturday, May 2, 2020 by Piyush Suthar | Comments

Home News Tech Google’s New AI Can Adjust Pitch, Emotion, and Speed with Just 30 Minutes of Data

AI researchers at Google and the University College London have detailed an AI model that can control speech characteristics like pitch, emotion and speaking rate with just 30 minutes of data. Their paper, which has been published by the International Conference on Learning Representations (ICLR), details how the researchers trained the AI system for 300,000 steps across 32 of Google’s custom-designed tensor processing units (TPUs).

According to the study, using just 30 minutes of labeled data enabled the AI algorithm to have a ‘significant degree’ of control over speech rate, valence, and arousal. The researchers further said that the new system can produce visual representations of frequencies called spectrograms by training a second model, such as DeepMind’s WaveNet, to act as a vocoder – a voice codec that analyzes and synthesizes voice data.

What’s really interesting is that the new AI model seems to address a critical limitation of an earlier study that investigated the use of ‘style tokens’, which represented different categories of emotion, to control speech effects. While that model achieved good results with only 5 percent of labeled data, it wasn’t able to satisfactorily modify speech samples that used different tones, stress, intonations and rhythms while conveying the same emotion.

The labeled data set included a total of around 45 hours of audio, including 72,405 recordings of 5-second each from 40 English speakers. The speakers were all trained voice actors who read pre-written texts with varying levels of valence (emotions like sadness or happiness) and arousal (excitement or energy). The researchers then used those recordings to obtain six ‘affective states’ that were then modeled and used as labels for the AI algorithm to train on.

While the researchers admit that new AI model can make it easier for unscrupulous parties to spread misinformation or commit fraud, they also claim that the benefits in this case far outweighs the possible risks because the study can eventually improve human-computer interfaces significantly.


Authored by Piyush Suthar
Pro Blogger


Follow me on Twitter, Facebook, Google+, YouTube.

Load comments
  • Newer Post
  • Home
  • Older Post
  • techlair
    Over 1,500+ Readers

    Get fresh content from TechLair

    brand222 facebook brand2 envelope-o

    BEST OF TechLair

    Siemens wants to change the way we share energy… with blockchain
    New AI tool can synthesise fake human fingerprints fooling biometric systems
    Fortnite servers go down to make way for Update 6.31: Here's what to watch out for
    Bad leaders lead to bad ends. This can help you avoid that fate.


    Copyright © 2019 TechLair. All rights reserved.
    Privacy Policy • DMCA • Contact