TechLair

  • Home
  • contact
  • About
  • Privacy Policy

This AI Learns and Mimics Your Voice in 5 Seconds!

Sunday, November 17, 2019 by Piyush Suthar | Comments

Home News Tech This AI Learns and Mimics Your Voice in 5 Seconds!

Have you ever wondered how long it would take for an AI to learn and replicate your voice? Well, the answer to this question might come as a surprise as a new AI manages to mimic your voice after listening to it for a mere 5 seconds.

Yes, you read that right. Researchers at Google have developed a neural network-based system for text-to-speech (TTS) that manages to replicate the voice of speakers, including the ones that were never heard while training the AI of course.

The proposed system consists of three major components namely a speaker encoder, a synthesizer, and a vocoder. The speaker encoder is trained on a dataset containing speeches of over a thousand people without transcripts. The synthesizer generates a “mel spectrogram” from the input text.

A vocoder network based on DeepMind’s WaveNet is implemented in the network to convert the mel spectrograms generated by the synthesizer to waveform samples. Take a look at the overall flow of the system in the below diagram.

The researchers tested this system to determine the naturalness of the generated synthesized speech. For this, they created an evaluation set containing 100 phrases that are never used before in the training set and tested with two different sets of seen and unseen speakers. The proposed model scored 4.0 Mean Opinion Score (MOS) with 95% confidence levels.

It is worth noting that the audio generated by their AI model for unseen speakers sounded as natural as the audio generated for seen speakers – the speakers whose voice has been used during the training phase.

If you’re interested to know how good the synthesized outputs are, listen to the below reference voice and synthesized outputs.

(Reference Audio) https://google.github.io/tacotron/publications/speaker_adaptation/demos/groundtruth/8230_00000.wav

(Synthesized Audio) https://google.github.io/tacotron/publications/speaker_adaptation/demos/synthesized/8230_00082.wav

More samples are available here if you’re interested to explore more speech samples. To know more about how this system works behind the scenes, check out the research paper here and let us know your thoughts on it in the comments.


Authored by Piyush Suthar
Pro Blogger


Follow me on Twitter, Facebook, Google+, YouTube.

Load comments
  • Newer Post
  • Home
  • Older Post
  • techlair
    Over 1,500+ Readers

    Get fresh content from TechLair

    brand222 facebook brand2 envelope-o

    BEST OF TechLair

    15 Best Offline Shooting Games for Android
    7 Best Android Emulators for Windows 10
    Top 8 Games Like Clash Royale
    Focuster turns your to-dos into an organized schedule, and it’s only $59


    Copyright © 2019 TechLair. All rights reserved.
    Privacy Policy • DMCA • Contact