Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio | #VALL_E - Antonios Bouris

Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio | #VALL_E

18.01.23

On Thursday, Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person’s voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker’s emotional tone.

Its creators speculate that VALL-E could be used for high-quality text-to-speech applications, speech editing where a recording of a person could be edited and changed from a text transcript (making them say something they originally didn’t), and audio content creation when combined with other generative AI models like GPT-3.

Learn more / En savoir plus / Mehr erfahren:

https://www.scoop.it/t/21st-century-innovative-technologies-and-developments/?&tag=AI

https://www.scoop.it/topic/21st-century-innovative-technologies-and-developments/?&tag=Ethics

Read the full article at: arstechnica.com

More
articles

Antonios’ web corner

06/09/2024

Lenso.ai launches facial recognition search engine | Biometric Update

06/09/2024

NASA Discovers a Long-Sought Global Electric Field on Earth – NASA Science

31/08/2024