On the floor at the least, Meta’s newest AI development doesn’t look like a significant step.
As we speak, Meta has printed an summary of its new ‘Voicebox’ AI system, which can allow customers to translate textual content to audio, in a variety of kinds and voices.
Introducing Voicebox, a brand new breakthrough generative speech system based mostly on Circulate Matching, a brand new methodology proposed by Meta AI. It could possibly synthesize speech throughout six languages, carry out noise elimination, edit content material, switch audio type & extra.
Extra particulars on this work & examples ⬇️
— Meta AI (@MetaAI) June 16, 2023
As introduced on this overview clip, the Voicebox system can take textual content inputs and translate them into audio, with completely different voice choices, enabling extra superior text-to-audio translation, however with lowered studying and processing necessities than different, comparable choices.
Although, on the floor at the least, it’s not a heap completely different from the text-to-audio instruments that we’re now accustomed to – whether or not we like them or not – on TikTok and different apps.
The Voicebox translations sound fairly comparable – and I’m prepared to guess Meta received’t let me use the voice of Rocket Raccoon or a Transformer in these new translations.
However the Voicebox system can be greater than only a direct text-to-speech translation instrument.
As defined by Meta:
“Voicebox can produce top quality audio clips and edit pre-recorded audio – like eradicating automobile horns or a canine barking – all whereas preserving the content material and magnificence of the audio. The mannequin can be multilingual and might produce speech in six languages. Sooner or later, multipurpose generative AI fashions like Voicebox might give natural-sounding voices to digital assistants and non-player-characters within the metaverse. They may permit visually impaired folks to listen to written messages from pals learn by AI of their voices, give creators new instruments to simply create and edit audio tracks for movies, and rather more.”
As Meta notes, Voicebox additionally lets you use fashions of voice for translation, so you need to use an audio clip of one other individual so as to make your text-to-speech translation sound like that individual is talking, by way of simply seconds of audio enter.
Which is able to undoubtedly result in a brand new raft of deepfakes – although once more, comparable instruments do exist already. They’re simply not the identical, and Meta says not pretty much as good, as this new course of.
The actual good thing about Voicebox, in a broad-reaching sense, can be in translation, and enabling simplified, native-sounding variations of your textual content inputs in several languages. That would open up new, cross-market alternatives, whereas the superior modeling of the system can even facilitate broader use circumstances and course of, which might present different key advantages.
However Meta can be conscious of the dangers.
At this stage, Meta isn’t releasing the supply code or app to the general public, citing ‘the potential dangers of misuse’. It’s hoping to search out extra sensible, invaluable use circumstances for the expertise over time – so its announcement in the present day is extra of an FYI than a launch, as such.
You possibly can learn extra about Meta’s Voicebox mission right here.