Function Repository Resource:

VoiceRestyle

Source Notebook

Generate speech from a given text in a vocal style emulating a given audio sample

Contributed by: Arnoud Buzing

ResourceFunction["VoiceRestyle"][text,audiofile]

generates audio for text using the voice sample from audiofile.

Details

ResourceFunction["VoiceRestyle"] makes use of the 'xtts_v2' model from Coqui. You must agree to the terms of the license agreement when using ResourceFunction["VoiceRestyle"] for the first time.
ResourceFunction["VoiceRestyle"] downloads a large neural network model when used for the first time, which can take several minutes to download.
ResourceFunction["VoiceRestyle"] requires a voice sample stored in an audio file. You can record voice samples with AudioCapture.
ResourceFunction["VoiceRestyle"] encapsulates functionality from the Python TTS package.

Examples

Basic Examples

Evaluate, then click the red record button to record a voice sample, then click the same button to end the recording (a good voice sample is about 30-60 seconds of clearly spoken text from, for example, a random Wikipedia page):

In[1]:=
audio = AudioCapture[]
Out[1]=

Get the voice sample file path:

In[2]:=
sample = Information[audio, "ResourcePath"]
Out[2]=

Run the function with a sample text and your voice sample:

In[3]:=
ResourceFunction[
 "VoiceRestyle"]["You need to record a sample of your own voice. You can do this with AudioCapture.", sample]
Out[3]=

Publisher

Arnoud Buzing

Requirements

Wolfram Language 14.0 (January 2024) or above

Version History

  • 1.0.0 – 17 January 2025

Source Metadata

Related Resources

License Information