Function Repository Resource:

VoiceRestyle

Generate speech from a given text in a vocal style emulating a given audio sample

Contributed by: Arnoud Buzing

ResourceFunction["VoiceRestyle"][text,audiofile]

generates audio for text using the voice sample from audiofile.

Details

ResourceFunction["VoiceRestyle"] makes use of the 'xtts_v2' model from Coqui. You must agree to the terms of the license agreement when using ResourceFunction["VoiceRestyle"] for the first time.

ResourceFunction["VoiceRestyle"] downloads a large neural network model when used for the first time, which can take several minutes to download.

ResourceFunction["VoiceRestyle"] requires a voice sample stored in an audio file. You can record voice samples with AudioCapture.

ResourceFunction["VoiceRestyle"] encapsulates functionality from the Python TTS package.

Examples

Basic Examples

Evaluate, then click the red record button to record a voice sample, then click the same button to end the recording (a good voice sample is about 30-60 seconds of clearly spoken text from, for example, a random Wikipedia page):

In[1]:=

Out[1]=

Get the voice sample file path:

In[2]:=

Out[2]=

Run the function with a sample text and your voice sample:

In[3]:=

ResourceFunction[
"VoiceRestyle"]["You need to record a sample of your own voice. You can do this with AudioCapture.", sample]

Out[3]=

Publisher

Arnoud Buzing

Requirements

Wolfram Language 14.0 (January 2024) or above

Version History

1.0.0 – 17 January 2025

Source Metadata

Citation:
Coqui

Related Resources

License Information

This work is licensed under a Creative Commons Attribution 4.0 International License

Wolfram Function Repository

VoiceRestyle

Details

Examples

Basic Examples

Publisher

Related Links

Requirements

Version History

Source Metadata

Related Resources

License Information

VoiceRestyle

Details

Examples

Basic Examples

Publisher

Related Links

Requirements

Version History

Source Metadata

Related Resources

Related Symbols

License Information