Function Repository Resource:

AudioSeparate

Source Notebook

Divide an audio signal into musical sources (such as vocals, drums and bass) using a neural network model

Contributed by: Hrachya Aslanyan

ResourceFunction["AudioSeparate"][input]

separates the input audio into all available sources.

ResourceFunction["AudioSeparate"][input,source]

separates the input audio and returns only the specified source.

Details and Options

ResourceFunction["AudioSeparate"] requires version 15.0+ of the Wolfram Language.

ResourceFunction["AudioSeparate"] returns an Association of separated Audio objects.

AudioSeparate performs music source separation using a pretrained deep neural network model (Demucs).

The function splits the input audio into overlapping chunks, processes them in batches, and merges the results using weighted overlap-add reconstruction.

By default, all available sources are returned as an association of the form: <|"drums"→Audio[…],"bass"→Audio[…],"other"→Audio[…],"vocals"→Audio[…]|>

If a single source is specified, the result is a single Audio object.

If a list of sources is specified as AudioSeparate[input, {source₁, source₂, …}], the result is an Association containing only those sources.

The input audio is automatically resampled and channel-mixed to match the model's expected sample rate and number of channels.

Computation can be performed on CPU or GPU depending on the value of TargetDevice.

ResourceFunction["AudioSeparate"] supports the following options:

"Overlap"

0.25

fraction of overlap between consecutive chunks. Must be between 0 and 0.5.

BatchSize

number of chunks processed in a single batch. Can be a positive integer or Infinity.

TargetDevice

"CPU"