Wolfram Function Repository
Instant-use add-on functions for the Wolfram Language
Function Repository Resource:
Synthesize images using the Stable Diffusion neural network
ResourceFunction["StableDiffusionSynthesize"][prompt] synthesize an image given a string or explicit text embedding vector as prompt. | |
ResourceFunction["StableDiffusionSynthesize"][prompt→latent] use an initial image or a noise as latent for the diffusion starting point. | |
ResourceFunction["StableDiffusionSynthesize"][prompt→latent→guidanceScale] specify a guidance scale. | |
ResourceFunction["StableDiffusionSynthesize"][{negativeprompt,prompt}→latent→guidanceScale] specify a guidance scale with a negative prompt. | |
ResourceFunction["StableDiffusionSynthesize"][<|"Prompt"→…,"NegativePrompt"→…,"Latent"→…,"GuidanceScale"→…,…|>] provide an association with explicit arguments. | |
ResourceFunction["StableDiffusionSynthesize"][prompt,n] generate n instances for the same prompt specification. | |
ResourceFunction["StableDiffusionSynthesize"][{p1,p2,…}] generate multiple images. | |
ResourceFunction["StableDiffusionSynthesize"][{p1,p2,…},n] generate multiple images for each prompt. |
Generate an image by giving a text prompt:
| In[1]:= |
| Out[1]= | ![]() |
Generate multiple images:
| In[2]:= |
| Out[2]= | ![]() |
Guide an initial image with a prompt:
| In[3]:= | ![]() |
| Out[3]= | ![]() |
Use negative prompt for additional guidance (bottom row):
| In[4]:= | ![]() |
| Out[4]= | ![]() |
Use a precomputed text embedding:
| In[5]:= | ![]() |
| In[6]:= |
| Out[6]= | ![]() |
Use an explicit initial noise:
| In[7]:= |
| In[8]:= |
| Out[8]= | ![]() |
A higher guidance scale encourages generation of images that are more closely linked to the prompt, usually at the expense of lower image quality:
| In[9]:= | ![]() |
| In[10]:= | ![]() |
| Out[10]= | ![]() |
Specify encoding strength (how much to transform the reference image):
| In[11]:= |
| In[12]:= | ![]() |
| Out[12]= | ![]() |
Specify number of diffusion iterations (default is 50):
| In[13]:= |
| Out[13]= | ![]() |
Default Automatic reporting shows latent images in the process of diffusion:

ProgressReporting→False disables it.
Return intermediate images for each diffusion iteration:
| In[14]:= | ![]() |
| Out[14]= | ![]() |
Return a pair with a list of latents and the result {latents,result}:
| In[15]:= | ![]() |
| Out[15]= | ![]() |
Specify custom neural network parts from a different trained checkpoints, modified by Text Inversion, LoRA or other techniques:
| In[16]:= | ![]() |
| Out[16]= | ![]() |
By default TargetDevice is "GPU" as the network is extremely slow and not recommended to be run on "CPU":
| In[17]:= |
| Out[17]= |
"UNetTargetDevice" and "UNetBatchSize" options can overwrite TargetDevice and BatchSize, which may be useful when Decoder can't handle the same BatchSize for decoding too many images:

Progressively modify the neural network to see how it gradually breaks down:
| In[18]:= |
| In[19]:= |
| In[20]:= |
| In[21]:= |
| In[22]:= | ![]() |
| Out[22]= | ![]() |
Wolfram Language 13.0 (December 2021) or above
This work is licensed under a Creative Commons Attribution 4.0 International License