Wolfram Language Paclet Repository

Community-contributed installable additions to the Wolfram Language

Primary Navigation

    • Cloud & Deployment
    • Core Language & Structure
    • Data Manipulation & Analysis
    • Engineering Data & Computation
    • External Interfaces & Connections
    • Financial Data & Computation
    • Geographic Data & Computation
    • Geometry
    • Graphs & Networks
    • Higher Mathematical Computation
    • Images
    • Knowledge Representation & Natural Language
    • Machine Learning
    • Notebook Documents & Presentation
    • Scientific and Medical Data & Computation
    • Social, Cultural & Linguistic Data
    • Strings & Text
    • Symbolic & Numeric Computation
    • System Operation & Setup
    • Time-Related Computation
    • User Interface Construction
    • Visualization & Graphics
    • Random Paclet
    • Alphabetical List
  • Using Paclets
    • Get Started
    • Download Definition Notebook
  • Learn More about Wolfram Language

Selfies

Guides

  • Selfies Functions

Tech Notes

  • Generating Molecules With SELFIES

Symbols

  • EncodingToSelfies
  • FromSelfies
  • SelfiesAlphabet
  • SelfiesCounts
  • SelfiesEncoding
  • SplitSelfies
  • ToSelfies
Generating Molecules With SELFIES
​
In contrast to other string-based molecule the SMILES and InChI formats, any combination of SELFIES tokens can be parsed to a valid chemical structure. We can take any molecule and find the tokens in its SELFIES string:
In[8]:=
tokens=
SplitSelfies
[Molecule["adenosine triphosphate"]]
Out[8]=
{[N],[C],[=N],[C],[=N],[C],[=C],[Ring1],[=Branch1],[N],[=C],[N],[Ring1],[Branch1],[C@@H1],[O],[C@H1],[Branch2],[Ring1],[O],[C],[O],[P],[=Branch1],[C],[=O],[Branch1],[C],[O],[O],[P],[=Branch1],[C],[=O],[Branch1],[C],[O],[O],[P],[=Branch1],[C],[=O],[Branch1],[C],[O],[O],[C@@H1],[Branch1],[C],[O],[C@H1],[Ring2],[Ring1],[Ring2],[O]}
Now take any random 30 tokens and combine them, you will get a valid SELFIES that can be converted into a SMILES string and from there to a
Molecule
:
In[9]:=
MoleculeEcho
FromSelfies
@Echo[StringJoin[RandomChoice[tokens,30]],"selfies:"],"smiles:"
»
selfies:[Branch1][C][O][Ring1][=Branch1][C][C][C][C][Ring1][Ring1][O][P][Branch1][Branch1][O][N][P][C@H1][C][C@@H1][N][O][P][O][=O][Ring2][Branch1][C][C@@H1]
»
smiles:C=O
Out[9]=
Molecule
Formula: C
H
2
O
Atoms:
4
Bonds:
3

From this one list of tokens we could create any number of random molecules:
In[10]:=
MapMoleculePlot@*
FromSelfies
@*StringJoin,RandomChoice[tokens,{3,50}]
Out[10]=

,
,

So far our list of tokens, our chemical alphabet if you will, is limited to the tokens in the input molecule. If we start with a larger alphabet we can generate a larger variety of structures:
In[11]:=
tokens2=
SelfiesAlphabet

ToSelfies

5K random SMILES from ChEMBL

Out[11]=
{[/C@@H1],[/Cl],[=Branch2],[/-Ring1],[C@],[\Br],[-\Ring1],[N-1],[Br-1],[/S],[=P],[-/Ring1],[Se],[/N],[Cl],[#N],[\-Ring1],[BH3-1],[\C@H1],[P@],[=C],[\S],[NH1],[S],[/O],[=NH2+1],[C],[OH0],[Ring3],[\N],[P+1],[C@@],[F],[N@+1],[\Cl],[=Ring2],[=N-1],[\O],[Ring1],[/C@H1],[#C],[I-1],[O],[N],[#C-1],[=N+1],[Ring2],[\C@@],[=O],[Branch2],[=N],[NH4+1],[Branch3],[Br],[=Se],[#Branch1],[-/Ring2],[\S+1],[I],[N+1],[O-1],[C@@H1],[Na+1],[P],[=Branch1],[As],[#Branch2],[\-Ring2],[Branch1],[\C],[K+1],[Li+1],[/C],[/-Ring2],[\N+1],[=Ring1],[Cl-1],[\C@@H1],[Si],[=O+1],[C@H1],[\O-1],[B-1],[NH1+1],[He],[B],[S+1],[=S]}
In[12]:=
MapMoleculePlot@*
FromSelfies
@*StringJoin,RandomChoice[tokens,{3,50}]
Out[12]=

,
,

We can make a better random molecule by taking into account the relative frequency of the various tokens in a representative dataset, i.e. the fact that carbon and oxygen are more likely to occur than selenium or helium. Use the function SelfiesCounts to find token frequency information:
In[13]:=
counts=
SelfiesCounts

ToSelfies

5K random SMILES from ChEMBL

Out[13]=
[O]9859,[C]90873,[=Branch1]17214,[=O]9239,[N]13255,[=C]29860,[Ring2]5050,[Ring1]19890,[=N]4023,[NH1]444,[=Branch2]1990,[Branch2]4383,[C@@H1]2410,[Branch1]17893,[#Branch2]1832,[P]870,[#Branch1]3008,[F]1989,[Cl]1281,.322,[C@H1]2496,[C@@]295,[O-1]280,[N+1]295,[S]2492,[Br]266,[\C]1042,[/C]187,[\S]17,[=Ring1]886,[#N]223,[C@]350,[#C]933,[=Ring2]156,[=S]137,[I]41,[Br-1]14,[/O]20,[\O]38,[\N]180,[Branch3]59,[/S]22,[\-Ring1]46,[/N]32,[-/Ring1]23,[I-1]17,[/-Ring1]28,[Cl-1]13,[=O+1]1,[Na+1]34,[\-Ring2]4,[=N+1]34,[=N-1]13,[\C@@H1]10,[S+1]18,[P+1]2,[Si]17,[/C@H1]4,[\C@H1]7,[\N+1]3,[\O-1]2,[NH1+1]2,[N-1]1,[/C@@H1]2,[Se]1,[B]11,[/-Ring2]6,[As]1,[-\Ring1]2,[Ring3]6,[=P]2,[-/Ring2]3,[OH0]2,[N@+1]1,[#C-1]1,[He]1,[BH3-1]1,[P@]1,[Li+1]3,[NH4+1]1,[\C@@]3,[/Cl]1,[\Cl]1,[\S+1]1,[K+1]1,[B-1]1,[=NH2+1]2,[=Se]1,[\Br]2
Now write a function to generate a random molecule using the counts:
In[14]:=
randomMol[tokenCount_]:=Molecule@
FromSelfies
@StringJoin@RandomChoice[Values[counts]Keys[counts],tokenCount]
In[15]:=
randomMol[50]
Out[15]=
Molecule
Formula:
C
9
H
14
N
O
4
P
Atoms:
29
Bonds:
30

In[16]:=
MoleculePlot@%
Out[16]=
RelatedGuides
▪
Selfies Functions
▪
Molecular Structure & Computation
​
""

© 2025 Wolfram. All rights reserved.

  • Legal & Privacy Policy
  • Contact Us
  • WolframAlpha.com
  • WolframCloud.com