Function Repository Resource:

RandomSmilesString

Source Notebook

Get a random SMILES string for a molecule

Contributed by: Jason Biggs

ResourceFunction["RandomSmilesString"][mol]

returns a SMILES string for the molecule mol with the atoms in a random order.

ResourceFunction["RandomSmilesString"][mol,n]

returns a list of n SMILES strings.

Details and Options

SMILES is an acronym for Simplified Molecular-Input Line-Entry System, a system to encode molecular structures as a string. SMILES strings change depending on the order in which the atoms are listed.
Randomized SMILES strings represent a non-unique encoding of a molecule and can be used in machine learning applications.
ResourceFunction["RandomSmilesString"] has the following options:
"AllBondsExplicit"Falsewhether to explicitly show all bonds
"AllHsExplicit"Falsewhether to explicitly list all implicit hydrogens
"Isomeric"Trueinclude stereochemistry and isotope information
"Kekulized"Falsewhether to use aromatic or Kekule form
IncludeHydrogensFalsewhether to include hydrogens as explicit atoms
mol can be a Molecule object, or something which can be easily converted to one such as a systematic chemical name, an Entity, or an ExternalIdentifier.

Examples

Basic Examples (3) 

Create a molecule and get a random SMILES string:

In[1]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/bb62831f-99af-4f97-a8b7-91d5b4b3e4ba"]
Out[1]=

Get a list of five random SMILES:

In[2]:=
ResourceFunction["RandomSmilesString"][m, 5] // TableForm
Out[2]=

All of the strings encode the same molecular structure:

In[3]:=
AllTrue[%, MoleculeMatchQ[m]]
Out[3]=

Compare this with the "IsomericSMILES" property, which lists the atoms in canonical order:

In[4]:=
MoleculeValue[m, "IsomericSMILES"]
Out[4]=

To get a reproducible SMILES string, use SeedRandom:

In[5]:=
{ResourceFunction["RandomSmilesString"]["caffeine"], ResourceFunction["RandomSmilesString"]["caffeine"]}
Out[5]=
In[6]:=
{SeedRandom[1234]; ResourceFunction["RandomSmilesString"]["caffeine"],
  SeedRandom[1234]; ResourceFunction["RandomSmilesString"]["caffeine"]}
Out[6]=

Use BlockRandom to block one use of RandomSmilesString from affecting others:

In[7]:=
{BlockRandom@ResourceFunction["RandomSmilesString"]["caffeine"], ResourceFunction["RandomSmilesString"]["caffeine"]}
Out[7]=

Options (5) 

AllBondsExplicit (1) 

By default single and aromatic bonds are elided. Use "AllBondsExplicit"True to change this:

In[8]:=
ResourceFunction["RandomSmilesString"]["caffeine", 3, "AllBondsExplicit" -> True]
Out[8]=

AllHsExplicit (1) 

By default implicit hydrogen counts are only included when necessary. Use "AllHsExplicit"True to change this:

In[9]:=
ResourceFunction["RandomSmilesString"]["phosphoric acid", "AllHsExplicit" -> #] & /@ {True, False}
Out[9]=

Isomeric (1) 

Use "Isomeric"True to include stereochemistry information:

In[10]:=
ResourceFunction["RandomSmilesString"]["l-alanine", "Isomeric" -> #] & /@ {True, False}
Out[10]=

Kekulized (1) 

Use "Kekulized"True to use localized single and double bonds in place of aromatic bonds:

In[11]:=
ResourceFunction["RandomSmilesString"]["coronene", "Kekulized" -> #] & /@ {True, False}
Out[11]=

IncludeHydrogens (1) 

Use IncludeHydrogensTrue to include hydrogen atoms as explicit atoms:

In[12]:=
ResourceFunction["RandomSmilesString"]["phosphoric acid", IncludeHydrogens -> #] & /@ {True, False}
Out[12]=

Possible Issues (1) 

Small molecules will have a limited number of SMILES representations:

In[13]:=
ResourceFunction["RandomSmilesString"]["water", 3]
Out[13]=
In[14]:=
ResourceFunction["RandomSmilesString"]["cyclohexane", 3]
Out[14]=

Publisher

JasonB

Version History

  • 1.0.1 – 15 April 2022
  • 1.0.0 – 06 April 2022

Source Metadata

Related Resources

License Information