Function Repository Resource:

SmilesString

Source Notebook

Get a SMILES string for a molecule

Contributed by: Jason Biggs

ResourceFunction["SmilesString"][mol]

returns the SMILES string for the molecule mol.

Details and Options

SMILES is an acronym for simplified molecular-input line-entry system.
ResourceFunction["SmilesString"] works on "Chemical" entities as well as molecules.
ResourceFunction["SmilesString"] has the following options:
"AllBondsExplicit"Falsewhether to explicitly show all bonds
"Canonical"Truewhether to list atoms in canonical order
IncludeAromaticBondsAutomaticwhether to use aromatic or Kekule form
"IncludedAtoms"Allwhich atoms to include in the string
IncludeHydrogensAutomaticinclude hydrogens as distinct atoms
"Isomeric"Trueinclude stereochemistry and isotope information
"RootedAtom"Automaticthe atom to begin the string
"WriteImplicitHydrogens"Falsewhether to show all implicit hydrogens with their heavy atom
With the default options, ResourceFunction["SmilesString"][mol] is equivalent to MoleculeValue[mol,"SMILES"].

Examples

Basic Examples (3) 

Get the SMILES string from a molecule:

In[1]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/da725a70-bf69-4598-9550-d1ab313c4d48"]
Out[2]=

This is equivalent to the molecule property "SMILES":

In[3]:=
SameQ[%, MoleculeValue[m, "SMILES"]]
Out[3]=

Get the SMILES string without stereochemistry information:

In[4]:=
ResourceFunction["SmilesString"][m, "Isomeric" -> False]
Out[4]=

Scope (3) 

Get the SMILES string for a chemical entity:

In[5]:=
ResourceFunction["SmilesString"][
 Entity["Chemical", "QuercetinDihydrate"], IncludeHydrogens -> False]
Out[5]=

The SMILES string returned can be used to construct a new Molecule object:

In[6]:=
Molecule[%]
Out[6]=

Use ToEntity to get back to the entity:

In[7]:=
ToEntity@%
Out[7]=

Options (8) 

By default, single bonds are omitted from the string. Use the "AllBondsExplicit" option to control this:

In[8]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/73f5f9e8-0cbd-4b28-abf7-c99f1e98eb22"]
Out[8]=

Two equivalent molecules will give the same SMILES string even if their atom ordering is different:

In[9]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/d636761b-47e3-43bb-983e-9a574ed0afa7"]
Out[11]=

To disable canonicalization of the atom ordering, use "Canonical"False:

In[12]:=
ResourceFunction["SmilesString"][#, "Canonical" -> False] & /@ {m, m2}
Out[12]=

With the default setting of IncludeAromaticBondsAutomatic, aromaticity in the SMILES string reflects the aromaticity in the Molecule expression:

In[13]:=
benzene = Molecule[{
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"]}, {
Bond[{1, 2}, "Aromatic"], 
Bond[{2, 3}, "Aromatic"], 
Bond[{3, 4}, "Aromatic"], 
Bond[{4, 5}, "Aromatic"], 
Bond[{5, 6}, "Aromatic"], 
Bond[{6, 1}, "Aromatic"]}];
benzeneKekule = MoleculeModify[benzene, "Kekulize"];
ResourceFunction["SmilesString"] /@ {benzene, benzeneKekule}
Out[15]=

Giving an explicit setting for the IncludeAromaticBonds option will override this behavior:

In[16]:=
Table[ResourceFunction["SmilesString"][mol, IncludeAromaticBonds -> bool], {bool, {True, False}}, {mol, {benzene, benzeneKekule}}]
Out[16]=

The "IncludedAtoms" option allows finding the SMILES string for a molecule fragment. The value of the option should be All or a list of atom indices:

In[17]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/951d7b53-7e79-4f08-a3e3-618d2fd92644"]
Out[17]=

Note that the SMILES for a fragment will not necessarily be a valid:

In[18]:=
Molecule@%
Out[18]=

When the included atoms are not bonded, the fragment SMILES will be disconnected:

In[19]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/334ebf84-95ee-435f-b011-44ca5256ca16"]
Out[19]=

With the default setting of IncludeHydrogensAutomatic, hydrogen atoms explicitly present in a Molecule expression will be in the resulting string:

In[20]:=
benzene = Molecule[{
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"]}, {
Bond[{1, 2}, "Aromatic"], 
Bond[{2, 3}, "Aromatic"], 
Bond[{3, 4}, "Aromatic"], 
Bond[{4, 5}, "Aromatic"], 
Bond[{5, 6}, "Aromatic"], 
Bond[{6, 1}, "Aromatic"]}];
benzeneKekule = MoleculeModify[benzene, "AddHydrogens"];
ResourceFunction["SmilesString"] /@ {benzene, benzeneKekule}
Out[22]=

Giving an explicit setting for the IncludeHydrogens option will override this behavior:

In[23]:=
Table[ResourceFunction["SmilesString"][mol, IncludeHydrogens -> bool], {bool, {True, False}}, {mol, {benzene, benzeneKekule}}]
Out[23]=

Use the "Isomeric" option to control whether isotope information is encoded:

In[24]:=
m = Molecule[{Entity["Isotope", "Hydrogen2"], "O", Entity["Isotope", "Hydrogen3"]}, {Bond[{1, 2}], Bond[{2, 3}]}];
ResourceFunction["SmilesString"][m, "Isomeric" -> #] & /@ {True, False}
Out[25]=

Double–bond and tetrahedral stereochemistry is controlled by this option as well:

In[26]:=
ResourceFunction["SmilesString"][Molecule[{
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"]}, {
Bond[{1, 2}, "Single"], 
Bond[{2, 3}, "Single"], 
Bond[{3, 4}, "Single"], 
Bond[{4, 5}, "Double"], 
Bond[{5, 6}, "Single"], 
Bond[{6, 7}, "Single"], 
Bond[{7, 8}, "Single"], 
Bond[{6, 9}, "Single"]}, StereochemistryElements -> {<|"StereoType" -> "Tetrahedral", "ChiralCenter" -> 6, "Direction" -> "Counterclockwise", "FiducialAtom" -> 5, "Ligands" -> {7, 9}|>, <|"StereoType" -> "DoubleBond", "StereoBond" -> {4, 5}, "Ligands" -> {3, 6}, "Value" -> "Together"|>}], "Isomeric" -> #] & /@ {True, False}
Out[26]=

Use "RootedAtom"n to create a SMILES string starting at the atom with index n:

In[27]:=
listOfSmiles = ResourceFunction["SmilesString"][Molecule[{
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"]}, {
Bond[{1, 2}, "Single"], 
Bond[{2, 3}, "Single"], 
Bond[{3, 4}, "Single"], 
Bond[{4, 5}, "Double"], 
Bond[{5, 6}, "Single"], 
Bond[{6, 7}, "Single"], 
Bond[{7, 8}, "Single"], 
Bond[{6, 9}, "Single"]}, StereochemistryElements -> {<|"StereoType" -> "Tetrahedral", "ChiralCenter" -> 6, "Direction" -> "Counterclockwise", "FiducialAtom" -> 5, "Ligands" -> {7, 9}|>, <|"StereoType" -> "DoubleBond", "StereoBond" -> {4, 5}, "Ligands" -> {3, 6}, "Value" -> "Together"|>}], "RootedAtom" -> #] & /@ Range[9];
Column[listOfSmiles]
Out[28]=

These SMILES strings all create equivalent molecules:

In[29]:=
MoleculeEquivalentQ @@ Molecule /@ listOfSmiles
Out[29]=

Implicit hydrogens are not included in a SMILES string when their presence can be inferred from normal valence rules. Use "WriteImplicitHydrogens"True to write all implicit hydrogens:

In[30]:=
mol = Molecule["hexane", IncludeHydrogens -> False];
ResourceFunction["SmilesString"][mol, "WriteImplicitHydrogens" -> #] & /@ {True, False}
Out[31]=

Publisher

JasonB

Version History

  • 1.0.0 – 29 July 2020

Related Resources

License Information