Function Repository Resource:

MACCSKeys

Source Notebook

Compute the 166-bit MACCS (Molecular ACCess System) key

Contributed by: Joshua Schrier

ResourceFunction["MACCSKeys"][molecule]

returns the MACCS key for the Molecule molecule.

ResourceFunction["MACCSKeys"][smiles]

returns the MACCS key for a molecule specified by the SMILES string smiles.

Details and Options

MACCS keys are used to compute molecular similarity for computational drug design and database matching.
This function follows the RDKit implementation of the 166-bit MACCS keys. As such, the same caveats as with the RDKit implementation apply here: Specifically, the isotope flag (key 1) is undefined and the public MACCS keys have been "reverse engineered".
The default output returns the keys as a 1-indexed, 166-entry SparseArray.
ResourceFunction["MACCSKeys"] takes an option "OutputStyle", whose possible values are: "SparseArray", "OnBits", "MoleculePlot", "Function" and "SMARTS".
ResourceFunction["MACCSKeys"] is Listable.

Examples

Basic Examples (2) 

MACCSKeys can take either a SMILES string or a Molecule as input. By default, it returns a SparseArray containing the 166 bits:

In[1]:=
ResourceFunction["MACCSKeys"]["Cn1c(=O)c2c(ncn2C)n(C)c1=O"]
Out[1]=
In[2]:=
ResourceFunction["MACCSKeys"][Molecule["caffeine"]]
Out[2]=

MACCSKeys is a Listable function:

In[3]:=
ResourceFunction[
 "MACCSKeys"][{"CCO", "O" , "[H][C@@]1([C@@H](C2=CC=NC3=CC=C(C=C23)OC)O)C[C@@H]4CC[N@]1C[C@@H]4C=C"}]
Out[3]=

Options (5) 

OutputStyle (5) 

Option values include "SparseArray", "OnBits", "MoleculePlot", "Function" and "SMARTS". The default setting of "SparseArray" returns the 166-bit vector:

In[4]:=
ResourceFunction["MACCSKeys"]["caffeine", "OutputStyle" -> "SparseArray"]
Out[4]=

The "OnBits" setting returns a list of the active (non-zero) bits in the MACCS key. These are 1-indexed (as is conventional in the Wolfram Language):

In[5]:=
ResourceFunction["MACCSKeys"]["caffeine", "OutputStyle" -> "OnBits"]
Out[5]=

The "MoleculePlot" setting returns an association whose keys are the active bits and whose values are the MoleculePlots corresponding to the MoleculePattern that was matched for that key. Here we take the first three, for brevity:

In[6]:=
ResourceFunction["MACCSKeys"]["caffeine", "OutputStyle" -> "MoleculePlot"][[;; 3]]
Out[6]=

The "Function" setting returns an association whose values are pure functions responsible for generating each key:

In[7]:=
ResourceFunction["MACCSKeys"]["caffeine", "OutputStyle" -> "Function"][[;; 3]]
Out[7]=

The "SMARTS" setting returns an Association whose values are the SMARTS specification for the pattern. Note that not all MACCS keys can be defined as SMARTS patterns (these return a “?”) and some MACCS keys require finding a certain number of matches above some threshold, so the SMARTS specification alone is not always a complete description of the key:

In[8]:=
ResourceFunction["MACCSKeys"]["caffeine", "OutputStyle" -> "SMARTS"][[;; 3]]
Out[8]=

Applications (2) 

Compare the structural similarity of six common statin drugs using the JaccardDissimilarity of the MACCS keys (one minus this is equivalent to the Tanimoto similarity):

In[9]:=
statins = <|
   "Zocor" -> "CCC(C)(C)C(=O)O[C@H]1C[C@H](C=C2[C@H]1[C@H]([C@H](C=C2)C)CC[C@@H]3C[C@H](CC(=O)O3)O)C", Sequence[
   "Pravachol" -> "CC[C@H](C)C(=O)O[C@H]1C[C@@H](C=C2[C@H]1[C@H]([C@H](C=C2)C)CC[C@H](C[C@H](CC(=O)[O-])O)O)O", "Lipitor" -> "CC(C)C1=C(C(=C(N1CC[C@H](C[C@H](CC(=O)[O-])O)O)C2=CC=C(C=C2)F)C3=CC=CC=C3)C(=O)NC4=CC=CC=C4.[Ca+2]", "Lescol" -> "CC(C)N1C2=CC=CC=C2C(=C1/C=C/[C@H](C[C@H](CC(=O)O)O)O)C3=CC=C(C=C3)F", "Crestor" -> "CC(C1=NC(=NC(=C1/C=C/[C@@H](O)C[C@@H](O)CC(=O)[O-])C2=CC=C(C=C2)F)N(S(=O)(=O)C)C)C.CC(C1=NC(=NC(=C1/C=C/[C@@H](O)C[C@@H](O)CC(=O)[O-])C2=CC=C(C=C2)F)N(S(=O)(=O)C)C)C.[Ca+2]", "Altoprev" -> "CC[C@H](C)C(=O)O[C@H]1C[C@H](C=C2[C@H]1[C@H]([C@H](C=C2)C)CC[C@@H]3C[C@H](CC(=O)O3)O)C"]|>; (*define 6 common statin drugs*)

similarity = 1. - DistanceMatrix[
    Values@
     ResourceFunction["MACCSKeys"]@
      statins, (*use the MACCS keys to calculate the (dis)similarity*)
    DistanceFunction -> JaccardDissimilarity];

TableForm[similarity, (*display*)
 TableHeadings -> {Keys[statins], Keys[statins]}]
Out[527]=

Empirically, less than 3% of randomly selected molecules have a MACCS Tanimoto similarity above 0.6. Use this as a threshold to visualize which molecules are similar to one another:

In[528]:=
AdjacencyGraph[
 Keys@statins,(*use drug names as vertexes*)
 UnitStep[
  similarity - IdentityMatrix[6] - 0.6 ], (*only draw edges above threshold*)
 VertexLabels -> KeyValueMap[#1 -> Tooltip[#1, Thumbnail@MoleculePlot@Molecule@#2] &,
    statins] (*create mouseover graphics with molecule images*)
 ]
Out[528]=

Neat Examples (7) 

How similar are random PubChem molecules? (7) 

Generate random compound IDs and SMILES strings:

In[529]:=
RandomSeed[1841];
cids = RandomInteger[10^8, 200] + 1; (*generate random CompoundIDs*)

smiles = (*Look up SMILES strings for the CIDS*)
  ServiceExecute["PubChem", "CompoundProperties", {"CompoundID" -> cids, "Property" -> "IsomericSMILES"}][[All, "IsomericSMILES"]] // Normal;

Get their MAACS keys:

In[530]:=
maccs = ResourceFunction["MACCSKeys"][
   smiles]; (*compute MACCS keys using Listable form*)

The function for Tanimoto similarity:

In[531]:=
tanimotoSimilarity[u_, v_List] := Map[1. - JaccardDissimilarity[u, #] &, v]

Use the function to compute the Tanimoto similarites of the random compounds:

In[532]:=
scores = ParallelMap[ (*compute all the pairwise Tanimoto Similarities*)
     tanimotoSimilarity[maccs[[#]], maccs[[# + 1 ;; -1]]] &,
    Range[199]] // Flatten;

Generate a histogram of the similarity scores:

In[533]:=
Histogram[scores, Automatic, #, PlotLabel -> #] & /@ {"PDF", "CDF"} // GraphicsRow
Out[533]=

The mean Tanimoto similarity score for randomly selected molecules is approximately 0.35:

In[534]:=
Mean[scores]
Out[534]=

Only about 3% of randomly chosen molecules will have a Tanimoto similarity score above 0.6:

In[535]:=
Quantile[scores, 0.97]
Out[535]=

Publisher

Joshua Schrier

Version History

  • 1.0.0 – 27 December 2019

Related Resources

License Information