Wolfram Research

Function Repository Resource:

MACCSKeys

Source Notebook

Compute the 166-bit MACCS (Molecular ACCess System) key

Contributed by: Joshua Schrier

ResourceFunction["MACCSKeys"][molecule]

returns the MACCS key for the Molecule molecule.

ResourceFunction["MACCSKeys"][smiles]

returns the MACCS key for a molecule specified by the SMILES string smiles.

Details and Options

MACCS keys are used to compute molecular similarity for computational drug design and database matching.
This function follows the RDKit implementation of the 166-bit MACCS keys.
The default output returns the keys as a 1-indexed, 166-entry SparseArray.
MACCSKeys takes an option "OutputStyle", whose possible values are: "SparseArray", "OnBits", "MoleculePlot", "Function" and "SMARTS".
ResourceFunction["MACCSKeys"] is Listable.

Examples

Basic Examples

The function can take either a SMILES string or a Molecule as input. By default it returns a SparseArray containing the 166 bits:

In[1]:=
ResourceFunction["MACCSKeys"]["Cn1c(=O)c2c(ncn2C)n(C)c1=O"]
Out[1]=
In[2]:=
ResourceFunction["MACCSKeys"][Molecule["caffeine"]]
Out[2]=

MACCSKeys is a Listable function:

In[3]:=
ResourceFunction[
 "MACCSKeys"][{"CCO", "O" , "[H][C@@]1([C@@H](C2=CC=NC3=CC=C(C=C23)OC)O)C[C@@H]4CC[N@]1C[C@@H]\
4C=C"}]
Out[3]=

Options

OutputStyle

Option values include "SparseArray", "OnBits", "MoleculePlot", "Function" and "SMARTS". The default setting of "SparseArray" returns the 166-bit vector:

In[4]:=
ResourceFunction["MACCSKeys"]["caffeine", "OutputStyle" -> "SparseArray"]
Out[4]=

The "OnBits" setting returns a list of the active (non-zero) bits in the MACCS key. These are 1-indexed (as is conventional in the Wolfram Language):

In[5]:=
ResourceFunction["MACCSKeys"]["caffeine", "OutputStyle" -> "OnBits"]
Out[5]=

The "MoleculePlot" setting returns an association whose keys are the active bits and whose values are the MoleculePlots corresponding to the MoleculePattern that was matched for that key. Here we take the first three, for brevity:

In[6]:=
ResourceFunction["MACCSKeys"]["caffeine", "OutputStyle" -> "MoleculePlot"][[;; 3]]
Out[6]=

The "Function" setting returns an association whose values are pure functions responsible for generating each key:

In[7]:=
ResourceFunction["MACCSKeys"]["caffeine", "OutputStyle" -> "Function"][[;; 3]]
Out[7]=

The "SMARTS" setting returns an association whose values are the SMARTS specification for the pattern; note that not all MACCS keys can be defined as SMARTS patterns (these return a “?”) and some MACCS keys require finding a certain number of matches above some threshold, so the SMARTS specification alone is not always a complete description of the key:

In[8]:=
ResourceFunction["MACCSKeys"]["caffeine", "OutputStyle" -> "SMARTS"][[;; 3]]
Out[8]=

Applications

Compare the structural similarity of six common statin drugs using the JaccardDissimilarity of the MACCS keys (one minus this is equivalent to the Tanimoto similarity):

In[9]:=
statins = <|
   "Zocor" -> "CCC(C)(C)C(=O)O[C@H]1C[C@H](C=C2[C@H]1[C@H]([C@H](C=C2)C)CC[C@@H]\
3C[C@H](CC(=O)O3)O)C", Sequence[
   "Pravachol" -> "CC[C@H](C)C(=O)O[C@H]1C[C@@H](C=C2[C@H]1[C@H]([C@H]\
(C=C2)C)CC[C@H](C[C@H](CC(=O)[O-])O)O)O", "Lipitor" -> "CC(C)C1=C(C(=C(N1CC[C@H](C[C@H](CC(=O)[O-])O)O)C2=\
CC=C(C=C2)F)C3=CC=CC=C3)C(=O)NC4=CC=CC=C4.[Ca+2]", "Lescol" -> "CC(C)N1C2=CC=CC=C2C(=C1/C=C/[C@H](C[C@H](CC(=O)O)O)O)\
C3=CC=C(C=C3)F", "Crestor" -> "CC(C1=NC(=NC(=C1/C=C/[C@@H](O)C[C@@H](O)CC(=O)[O-])\
C2=CC=C(C=C2)F)N(S(=O)(=O)C)C)C.CC(C1=NC(=NC(=C1/C=C/[C@@H](O)C[C@@H](\
O)CC(=O)[O-])C2=CC=C(C=C2)F)N(S(=O)(=O)C)C)C.[Ca+2]", "Altoprev" -> "CC[C@H](C)C(=O)O[C@H]1C[C@H](C=C2[C@H]1[C@H]([C@H](\
C=C2)C)CC[C@@H]3C[C@H](CC(=O)O3)O)C"]|>; (*define 6 common statin \
drugs*)

similarity = 1. - DistanceMatrix[
    Values@
     ResourceFunction["MACCSKeys"]@
      statins, (*use the MACCS keys to calculate the (dis)similarity*) DistanceFunction -> JaccardDissimilarity];

TableForm[similarity, (*display*) TableHeadings -> {Keys[statins], Keys[statins]}]
Out[10]=

Empirically, less than 3% of randomly selected molecules have a MACCS Tanimoto similarity above 0.6. Use this as a threshold to visualize which molecules are similar to one another:

In[11]:=
AdjacencyGraph[
 Keys@statins,(*use drug names as vertexes*) UnitStep[similarity - IdentityMatrix[6] - 0.6 ], (*only draw edges above threshold*) VertexLabels -> KeyValueMap[#1 -> Tooltip[#1, Thumbnail@MoleculePlot@Molecule@#2] &,
    statins] (*create mouseover graphics with molecule images*)
 ]
Out[11]=

Possible Issues

The same caveats as with the RDKit implementation apply here—specifically the isotope flag (key 1) is undefined and the public MACCS keys have been “reverse engineered”.

Neat Examples

How similar are random PubChem molecules?

In[12]:=
Histogram[scores, Automatic, #, PlotLabel -> #] & /@ {"PDF", "CDF"} // GraphicsRow
Out[12]=

The mean Tanimoto similarity score for randomly selected molecules is approximately 0.35:

In[13]:=
Mean[scores]
Out[13]=

Only about 3% of randomly chosen molecules will have a Tanimoto similarity score above 0.6:

In[14]:=
Quantile[scores, 0.97]
Out[14]=

Resource History

Source Metadata

Related Resources

License Information