Wolfram Language Paclet Repository

Community-contributed installable additions to the Wolfram Language

Primary Navigation

    • Cloud & Deployment
    • Core Language & Structure
    • Data Manipulation & Analysis
    • Engineering Data & Computation
    • External Interfaces & Connections
    • Financial Data & Computation
    • Geographic Data & Computation
    • Geometry
    • Graphs & Networks
    • Higher Mathematical Computation
    • Images
    • Knowledge Representation & Natural Language
    • Machine Learning
    • Notebook Documents & Presentation
    • Scientific and Medical Data & Computation
    • Social, Cultural & Linguistic Data
    • Strings & Text
    • Symbolic & Numeric Computation
    • System Operation & Setup
    • Time-Related Computation
    • User Interface Construction
    • Visualization & Graphics
    • Random Paclet
    • Alphabetical List
  • Using Paclets
    • Get Started
    • Download Definition Notebook
  • Learn More about Wolfram Language

MoleculeFingerprints

Guides

  • Molecule Fingerprints

Tech Notes

  • Substructure Screening

Symbols

  • AtomPairFingerprint
  • ExtendedConnectivityFingerprint
  • LayeredFingerprint
  • MACCSKeysFingerprint
  • MoleculeDistanceMatrix
  • MoleculeDistance
  • MoleculeNearest
  • PatternFingerprint
  • SubstructureKeyFingerprint
  • TopologicalFingerprint
  • $DefaultFingerprintFormat
  • $DefaultFingerprintType
Substructure Screening
​
Molecule fingerprints can be used for fast substructure searching due to the fact that if a particular substructure appears in a molecule, then all bits set in the substructure's fingerprint will also be set in the molecule's fingerprints. The

Prepare a dataset of fingerprints

First import one hundred thousand molecules randomly selected from the ChEMBL database.
In[22]:=
mols=Import[PacletObject["WolframChemistry/MoleculeFingerprints"]["AssetLocation","SMILES strings for 100K molecules from ChEMBL"]];
It is important to end the expression with a semicolon here, suppressing the output so that the system does not try to format each molecule.
Now precompute the fingerprints for these molecules, using the "BitVector" output format:
In[140]:=
fprints=
PatternFingerprint
[mols,"BitVector"];

Search the dataset for a query molecule

Now that we have precomputed the pattern fingerprints we can search through them for a query substructure. This works because all bits set in the query fingerprint will also be set in the fingerprint for a molecule containing the query.
In[47]:=
query=Molecule["caffeine"];​​MoleculePlot[query]
Out[48]=
Compute the query fingerprint:
In[49]:=
queryFP=
PatternFingerprint
[query,"BitVector"]
Out[49]=
DataStructure
Type:BitVector
Capacity:2048

Use
Pick
to find the molecules whose fingerprints contain the query:
In[50]:=
prescreened=Pick[mols,queryFP["Copy"]["BitAnd",#]===queryFP&/@fprints];//AbsoluteTiming
Out[50]=
{0.309261,Null}
It is possible that molecules that don't contain the query could still have the same bits set, due to bit collision:
In[51]:=
CountsBy[prescreened,MoleculeContainsQ[query,IncludeHydrogensFalse]]
Out[51]=
True167,False26
By using the fingerprints to quickly screen out the majority of the molecules without doing a full substructure search.
In[52]:=
CountsBy[mols,MoleculeContainsQ[query,IncludeHydrogensFalse]]//AbsoluteTiming
Out[52]=
{21.0927,False99833,True167}
The speedup by using fingerprints to prescreen is quite large:
In[53]:=
%〚1〛/%%%〚1〛
Out[53]=
68.2037
Of course computing the fingerprints all at once is an expensive operation, and the time savings will only fully be realized when searching for many substructures in a large set of molecules.
​
​
""

© 2025 Wolfram. All rights reserved.

  • Legal & Privacy Policy
  • Contact Us
  • WolframAlpha.com
  • WolframCloud.com