Function Repository Resource:

FindNucleicAcidBasePairs

Source Notebook

Find the positions of base pairs in a nucleic acid BioMolecule

Contributed by: Robert Nachbar

ResourceFunction["FindNucleicAcidBasePairs"][biomol,part]

gives the positions of base-pairs in part of BioMolecule biomol.

Details and Options

part is an Association of the form <|"Model"m,"ChainIDs"{c1,c2,}|>, m defaults to 1 and Automatic can be used for chain IDs.
Base-pair positions are given as a list of {{c1,r1,a1},{c2,r2,a2}}, where ci are chain IDs, ri are residue IDs, and ai are nucleotide residue abbreviations.
Only interchain base-pairs are returned when 2 chains are given.
The presence of a base-pair is determined solely by the geometry as given by the ResourceFunction["NucleicAcidBasePairParameters"] and the number of hydrogen bonds.
A hydrogen bond is defined as a distance of 3.3 Å or less between a hydrogen bond donor atom in one base and a hydrogen bond acceptor atom in the other base.
The following options can be given:
"MaxOriginDistance"maximum distance between base reference frame origins
"MaxVerticalSeparation"maximum vertical separation of base (also known as stagger)
"MaxBaseNormalsAngle"maximum angle between base plane normals
"MinHydrogenBondCount"2minimum number of hydrogen bonds between bases

Examples

Basic Examples (3) 

Import the structure for the yeast phenylalanine tRNA:

In[1]:=
biomol = BioMolecule[ExternalIdentifier["PDBStructureID", "1EHZ"]]
Out[1]=

Visualize the molecule:

In[2]:=
BioMoleculePlot3D[biomol, ImageSize -> Small, ColorRules -> "Residues"]
Out[3]=

Find the base-pairs:

In[4]:=
ResourceFunction["FindNucleicAcidBasePairs"][biomol]
Out[4]=

Scope (2) 

Import a duplex RNA:

In[5]:=
biomol = BioMolecule[ExternalIdentifier["PDBStructureID", "4JRD"]]

Base-pairs between chains are found when more than one nucleic acid polymer is present in the BioMolecule:

In[6]:=
BioMoleculePlot3D[biomol, ImageSize -> Small]
Out[7]=

Find the base-pairs:

In[8]:=
ResourceFunction["FindNucleicAcidBasePairs"][biomol]
Out[8]=

Import a short RNA that has four models:

In[9]:=
biomol = BioMolecule[ExternalIdentifier["PDBStructureID", "1ATV"]]
Out[9]=

Base-pairs can be found for different models in a BioMolecule:

In[10]:=
BioMoleculePlot3D[biomol, ImageSize -> Small, ColorRules -> "Residues"]
Out[9]=

Find the base-pairs for all the models:

In[11]:=
ResourceFunction["FindNucleicAcidBasePairs"][biomol, <|"Model" -> All|>]
Out[11]=

Options (12) 

MaxOriginDistance (3) 

Import an RNA biomolecule:

In[12]:=
biomol = BioMolecule[ExternalIdentifier["PDBStructureID", "1ATV"]];

Compute the base-pairs with the default criteria:

In[13]:=
ResourceFunction["FindNucleicAcidBasePairs"][biomol]
Out[13]=

Use a severely constrained "MaxOriginDistance" criterion:

In[14]:=
ResourceFunction["FindNucleicAcidBasePairs"][biomol, "MaxOriginDistance" -> 0.7]
Out[14]=

MaxVerticalSeparation (3) 

Import an RNA biomolecule:

In[15]:=
biomol = BioMolecule[ExternalIdentifier["PDBStructureID", "1ATV"]];

Compute the base-pairs with the default criteria:

In[16]:=
ResourceFunction["FindNucleicAcidBasePairs"][biomol]
Out[16]=

Use a severely constrained "MaxVerticalSeparation" criterion:

In[17]:=
ResourceFunction["FindNucleicAcidBasePairs"][biomol, "MaxVerticalSeparation" -> 0.3]
Out[17]=

MaxBaseNormalsAngle (3) 

Import an RNA biomolecule:

In[18]:=
biomol = BioMolecule[ExternalIdentifier["PDBStructureID", "1ATV"]];

Compute the base-pairs with the default criteria:

In[19]:=
ResourceFunction["FindNucleicAcidBasePairs"][biomol]
Out[19]=

Use a severely constrained "MaxBaseNormalsAngle" criterion:

In[20]:=
ResourceFunction["FindNucleicAcidBasePairs"][biomol, "MaxBaseNormalsAngle" -> 25]
Out[20]=

MinHydrogenBondCount (3) 

Import an RNA biomolecule:

In[21]:=
biomol = BioMolecule[ExternalIdentifier["PDBStructureID", "1ATV"]];

Compute the base-pairs with the default criteria:

In[22]:=
ResourceFunction["FindNucleicAcidBasePairs"][biomol]
Out[22]=

Use a relaxed "MinHydrogenBondCount" criterion:

In[23]:=
ResourceFunction["FindNucleicAcidBasePairs"][biomol, "MinHydrogenBondCount" -> 1]
Out[23]=

Applications (6) 

Nucleic acid base-pairs can be used to construct the sequence bonds of a BioSequence. Import a short RNA strand that forms a simple loop:

In[24]:=
biomol = BioMolecule[ExternalIdentifier["PDBStructureID", "1ATV"]];
BioMoleculePlot3D[biomol, ImageSize -> Small, ColorRules -> "Residues"]
Out[25]=

The sequence bonds are not present in the BioMolecule:

In[26]:=
biomol["BioSequences"]["A"]["SequenceBondList"]
Out[26]=

Find the positions of the base-pairs:

In[27]:=
basePairs = ResourceFunction["FindNucleicAcidBasePairs"][biomol]
Out[27]=

Construct the BioSequence with "Multihydrogen" bonds:

In[28]:=
bioseq = BioSequence[biomol["BioSequences"]["A"]["SequenceType"], biomol["BioSequences"]["A"]["SequenceString"], Bond[#, "MultiHydrogen"] & /@ basePairs[[All, All, 3]]]
Out[28]=

Display the secondary structure:

In[29]:=
BioSequencePlot[bioseq]
Out[29]=

Display the biomolecule with the base-pairs colored:

In[30]:=
BioMoleculePlot3D[biomol, ColorRules -> Append[MapThread[
    Splice@Thread[#1 -> #2] &, {basePairs[[All, All, 2 ;; 3]], {StandardRed, StandardBlue, StandardGreen, StandardCyan, StandardYellow, StandardMagenta}}], _ -> StandardGray]]
Out[30]=

Possible Issues (4) 

Atom labels in biomolecules are generally standardized, but exceptions are known, so the function may occasionally fail to find the atoms of a nucleotide base and therefore be unable to compute the base-pair parameters. When that occurs, the residues are removed from further consideration.

Import the structure of 3B5A, a minimally hinged hairpin ribozyme:

In[31]:=
biomol = BioMolecule[ExternalIdentifier["PDBStructureID", "3B5A"]];
BioMoleculePlot3D[biomol]
Out[32]=

Display the residue abbreviations for the RNA chains:

In[33]:=
biomol["ResidueAbbreviations"][[
 Keys@Select[biomol["ChainTypes"], EqualTo["RNA"]]]]
Out[33]=

Note the nonstandard residue A2m in chain A and S9l in chain B. Attempt to compute the base pairs:

In[34]:=
basePairs = ResourceFunction["FindNucleicAcidBasePairs"][biomol]
Out[34]=

Count the intra- (A-A, B-B, C-C) and inter-strand (A-B, A-C, B-C) base pairs:

In[35]:=
CountsBy[basePairs, #[[All, 2]] &]
Out[35]=

Neat Examples (2) 

Base-pairing in RNA can be quite intricate, and many topologies are known.

Import the structure of an RNA "kissing" hairpin complex of the HIV TAR hairpin loop and its complement:

In[36]:=
biomol = BioMolecule[ExternalIdentifier["PDBStructureID", "1KIS"]];
BioMoleculePlot3D[biomol]
Out[37]=

Compute the base-pairs:

In[38]:=
basePairs = ResourceFunction["FindNucleicAcidBasePairs"][biomol, "MaxVerticalSeparation" -> 2.25]
Out[38]=

Count the intra- (A-A, B-B) and inter-strand (A-B) base pairs:

In[39]:=
CountsBy[basePairs, #[[All, 2]] &]
Out[39]=

Construct the BioSequence:

In[40]:=
bioseq = BioSequence[
  BioSequence["RNA", #] & /@ StringJoin @@@ Values@biomol["ResidueAbbreviations"],
  Bond[#, "MultiHydrogen"] & /@ (basePairs[[All, All, {2, 3}]] /. {"A" -> 1, "B" -> 2})
  ]
Out[40]=

Display the secondary structure:

In[41]:=
BioSequencePlot[bioseq]
Out[41]=

Import an RNA structure with a three way junction (3WJ):

In[42]:=
biomol = BioMolecule[ExternalIdentifier["PDBStructureID", "4KZ2"]];
BioMoleculePlot3D[biomol]
Out[43]=

Compute the base-pairs:

In[44]:=
basePairs = ResourceFunction["FindNucleicAcidBasePairs"][biomol]
Out[44]=

Count the intra- (A-A, B-B, C-C) and inter-strand (A-B, A-C, B-C) base pairs:

In[45]:=
CountsBy[basePairs, #[[All, 2]] &]
Out[45]=

Construct the BioSequence:

In[46]:=
bioseq = BioSequence[
  BioSequence["RNA", #] & /@ StringJoin @@@ Values@KeySelect[biomol["ResidueAbbreviations"], biomol["ChainTypes"][#] == "RNA" &],
  Bond[#, "MultiHydrogen"] & /@ (basePairs[[All, All, {2, 3}]] /. {"A" -> 1, "B" -> 2, "C" -> 3})
  ]
Out[46]=

Display the secondary structure:

In[47]:=
BioSequencePlot[bioseq]
Out[47]=

Publisher

Robert Nachbar

Version History

  • 1.0.0 – 13 May 2026

Source Metadata

Related Resources

License Information