Galactooligosaccharide Conformations

Generate and cluster conformations of galactooligosaccharides for 2D NMR or enzyme active site docking studies

Galactooligosaccharides (GOS) are molecules found in plant-derived foods that are important for the good health of beneficial gut microflora. Humans cannot metabolize them but the bacteria can. Understanding their shape in solution and how they might fit in the bacterial enzyme active site can support the design of new GOS. This example shows how to create a large sample of randomly generated conformations of 4-galactosyllactose (Gal(β1-4)Gal(β1-4)β-Glc), optimize their geometries to minimize the conformational energy, filter out unrealistic conformations, cluster the remaining conformations based on shape, and select the low energy conformation from each cluster for further analysis.

Here is the starting molecule:

In[1]:=

galGalGlc=Molecule

Formula:

Atoms:

Bonds:

//MoleculePlot

Out[1]=

We can use parallel computation to reduce the waiting time:

In[2]:=

LaunchKernels[8];

Generate 20,000 random conformations with

MoleculeModify

In[3]:=

conformers=ParallelTable[MoleculeModify[galGalGlc,{"GenerateConformers",100,RandomSeedingRandomInteger[{1,1000000000}]}],200,Method"FinestGrained"]//Catenate;//EchoTiming

⌚

590.008

This is the first conformation:

In[4]:=

First@conformers//MoleculePlot3D

Out[4]=

The structures need to be optimized to reduce the internal strain from close non-bonded contacts and partially eclipsed bonds. Here is a histogram of the MMFF conformational energy before the optimization:

In[5]:=

#["MMFFEnergy"]&/@conformers//Histogram

Out[5]=

Now, optimize the geometry of each conformation, again using parallel computation:

In[6]:=

conformers=ParallelMap[MoleculeModify[#,"EnergyMinimizeAtomCoordinates"]&,conformers];//EchoTiming

⌚

1182.7

Compute a new histogram of the energies:

In[7]:=

#["MMFFEnergy"]&/@conformers//Histogram

Out[7]=

Many of the structures have non-chair pyranose ring conformations, so next filter out those that have all chair conformations for carrying forward. The helper function

chairQ

returns True if the ring defined by the input list of atoms has the chair shape.

Define the helper function:

In[8]:=

chairQ[mol_?MoleculeQ,atoms:{{__Integer?Positive}...}]:=chairQ[mol,#]&/@atomschairQ[mol_?MoleculeQ,atoms:{__Integer?Positive}]/;Length[atoms]6:=AllTrue[QuantityMagnitude@MoleculeValue[mol,{"TorsionAngle",Partition[atoms,4,1,{1,1}]}],45≤Abs[#]≤75&]

The lists of atoms for each of the rings can be computed using

FindMoleculeSubstructure

with a

MoleculePattern

for the ring:

In[9]:=

ringAtoms=FindMoleculeSubstructure[First@conformers,MoleculePattern["C1CCCCO1"],All]//Values

Out[9]=

{{3,33,31,29,5,4},{12,10,8,7,26,25},{19,17,15,14,22,21}}

Now do the filtering:

In[10]:=

(goodConformers=Select[conformers,And@@chairQ[#,ringAtoms]&])//Length

Out[10]=

511

Here is the first conformation from the result:

In[11]:=

First@goodConformers//MoleculePlot3D

Out[11]=

Looking at the central galactose moiety, one can see that there four axial substituents and one equatorial substituent, which is energetically unfavorable even though the ring is in a chair conformation. The first and third pyranose rings each have more equatorial than axial substituents. Flipping the chair to its inverted form would swap the axial and equatorial orientation of the substituents. We can calculate a sign ±1 for each ring and then filter out the structures with the favorable combination of signs. The helper function

chairSign

will return

-1

, so we just need to know what the signs are for the first conformation and then we can determine the combination of signs for the best conformations.

Define the helper function:

In[12]:=

chairSign[mol_?MoleculeQ,atoms:{{__Integer?Positive}...}]:=chairSign[mol,#]&/@atomschairSign[mol_?MoleculeQ,atoms:{__Integer?Positive}]/;Length[atoms]6:=Sign@QuantityMagnitude@MoleculeValue[mol,{"TorsionAngle",Take[atoms,4]}]

Here are the signs for the first structure:

In[13]:=

chairSign[First@goodConformers,ringAtoms]

Out[13]=

{1,1,-1}

So, we want structures with the sign combination

{1,-1,-1}

. Filter out the conformations with the chairs of the optimal signs:

In[14]:=

bestConformers=Select[goodConformers,MatchQ[chairSign[#,ringAtoms],{1,-1,-1}]&]//Echo[#,"",Length]&;

238

And here is the first molecule in that result:

In[15]:=

First@bestConformers//MoleculePlot3D

Out[15]=

In the set of 238 structures there are likely many that have very similar conformations. We can compute the 3D similarity with

MoleculeAlign

by using the third argument to request RMS difference property.

To reduce the computational overhead, we can use an atom map that includes just the carbon and oxygen atoms:

In[16]:=

atomMap=FindMoleculeSubstructure[First@bestConformers,MoleculePattern["[C,O]"],All]//Values//ReplaceAll[{atom_Integer}(atomatom)]

Out[16]=

{11,22,33,44,55,66,77,88,99,1010,1111,1212,1313,1414,1515,1616,1717,1818,1919,2020,2121,2222,2323,2424,2525,2626,2727,2828,2929,3030,3131,3232,3333,3434}

Let's compute the RMS difference of the first two structures:

In[17]:=

MoleculeAlign[bestConformers〚1〛,bestConformers〚2〛,atomMap,"Error"]

Out[17]=

0.759672

A priori, we can't say whether or not that is a big or small difference, so let's see how the first structure compares with all the others:

In[18]:=

distances=DistanceMatrix[bestConformers〚{1}〛,bestConformers,DistanceFunction(MoleculeAlign[#1,#2,atomMap,"Error"]&)]//First;

In[19]:=

Histogram[distances,{0.05}]

Out[19]=

The 0 is, of course, the first structure. The other really close ones are:

In[20]:=

Position[distances,x_Real/;0<x<0.2]

Out[20]=

{{42},{86},{200},{224}}

This is how we can view the alignment of the first structure to one of the close ones:

In[21]:=

Show[MoleculePlot3D@bestConformers〚1〛,MoleculePlot3D@MoleculeAlign[bestConformers〚1〛,bestConformers〚42〛,atomMap]]

Out[21]=

Now, let's look at those with an RMS difference of about half an Angstrom:

In[22]:=

Position[distances,x_Real/;0.49<x<0.51]

Out[22]=

{{36},{151},{172},{189},{204}}

In[23]:=

Show[MoleculePlot3D@bestConformers〚1〛,MoleculePlot3D@MoleculeAlign[bestConformers〚1〛,bestConformers〚36〛,atomMap]]

Out[23]=

Now we're starting to see some atoms that are not overlapping. Let's try a little bigger difference:

In[24]:=

Position[distances,x_Real/;0.74<x<0.76]

Out[24]=

{{2},{37},{46},{95},{210},{221}}

In[25]:=

Show[MoleculePlot3D@bestConformers〚1〛,MoleculePlot3D@MoleculeAlign[bestConformers〚1〛,bestConformers〚2〛,atomMap]]

Out[25]=

So, it looks like

0.75Å

is on the big side because all three hydroxymethyl groups are not overlapping. Armed with this insight, we can do the clustering using

ClusteringTree

with the average linkage method.

It will be convenient to use the helper function

moleculeDistance

, which takes the indices of the molecules and is memoized for efficiency:

In[26]:=

moleculeDistance[m1_Integer,m2_Integer]:=moleculeDistance[m1,m2]=moleculeDistance[m2,m1]=MoleculeAlign[bestConformers〚m1〛,bestConformers〚m2〛,atomMap,"Error"]

And now compute the clustering tree:

We can examine the histogram of cluster merge distances to see if there is a natural break for determining the number of clusters:

A grid line at 0.5 was included based on the earlier analysis of the RMS difference and the visual overlap above. It appears to be a useful choice, so we can use that distance as the cluster selection criterion:

Let's see how we did. Here are the structures in the first, and largest, cluster all aligned to each other:

Pretty good! And here is the next cluster:

Also pretty good. Let's look at one of the small clusters (not the singletons):

The clustering has been successful, so take the lowest energy conformation form each one as a representative:

Here are the first six:

A joint alignment can give us an overall sense of diversity of the conformations:

Publisher Information

Contributed by: Robert Nachbar

Wolfram Language Example Repository

Galactooligosaccharide Conformations

See Also

Publisher Information

Galactooligosaccharide Conformations

See Also

Related Symbols

Publisher Information