Galactooligosaccharides (GOS) are molecules found in plant-derived foods that are important for the good health of beneficial gut microflora. Humans cannot metabolize them but the bacteria can. Understanding their shape in solution and how they might fit in the bacterial enzyme active site can support the design of new GOS. This example shows how to create a large sample of randomly generated conformations of 4-galactosyllactose (Gal(β1-4)Gal(β1-4)β-Glc), optimize their geometries to minimize the conformational energy, filter out unrealistic conformations, cluster the remaining conformations based on shape, and select the low energy conformation from each cluster for further analysis.
Here is the starting molecule:
In[1]:=
galGalGlc=Molecule
Formula:
C
18
H
32
O
16
Atoms:
66
Bonds:
68
//MoleculePlot
Out[1]=
We can use parallel computation to reduce the waiting time:
The structures need to be optimized to reduce the internal strain from close non-bonded contacts and partially eclipsed bonds. Here is a histogram of the MMFF conformational energy before the optimization:
In[5]:=
#["MMFFEnergy"]&/@conformers//Histogram
Out[5]=
Now, optimize the geometry of each conformation, again using parallel computation:
Many of the structures have non-chair pyranose ring conformations, so next filter out those that have all chair conformations for carrying forward. The helper function
chairQ
returns True if the ring defined by the input list of atoms has the chair shape.
Looking at the central galactose moiety, one can see that there four axial substituents and one equatorial substituent, which is energetically unfavorable even though the ring is in a chair conformation. The first and third pyranose rings each have more equatorial than axial substituents. Flipping the chair to its inverted form would swap the axial and equatorial orientation of the substituents. We can calculate a sign ±1 for each ring and then filter out the structures with the favorable combination of signs. The helper function
chairSign
will return
1
or
-1
, so we just need to know what the signs are for the first conformation and then we can determine the combination of signs for the best conformations.
We can examine the histogram of cluster merge distances to see if there is a natural break for determining the number of clusters:
A grid line at 0.5 was included based on the earlier analysis of the RMS difference and the visual overlap above. It appears to be a useful choice, so we can use that distance as the cluster selection criterion:
Let's see how we did. Here are the structures in the first, and largest, cluster all aligned to each other:
Pretty good! And here is the next cluster:
Also pretty good. Let's look at one of the small clusters (not the singletons):
The clustering has been successful, so take the lowest energy conformation form each one as a representative:
Here are the first six:
A joint alignment can give us an overall sense of diversity of the conformations: