Wolfram Research

Function Repository Resource:

DNAAlignmentPlot

Source Notebook

Generate a visualization for DNA sequence alignment

Contributed by: Jessica Shi

ResourceFunction["DNAAlignmentPlot"][str1,str2]

generates a visual for the sequence alignment that is suited for the length of the two input DNA sequences.

Details and Options

The sequence alignment method used is global-alignment or Needleman-Wunsch alignment.
By default, if the sequences are less than 1000 bases, it produces a one-to-one alignment visual, where the red symbolizes a mismatch and the green symbolizes a match after the sequences are aligned. A hyphen takes the place of deletion mutations, the darker red. The lighter red represents a substitution. Each base, ATGC, has its own color for easier viewing.
By default, if the sequences are greater than 1000, it produces a color chain. which describes the alignment in groupings of 100 when less than 10,000 and groupings of 1000 above 10,000. After the sequences are partitioned into sections, the same alignment method is performed and the number of common bases is counted.
The scale for the color chain has red representing less similarity and green representing more similarity. It can also be seen as the percent of bases that are similar after alignment in that section of the sequences.
ResourceFunction["DNAAlignmentPlot"] accepts the following options:
Method Automatic specify whether to use a one-on-one or ColorChain approach
"GroupSize" Automatic specify the group size to

Examples

Basic Examples

Generate random sequences of ATGC:

In[1]:=
random[n_] := StringJoin@RandomChoice[{"A", "T", "G", "C"}, n]

Create a one-to-one alignment plot for sequences of fewer than 1000 bases:

In[2]:=
ResourceFunction["DNAAlignmentPlot"][random[900], random[900]]
Out[2]=

Create a color chain for sequences of more than 1000 bases:

In[3]:=
ResourceFunction["DNAAlignmentPlot"][random[2000], random[2000]]
Out[3]=

Options

Method

Create a color chain for a sequence of fewer than 1000 bases:

In[4]:=
ResourceFunction["DNAAlignmentPlot"][random[900], random[900], Method -> "ColorChain"]
Out[4]=

Create a one-on-one plot of a sequence of more than 1000 bases:

In[5]:=
ResourceFunction["DNAAlignmentPlot"][random[1200], random[1200], Method -> "OneOnOne"]
Out[5]=

GroupSize

Specify the group size of a grouped plot:

In[6]:=
ResourceFunction["DNAAlignmentPlot"][random[2000], random[2000], "GroupSize" -> 50]
Out[6]=

Applications

Compare the insulin gene for humans to that of a chimpanzee :

In[7]:=
ResourceFunction["DNAAlignmentPlot"][
 Entity["Gene", {"INS", {"Species" -> "HomoSapiens"}}][
  "ReferenceSequence"], Entity["Gene", {"INS", {"Species" -> "PanTroglodytes"}}][
  "ReferenceSequence"]]
Out[7]=

Visualize the sequence alignment between the lalba gene of a cow and a dog:

In[8]:=
ResourceFunction["DNAAlignmentPlot"][
 Entity["Gene", {"LALBA", {"Species" -> "BosTaurus"}}][
  "ReferenceSequence"], Entity["Gene", {"LALBA", {"Species" -> "CanisLupusFamiliaris"}}][
  "ReferenceSequence"]]
Out[8]=

Resource History

Related Resources

Author Notes

Special thanks to Lauren Cooper, my mentor at Wolfram Summer Camp, and Katja DellaLibera for helping me on this function.

License Information