Function Repository Resource:

DNAAlignmentPlot

Source Notebook

Generate a visualization for DNA sequence alignment

Contributed by: Jessica Shi

ResourceFunction["DNAAlignmentPlot"][str1,str2]

generates a visual for the sequence alignment that is suited for the length of the two input DNA sequences.

Details and Options

The sequence alignment method used is global-alignment or Needleman-Wunsch alignment.
By default, if the sequences are less than 1000 bases, it produces a one-to-one alignment visual, where the red symbolizes a mismatch and the green symbolizes a match after the sequences are aligned. A hyphen takes the place of deletion mutations, the darker red. The lighter red represents a substitution. Each base, ATGC, has its own color for easier viewing.
By default, if the sequences are greater than 1000, it produces a color chain. which describes the alignment in groupings of 100 when less than 10,000 and groupings of 1000 above 10,000. After the sequences are partitioned into sections, the same alignment method is performed and the number of common bases is counted.
The scale for the color chain has red representing less similarity and green representing more similarity. It can also be seen as the percent of bases that are similar after alignment in that section of the sequences.
ResourceFunction["DNAAlignmentPlot"] accepts the following options:
MethodAutomaticspecify whether to use a one-on-one or ColorChain approach
"GroupSize"Automaticspecify the group size of a grouped plot

Examples

Basic Examples (3) 

Generate random sequences of ATGC:

In[1]:=
random[n_] := StringJoin@RandomChoice[{"A", "T", "G", "C"}, n]

Create a one-to-one alignment plot for sequences of fewer than 1000 bases:

In[2]:=
ResourceFunction["DNAAlignmentPlot"][random[900], random[900]]
Out[2]=

Create a color chain for sequences of more than 1000 bases:

In[3]:=
ResourceFunction["DNAAlignmentPlot"][random[2000], random[2000]]
Out[3]=

Options (5) 

Method (3) 

Generate random sequences of ATGC:

In[4]:=
random[n_] := StringJoin@RandomChoice[{"A", "T", "G", "C"}, n]

Create a color chain for a sequence of fewer than 1000 bases:

In[5]:=
ResourceFunction["DNAAlignmentPlot"][random[900], random[900], Method -> "ColorChain"]
Out[5]=

Create a one-on-one plot of a sequence of more than 1000 bases:

In[6]:=
ResourceFunction["DNAAlignmentPlot"][random[1200], random[1200], Method -> "OneOnOne"]
Out[6]=

GroupSize (2) 

Generate random sequences of ATGC:

In[7]:=
random[n_] := StringJoin@RandomChoice[{"A", "T", "G", "C"}, n]

Specify the group size of a grouped plot:

In[8]:=
ResourceFunction["DNAAlignmentPlot"][random[2000], random[2000], "GroupSize" -> 50]
Out[8]=

Applications (2) 

Compare the insulin gene for humans to that of a chimpanzee:

In[9]:=
ResourceFunction["DNAAlignmentPlot"][
 Entity["Gene", {"INS", {"Species" -> "HomoSapiens"}}][
  "ReferenceSequence"], \!\(\*
NamespaceBox["LinguisticAssistant",
DynamicModuleBox[{Typeset`query$$ = "chimp insulin gene", Typeset`boxes$$ = TemplateBox[{"\"insulin\"", 
RowBox[{"Entity", "[", 
RowBox[{"\"Gene\"", ",", 
RowBox[{"{", 
RowBox[{"\"INS\"", ",", 
RowBox[{"{", 
RowBox[{"\"Species\"", "->", "\"PanTroglodytes\""}], "}"}]}], "}"}]}],
           "]"}], "\"Entity[\\\"Gene\\\", {\\\"INS\\\", {\\\"Species\\\" -> \\\"PanTroglodytes\\\"}}]\"", "\"gene\""}, "Entity"], Typeset`allassumptions$$ = {}, Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, Typeset`querystate$$ = {"Online" -> True, "Allowed" -> True, "mparse.jsp" -> 0.504462, "Messages" -> {}}}, 
DynamicBox[ToBoxes[
AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, 
Dynamic[Typeset`query$$], 
Dynamic[Typeset`boxes$$], 
Dynamic[Typeset`allassumptions$$], 
Dynamic[Typeset`assumptions$$], 
Dynamic[Typeset`open$$], 
Dynamic[Typeset`querystate$$]], StandardForm],
ImageSizeCache->{112.25, {8.125, 17.125}},
TrackedSymbols:>{Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, Typeset`assumptions$$, Typeset`open$$, Typeset`querystate$$}],
DynamicModuleValues:>{},
UndoTrackedVariables:>{Typeset`open$$}],
BaseStyle->{"Deploy"},
DeleteWithContents->True,
Editable->False,
SelectWithContents->True]\)["ReferenceSequence"]]
Out[9]=

Visualize the sequence alignment between the lalba gene of a cow and a dog:

In[10]:=
ResourceFunction["DNAAlignmentPlot"][
 Entity["Gene", {"LALBA", {"Species" -> "BosTaurus"}}][
  "ReferenceSequence"], Entity["Gene", {"LALBA", {"Species" -> "CanisLupusFamiliaris"}}][
  "ReferenceSequence"]]
Out[10]=

Publisher

Jessica Shi

Version History

  • 1.0.0 – 16 October 2019

Related Resources

Author Notes

Special thanks to Lauren Cooper, my mentor at Wolfram Summer Camp, and Katja DellaLibera for helping me on this function.

License Information