Function Repository Resource:

RemoveDegenerateSequenceDifferences

Source Notebook

Remove degenerate matches from differences between biomolecular sequences

Contributed by: John Cassel, Wolfram|Alpha Scientific Content

ResourceFunction["RemoveDegenerateSequenceDifferences"][diff,type]

removes the degenerate letter matches corresponding to the biosequence of type type from the positional differences diff.

Details

A degenerate letter is one that could match several letters representing specific chemicals. See the BioSequence documentation for further details.
The positional differences reduced by this function can be generated with the AlignmentToPositionDifferences resource function.

Examples

Basic Examples (1) 

Reduce a set of sequence differences to non-degenerate matches:

In[1]:=
ResourceFunction[
 "RemoveDegenerateSequenceDifferences"][{{241, "C" -> "Y"}, {6312, "C" -> "MA"}, {11083, "G" -> "K"}, {13123, "G" -> "T"}, {28311, "C" -> ""}}, "DNA"]
Out[1]=

Scope (1) 

Reducing degenerate differences can split an existing difference:

In[2]:=
ResourceFunction[
 "RemoveDegenerateSequenceDifferences"][{{241, "CGAA" -> "YTRG"}}, "DNA"]
Out[2]=

Neat Examples (2) 

Some SARS-CoV-2 sequences have a number of degenerate differences with the reference sequence:

In[3]:=
diffs = ResourceFunction["AlignmentToPositionDifferences"][
  ResourceFunction["AlignNearlyIdenticalSequences"][
   ResourceData["Genetic Sequences for the SARS-CoV-2 Coronavirus", "ReferenceBioSequence"],
   BioSequence["DNA", ResourceFunction["ImportFASTA"]["MW849897"][[2, 1]]]
   ]
  ]
Out[3]=

These differences with degenerate letters are frequently found to be matching, and thus can be simplified:

In[4]:=
ResourceFunction["RemoveDegenerateSequenceDifferences"][diffs, "DNA"]
Out[4]=

Version History

  • 1.0.0 – 13 April 2021

Related Resources

License Information