Function Repository Resource:

NuIDEncode

Source Notebook

Generate the nucleotide universal identifier (nuID) of an oligonucleotide sequence

Contributed by: Jan Mangaldan

ResourceFunction["NuIDEncode"][bioseq]

encodes the "DNA" BioSequence bioseq as a nucleotide universal identifier (nuID) string.

ResourceFunction["NuIDEncode"]["seq"]

encodes the DNA oligonucleotide represented by "seq".

Details

The nucleotide universal IDentifier (nuID) is a lossless compression scheme that encodes a DNA oligonucleotide as a modified Base64 string.

Examples

Basic Examples (2) 

A DNA BioSequence:

In[1]:=
oligo = BioSequence["DNA", "GCTGATATTTAAAAGAG"]
Out[1]=

Encode it as a nuID:

In[2]:=
ResourceFunction["NuIDEncode"][oligo]
Out[2]=

Scope (1) 

NuIDEncode accepts string arguments:

In[3]:=
ResourceFunction["NuIDEncode"]["TGTATATGTCTGGTTTTCTTACCCC"]
Out[3]=

Properties and Relations (1) 

The resource function NuIDDecode can be used to decode a nuID encoded by NuIDEncode:

In[4]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/d77282b0-1d3d-4dad-997f-53f4621d335f"]
Out[4]=
In[5]:=
ResourceFunction["NuIDDecode"][%]
Out[5]=

Possible Issues (2) 

NuIDEncode does not evaluate for BioSequence objects with degenerate letters:

In[6]:=
ResourceFunction["NuIDEncode"][BioSequence["DNA", "GRYT"]]
Out[6]=

Use BioSequenceQ to check if a BioSequence is fully specified:

In[7]:=
BioSequenceQ[BioSequence["DNA", "GRYT"], "FullySpecifiedDNA"]
Out[7]=

NuIDEncode does not evaluate for RNA sequences:

In[8]:=
ResourceFunction["NuIDEncode"]["UGA"]
Out[8]=
In[9]:=
ResourceFunction["NuIDEncode"][BioSequence["RNA", "UGA"]]
Out[9]=

Version History

  • 1.0.0 – 28 June 2021

Source Metadata

Related Resources

License Information