Function Repository Resource:

NCBITranslationTableConvert

Source Notebook

Convert between codon associations and NCBI-style translation tables

Contributed by: John Cassel

ResourceFunction["NCBITranslationTableConvert"][ncbiTranslationTable]

converts a NCBI-style translation table to an an Association from codon to amino acid.

ResourceFunction["NCBITranslationTableConvert"][codonAssociation]

converts an Association from codon to amino acids to a NCBI-style translation table String.

Details and Options

The National Center for Biological Information (NCBI) syntax for a translation table is a string of sixty-four amino acids and stop characters. Each position in the string corresponds to a codon, a three-letter sequence of the four DNA bases. All such codon translations are defined by these sixty-four characters (4³=64). The sequence TCAG is used to enumerate these positions so that TTT is at the first position, TTC is at the second position and so forth.

For example, consider "FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG". By aligning this specification with an enumeration of the series, we can read down from the first position to see that "F" is in the first and second positions and thus is the translation of "TTT" and "TTC", while "I" lines up with "ATT", "ATC" and "ATA":

FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG

TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG (1st letter)

TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG (2nd letter)

TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG (3rd letter)

In the NCBI syntax, the character "*" represents a stop character. When following IUPAC convention, a period is used as a stop character.

Examples

Basic Examples (2)

Convert an NCBI translation table to a codon Association:

In[1]:=

Out[1]=

Inversely, convert a codon Association to NCBI translation table syntax:

In[2]:=

ResourceFunction[
"NCBITranslationTableConvert"][<|"TTT" -> "F", "TTC" -> "F", "TTA" -> "L", "TTG" -> "L", "TCT" -> "S", "TCC" -> "S", "TCA" -> "S", "TCG" -> "S", "TAT" -> "Y", "TAC" -> "Y", "TAA" -> "*", "TAG" -> "*", "TGT" -> "C", "TGC" -> "C", "TGA" -> "*", "TGG" -> "W", "CTT" -> "L", "CTC" -> "L", "CTA" -> "L", "CTG" -> "L", "CCT" -> "P", "CCC" -> "P", "CCA" -> "P", "CCG" -> "P", "CAT" -> "H", "CAC" -> "H", "CAA" -> "Q", "CAG" -> "Q", "CGT" -> "R", "CGC" -> "R", "CGA" -> "R", "CGG" -> "R", "ATT" -> "I", "ATC" -> "I", "ATA" -> "I", "ATG" -> "M", "ACT" -> "T", "ACC" -> "T", "ACA" -> "T", "ACG" -> "T", "AAT" -> "N", "AAC" -> "N", "AAA" -> "K", "AAG" -> "K", "AGT" -> "S", "AGC" -> "S", "AGA" -> "R", "AGG" -> "R", "GTT" -> "V", "GTC" -> "V", "GTA" -> "V", "GTG" -> "V", "GCT" -> "A", "GCC" -> "A", "GCA" -> "A", "GCG" -> "A", "GAT" -> "D", "GAC" -> "D", "GAA" -> "E", "GAG" -> "E", "GGT" -> "G", "GGC" -> "G", "GGA" -> "G", "GGG" -> "G"|>]

Out[2]=

Applications (2)

If you have an NCBI-style translation table and would like to incorporate an environment-specific translation rule, such as the translation to selenocysteine (U), converting to an Association makes this change easy:

In[3]:=