Function Repository Resource:

KEGGGenome

Source Notebook

Get information about a KEGG genome

Contributed by: Lina Marcela Ruiz Galvis and Keiko Hirayama

ResourceFunction["KEGGGenome"]["Genomes"]

gives a Dataset with basic information for all the KEGG Genomes.

ResourceFunction["KEGGGenome"][keggcode]

gives a Dataset with all the properties for a specific keggcode.

ResourceFunction["KEGGGenome"][keggcode,prop]

gives a property prop for a specific keggcode.

ResourceFunction["KEGGGenome"][keyword,"Query"]

gives a Dataset of genomes associated with a species-related keyword.

Details

The KEGG GENOME database is a collection of KEGG organisms, which are the organisms with complete genome sequences and each of which is identified by the three- or four-letter organism code.
keggcode supported are those referring to KEGG Genome. It could be a String or an ExternalIdentifier.
prop supports the following values:
"Entry"ExternalIdentifier of the Genome KEGG code
"OrgCode"organism KEGG code
"Category"designation of KEGG Reference genome and Type strain
“Fullname"the Geome full name
“TaxonomicSpecies""TaxonomicSpecies" Entity
“Annotation"whether the genome is annotated or not
“AnnotationLink"more information about annotation
“Taxonomy"taxonomy information taken from the NCBI taxonomy database
“Lineage"genetic ancestry or evolutionary path
“LinkTaxonomy"more information about taxonomy
“LinkGenomeBrowser"more information about genome
“DataSource"links to the data source, usually RefSeq.
“OriginalDB"links to the original database where the sequencing was done
"Keywords"associated keywords
“Brite"associated KEGG brite
“BriteLink"associated KEGG brite link
“Disease"disease information for pathogen genomes
“Comment"comment associated
“Chromosome"chromosome information
“Plasmid"plasmid information
“Created"year of creation
“Statistics"statistics of the complete genome
“Reference"references reporting the complete genome (or chromosome) with links to PubMed
The "Query" property performs a search operation to retrieve the genome entry identifier associated with a given keyword.
The keyword corresponds to a species name. Examples include "Homo sapiens", "E coli", or "Dermacoccus", among others. Alternatively, a "TaxonomicSpecies" Entity can be used.

Examples

Basic Examples (2) 

Get the list of all the genomes in the KEGG genome database:

In[1]:=
ResourceFunction["KEGGGenome"]["Genomes"]
Out[1]=

Get the information from KEGG about a specific genome:

In[2]:=
ResourceFunction["KEGGGenome"][ExternalIdentifier["KEGGID", "T09341"]]
Out[2]=

Scope (2) 

Get the KEGG genome codes for a specific query:

In[3]:=
ResourceFunction["KEGGGenome"][
 Entity["TaxonomicSpecies", "HomoSapiens::4pydj"], "Query"]
Out[3]=

Keywords may have more than one word:

In[4]:=
ResourceFunction["KEGGGenome"]["E coli", "Query"]
Out[4]=

Options (1) 

Get the dataset properties independently for each module:

In[5]:=
ResourceFunction["KEGGGenome"][
 ExternalIdentifier["KEGGID", "T09341"], "Plasmid"]
Out[5]=

Possible Issues (3) 

You can't give just anything:

In[6]:=
ResourceFunction["KEGGGenome"]["anything"]
Out[6]=

Only valid KEGG codes are supported:

In[7]:=
ResourceFunction["KEGGGenome"]["T0001j0"]
Out[7]=

Unrecognized queries give an error:

In[8]:=
ResourceFunction["KEGGGenome"]["oxigenu", "Query"]
Out[8]=

Neat Example (2) 

Retrieve the "Number of nucleotides", "Number of protein genes", and "Number of RNA genes" for various species, grouped by phylum:

In[9]:=
entriesANDSp = ResourceFunction["KEGGGenome"]["Genomes"][
   All, {"Entry", "TaxonomicSpecies"}];
txSpecies = Union[Select[DeleteMissing[Normal[entriesANDSp[[All, 2]]]], MatchQ[#, _Entity] &]];
In[10]:=
SeedRandom[1];
sampleSpe = RandomChoice[txSpecies, 100];
phylum = # -> #[EntityProperty["TaxonomicSpecies", "Phylum"]] & /@ sampleSpe;
moreFrePhy = Select[Tally[phylum[[All, 2]]], #[[2]] > 4 &];
In[11]:=
sampleSpe2 = entriesANDSp[
    Select[
     MemberQ[sampleSpe, #TaxonomicSpecies] &&
       MemberQ[moreFrePhy[[All, 1]], ReplaceAll[#TaxonomicSpecies, phylum]]
      &]
    ][All, Join[#, <|"Phylum" -> ReplaceAll[#TaxonomicSpecies, phylum]|>] &];
In[12]:=
rulesEntAndPhy = Normal[sampleSpe2[All, #Entry -> #Phylum &]];
In[13]:=
sampleSpe3 = (# /. rulesEntAndPhy) -> ResourceFunction["KEGGGenome"][#] & /@ Normal[sampleSpe2[[All, 1]]];
sampleSpe3Grp = GroupBy[sampleSpe3, First];

Visualize the results:

In[14]:=
BoxWhiskerChart[
 Map[ToExpression, DeleteMissing[
     Lookup[Normal[#[[All, 2]]], "NumberOfNucleotides"]]] & /@ sampleSpe3Grp,
 ChartLabels -> Automatic, BarOrigin -> Left, FrameLabel -> {"Number of nucleotides"}]
Out[14]=
In[15]:=
BoxWhiskerChart[
 Map[ToExpression, DeleteMissing[
     Lookup[Normal[#[[All, 2]]], "NumberOfProteinGenes"]]] & /@ sampleSpe3Grp,
 ChartLabels -> Automatic, BarOrigin -> Left, FrameLabel -> {"Number of protein genes"}]
Out[15]=
In[16]:=
BoxWhiskerChart[
 Map[ToExpression, DeleteMissing[
     Lookup[Normal[#[[All, 2]]], "NumberOfRNAGenes"]]] & /@ sampleSpe3Grp,
 ChartLabels -> Automatic, BarOrigin -> Left, FrameLabel -> {"Number of RNA genes"}]
Out[16]=

Publisher

Lina Marcela

Requirements

Wolfram Language 13.0 (December 2021) or above

Version History

  • 1.0.0 – 25 April 2025

Source Metadata

Related Resources

License Information