Function Repository Resource:

KEGGReference

Source Notebook

Get cross-references between KEGG entries, databases and outside databases

Contributed by: Lina Marcela Ruiz Galvis and Keiko Hirayama

ResourceFunction["KEGGReference"]["Organisms"]

gives a dataset with basic information for all KEGG organisms.

ResourceFunction["KEGGReference"]["Databases"]

gives a dataset with basic information for all KEGG databases.

ResourceFunction["KEGGReference"][entry,prop]

gives property prop for a specific entry.

ResourceFunction["KEGGReference"][keyword,"Query"]

search entry identifier for matching keyword.

Details and Options

ResourceFunction["KEGGReference"] is based on the KEGG API endpoints for cross-referencing and query entries.
The supported values for entry are keywords corresponding to KEGG entries, KEGG codes for entries and databases, and names of entries or databases from external sources linked to KEGG: NCBIGeneID, NCBIProteinID, UniProt, PubChem, ChEBI , PubMed, ATC, JTC, NDC, YK. They can be String or ExternalIdentifier expressions or a List of them no longer than ten elements.
The KEGG databases include pathway, brite, module, ko, genes, vg, vp, ag, genome (T numbers), compound, glycan, reaction, rclass, enzyme, network, variant, disease, drug and dgroup (drug groups).
The argument prop supports the following values:
"ExternalCrossReference"list of the outside databases identifiers associate to a KEGG identifier, and vice versa
"KEGGCrossReference"list of the KEGG databases (and some outside databases) identifiers associate to a KEGG identifier
KEGGReference accepts the option "Database". The "Database" values depend on the property value and could be a String or a List of them:
"ExternalCrossReference"genes, compound, glycan, drug KEGG databases and NCBIGeneID,NCBIProteinID,UniProt,PubChem,ChEBI outside databases.
"KEGGCrossReference"all KEGG databases and PubMed, ATC,JTC,NDC,YK outside databases.
"Query"all KEGG databases.

Examples

Basic Examples (12) 

Get the list of all the KEGG databases and additional information:

In[1]:=
ResourceFunction["KEGGReference"]["Databases"]
Out[1]=

Get the list of all the organisms in KEGG with their KEGG code and genome entry:

In[2]:=
ResourceFunction["KEGGReference"]["Organisms"]
Out[2]=

Retrieve, for each human gene in KEGG, the associated entry identifiers (accession numbers) from external databases (i.e., NCBIGeneID, NCBIProteinID, and UniProt):

In[3]:=
ResourceFunction["KEGGReference"]["hsa", "ExternalCrossReference"]
Out[3]=

Retrieve, for a specific human gene in KEGG, the associated entry identifiers (accession numbers) from external databases (i.e., NCBIGeneID, NCBIProteinID, UniProt):

In[4]:=
ResourceFunction["KEGGReference"][
 ExternalIdentifier["KEGGID", "hsa:10458"], "ExternalCrossReference"]
Out[4]=

Retrieve, for a specific identifier of outside database (i.e., NCBIGeneID, NCBIProteinID, UniProt), the associated KEGG entry identifier:

In[5]:=
ResourceFunction["KEGGReference"][
 ExternalIdentifier["NCBILocusTag", "948364"], "ExternalCrossReference"]
Out[5]=

Retrieve, for a specific KEGG chemical substance identifier (i.e., Compound, Glycan, or Drug), the associated entry identifiers (accession numbers) from external databases (i.e., PubChem, ChEBI):

In[6]:=
ResourceFunction["KEGGReference"][
 ExternalIdentifier["KEGGID", "D00001"], "ExternalCrossReference"]
Out[6]=

Retrieve, for a specific identifier of outside database (i.e.,PubChem, ChEBI), the associated KEGG chemical substance identifiers (i.e., Compound, Glycan, Drug):

In[7]:=
ResourceFunction["KEGGReference"][
 ExternalIdentifier["PubChemSubstanceID", "7847069"], "ExternalCrossReference"]
Out[7]=

Get cross-references between KEGG DGroup database and all KEGG databases and some outside databases (i.e., PubMed, ATC, JTC, NDC, YK):

In[8]:=
ResourceFunction["KEGGReference"]["dgroup", "KEGGCrossReference"]
Out[8]=

Get cross-references between a specific KEGG entry and all KEGG databases including some outside databases (i.e., PubMed, ATC, JTC, NDC, YK):

In[9]:=
ResourceFunction["KEGGReference"]["D00001", "KEGGCrossReference"]
Out[9]=

Find various relationships or cross-references between an external database (i.e., PubMed, ATC, JTC, NDC, YK) and KEGG databases:

In[10]:=
ResourceFunction["KEGGReference"]["pubmed", "KEGGCrossReference"]
Out[10]=

Get cross-references between an outside database entry (i.e., PubMed, ATC, JTC, NDC, YK) and all KEGG databases:

In[11]:=
ResourceFunction["KEGGReference"][
 ExternalIdentifier["ATCCode", "A01AA01"], "KEGGCrossReference"]
Out[11]=

Find all the entries matching query keyword "tp53":

In[12]:=
ResourceFunction["KEGGReference"]["tp53", "Query"]
Out[12]=

Scope (2) 

Retrieve, for a list of human genes in KEGG, the associated entry identifiers (accession numbers) from external databases (i.e., NCBIGeneID, NCBIProteinID, UniProt):

In[13]:=
ResourceFunction[
 "KEGGReference"][{"hsa:10458", "hsa:1"}, "ExternalCrossReference"]
Out[13]=

For a list of selected KEGG gene identifiers, get cross-references within all KEGG databases and some outside databases (i.e., PubMed, ATC, JTC, NDC, YK):

In[14]:=
ResourceFunction[
 "KEGGReference"][{"hsa05211", "hsa05214"}, "KEGGCrossReference"]
Out[14]=

Options (11) 

Retrieve, for each gene in KEGG human genome (i.e., T01001), the associated entry identifiers (accession numbers) from a specific external database:

In[15]:=
ResourceFunction["KEGGReference"][
 ExternalIdentifier["KEGGID", "T01001"], "ExternalCrossReference", "Database" -> "NCBIGeneID"]
Out[15]=

Retrieve, for all the entries in a chemical substance KEGG database (i.e., Compound, Glycan, Drug), the associated entry identifiers (accession numbers) from a specific external database:

In[16]:=
ResourceFunction["KEGGReference"]["Glycan", "ExternalCrossReference", "Database" -> "PubChem"]
Out[16]=

Retrieve, for a specific human gene in KEGG, the associated entry identifiers (accession numbers) from a specific external database:

In[17]:=
ResourceFunction["KEGGReference"][
 ExternalIdentifier["KEGGID", "hsa:10458"], "ExternalCrossReference", "Database" -> "NCBIProteinID"]
Out[17]=

Retrieve, for a specific identifier of outside database (i.e., NCBIGeneID, NCBIProteinID, UniProt), the associated KEGG gene entry:

In[18]:=
ResourceFunction["KEGGReference"][
 ExternalIdentifier["EntrezGeneID", "948364"], "ExternalCrossReference", "Database" -> "genes"]
Out[18]=

Retrieve, for a specific KEGG chemical substance identifier (i.e., Compound, Glycan, Drug), the associated entry identifiers of PubChem outside database:

In[19]:=
ResourceFunction[
 "KEGGReference"]["dr:D00001", "ExternalCrossReference", "Database" -> "PubChem"]
Out[19]=

Get KEGG gene database to KEGG disease database cross-references or the relationships between them:

In[20]:=
ResourceFunction["KEGGReference"]["genome", "KEGGCrossReference", "Database" -> "disease"]
Out[20]=

For a selected entry, get cross-references with the KEGG brite database:

In[21]:=
ResourceFunction["KEGGReference"][
 ExternalIdentifier["KEGGID", "D10520"], "KEGGCrossReference", "Database" -> "brite"]
Out[21]=

Get cross-references between an outside database entry (i.e., PubMed, ATC, JTC, NDC, YK) and the drug KEGG database:

In[22]:=
ResourceFunction["KEGGReference"]["yk:7131001X1", "KEGGCrossReference",
  "Database" -> "drug"]
Out[22]=

In[23]:=
ResourceFunction["KEGGReference"]["KO", "KEGGCrossReference"]
Out[23]=

Find all the KEGG Drug entries matching the query keyword acetaminophen:

In[24]:=
ResourceFunction["KEGGReference"]["acetaminophen", "Query", "Database" -> "Drug"]
Out[24]=

Find all the KEGG Network and Pathway entries matching query keyword RNA:

In[25]:=
ResourceFunction["KEGGReference"]["RNA", "Query", "Database" -> {"network", "pathway"}]
Out[25]=

Applications (3) 

Get the compounds in the KEGG Pathway map00020 using the current function:

In[26]:=
crossRefe = ResourceFunction["KEGGReference"]["map00020", "KEGGCrossReference", "Database" -> {"compound", "ko"}];
cpdCrossRef = Normal[crossRefe[Select[StringContainsQ[#Target[[2]], "cpd"] &]][All,
     2]][[All, 2]]
Out[27]=

Get the compounds in the KEGG Pathway map00020 using KEGGPathway:

In[28]:=
entriesTCA = Normal[ResourceFunction[
ResourceObject[<|"Name" -> "KEGGPathway", "ShortName" -> "KEGGPathway", "UUID" -> "c26fb78f-9391-4e5b-8348-1b3a9bcb8293", "ResourceType" -> "Function", "Version" -> "1.2.1", "Description" -> "Get the graph and additional information of a KEGG pathway", "RepositoryLocation" -> URL[
         "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$cf8afda6cda14c308cd1e656f8629ece`KEGGPathway", "FunctionLocation" -> CloudObject[
         "https://www.wolframcloud.com/obj/81066f03-423e-4566-800b-da9d03968a36"]|>, ResourceSystemBase -> Automatic]]["hsa", "00020", "Entries"][All, "EntryName"]];
cpdInPath = Select[Flatten[entriesTCA], StringContainsQ[#, "cpd"] &]
Out[29]=

Compare both lists to check if they contain the same compounds:

In[30]:=
Complement[cpdCrossRef, cpdInPath]
Out[30]=

Possible Issues (1) 

You get Missing when the entry is not correct:

In[31]:=
ResourceFunction["KEGGReference"]["D01", "ExternalCrossReference"]
Out[31]=

Neat Examples (2) 

Get all the KEGG pathways associated with the D00252 drug:

In[32]:=
entries = ResourceFunction["KEGGReference"]["D00252", "KEGGCrossReference", "Database" -> "Pathway"]
Out[32]=

Get a Graph of one of the pathways and highlight the D00252 drug:

In[33]:=
ResourceFunction[
ResourceObject[<|"Name" -> "KEGGPathway", "ShortName" -> "KEGGPathway", "UUID" -> "c26fb78f-9391-4e5b-8348-1b3a9bcb8293", "ResourceType" -> "Function", "Version" -> "1.2.1", "Description" -> "Get the graph and additional information of a KEGG pathway", "RepositoryLocation" -> URL[
     "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$cf8afda6cda14c308cd1e656f8629ece`KEGGPathway", "FunctionLocation" -> CloudObject[
     "https://www.wolframcloud.com/obj/81066f03-423e-4566-800b-da9d03968a36"]|>, ResourceSystemBase -> Automatic]]["hsa", "00982", "Graph", GraphHighlight -> {"dr:D00252"}, VertexSize -> 2]
Out[33]=

Publisher

Lina Marcela

Requirements

Wolfram Language 13.0 (December 2021) or above

Version History

  • 1.0.0 – 10 July 2025

Source Metadata

Related Resources

License Information