Function Repository Resource:

BioDBnetGeneData

Source Notebook

Access information on genes available in major biological databases

Contributed by: Keiko Hirayama

ResourceFunction["BioDBnetGeneData"][gene]

gives the dataset for a specified gene.

ResourceFunction["BioDBnetGeneData"][{gene1,gene2,}]

gives the dataset for the specified genei.

ResourceFunction["BioDBnetGeneData"][gene, prop]

gives a property value for a specified gene.

ResourceFunction["BioDBnetGeneData"][gene,prop,{"Species"species}]

gives a property value for a gene symbol of a spefied species.

Details and Options

BioDBnetGeneData is based on bioDBnet (biological DataBase network), which provides information on genes and cross references to major biological databases.
Selected gene entities, gene symbols, or gene IDs of Entrez system, a global query system developed by NCBI (National Center for Biotechnology Information), can be used.
Selected taxonomic species entities or Taxon IDs, as assigned by NCBI Taxonomy, can be used. If not specified, human genes are assumed
Available properties include:
"EntrezGeneID"identifier for a gene as assigned by the Entrez Gene project
"TaxonID"identifier for the NCBI Taxonomy database
"GeneSymbol"symbolic representation of a gene
"Description"detailed description of a gene
"GeneSynonyms"gene symbol aliases
"Chromosome"chromosome of a gene
"ChromosomeBand"cytoband location of a gene
"StartPosition"starting sequence position of a gene
"EndPosition"ending sequence position of a gene
"Strand"coding strand of a gene
"NCBIHomologGeneIDs"identifier for associated NCBI Taxonomy and Entrez Gene
"EnsemblGeneID"identifier for a gene as assigned by Ensembl project
"EnsemblBiotype"biotype of a gene or a protein as assigned by Ensembl project
"EnsemblTranscriptID"identifier for a transcript as assigned by Ensembl project
"EnsemblProteinID"identifier for a protein as assigned by Ensembl project
"EnsemblHomologGeneIDs"identifier for associated NCBI Taxonomy and Ensembl genes
"EnsemblHomologProteinIDs"identifier for associated NCBI Taxonomy and Ensembl proteins
"RefSeqGenomicAccession"genomic accession for RefSeq
"RefSeqmRNAAccession"mRNA accession for RefSeq
"RefSeqncRNAAccession"non-coding RNA accession for RefSeq
"RefSeqProteinAccession"protein accession for RefSeq
"GenBankNucleotideAccession"mRNA accession for the GenBank database
"GenBankProteinAccession"protein accession for the GenBank database
"PDBID"identifier for RCSB PDB (Research Collaboratory for Structural Bioinformatics Protein Data Bank)
"UniProtAccession"identifier for UniProt (Universal Protein Resource) sequences
"GOBiologicalProcess"identifier for Gene Ontology (GO) associated with biological processes
"GOCellularComponent"identifier for Gene Ontology (GO) associated with cellular components
"GOMolecularFunction"identifier for Gene Ontology (GO) associated with molecular functions
"CPDBProteinInteractor"UniProt entry name of the interacting protein from ConsensusPathDB
"dbSNPID"identifier for the dbSNP (Single Nucleotide Polymorphism database)
"CTDDiseaseInfo"associated disease based on CTD (Comparative Toxicogenomics Database)
"DrugBankDrugID"identifier for associated drugs from DrugBank database
"KEGGDiseaseID"identifier for the associated disease in KEGG (Kyoto Encyclopedia of Genes and Genomes) database
"KEGGGeneID"identifier for KEGG gene
"KEGGPathwayID"identifier for the associated KEGG pathway
"ReactomeID"identifier for the associated Reactome pathway
"PubMedID"identifier for the associated PubMed articles

Examples

Basic Examples (3) 

Retrieve a dataset of all available properties for a gene entity:

In[1]:=
ResourceFunction["BioDBnetGeneData"][
 Entity["Gene", {"BAK1", {"Species" -> "HomoSapiens"}}]]
Out[1]=

Find property values for a gene:

In[2]:=
ResourceFunction[
 "BioDBnetGeneData"]["HFE1", {"EntrezGeneID", "Chromosome", "StartPosition", "EndPosition", "Strand", "UniProtAccession"}]
Out[2]=

Find information on selected fruit fly genes:

In[3]:=
ResourceFunction[
 "BioDBnetGeneData"][{"swiss cheese", "timeless"}, {"GeneSymbol", "EnsemblGeneID", "GOBiologicalProcess", "KEGGPathwayID"}, {"Species" -> Entity["TaxonomicSpecies", "DrosophilaMelanogaster::ynxq4"]}]
Out[3]=

Scope (7) 

Get the accession for mRNA sequences associated with the BMAL1 gene, which plays a role in the circadian clock:

In[4]:=
nuc = ResourceFunction["BioDBnetGeneData"]["BMAL1", "RefSeqmRNAAccession"]
Out[4]=

Use the ImportFASTA ResourceFunction to retrieve one of the sequences:

In[5]:=
ref = ResourceFunction[
ResourceObject[<|"Name" -> "ImportFASTA", "ShortName" -> "ImportFASTA", "UUID" -> "3ca50e0a-86fb-47da-ba15-8da3cf5eb306", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "Import FASTA data from the NCBI using an NCBI Reference Sequence", "RepositoryLocation" -> URL[
        "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$7dd2d674c168464689de5cc6e8787198`ImportFASTA", "FunctionLocation" -> CloudObject[
        "https://www.wolframcloud.com/obj/cfacdd70-f3a9-45f6-a844-df57fa5c78b7"]|>, ResourceSystemBase -> Automatic]][
    nuc[1, 1, 1]["ExternalID"]][[2, 1]];
In[6]:=
ref // Short
Out[6]=

Get a PDB identifier associated with the BMAL1 gene:

In[7]:=
pdb = ResourceFunction["BioDBnetGeneData"]["BMAL1", "PDBID"]
Out[7]=

Use the PDBImport ResourceFunction to visualize the 3D structure of the protein:

In[8]:=
ResourceFunction[
ResourceObject[<|"Name" -> "PDBImport", "ShortName" -> "PDBImport", "UUID" -> "e304aadb-6c4b-41a1-a7d2-8c1e941054bc", "ResourceType" -> "Function", "Version" -> "2.0.0", "Description" -> "Import protein data in the Protein Data Bank (PDB) format", "RepositoryLocation" -> URL[
     "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$c15c2c52d47c41fea00ae7bdedcf5347`PDBImport", "FunctionLocation" -> CloudObject[
     "https://www.wolframcloud.com/obj/2e72c16a-1366-495d-bb94-2db5af352f9e"]|>, ResourceSystemBase -> Automatic]][
 "RCSB" -> pdb[1, 1, 1]["ExternalID"]]
Out[8]=

Find the biological pathways associated with BMAL1 gene:

In[9]:=
path = ResourceFunction["BioDBnetGeneData"]["BMAL1", "ReactomeID"]
Out[9]=

Use the ReactomePathways ResourceFunction to find information for one of the pathways:

In[10]:=
ResourceFunction["ReactomePathways"][
 path[1, 1, 3]["ExternalID"], "Information"]
Out[10]=

Visualize the pathway:

In[11]:=
ResourceFunction["ReactomePathways"][
 path[1, 1, 3]["ExternalID"], "Graph", VertexLabelStyle -> Directive[Black, 5], VertexSize -> Scaled[.007]]
Out[11]=

Properties and Relations (2) 

Find the SNPs associated with the BRCA1 gene:

In[12]:=
snps = ResourceFunction["BioDBnetGeneData"]["BRCA1", "dbSNPID"]
Out[12]=

Use the NCBIGenomicSNPData ResourceFunction to find clinical information associated with one of the SNPs:

In[13]:=
ResourceFunction[
ResourceObject[<|"Name" -> "NCBIGenomicSNPData", "ShortName" -> "NCBIGenomicSNPData", "UUID" -> "6f2d5756-cc3b-42f3-932f-bc6d163c3291", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "Retrieve information on reference SNPs from the NCBI database", "RepositoryLocation" -> URL[
     "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$a394d3b84028496db0f5dc88fba79419`NCBIGenomicSNPData", "FunctionLocation" -> CloudObject[
     "https://www.wolframcloud.com/obj/60a44693-24f9-4ce2-9c59-82e267a66d90"]|>, ResourceSystemBase -> Automatic]][
 StringTrim[snps[1, 1, 1]["ExternalID"], "rs"], "ClinicalSignificance"]
Out[13]=

Neat Examples (1) 

Retrieve the CPDBProteinInteractor relations for the BAK1 gene and visualize the network of interacting proteins:

In[14]:=
NestGraph[(Flatten[
     Normal@Values[
       ResourceFunction["BioDBnetGeneData"][StringTrim[#, "_" ~~ __], "CPDBProteinInteractor"]], 1] /. {l_List} :> l) &, "BAK1", 2, VertexLabels -> Placed["Name", Tooltip], GraphStyle -> "LargeNetwork", GraphLayout -> "RadialEmbedding"]
Out[14]=

Requirements

Wolfram Language 14.0 (January 2024) or above

Version History

  • 1.0.0 – 11 December 2024

Source Metadata

Related Resources

License Information