Function Repository Resource:

DAVIDGeneEnrichmentAnalysis

Source Notebook

Retrieve a functional enrichment analysis on a gene list from the DAVID genetic website

Contributed by: Keiko Hirayama

ResourceFunction["DAVIDGeneEnrichmentAnalysis"][{gene1,gene2,}]

returns the results of the DAVID website functional enrichment analysis for the specified list of genes.

Details and Options

DAVID is the Database for Annotation, Visualization, and Integrated Discovery. It provides annotation tools to help interpret the biological functions of genes.
DAVIDGeneEnrichmentAnalysis retrieves its results from DAVID and returns its result in Dataset form.
DAVIDGeneEnrichmentAnalysis identifies enriched biological themes, such as Gene Ontology (GO) terms and KEGG pathways, and classifies genes into enriched functional groups.
By default, DAVIDGeneEnrichmentAnalysis accepts a list of Entrez Gene IDs as input. To use other identifier types, such as Ensembl gene IDs or GenBank accession numbers, one can specify the appropriate value using the "Type" option.
The following options can be given:
"Type""ENTREZ_GENE_ID"type of gene identifiers used for the input; allowed values include: "AFFYMETRIX_3PRIME_IVT_ID","AFFYMETRIX_EXON_GENE_ID","AFFYMETRIX_SNP_ID", "AGILENT_CHIP_ID","AGILENT_ID","AGILENT_OLIGO_ID","ENSEMBL_GENE_ID", "ENSEMBL_TRANSCRIPT_ID","ENTREZ_GENE_ID","FLYBASE_GENE_ID", "FLYBASE_TRANSCRIPT_ID","GENBANK_ACCESSION","GENPEPT_ACCESSION", "GENOMIC_GI_ACCESSION","PROTEIN_GI_ACCESSION","ILLUMINA_ID","IPI_ID", "MGI_ID","GENE_SYMBOL","PFAM_ID","PIR_ACCESSION","PIR_ID","PIR_NREF_ID", "REFSEQ_GENOMIC","REFSEQ_MRNA","REFSEQ_PROTEIN","REFSEQ_RNA","RGD_ID", "SGD_ID","TAIR_ID","UCSC_GENE_ID","UNIGENE","UNIPROT_ACCESSION","UNIPROT_ID", "UNIREF100_ID","WORMBASE_GENE_ID","WORMPEP_ID","ZFIN_ID"
"Annotation"{"GOTERM_BP_ALL","GOTERM_CC_ALL", "GOTERM_MF_ALL","KEGG_PATHWAY"}List of annotation terms to be analyzed; the list can include a mix of annotation groups and individual annotations; allowed annotation groups (left) and individual annotations (right) include: "Gene Ontology"-> {"GOTERM_BP_1","GOTERM_BP_2","GOTERM_BP_3","GOTERM_BP_4","GOTERM_BP_5", "GOTERM_BP_ALL","GOTERM_BP_FAT","GOTERM_CC_1","GOTERM_CC_2","GOTERM_CC_3","GOTERM_CC_4", "GOTERM_CC_5","GOTERM_CC_ALL","GOTERM_CC_FAT","GOTERM_MF_1","GOTERM_MF_2","GOTERM_MF_3", "GOTERM_MF_4","GOTERM_MF_5","GOTERM_MF_ALL","GOTERM_MF_FAT"}, "Pathways"->{"BBID","BIOCARTA","EC_NUMBER","KEGG_COMPOUND","KEGG_PATHWAY","KEGG_REACTION"}, "Protein Domains"->{"BLOCKS_ID","COG","INTERPRO","PDB_ID","PFAM","PIR_ALN","PIR_HOMOLOGY_DOMAIN", "PIR_SUPERFAMILY","PRINTS","PRODOM","PROSITE","SCOP_ID","SMART","TIGRFAMS"}, "Disease"->{"GENETIC_ASSOCIATION_DB_DISEASE","OMIM_DISEASE"}, "General Annotations"->{"ALIAS_GENE_SYMBOL","CHROMOSOME","CYTOBAND","GENE","GENE_SYMBOL", "HOMOLOGOUS_GENE","LL_SUMMARY","OMIM_ID","PIR_SUMMARY","PROTEIN_MW","REFSEQ_PRODUCT", "SEQUENCE_LENGTH","SP_COMMENT"}, "Functional Categories"->{"CGAP_EST_QUARTILE","CGAP_EST_RANK","COG_ONTOLOGY","PIR_SEQ_FEATURE", "SP_COMMENT_TYPE","SP_PIR_KEYWORDS","UP_SEQ_FEATURE"}, "Protein-Protein Interaction"->{"BIND","DIP","HIV_INTERACTION_CATEGORY","HIV_INTERACTION", "MINT","NCICB_CAPATHWAY","TRANSFAC_ID"}, "Literature"->{"GENERIF_SUMMARY","HIV_INTERACTION_PUBMED_ID","PUBMED_ID"}
ResourceFunction["DAVIDGeneEnrichmentAnalysis"] supports a maximum query of 400 genes.
DAVID recommends that users perform no more than 200 requests per day from a single computer and allow a 10-second interval between analyses.

Examples

Basic Examples (2) 

Retrieve a functional enrichment analysis on a set of Entrez genes:

In[1]:=
ResourceFunction[
 "DAVIDGeneEnrichmentAnalysis"][{"3569", "3586", "90865", "7189"}]
Out[1]=

Retrieve a functional enrichment analysis on a set of Ensembl genes:

In[2]:=
ResourceFunction[
 "DAVIDGeneEnrichmentAnalysis"][{ExternalIdentifier["EnsemblGeneID", "ENSG00000012048"], ExternalIdentifier["EnsemblGeneID", "ENSG00000139618"], ExternalIdentifier["EnsemblGeneID", "ENSG00000083093"]}, "Type" -> "ENSEMBL_GENE_ID"]
Out[2]=

Scope (5) 

Identify enriched KEGG pathways for a list of selected genes:

In[3]:=
keggpathwaygenes = ResourceFunction[
  "DAVIDGeneEnrichmentAnalysis"][{"7157", "7042", "6794", "655", "7057", "1029", "2066", "7128", "6772", "11200", "958", "4734", "578", "6375", "991", "51561", "3082", "916", "1869", "2252", "8792", "864", "8817", "5105", "990", "701", "9641", "7187", "3383",
    "699", "355", "894", "3696", "8555", "9134", "3339", "8317", "5971", "9088", "2258", "890", "1871", "8200", "3693", "9700", "10220", "2919", "266629", "3695", "3679"}, "Annotation" -> "KEGG_PATHWAY"]
Out[3]=

Compare counts of associated genes and p-values across clusters:

In[4]:=
clusters = keggpathwaygenes[GroupBy[#Group &], All, {Length[#"Genes"], -Log[#"PValue"]} &]
Out[4]=
In[5]:=
ListPlot[clusters, PlotRange -> Full, AxesLabel -> {"gene count", "-log p"}]
Out[5]=

Compare count of genes and p-values across associated KEGG pathways:

In[6]:=
keggpvalue = SortBy[Normal@
   keggpathwaygenes[
    All, {#"Name", Length[#"Genes"], #"PValue"} &], #[[2]] &]
Out[6]=

Show a bar chart:

In[7]:=
Legended[
 BarChart[
  MapThread[
   Style[#1, ColorData["LightTemperatureMap"][
      Rescale[-Log10[#2], MinMax[-Log10 /@ keggpvalue[[All, 3]]]]]] &, {keggpvalue[[All, 2]], keggpvalue[[All, 3]]}], ChartLabels -> keggpvalue[[All, 1]],
   BarOrigin -> Left, AxesLabel -> "gene count", AspectRatio -> 1.7, ImageSize -> 600], BarLegend[{"LightTemperatureMap", MinMax[-Log10 /@ keggpvalue[[All, 3]]]}, LabelingFunction -> Function[(10^-HoldForm[#])], LegendLabel -> "p\[Hyphen]value"]]
Out[7]=

Visualize the network of genes and their associated KEGG pathways (the larger the node, the more genes are associated with that pathway):

In[8]:=
Graph[Flatten[
   Normal@keggpathwaygenes[All, Thread[{#"Name", #"Genes"}] &], 1] /. {bp_, gn_} :> UndirectedEdge[gn, bp], VertexStyle -> Map[# -> Orange &, Normal@keggpathwaygenes[All, "Name"]],
  VertexSize -> Join[Map[#[[1]] -> {#[[2]]*.002, #[[2]]*.002} &, Normal@keggpathwaygenes[All, {#"Name", Length[#"Genes"]} &]], Map[# -> .15 &, Union@Flatten@Normal@keggpathwaygenes[All, "Genes"]]],
  VertexLabels -> "Name", EdgeStyle -> Directive[Thick, LightGray], BaseStyle -> Directive[Gray, EdgeForm[None]], GraphLayout -> "CircularEmbedding"]
Out[8]=

Options (2) 

Type (1) 

Use the "Type" option to specify the gene identifier type for the input:

In[9]:=
ResourceFunction[
 "DAVIDGeneEnrichmentAnalysis"][{"Q12888", "Q9Y2B4", "P04637"}, {"Type" -> "UNIPROT_ACCESSION"}]
Out[9]=

Annotation (1) 

Use the "Annotation" option to specify the annotation to be used:

In[10]:=
ResourceFunction[
 "DAVIDGeneEnrichmentAnalysis"][{"11200", "11151", "1415"}, {"Annotation" -> "Literature"}]
Out[10]=

Applications (4) 

Identify enriched KEGG pathways for a list of selected genes:

In[11]:=
enrichedkeggpath = ResourceFunction[
  "DAVIDGeneEnrichmentAnalysis"][{"581", "637", "2810", "7157"}, "Annotation" -> {"KEGG_PATHWAY"}]
Out[11]=

Find the pathway code for one of the enriched KEGG pathways:

In[12]:=
pathcode = enrichedkeggpath[1, "ID"]["ExternalID"]
Out[12]=

Find the gene symbols for the associated genes:

In[13]:=
genes = Normal@Flatten@Values@ResourceFunction[
ResourceObject[<|"Name" -> "BioDBnetGeneData", "ShortName" -> "BioDBnetGeneData", "UUID" -> "73f25ec5-bf6c-435a-8087-74a79d984422", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "Access information on genes available in major biological databases", "RepositoryLocation" -> URL[
         "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$f4720c38dbf74195bd13e18f466ed042`BioDBnetGeneData", "FunctionLocation" -> CloudObject[
         "https://www.wolframcloud.com/obj/90cea3ec-6d0a-42f7-828a-46ffd6ae44dd"]|>, ResourceSystemBase -> Automatic]][{"581", "637", "2810", "7157"}, "GeneSymbol"]
Out[13]=

Visualize the relevant genes on the KEGG pathway graph:

In[14]:=
ResourceFunction["KEGGPathway"][StringTake[pathcode, 3], StringDrop[pathcode, 3], "Graph", VertexSize -> {.02, .01}, VertexStyle -> Map[# -> Yellow &, genes], ImageSize -> 800]
Out[14]=

Properties and Relations (2) 

Identify enriched Gene Ontology concepts for a list of genes:

In[15]:=
enrichedgo = ResourceFunction[
  "DAVIDGeneEnrichmentAnalysis"][{"5286", "5728", "219699", "90249"}, {"Annotation" -> "GOTERM_BP_ALL"}]
Out[15]=

Use the GeneOntologyData resource function to get additional information on a selected Gene Ontology concept:

In[16]:=
ResourceFunction["GeneOntologyData"][enrichedgo[1, "ID"]["ExternalID"]]
Out[16]=

Possible Issues (1) 

A list of more than 400 genes fails to perform the analysis:

In[17]:=
ResourceFunction["DAVIDGeneEnrichmentAnalysis"][
 ToString /@ Table[n, {n, 1, 401}]]
Out[17]=

Requirements

Wolfram Language 14.0 (January 2024) or above

Version History

  • 1.0.0 – 07 May 2025

Source Metadata

Related Resources

License Information