Function Repository Resource:

EnsemblGeneTree (1.0.0) current version: 1.0.1 »

Source Notebook

Get an evolutionary tree of homologous genes

Contributed by: Keiko Hirayama

ResourceFunction["EnsemblGeneTree"][gene, "Dataset"]

gives the dataset of tree components associated with a specified gene.

ResourceFunction["EnsemblGeneTree"][gene, "TreeGraphic"]

gives the gene tree graphic associated with a specified gene.

ResourceFunction["EnsemblGeneTree"][gene, "Tree"]

gives the Tree object associated with a specified gene.

Details and Options

EnsemblGeneTree is based on Ensembl, which provides genomics information including phylogenetic relations.
Nodes and branches of trees illustrate homologous genes of various species and their evolutionary relationships, respectively.
Selected "Gene" entities, gene symbols or Ensembl gene/tree IDs can be used for the argument gene.
The following options can be given:
"TreeType""GeneTree"specified type of trees to illustrate "TreeGraphic" result; allowed values include: "GeneTree", "GeneGainLossTree"
"SequenceType""Protein"specified type of sequences included in the "Dataset"; allowed values include: "Protein", "cDNA"
"Species""human"specified species to query result; selected TaxonomicSpecies entities or names of species are used
"Highlight"{}specified "TreeGraphic" elements to highlight
The "GeneTree" illustrates evolutionary relationships between a group of homologous genes. Branch lengths are estimated based on the DNA alignment. The tree may include ambiguous nodes due to the phylogenetic interpretation of duplication nodes with a low consistency score.
The "GeneGainLossTree" analyzes the evolution of the size of gene families over time. Significant gene gain events (expansions) are shown with orange branches and gene loss events (contractions) are shown with blue branches.
ResourceFunction["EnsemblGeneTree"][gene,"Tree"] returns the Tree object illustrating the relations among organisms. It does not account for branch lengths (genetic distances) or gene gain/loss events.
ResourceFunction["EnsemblGeneTree"][gene] is equivalent to ResourceFunction["EnsemblGeneTree"][gene,"TreeGraphic"].

Examples

Basic Examples (2) 

Retrieve the dataset of the gene tree associated with CLOCK genes, which are involved in circadian rhythms:

In[1]:=
ResourceFunction["EnsemblGeneTree"]["CLOCK", "Dataset"]
Out[1]=

Visualize the gene tree. Hover over or click any tree elements to find associated species information:

In[2]:=
ResourceFunction["EnsemblGeneTree"]["Clock", "TreeGraphic"]
Out[2]=

Retrieve the dataset of the gene gain/loss tree associated with the olfactory receptor family 1 subfamily A member 1 (OR1A1) gene:

In[3]:=
ResourceFunction["EnsemblGeneTree"]["OR1A1", "Dataset", "TreeType" -> "GeneGainLossTree"]
Out[3]=

Visualize the gene gain/loss events over time. Orange and blue branches illustrate the significant gene expansions and contractions, respectively. Hover over or click any tree elements to find associated species information:

In[4]:=
ResourceFunction["EnsemblGeneTree"]["OR1A1", "TreeGraphic", "TreeType" -> "GeneGainLossTree"]
Out[4]=

Scope (2) 

Retrieve the Tree object for the CLOCK genes to analyze the relations of associated species:

In[5]:=
clocktree = ResourceFunction["EnsemblGeneTree"]["CLOCK", "Tree"]
Out[5]=

Find the subtree that includes Marsupialia:

In[6]:=
marsupialiatree = TreeCases[clocktree, "Marsupialia"]
Out[6]=

Highlight the subtree:

In[7]:=
Tree[clocktree, TreeElementStyle -> (TreeCases[
     Alternatives @@ Join[{TreeData[marsupialiatree[[1]]], TreeData[TreeChildren[marsupialiatree[[1]]][[1]]]}, TreeData /@ TreeLeaves[marsupialiatree[[1]]]]] -> Red)]
Out[7]=

Use the GeneOntologyData resource function to find genes associated with the Gene Ontology concept GO:0007165, which describes signal transduction:

In[8]:=
signalTransduction = ResourceFunction[
ResourceObject[<|"Name" -> "GeneOntologyData", "ShortName" -> "GeneOntologyData", "UUID" -> "e4fc4f85-50ef-48bd-a5fe-0828f8e5d6a7", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "Access hierarchical relation and associated gene product information for Gene Ontology concepts", "RepositoryLocation" -> URL[
      "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$624e67ea547d444896115a9b1fe8272e`GeneOntologyData", "FunctionLocation" -> CloudObject[
      "https://www.wolframcloud.com/obj/e3d94857-e447-4f90-af20-cb8a2aed96c6"]|>, ResourceSystemBase -> Automatic]]["GO:0007165"]
Out[8]=

Find the list of genes involved in signal transduction:

In[9]:=
signalTransductionGenes = signalTransduction["AssociatedGenes"]
Out[9]=

Retrieve the datasets of the gene gain/loss tree for genes associated with signal transduction (it may take some time to download all data):

In[10]:=
signalTransductionGeneTreeData = ResourceFunction["EnsemblGeneTree"][#[[1]], "Dataset", {"TreeType" -> "GeneGainLossTree", "Species" -> #[[2]]}] & /@ Normal@Values@
      signalTransductionGenes[All, {"Gene", "ScientificName"}] // DeleteMissing;
In[11]:=
signalTransductionGeneTreeData[[1]]
Out[11]=

Compute the total number of gene families along the lineage:

In[12]:=
signalTransductionGeneFamilyTotal = Normal@GroupBy[
    Flatten[Cases[signalTransductionGeneTreeData, d_Dataset :> Normal@d[All, {"ScientificName", "Parent", "Members"}]]], First, {#[[1, 2]], Total@#[[All, -1]]} &];
signalTransductionGeneFamilyTotal // Short
Out[13]=

For the selected species (human, Atlantic salmon, chicken, and Indian cobra), plot the total number of gene families along the lineage:

In[14]:=
signalTransductionGeneFamilyLineage = # -> (MapThread[
        Rule, {#, # /. signalTransductionGeneFamilyTotal /. {_, i_Integer} :> i}] &@Reverse@
       DeleteMissing@
        NestWhileList[
         Cases[signalTransductionGeneFamilyTotal, Rule[#, {par_, _}] :> par] /. {s_} :> s &, #, ! MatchQ[#, {} | _Missing] &]) & /@ {"Homo sapiens", "Salmo salar", "Gallus gallus reference breed", "Naja naja"}
Out[14]=
In[15]:=
GraphicsGrid[
 Partition[
  ListLinePlot[#[[2]][[All, 2]], Ticks -> {MapIndexed[
        List[#2[[1]], Style[Rotate[#1, Pi/2]]] &, #[[2]][[All, 1]]], Automatic}, PlotLabel -> (#[[1]] /. Normal@signalTransductionGeneTreeData[[1]][
          All, #"ScientificName" -> #"CommonName" &])] & /@ signalTransductionGeneFamilyLineage, 2]]
Out[15]=

Use the time tree information from the gene gain/loss tree dataset to plot the evolutionary timeline of selected species:

In[16]:=
TimelinePlot[
 Tooltip[Labeled["TimeTree", "ScientificName"], "ScientificName"] /. Select[Union@
     Flatten[Normal@
       Cases[signalTransductionGeneTreeData, d_Dataset :> Normal@d[All, {"ScientificName", "TimeTree"}]], 1] /. q_Quantity :> Now + q, MemberQ[Union[
      Flatten[signalTransductionGeneFamilyLineage[[All, 2]]][[All, 1]]],
      "ScientificName" /. #[[1]]] &]]
Out[16]=

Options (4) 

SequenceType (1) 

Use "SequenceType""cDNA" to retrieve the dataset of the gene tree associated with the Cytochrome c oxidase I (COX1) gene, including corresponding cDNA sequences:

In[17]:=
ResourceFunction["EnsemblGeneTree"]["COX1", "Dataset", "SequenceType" -> "cDNA"]
Out[17]=

Species (1) 

Use the "Species" option to retrieve the dataset of the gene tree including koala genes:

In[18]:=
ResourceFunction["EnsemblGeneTree"]["BAK1", "Dataset", "Species" -> Entity["TaxonomicSpecies", "PhascolarctosCinereus::2kft4"]]
Out[18]=

TreeType (1) 

Use "TreeType""GeneGainLossTree" to visualize the gene gain/loss tree:

In[19]:=
ResourceFunction["EnsemblGeneTree"][
 Entity["Gene", {"TP53", {"Species" -> "HomoSapiens"}}], "TreeGraphic",
  "TreeType" -> "GeneGainLossTree"]
Out[19]=

Highlight (1) 

Use the "Highlight" option to highlight the genes of selected species:

In[20]:=
ResourceFunction["EnsemblGeneTree"]["ENSG00000157764", "TreeGraphic", "Highlight" -> {"Zebrafish" -> Red, "Fugu" -> Blue}]
Out[20]=

Requirements

Wolfram Language 14.0 (January 2024) or above

Version History

  • 1.0.1 – 21 April 2025
  • 1.0.0 – 12 March 2025

Source Metadata

Related Resources

License Information