Function Repository Resource:

DISCOSingleCellAtlas

Source Notebook

Explore single-cell atlases by tissue type

Contributed by: Keiko Hirayama

ResourceFunction["DISCOSingleCellAtlas"][tissue, "CellData"]

gives the dataset of cells for a specified tissue.

ResourceFunction["DISCOSingleCellAtlas"][tissue,"GeneExpressionData"]

gives the dataset of gene expressions for a specified tissue.

Details

Information is based on Deeply Integrated human Single-Cell Omics (DISCO) data
Single cell atlases are constructed using Uniform Manifold Approximation and Projection (UMAP), a nonlinear dimensionality reduction technique
Selected tissues can be specified by a name of the tissue as a string or an "AnatomicalStructure" Entity
The "CellData" content is a Dataset containing details of cells with the following properties:
Indexunique identifier
CellTypetype of a cell
RNACounttotal number of RNA transcripts
FeatureRNACounttotal number of feature RNA transcripts
Tissuename of a tissue
AnatomicalSitemore specific anatomical site, if available
UMAPvalues to construct Uniform Manifold Approximation and Projection (UMAP) graphs
The "GeneExpressionData" content is a Dataset containing details of gene expressions with the following properties
CellTypetype of a cell
Genename of a gene
Valuespairs of gene expression and unique molecular identifiers (UMI) values
Some datasets are large and will take time to download

Examples

Basic Examples (3) 

Retrieve a cell dataset for adrenal glands:

In[1]:=
ResourceFunction["DISCOSingleCellAtlas"]["AdrenalGland", "CellData"]
Out[1]=

Get a gene expression dataset for the adrenal gland. The dataset is large and will take time to download:

In[2]:=
First@AbsoluteTiming[
  data = ResourceFunction["DISCOSingleCellAtlas"]["AdrenalGland", "GeneExpressionData"]]
Out[2]=

View a sample of the data:

In[3]:=
data[[1 ;; 3]]
Out[3]=

Applications (7) 

Retrieve a cell dataset for adrenal glands:

In[4]:=
celldata = ResourceFunction["DISCOSingleCellAtlas"]["AdrenalGland", "CellData"]
Out[4]=

Compare the total number of recorded genes across cells:

In[5]:=
expressedGeneCount = ReverseSort[
   GroupBy[Normal@celldata[All, {"CellType", "RNACount"}], First -> Last, N@Median@# &]];
BarChart[expressedGeneCount, ChartLabels -> Keys@expressedGeneCount]
Out[6]=

Construct a cell overview atlas using Uniform manifold approximation and projection (UMAP):

In[7]:=
umap = GroupBy[Normal@celldata[All, {"CellType", "UMAP"}], First -> Last];
ListPlot[Values@umap, PlotLegends -> Keys@umap, AxesOrigin -> {-15, -15}, AxesLabel -> {"UMAP_1", "UMAP_2"}]
Out[8]=

Get a gene expression dataset for the adrenal gland. The dataset is large and will take time to download:

In[9]:=
geneexpdata = ResourceFunction["DISCOSingleCellAtlas"]["AdrenalGland", "GeneExpressionData"];
In[10]:=
geneexpdata[[1]]
Out[10]=

Compare gene expressions across cells:

In[11]:=
expr = geneexpdata[Select[MatchQ[#"Gene", "C1QB"] &]][
   All, {#CellType, Log2[#Values[[All, 1]]]} &];
DistributionChart[expr[All, 2], ChartLabels -> (Style[Rotate[#1, Pi/2]] & /@ Normal@expr[All, 1])]
Out[12]=

Make gene expression matrix for specified genes:

In[13]:=
genes = {"S100A9", "S100A8", "CHGA", "LYZ", "PENK", "C1QB", "C1QA", "STMN2", "CHGB", "TUBB2B", "COL1A1", "NKG7", "TUBA1A", "COL1A2", "C1QC", "GAP43", "KLRB1", "NPY", "COL3A1", "S100B", "ACTA2", "SPP1", "S100A12", "CCL3", "HLA-DPB1", "CCL4", "DCN", "S100A4", "PRPH", "MLLT11", "CARTPT", "RETN", "HLA-DRA", "RNASE1", "RTN1", "LUM", "SCG2", "HLA-DPA1", "TAGLN", "G0S2", "RGS5", "MAP1B", "CRYAB", "CD74", "PLAC9", "IGFBP5", "STMN4", "PCSK1N", "CXCL8", "FCN1"};
In[14]:=
geneexp = KeyUnion[
   Association @@@ (Normal@
       geneexpdata[Select[MemberQ[genes, #"Gene"] &]][
           All, {#"CellType", #"Gene", Log2[(Total@#"Values"[[All, 1]])/(Total[#"Values"[[All, 2]]/
                  Median[#"Values"[[All, 2]]]])]} &][
          GroupBy[#[[2]] &]][genes][Values] /. {cl_String, _, v_} :> Rule[cl, v])];
In[15]:=
MatrixPlot[Transpose[Values@geneexp] /. _Missing :> None, FrameTicks -> {MapIndexed[List[#2[[1]], #1] &, Keys@geneexp[[1]]], MapIndexed[List[#2[[1]], Rotate[#1, Pi/2]] &, genes]}, PlotRangePadding -> .5, PlotLegends -> Automatic, Mesh -> All, PlotRange -> All, ClippingStyle -> None]
Out[15]=

Create a plot with bubbles illustrating % expressed in cells:

In[16]:=
cellct = Association[
  Rule @@@ Normal@Tally[celldata[All, "CellType"]]]; geneexpct = KeyUnion[
  Association @@@ (Normal@
      geneexpdata[Select[MemberQ[genes, #"Gene"] &]][
          All, {#"CellType", #"Gene", Length[#"Values"]*100/cellct[#"CellType"] // N, Log2[(Total@#"Values"[[All, 1]])/(Total[#"Values"[[All, 2]]/
                 Median[#"Values"[[All, 2]]]])]} &][
         GroupBy[#[[2]] &]][genes][Values] /. {cl_String, _, c_, v_} :>
       Rule[cl, {c, v}])];
In[17]:=
Legended[
 BubbleChart[
  MapThread[
     Style[Flatten@{#1, #3}, ColorData[{"BeachColors", "Reverse"}][#2]] &, {#[[All, 1 ;; 2]], Rescale[#[[All, 4]], MinMax@#[[All, 4]]], #[[All, 3]]}] &@
   Flatten[MapIndexed[Flatten /@ Thread[List[#2[[1]], #1]] &, MapIndexed[List[#2[[1]], #1] &, #] & /@ Values@geneexpct] /. _Missing :> Sequence[0, 0], 1], BubbleSizes -> {.01, .02}, FrameTicks -> {{MapIndexed[List[#2[[1]], #1] &, Keys@geneexpct[[1]]], Automatic}, {MapIndexed[List[#2[[1]], Rotate[#1, Pi/2]] &, genes], Automatic}}, GridLines -> {Function[{min, max}, Range[Floor[min], Ceiling[max]]],
     None}, PlotRangePadding -> Scaled[.01], ImageSize -> 500], Column[{BarLegend[{"BeachColors", "Reverse"}, LegendLayout -> "Row", LegendLabel -> "scaled gene expression"], SwatchLegend[Table[Blue, 5], {0, 25, 50, 75, 100}, LegendMarkers -> Graphics[{Gray, Circle[]}], LegendMarkerSize -> {.01, .0125, .015, .0175, .02}*700, LegendLayout -> "Row", LegendLabel -> "expressed in cells (%)"]}]]
Out[17]=

Requirements

Wolfram Language 13.0 (December 2021) or above

Version History

  • 1.0.0 – 06 November 2024

Source Metadata

Related Resources

License Information