Wolfram Language Paclet Repository

Community-contributed installable additions to the Wolfram Language

Primary Navigation

    • Cloud & Deployment
    • Core Language & Structure
    • Data Manipulation & Analysis
    • Engineering Data & Computation
    • External Interfaces & Connections
    • Financial Data & Computation
    • Geographic Data & Computation
    • Geometry
    • Graphs & Networks
    • Higher Mathematical Computation
    • Images
    • Knowledge Representation & Natural Language
    • Machine Learning
    • Notebook Documents & Presentation
    • Scientific and Medical Data & Computation
    • Social, Cultural & Linguistic Data
    • Strings & Text
    • Symbolic & Numeric Computation
    • System Operation & Setup
    • Time-Related Computation
    • User Interface Construction
    • Visualization & Graphics
    • Random Paclet
    • Alphabetical List
  • Using Paclets
    • Get Started
    • Download Definition Notebook
  • Learn More about Wolfram Language

TCGADataTool

Guides

  • TCGA Data Tool

Tech Notes

  • Custom Entities
  • Data Exploration
  • Data Modeling
  • Data Visualization
  • Genomic Data
  • Images Download
  • Property Standard Name
  • User Interface

Symbols

  • buildDesignMatrix
  • buildModel
  • cleanRawData
  • columnHeaderRiskClassSummary
  • downloadGenomicData
  • dynamicallyExploreThreshold
  • exampleDataTCGA
  • getHistologicalImages
  • getPotentialPredictors
  • importGenomicDataFile
  • inspectPotentialPredictors
  • modelMeasurementsAtThreshold
  • overallSurvivalPlot
  • progressionFreeSurvivalPlot
  • pullDataSlice
  • radiologicalImagesBatchProcessing
  • swimmerPlot
  • TCGADataToolUserInterface
Data Exploration
TCGA project exploration
TCGA-CESC patient coverage
Data exploration using data available from the TCGA Data Tool paclet.
This loads the paclet.
In[1]:=
Needs["JaneShenGunther`TCGADataTool`"]
TCGA project exploration
Use data from GDCProject entities to visualize basic information about projects in the TCGA program.
Get all GDCProject entities for the TCGA program and their properties.
In[64]:=
tcgaProjectEntities=EntityList[EntityClass["GDCProject","TCGA"]]
Out[64]=

TCGA-CHOL
,
TCGA-LIHC
,
TCGA-DLBC
,
TCGA-BLCA
,
TCGA-ACC
,
TCGA-CESC
,
TCGA-PCPG
,
TCGA-PAAD
,
TCGA-MESO
,
TCGA-TGCT
,
TCGA-KIRP
,
TCGA-UVM
,
TCGA-UCS
,
TCGA-THYM
,
TCGA-COAD
,
TCGA-ESCA
,
TCGA-GBM
,
TCGA-KICH
,
TCGA-HNSC
,
TCGA-PRAD
,
TCGA-OV
,
TCGA-LUSC
,
TCGA-LAML
,
TCGA-LGG
,
TCGA-SARC
,
TCGA-BRCA
,
TCGA-READ
,
TCGA-LUAD
,
TCGA-STAD
,
TCGA-THCA
,
TCGA-KIRC
,
TCGA-SKCM
,
TCGA-UCEC

In[65]:=
properties={"ProjectID","CaseCount","TCIADataQ","FileSize"};​​tcgaProjectsMetadataSubset=EntityValue[tcgaProjectEntities,properties,"PropertyAssociation"];
In[65]:=
tcgaProjectsMetadataSubset〚;;3〛
Out[67]=
{ProjectIDTCGA-CHOL,CaseCountGDC51,TCIA0,TCIADataQFalse,FileSizeTotalFileSize
96.7764
GB
,ImageTotalFileSize
95.1052
GB
,TextTotalFileSize
1.67123
GB
,ProjectIDTCGA-LIHC,CaseCountGDC377,TCIA97,TCIADataQTrue,FileSizeTotalFileSize
627.952
GB
,ImageTotalFileSize
611.982
GB
,TextTotalFileSize
15.9699
GB
,ProjectIDTCGA-DLBC,CaseCountGDC58,TCIA0,TCIADataQFalse,FileSizeTotalFileSize
55.7289
GB
,ImageTotalFileSize
53.9161
GB
,TextTotalFileSize
1.81283
GB
}
Create a
PieChart
showing how many TCGA projects have data also in the TCIA portal.
In[68]:=
PieChart[​​ ReverseSort@Counts[tcgaProjectsMetadataSubset〚All,"TCIADataQ"〛],​​ SectorOriginTop,LabelingFunction"RadialCenter",​​ PlotLabel"TCGA projects that have TCIA data",​​ ChartLabelsNone,ChartLegendsAutomatic​​]
Out[68]=
True
False
Create
BarChart
showing number of cases per TCGA projects.
In[72]:=
BarChart[​​Sort@Association[Query[All,#ProjectID#["CaseCount"]["GDC"]&]@tcgaProjectsMetadataSubset],​​BarOriginLeft,ImageSize600,FrameTrue,GridLinesAutomatic,FrameLabel{"TCGA Projects on GDC portal","# GDC Cases"},ChartLabelsAutomatic,LabelingFunctionRight,PlotLabel"Number of GDC cases per TCGA project"​​]
Out[72]=
Contents cannot be rendered at this time; please try again later
Plot TCGA project file sizes broken down by GDC file type.
In[76]:=
BarChart[​​Sort@(Association[Query[All,#ProjectID#["FileSize"]&]@tcgaProjectsMetadataSubset]/.q_QuantityRound[UnitConvert[q,"Terabytes"],.001]),​​ BarOriginLeft,ImageSize600,FrameTrue,GridLinesAutomatic,​​ FrameLabel{"TCGA Projects on GDC portal","GDC File Size (TB)"},ChartLabels{Automatic,None},ChartLegendsPlaced[Automatic,Below],​​ LabelingFunctionAutomatic,PlotLabel"GDC file size per TCGA project",BarSpacing{0,1}​​]
Out[76]=
TCGA-CESC patient coverage
Use example data for TCGA-CESC project to get a visualization of how many patients have data for each data subcategory.
Load example data for TCGA-CESC project.
In[120]:=
dataStructure=
exampleDataTCGA
[{"TCGAProjectData","TCGACESCExceptGenomicDataAllPatients"}];
In[121]:=
dataScope=
exampleDataTCGA
[{"TCGAProjectData","TCGACESCExceptGenomicDataAllPatients"},"Metadata"]["DataScope"]
Out[121]=
{{Clinical,Patient},{Clinical,Drug},{Clinical,Radiation},{Clinical,NewTumorEvent},{Clinical,OtherMalignancyForm},{Clinical,FollowUp},{Clinical,NewTumorEventFollowUp},{Clinical,Ablation},{Biospecimen,Aliquot},{Biospecimen,Analyte},{Biospecimen,Auxiliary},{Biospecimen,DiagnosticSlide},{Biospecimen,Sample},{Biospecimen,Portion},{Biospecimen,Protocol},{Biospecimen,ShipmentPortion},{Biospecimen,Slide},{Biospecimen,SSFNormalControl},{Biospecimen,SSFTumorSample},{ScrapedData,RadiologicalImages},{ScrapedData,HistologicalImages},{ScrapedData,FollowUp},{ScrapedData,NewTumorEventFollowUp}}
Ignore Clinical::FollowUp and Clinical::NewTumorEventFollowUp subcategories since they are already covered by ScrapedData::FollowUp and ScrapedData::NewTumorEventFollowUp respectively.
In[122]:=
dataScope=DeleteCases[dataScope,Alternatives[{"Clinical","FollowUp"},{"Clinical","NewTumorEventFollowUp"}]];
Compute the number of patients covered by each data subcategory.
In[123]:=
patientsCovered=Flatten/@Thread[{dataScope,(Length[dataStructure]-Count[dataStructure〚All,Sequence@@#〛,_Missing])&/@dataScope}]
Out[123]=
{{Clinical,Patient,307},{Clinical,Drug,141},{Clinical,Radiation,177},{Clinical,NewTumorEvent,23},{Clinical,OtherMalignancyForm,8},{Clinical,Ablation,0},{Biospecimen,Aliquot,307},{Biospecimen,Analyte,307},{Biospecimen,Auxiliary,307},{Biospecimen,DiagnosticSlide,271},{Biospecimen,Sample,307},{Biospecimen,Portion,307},{Biospecimen,Protocol,307},{Biospecimen,ShipmentPortion,173},{Biospecimen,Slide,307},{Biospecimen,SSFNormalControl,305},{Biospecimen,SSFTumorSample,307},{ScrapedData,RadiologicalImages,54},{ScrapedData,HistologicalImages,307},{ScrapedData,FollowUp,248},{ScrapedData,NewTumorEventFollowUp,25}}
In[124]:=
patientsCovered=GroupBy[patientsCovered,First,Association[Rule@@@#〚All,2;;〛]&]
Out[124]=
ClinicalPatient307,Drug141,Radiation177,NewTumorEvent23,OtherMalignancyForm8,Ablation0,BiospecimenAliquot307,Analyte307,Auxiliary307,DiagnosticSlide271,Sample307,Portion307,Protocol307,ShipmentPortion173,Slide307,SSFNormalControl305,SSFTumorSample307,ScrapedDataRadiologicalImages54,HistologicalImages307,FollowUp248,NewTumorEventFollowUp25
Compute percentage of patients covered by each data subcategory.
In[125]:=
totalPatientsNumber=Length[dataStructure]
Out[125]=
307
In[126]:=
patientsCovered=Map["%"(Round[100.#/totalPatientsNumber]),"#"#&,patientsCovered,{2}]
Out[126]=
ClinicalPatient%100,#307,Drug%46,#141,Radiation%58,#177,NewTumorEvent%7,#23,OtherMalignancyForm%3,#8,Ablation%0,#0,BiospecimenAliquot%100,#307,Analyte%100,#307,Auxiliary%100,#307,DiagnosticSlide%88,#271,Sample%100,#307,Portion%100,#307,Protocol%100,#307,ShipmentPortion%56,#173,Slide%100,#307,SSFNormalControl%99,#305,SSFTumorSample%100,#307,ScrapedDataRadiologicalImages%18,#54,HistologicalImages%100,#307,FollowUp%81,#248,NewTumorEventFollowUp%8,#25
Visualize patient coverage by subcategory.

© 2025 Wolfram. All rights reserved.

  • Legal & Privacy Policy
  • Contact Us
  • WolframAlpha.com
  • WolframCloud.com