Wolfram Language Paclet Repository

Community-contributed installable additions to the Wolfram Language

Primary Navigation

    • Cloud & Deployment
    • Core Language & Structure
    • Data Manipulation & Analysis
    • Engineering Data & Computation
    • External Interfaces & Connections
    • Financial Data & Computation
    • Geographic Data & Computation
    • Geometry
    • Graphs & Networks
    • Higher Mathematical Computation
    • Images
    • Knowledge Representation & Natural Language
    • Machine Learning
    • Notebook Documents & Presentation
    • Scientific and Medical Data & Computation
    • Social, Cultural & Linguistic Data
    • Strings & Text
    • Symbolic & Numeric Computation
    • System Operation & Setup
    • Time-Related Computation
    • User Interface Construction
    • Visualization & Graphics
    • Random Paclet
    • Alphabetical List
  • Using Paclets
    • Get Started
    • Download Definition Notebook
  • Learn More about Wolfram Language

TCGADataTool

Guides

  • TCGA Data Tool

Tech Notes

  • Custom Entities
  • Data Exploration
  • Data Modeling
  • Data Visualization
  • Genomic Data
  • Images Download
  • Property Standard Name
  • User Interface

Symbols

  • buildDesignMatrix
  • buildModel
  • cleanRawData
  • columnHeaderRiskClassSummary
  • downloadGenomicData
  • dynamicallyExploreThreshold
  • exampleDataTCGA
  • getHistologicalImages
  • getPotentialPredictors
  • importGenomicDataFile
  • inspectPotentialPredictors
  • modelMeasurementsAtThreshold
  • overallSurvivalPlot
  • progressionFreeSurvivalPlot
  • pullDataSlice
  • radiologicalImagesBatchProcessing
  • swimmerPlot
  • TCGADataToolUserInterface
Data Modeling
Example workflow for data modeling functionality in the TCGA Data Tool.
This loads the package.
In[1]:=
Needs["JaneShenGunther`TCGADataTool`"]
Basic data modeling workflow
Basic data modeling workflow using example data for TCGA-CESC project.
Load TCGA-CESC example data structure and its description.
In[148]:=
exampleDataTCGA
[{"TCGAProjectData","TCGACESCExceptGenomicDataAllPatients"},"Description"]
Out[148]=
Example data structure for project TCGA-CESC, including all patients and full data scope except for genomic data. This example shows how data is stored and organized under the hood by the TCGADataToolUserInterface[]. Files exported in the .m format from the TCGADataToolUserInterface[] will adhere to this format.
In[149]:=
dataStructure=
exampleDataTCGA
[{"TCGAProjectData","TCGACESCExceptGenomicDataAllPatients"}];

Outcome and predictor definition

Define the outcome property you whish your model to predict. Outcome properties are expected to be provided in their standard format "Category::Subcategory::Property".
Use "Clinical::Patient::vital_status" as outcome property.
In[150]:=
outcomeProperty="Clinical::Patient::vital_status";
Inspect outcome property values for TCGA-CESC patients.
In[151]:=
Counts@Flatten@Values@
pullDataSlice
[dataStructure,{outcomeProperty}]
Out[151]=
Alive247,Dead60
Define potential predictors for the selected outcome. They must be provided as a list of properties expressed in their standard format "Category::Subcategory::Property".
Use a manually curated set of predictor properties.
In[152]:=
predictors={"Clinical::Patient::person_neoplasm_cancer_status","Clinical::Patient::eastern_cancer_oncology_group","Clinical::Patient::tobacco_smoking_history","Clinical::Drug::drug_name"};
Alternatively, use the function getPotentialPredictor to get a list of potential predictors from all properties available in the current data structure.
In[153]:=
Shallow​​
getPotentialPredictors
[dataStructure,outcomeProperty]["PotentialPredictors"]​​
Out[153]//Shallow=
{Biospecimen::Aliquot::center_id,Biospecimen::Aliquot::plate_column,Biospecimen::Aliquot::plate_id,Biospecimen::Aliquot::plate_row,Biospecimen::Aliquot::quantity,Biospecimen::Aliquot::volume,Biospecimen::Analyte::concentration,Biospecimen::Analyte::spectrophotometer_method,Biospecimen::Analyte::subportion_sequence,Biospecimen::Auxiliary::hpv_call_1,128}

Design matrix creation

Create a design matrix for the selected outcome and predictor properties. Take the first of a list of property values, in case the property has multiple values for a single patient, e.g. "Clinical::Drug::drug_name".
Example data for property "Clinical::Drug::drug_name".
In[154]:=
pullDataSlice
[dataStructure,{"bcr_patient_uuid","Clinical::Drug::drug_name"},"DeleteMissing"True];;7
Out[154]=
{bcr_patient_uuidf150d999-cdec-427e-b8b2-00731c648989,Clinical::Drug::drug_name{Cisplatin},bcr_patient_uuidcf87812f-6db6-4592-9842-c62b3e4ff03f,Clinical::Drug::drug_name{cisplatin},bcr_patient_uuidadb7c5c8-4afd-40dc-89f1-571fbd88e4f9,Clinical::Drug::drug_name{Cisplatin},bcr_patient_uuid6868ba1d-b612-454c-8146-74af2f573d76,Clinical::Drug::drug_name{Cisplatin,Aloxi},bcr_patient_uuidcb695fdd-381e-4c00-836e-0ba27e32176b,Clinical::Drug::drug_name{Paclitaxel},bcr_patient_uuid2e08a89e-4035-4782-b54f-f98eef5db80d,Clinical::Drug::drug_name{Cisplatin,Cisplatin,Fluoruracil},bcr_patient_uuidc14468eb-e725-4029-96cc-75bb9ebd9def,Clinical::Drug::drug_name{Cisplatin}}
Create the design matrix.
In[155]:=
designMatrix=
buildDesignMatrix
[dataStructure,predictors,outcomeProperty,"GeneralListHandlingFunction"(If[ListQ[#],First[#],#]&)];
In[156]:=
designMatrix//Length
Out[156]=
307
In[157]:=
designMatrix〚;;3〛
Out[157]=
{bcr_patient_barcodeTCGA-ZJ-AAXN,bcr_patient_uuid5f22e050-c172-4f2d-a1f9-6cb749b6ef98,Clinical::Drug::drug_nameMissing[NotAvailable],Clinical::Patient::eastern_cancer_oncology_groupMissing[NotAvailable],Clinical::Patient::person_neoplasm_cancer_statusMissing[Unknown],Clinical::Patient::tobacco_smoking_history2,Clinical::Patient::vital_statusAlive,bcr_patient_barcodeTCGA-HM-A3JJ,bcr_patient_uuidf150d999-cdec-427e-b8b2-00731c648989,Clinical::Drug::drug_nameCisplatin,Clinical::Patient::eastern_cancer_oncology_group0,Clinical::Patient::person_neoplasm_cancer_statusWITH TUMOR,Clinical::Patient::tobacco_smoking_historyMissing[NotAvailable],Clinical::Patient::vital_statusDead,bcr_patient_barcodeTCGA-HG-A9SC,bcr_patient_uuidcf87812f-6db6-4592-9842-c62b3e4ff03f,Clinical::Drug::drug_namecisplatin,Clinical::Patient::eastern_cancer_oncology_group0,Clinical::Patient::person_neoplasm_cancer_statusTUMOR FREE,Clinical::Patient::tobacco_smoking_history1,Clinical::Patient::vital_statusAlive}

Model creation

Create a machine learning model that predicts outcome property values. Train the model using a random sample of 80% of the design matrix, leave the remaining 20% for testing.
Define training and test sets.
In[158]:=
SeedRandom[123];​​trainingSet=RandomSample[designMatrix,Round[Length[designMatrix]*.8]];​​testSet=Complement[designMatrix,trainingSet];
In[161]:=
trainingSet//Length
Out[161]=
246
In[162]:=
testSet//Length
Out[162]=
61
Train a classifier model for the categorical outcome property "Clinical::Patient::vital_status" on TCGA-CESC data.
In[163]:=
model=
buildModel
[trainingSet,outcomeProperty]
Out[163]=
ClassifierFunction
Input type:
Mixed
(number: 4)
Classes: Alive,Dead


Model performances evaluation

Evaluate model performances on the test set using built-in Wolfram Language functions.
Compute a measurement report using the trained model and the test set. Drop patient identifier columns since they were not used for training.
In[165]:=
modelMeasures=ClassifierMeasurements[model,KeyDrop[testSet,{"bcr_patient_uuid","bcr_patient_barcode"}]]
Out[165]=
Contents cannot be rendered at this time; please try again later
Inspect various measurements available.
In[36]:=
modelMeasures["ROCCurve"]
Out[36]=
Contents cannot be rendered at this time; please try again later
In[37]:=
modelMeasures["ProbabilityHistogram"]
Out[37]=
RelatedGuides
▪
TCGA Data Tool

© 2025 Wolfram. All rights reserved.

  • Legal & Privacy Policy
  • Contact Us
  • WolframAlpha.com
  • WolframCloud.com