Wolfram Language Paclet Repository

Community-contributed installable additions to the Wolfram Language

Primary Navigation

    • Cloud & Deployment
    • Core Language & Structure
    • Data Manipulation & Analysis
    • Engineering Data & Computation
    • External Interfaces & Connections
    • Financial Data & Computation
    • Geographic Data & Computation
    • Geometry
    • Graphs & Networks
    • Higher Mathematical Computation
    • Images
    • Knowledge Representation & Natural Language
    • Machine Learning
    • Notebook Documents & Presentation
    • Scientific and Medical Data & Computation
    • Social, Cultural & Linguistic Data
    • Strings & Text
    • Symbolic & Numeric Computation
    • System Operation & Setup
    • Time-Related Computation
    • User Interface Construction
    • Visualization & Graphics
    • Random Paclet
    • Alphabetical List
  • Using Paclets
    • Get Started
    • Download Definition Notebook
  • Learn More about Wolfram Language

TCGADataTool

Guides

  • TCGA Data Tool

Tech Notes

  • Custom Entities
  • Data Exploration
  • Data Modeling
  • Data Visualization
  • Genomic Data
  • Images Download
  • Property Standard Name
  • User Interface

Symbols

  • buildDesignMatrix
  • buildModel
  • cleanRawData
  • columnHeaderRiskClassSummary
  • downloadGenomicData
  • dynamicallyExploreThreshold
  • exampleDataTCGA
  • getHistologicalImages
  • getPotentialPredictors
  • importGenomicDataFile
  • inspectPotentialPredictors
  • modelMeasurementsAtThreshold
  • overallSurvivalPlot
  • progressionFreeSurvivalPlot
  • pullDataSlice
  • radiologicalImagesBatchProcessing
  • swimmerPlot
  • TCGADataToolUserInterface
Property Standard Name
Data hierarchy
Properties retrieval
The paclet uses properties across the board, here we describe the standard names for the properties
This loads the paclet.
In[23]:=
Needs["JaneShenGunther`TCGADataTool`"]
Data hierarchy
TCGADataTool allows to easily retrieve and process data for TCGA projects from GDC and TCIA portals. Data for a TCGA project is internally organized at the patient level as a
List
of
Association
, where each
Association
represents a patient.
Patient data follows the hierarchy: Category ▶ Subcategory ▶ Property. Each patient Association is nested and contains at the first level patient identifiers and data categories labels. Category labels point to subcategories labels which then point to the actual subcategory data in the form of a List of Association:
Example data for a single patient:
​​ (*patientidentifiers*)​​ "bcr_patient_uuid""00ad0ffe-2105-4829-a495-1c2aceb5bb31",​​"bcr_patient_barcode""TCGA-EK-A2R9",​​​​(*categories*)​​"Clinical"​​ (*"Clinical"subcategories*)"Patient"{"bcr_patient_uuid""00AD0FFE-2105-4829-A495-1C2ACEB5BB31","bcr_patient_barcode""TCGA-EK-A2R9","vital_status""Alive"...},​​​​"FollowUp4.0"{"bcr_patient_uuid""00AD0FFE-2105-4829-A495-1C2ACEB5BB31","bcr_patient_barcode""TCGA-EK-A2R9","bcr_followup_barcode""TCGA-EK-A2R9-F58255","bcr_followup_uuid""B7035727-5282-454B-AC18-BA317E344D00",...}​​,​​​​"Biospecimen"​​​​​​(*"Biospecimen"subcategories*)​​"Aliquot"{​​"bcr_patient_uuid""00AD0FFE-2105-4829-A495-1C2ACEB5BB31","bcr_sample_barcode""TCGA-EK-A2R9-01A","bcr_aliquot_barcode""TCGA-EK-A2R9-01A-11D-A18H-01",...,​​"bcr_patient_uuid""00AD0FFE-2105-4829-A495-1C2ACEB5BB31","bcr_sample_barcode""TCGA-EK-A2R9-01A","bcr_aliquot_barcode""TCGA-EK-A2R9-01A-11D-A18I-02","bcr_aliquot_uuid""B282ED3B-A192-4814-BD71-2DB05E1CCE1F",...,​​...​​},​​...​​,​​...​​​​
Some patients might not have data for all sub/categories or not all sub/categories might have been selected for download in the "Download parameters" pane of the user interface, in those cases some sub/category will simply not appear for all patients.
Most of the TCGADataTool paclet functionalities use a standard format to reference data properties. The standard property format leverages the data hierarchy and expresses each property as a concatenation of three elements: category, subcategory and property, separated by "::". For example, the standard name of the property "vital_status" from the category Clinical and subcategory Patient is: "Clinical::Patient::vital_status".
Supported data categories:
Clinical
GDC clinical patient data
Biospecimen
GDC biospecimen patient data
ScrapedData
data processed from different sources
GenomicData
GDC genomic data
Supported data categories.
Supported data subcategories:
Patient
demographic and general patient information
Drug
drug treatments
Radiation
radiation treatments
FollowUpXX
follow-up data
NewTumorEvent
new tumor events data
NewTumorEventFollowUpXX
new tumor events forllow-up data
OtherMalignancyForm
other malignancy data
Ablation
ablation data
"Clinical" subcategories.
Aliquot
aliquot data
Analyte
analyte data
Auxiliary
auxiliary data
DiagnosticSlide
metadata on diagnostic slides
Sample
biospecimen sample data
Portion
portion data
Protocol
protocol data
ShipmentPortion
shipment portion
Slide
slide metadata
SSFNormalControl
sample submission form normal control
SSFTumorSample
sample submission form tumor sample data
"Biospecimen" subcategories.
RadiologicalImages
radiological metadata from TCIA
HistologicalImages
histological images metadata from GDC API
FollowUp
union of different versions of Clinical FollowUp data
NewTumorEventFollowUp
union of different versions of Clinical NewTumorEventFollowUp data
"ScrapedData" subcategories.
SimpleNucleotideVariation_MaskedSomaticMutation
GDC data for category "simple nucleotide variation" and type "masked somatic mutation"
"GenomicData" subcategories.
The standard name of a property can be manually assembled following the aforementioned structure or it can be directly copied from the user interface
TCGADataToolUserInterface
[] in the "Data filtering" and "Data inspection" panes clicking on the Label value in the table summarizing the metadata of the column.
Contents cannot be rendered at this time; please try again later
Properties retrieval
Query
built-in functionality to interact with tabular data
pullDataSlice
custom function to extract property values
Ways to retrieve patient property values.
Use
Query
to extract data for the first 3 patients for property "Clinical::Patient::vital_status":
In[21]:=
Query[;;3,"Clinical","Patient",All,"vital_status"]@
exampleDataTCGA
[{"TCGAProjectData","TCGACESCExceptGenomicDataAllPatients"}]
Out[21]=
{{Alive},{Dead},{Alive}}
Use
pullDataSlice
to extract data for the property "Clinical::Patient::vital_status":
In[23]:=
pullDataSlice

exampleDataTCGA
[{"TCGAProjectData","TCGACESCExceptGenomicDataAllPatients"}],{"Clinical::Patient::vital_status"};;3
Out[23]=
{Clinical::Patient::vital_status{Alive},Clinical::Patient::vital_status{Dead},Clinical::Patient::vital_status{Alive}}
RelatedGuides
▪
TCGA Data Tool
RelatedTechNotes
▪
Data Modeling
▪
Data Visualization
▪
User Interface
""

© 2025 Wolfram. All rights reserved.

  • Legal & Privacy Policy
  • Contact Us
  • WolframAlpha.com
  • WolframCloud.com