Wolfram Language Paclet Repository

Community-contributed installable additions to the Wolfram Language

Primary Navigation

    • Cloud & Deployment
    • Core Language & Structure
    • Data Manipulation & Analysis
    • Engineering Data & Computation
    • External Interfaces & Connections
    • Financial Data & Computation
    • Geographic Data & Computation
    • Geometry
    • Graphs & Networks
    • Higher Mathematical Computation
    • Images
    • Knowledge Representation & Natural Language
    • Machine Learning
    • Notebook Documents & Presentation
    • Scientific and Medical Data & Computation
    • Social, Cultural & Linguistic Data
    • Strings & Text
    • Symbolic & Numeric Computation
    • System Operation & Setup
    • Time-Related Computation
    • User Interface Construction
    • Visualization & Graphics
    • Random Paclet
    • Alphabetical List
  • Using Paclets
    • Get Started
    • Download Definition Notebook
  • Learn More about Wolfram Language

TCGADataTool

Guides

  • TCGA Data Tool

Tech Notes

  • Custom Entities
  • Data Exploration
  • Data Modeling
  • Data Visualization
  • Genomic Data
  • Images Download
  • Property Standard Name
  • User Interface

Symbols

  • buildDesignMatrix
  • buildModel
  • cleanRawData
  • columnHeaderRiskClassSummary
  • downloadGenomicData
  • dynamicallyExploreThreshold
  • exampleDataTCGA
  • getHistologicalImages
  • getPotentialPredictors
  • importGenomicDataFile
  • inspectPotentialPredictors
  • modelMeasurementsAtThreshold
  • overallSurvivalPlot
  • progressionFreeSurvivalPlot
  • pullDataSlice
  • radiologicalImagesBatchProcessing
  • swimmerPlot
  • TCGADataToolUserInterface
Images Download
Histological images
Radiological images
The TCGA Data Tool user interface allows easy retrieval of data for the TCGA projects from GDC and TCIA portals. Data categories such as ScrapedData::HistologicalImages or ScrapedData::RadiologicalImages do not contain actual patient images but only their metadata. Histological images can be downloaded using separate paclet functionalities, while radiological images can be downloaded from the dedicated user interface pane.
This loads the paclet.
In[1]:=
Needs["JaneShenGunther`TCGADataTool`"]
Histological images
Histological images can be downloaded from the GDC portal using the function
getHistologicalImages
.
getHistologicalImages
[dataStructure, patientUUID]
download all histological images for patientUUID
Get example data structure for project TCGA-CESC. Then select a valid patient UUID from that data structure.
In[2]:=
dataStructure=
exampleDataTCGA
[{"TCGAProjectData","TCGACESCExceptGenomicDataAllPatients"}];
In[3]:=
examplePatientUUID="03804F9B-DF7C-462C-8984-8EB3A5ED4999";
Download all histological images for the selected patient UUID.
In[4]:=
getHistologicalImages
[dataStructure,examplePatientUUID]
Mon 27 Feb 2023 14:34:18 Getting histological images for patient UUID 03804F9B-DF7C-462C-8984-8EB3A5ED4999...
Mon 27 Feb 2023 14:34:18 Identified 2 histological images for patient UUID 03804F9B-DF7C-462C-8984-8EB3A5ED4999
Mon 27 Feb 2023 14:34:18 Total histological images file size for patient UUID 03804F9B-DF7C-462C-8984-8EB3A5ED4999:
400.982
MB
Mon 27 Feb 2023 14:34:18 Starting download...
Mon 27 Feb 2023 14:34:18 Downloading file ID 821e284e-0846-466e-862a-3d8e80cc4b3a: file size:
171.884
MB
, file name: TCGA-VS-A9V2-01Z-00-DX1.61AD73DF-4A1D-4398-9558-00ADAF82952A.svs, file experimental strategy: Diagnostic Slide
Mon 27 Feb 2023 14:35:07 Download completed successfully.
Mon 27 Feb 2023 14:35:07 Downloading file ID 89fb2c28-2eba-406c-a687-385e8ccfc468: file size:
229.098
MB
, file name: TCGA-VS-A9V2-01A-01-TS1.84BB47F0-6ECC-4C57-BBFF-7A833624E45E.svs, file experimental strategy: Tissue Slide
Mon 27 Feb 2023 14:36:09 Download completed successfully.
Mon 27 Feb 2023 14:36:09 Download completed.
Out[4]=
{/private/var/folders/9s/ssw_w4pj5tsfxv3r6yrjj1mw0000gn/T/TCGADataTool/03804F9B-DF7C-462C-8984-8EB3A5ED4999/821e284e-0846-466e-862a-3d8e80cc4b3a/TCGA-VS-A9V2-01Z-00-DX1.61AD73DF-4A1D-4398-9558-00ADAF82952A.svs,/private/var/folders/9s/ssw_w4pj5tsfxv3r6yrjj1mw0000gn/T/TCGADataTool/03804F9B-DF7C-462C-8984-8EB3A5ED4999/89fb2c28-2eba-406c-a687-385e8ccfc468/TCGA-VS-A9V2-01A-01-TS1.84BB47F0-6ECC-4C57-BBFF-7A833624E45E.svs}
Radiological images
Radiological images can be downloaded and inspected from the
TCGADataToolUserInterface
. In addition, radiological images can be downloaded and processed in batches using the function
radiologicalImagesBatchProcessing
. In general, the user interface is best suited to inspect radiological images for a small sample of patients, while the batch processing functionality aims at processing several radiological images at once in order to extract meaningful patient measurements to be included in a design matrix.
radiologicalImagesBatchProcessing
[dataStructure, {
patientUUID
1
, ...}]
download and process radiological images
Get example data for TCGA-CESC project and select a set of valid patient UUIDs.
In[29]:=
dataStructure=
exampleDataTCGA
[{"TCGAProjectData","TCGACESCExceptGenomicDataAllPatients"}];
In[79]:=
patientUUIDs={"03804f9b-df7c-462c-8984-8eb3a5ed4999"};
Define processing function to be applied to each radiological image. The function below computes percentage of white pixels in an image.
In[35]:=
ClearAll[computeWhitePercentage];​​computeWhitePercentage[img_Image]:=​​Module[{binaryImg,imgData,whitePixelsCount,pixelsCount},​​​​ binaryImg=Binarize[img];​​ imgData=ImageData[binaryImg];​​ ​​ whitePixelsCount=Count[Flatten[imgData],1];​​ pixelsCount=Length[Flatten[imgData]];​​ ​​ N[whitePixelsCount/pixelsCount]​​]
Batch process all images for the selected patient UUIDs. For each patient, compute the
Mean
percentage of white pixels.
In[80]:=
processingResult=
radiologicalImagesBatchProcessing
[​​dataStructure,​​patientUUIDs,​​"DeleteDownloadedFilesAfterProcessingQ"True,​​"ImageProcessingFunctions"{computeWhitePercentage[#]&},"PatientImageProcessingResultsAggregationFunction"(Mean[#]&)​​]
Mon 27 Feb 2023 18:47:36 Processing patient UUID 03804f9b-df7c-462c-8984-8eb3a5ed4999
Mon 27 Feb 2023 18:47:36 Getting radiological images for patient UUID 03804f9b-df7c-462c-8984-8eb3a5ed4999...
Mon 27 Feb 2023 18:47:36 Identified 11 radiological image series for patient UUID 03804f9b-df7c-462c-8984-8eb3a5ed4999
Mon 27 Feb 2023 18:47:36 Download for patient UUID 03804f9b-df7c-462c-8984-8eb3a5ed4999 started...
Mon 27 Feb 2023 18:47:56 Download for patient UUID 03804f9b-df7c-462c-8984-8eb3a5ed4999 completed.
Mon 27 Feb 2023 18:47:57 Starting processing of 386images...
Mon 27 Feb 2023 18:48:58 Processing of images completed.
Mon 27 Feb 2023 18:48:58 Processing results aggregation...
Mon 27 Feb 2023 18:48:58 Deleted downloaded files for patient UUID 03804f9b-df7c-462c-8984-8eb3a5ed4999
Mon 27 Feb 2023 18:48:58 Done processing for patient UUID 03804f9b-df7c-462c-8984-8eb3a5ed4999
Out[80]=
{bcr_patient_uuid03804f9b-df7c-462c-8984-8eb3a5ed4999,ImageProcessingResult0.27155}
The result could be used as "AdditionalPatientInformation" in
buildDesignMatrix
.
In[81]:=
outcomeProperty="Clinical::Patient::vital_status";​​predictors={"Clinical::Patient::person_neoplasm_cancer_status","Clinical::Patient::eastern_cancer_oncology_group","Biospecimen::Analyte::analyte_type"};
In[94]:=
designMatrix=
buildDesignMatrix
[dataStructure,predictors,outcomeProperty,"GeneralListHandlingFunction"(If[ListQ[#],First[#],#]&),"AdditionalPatientInformation"processingResult,"DeleteAllSame"False];
In[95]:=
designMatrix//Length
Out[95]=
307
In[96]:=
designMatrix〚;;3〛//Dataset
Out[96]=
RelatedGuides
▪
TCGA Data Tool
RelatedTechNotes
▪
User Interface
▪
Data Modeling
""

© 2025 Wolfram. All rights reserved.

  • Legal & Privacy Policy
  • Contact Us
  • WolframAlpha.com
  • WolframCloud.com