Wolfram Language Paclet Repository

Community-contributed installable additions to the Wolfram Language

Primary Navigation

    • Cloud & Deployment
    • Core Language & Structure
    • Data Manipulation & Analysis
    • Engineering Data & Computation
    • External Interfaces & Connections
    • Financial Data & Computation
    • Geographic Data & Computation
    • Geometry
    • Graphs & Networks
    • Higher Mathematical Computation
    • Images
    • Knowledge Representation & Natural Language
    • Machine Learning
    • Notebook Documents & Presentation
    • Scientific and Medical Data & Computation
    • Social, Cultural & Linguistic Data
    • Strings & Text
    • Symbolic & Numeric Computation
    • System Operation & Setup
    • Time-Related Computation
    • User Interface Construction
    • Visualization & Graphics
    • Random Paclet
    • Alphabetical List
  • Using Paclets
    • Get Started
    • Download Definition Notebook
  • Learn More about Wolfram Language

TCGADataTool

Guides

  • TCGA Data Tool

Tech Notes

  • Property Standard Name
  • Custom Entities
  • Data Exploration
  • Data Modeling
  • User Interface
  • Data Visualization
  • Genomic Data
  • Images Download

Symbols

  • buildDesignMatrix
  • buildModel
  • cleanRawData
  • columnHeaderRiskClassSummary
  • downloadGenomicData
  • dynamicallyExploreThreshold
  • exampleDataTCGA
  • getHistologicalImages
  • getPotentialPredictors
  • importGenomicDataFile
  • inspectPotentialPredictors
  • modelMeasurementsAtThreshold
  • overallSurvivalPlot
  • progressionFreeSurvivalPlot
  • pullDataSlice
  • radiologicalImagesBatchProcessing
  • swimmerPlot
  • TCGADataToolUserInterface
Genomic Data
Genomic Data walkthrough
DNA methylation download
Example workflow of the use of genomic data from the TCGA Data Tool.
This loads the package.
Needs["JaneShenGunther`TCGADataTool`"]
Genomic Data walkthrough
Summary of the genomic data available for TCGA-CESC project.
Load TCGA-CESC example data structure and its description.
In[72]:=
exampleDataTCGA
[{"TCGAProjectData","TCGACESCFullDataScopePatientSample"},"Description"]
Out[72]=
Example data structure for project TCGA-CESC, including 10 randomly sampled patients and full data scope. This example shows how data is stored and organized under the hood by the TCGADataToolUserInterface[]. Files exported in the .m format from the TCGADataToolUserInterface[] will adhere to this format.
In[4]:=
dataStructure=
exampleDataTCGA
[{"TCGAProjectData","TCGACESCFullDataScopePatientSample"}];

Summary

For each patient Genomic Data is structured as a dataset
In[62]:=
Dataset[dataStructure〚1〛["GenomicData"]["SimpleNucleotideVariation_MaskedSomaticMutation"]]
Out[62]=
HugoGeneSymbol
EntrezGeneID
Center
NCBIBuild
Chromosome
StartPosition
EndPosition
Strand
VariantClassification
VariantType
CAMTA1
23261
WUGSC
GRCh38
chr1
7736540
7736540
+
Missense_Mutat
ion
SNP
DEPDC1
55635
WUGSC
GRCh38
chr1
68479301
68479301
+
Missense_Mutat
ion
SNP
MFSD14A
64645
WUGSC
GRCh38
chr1
100082009
100082009
+
Missense_Mutat
ion
SNP
HIPK1
204851
WUGSC
GRCh38
chr1
113973045
113973045
+
Missense_Mutat
ion
SNP
GJA8
2703
WUGSC
GRCh38
chr1
147908810
147908810
+
Silent
SNP
CHIT1
1118
WUGSC
GRCh38
chr1
203225849
203225849
+
Missense_Mutat
ion
SNP
TTC13
79573
WUGSC
GRCh38
chr1
230924915
230924915
+
Silent
SNP
RYR2
6262
WUGSC
GRCh38
chr1
237783900
237783900
+
Missense_Mutat
ion
SNP
RYR2
6262
WUGSC
GRCh38
chr1
237793889
237793889
+
Missense_Mutat
ion
SNP
KIF26B
55083
WUGSC
GRCh38
chr1
245686014
245686014
+
Missense_Mutat
ion
SNP
SLC4A1AP
22950
WUGSC
GRCh38
chr2
27664394
27664394
+
Silent
SNP
PRADC1
84279
WUGSC
GRCh38
chr2
73228936
73228936
+
Missense_Mutat
ion
SNP
RANBP2
5903
WUGSC
GRCh38
chr2
108758413
108758413
+
Missense_Mutat
ion
SNP
NFE2L2
4780
WUGSC
GRCh38
chr2
177234071
177234071
+
Missense_Mutat
ion
SNP
ALS2
57679
WUGSC
GRCh38
chr2
201744425
201744425
+
Missense_Mutat
ion
SNP
FARP2
9855
WUGSC
GRCh38
chr2
241492946
241492946
+
Silent
SNP
OR5H1
26341
WUGSC
GRCh38
chr3
98133006
98133006
+
Silent
SNP
TMCC1
23023
WUGSC
GRCh38
chr3
129670520
129670520
+
Missense_Mutat
ion
SNP
ECE2
9718
WUGSC
GRCh38
chr3
184276690
184276690
+
Silent
SNP
ZBTB49
166793
WUGSC
GRCh38
chr4
4320939
4320939
+
Missense_Mutat
ion
SNP
rows 1–20 of
67
columns 1–10 of
140
Data not saved. Save now
Relevant columns you have access for the masked somatic mutations
In[65]:=
dataStructure〚1〛["GenomicData"]["SimpleNucleotideVariation_MaskedSomaticMutation"]〚1〛//Keys//Multicolumn[#,3]&
Out[65]=
HugoGeneSymbol
Gene
gnomAD_NFE_AF
EntrezGeneID
Feature
gnomAD_OTH_AF
Center
FeatureType
gnomAD_SAS_AF
NCBIBuild
OneConsequence
MAX_AF
Chromosome
Consequence
MAX_AF_POPS
StartPosition
cDNAPosition
gnomAD_non_cancer_AF
EndPosition
CDSPosition
gnomAD_non_cancer_AFR_AF
Strand
ProteinPosition
gnomAD_non_cancer_AMI_AF
VariantClassification
AminoAcids
gnomAD_non_cancer_AMR_AF
VariantType
Codons
gnomAD_non_cancer_ASJ_AF
ReferenceAllele
ExistingVariation
gnomAD_non_cancer_EAS_AF
TumorSeqAllele1
Distance
gnomAD_non_cancer_FIN_AF
TumorSeqAllele2
TranscriptStrand
gnomAD_non_cancer_MID_AF
dbSNP_RS
GeneSymbol
gnomAD_non_cancer_NFE_AF
dbSNP_Val_Status
SymbolSource
gnomAD_non_cancer_OTH_AF
TumorSampleBarcode
HGNCGeneID
gnomAD_non_cancer_SAS_AF
MatchedNormSampleBarcode
Biotype
gnomAD_non_cancer_MAX_AF_adj
MatchNormSeqAllele1
Canonical
gnomAD_non_cancer_MAX_AF_POPS_adj
MatchNormSeqAllele2
CCDS
ClinicalSignificance
TumorValidationAllele1
ENSP
Somatic
TumorValidationAllele2
SwissProt
PubmedID
MatchNormValidationAllele1
TrEMBL
TranscriptionFactors
MatchNormValidationAllele2
UniParc
MotifName
VerificationStatus
UniProtIsoform
MotifPosition
ValidationStatus
RefSeq
HighInformationPositionFlag
MutationStatus
Mane
MotifScoreChange
SequencingPhase
APPRIS
miRNA
SequenceSource
Flags
Impact
ValidationMethod
SIFT
Pick
Score
PolyPhen
VariantClass
BAMFile
EXON
TranscriptSupportLevel
Sequencer
Intron
HGVSOffset
TumorSampleUUID
Domains
Phenotype
MatchedNormSampleUUID
1000G_AF
GenePhenotype
HGVSc
1000G_AFR_AF
Context
HGVSp
1000G_AMR_AF
TumorBAMUUID
HGVSpShort
1000G_EAS_AF
normal_bam_uuid
TranscriptID
1000G_EUR_AF
bcr_patient_uuid
ExonNumber
1000G_SAS_AF
GDCFilter
t_depth
ESP_AA_AF
COSMIC
t_ref_count
ESP_EA_AF
Hotspot
t_alt_count
gnomAD_AF
RNASupport
n_depth
gnomAD_AFR_AF
RNADepth
n_ref_count
gnomAD_AMR_AF
RNARefCount
n_alt_count
gnomAD_ASJ_AF
RNAAltCount
AllEffects
gnomAD_EAS_AF
Callers
Allele
gnomAD_FIN_AF
​
In order to be able to demonstrate the selection of genomic data for different samples we make sure that there are patients who have data relative to multiple aliquots, identified by multiple “TumorSampleBarcode”, resulting from multiple genomic files.
Define variable with patients with multiple sample tested for genomic data
In[66]:=
dataPatientwithmultiplegenomicfiles=Select[Length[Union[Query[All,"TumorSampleBarcode"]@(#["GenomicData"]["SimpleNucleotideVariation_MaskedSomaticMutation"])]]>1&]@dataStructure;​​Length[dataPatientwithmultiplegenomicfiles]
Out[67]=
1

Extract data based on sample type

Define example data structure
In[1]:=
dataWithMultiplegenomics=​​Union[​​dataStructure〚;;3〛,​​(*addingapatientknowntohavemultiplefiles*)​​dataPatientwithmultiplegenomicfiles];
Show different output if restricting to single sample type:
Define sample type
In[1]:=
$sampletype="Primary Tumor";
Example showing how to restrict to a sample type, and displaying differences

Create example data to extend existing design matrix

Define sample type
Example 1: Computation based on all columns for a given patient, determining the total count of high impact mutations.
Determine total high impact
Example 2: Computation based on single mutation
Define the mutation of interest
Determine total high impact mutations

Extend a design matrix

Extend a design matrix using the proper option of buildDesignMatrix "AdditionalPatientInformation"
Define the variables for design matrix creation
Create a design matrix
DNA methylation download
Load TCGA-CESC example data structure and its description.

Get methylation data

Workflow functions for methylation data import.
Select project and patients UUID for the example
Select the download folder
Download methylation raw data
Download methylation raw data
Inspect the data

Get human methylation genomic coordinate and merge data

Get data for human methylation genomic coordinates from Wolfram Data Repository.
Merge methylation data with genomic coordinates

Brief data exploration

Create an overlapped histogram of beta values from the two patients
Compare distributions for a specific gene name

© 2026 Wolfram. All rights reserved.

  • Legal & Privacy Policy
  • Contact Us
  • WolframAlpha.com
  • WolframCloud.com