Wolfram Language Paclet Repository

Community-contributed installable additions to the Wolfram Language

Primary Navigation

    • Cloud & Deployment
    • Core Language & Structure
    • Data Manipulation & Analysis
    • Engineering Data & Computation
    • External Interfaces & Connections
    • Financial Data & Computation
    • Geographic Data & Computation
    • Geometry
    • Graphs & Networks
    • Higher Mathematical Computation
    • Images
    • Knowledge Representation & Natural Language
    • Machine Learning
    • Notebook Documents & Presentation
    • Scientific and Medical Data & Computation
    • Social, Cultural & Linguistic Data
    • Strings & Text
    • Symbolic & Numeric Computation
    • System Operation & Setup
    • Time-Related Computation
    • User Interface Construction
    • Visualization & Graphics
    • Random Paclet
    • Alphabetical List
  • Using Paclets
    • Get Started
    • Download Definition Notebook
  • Learn More about Wolfram Language

TCGADataTool

Guides

  • TCGA Data Tool

Tech Notes

  • Custom Entities
  • Data Exploration
  • Data Modeling
  • Data Visualization
  • Genomic Data
  • Images Download
  • Property Standard Name
  • User Interface

Symbols

  • buildDesignMatrix
  • buildModel
  • cleanRawData
  • columnHeaderRiskClassSummary
  • downloadGenomicData
  • dynamicallyExploreThreshold
  • exampleDataTCGA
  • getHistologicalImages
  • getPotentialPredictors
  • importGenomicDataFile
  • inspectPotentialPredictors
  • modelMeasurementsAtThreshold
  • overallSurvivalPlot
  • progressionFreeSurvivalPlot
  • pullDataSlice
  • radiologicalImagesBatchProcessing
  • swimmerPlot
  • TCGADataToolUserInterface
Genomic Data
Genomic Data walkthrough
DNA methylation download
Example workflow of the use of genomic data from the TCGA Data Tool.
This loads the package.
Needs["JaneShenGunther`TCGADataTool`"]
Genomic Data walkthrough
Summary of the genomic data available for TCGA-CESC project.
Load TCGA-CESC example data structure and its description.
In[84]:=
exampleDataTCGA
[{"TCGAProjectData","TCGACESCFullDataScopePatientSample"},"Description"]
Out[84]=
Example data structure for project TCGA-CESC, including 10 randomly sampled patients and full data scope. This example shows how data is stored and organized under the hood by the TCGADataToolUserInterface[]. Files exported in the .m format from the TCGADataToolUserInterface[] will adhere to this format.
In[85]:=
dataStructure=
exampleDataTCGA
[{"TCGAProjectData","TCGACESCFullDataScopePatientSample"}];

Summary

For each patient Genomic Data is structured as a
List
of
Associations
In[112]:=
Short[#,10]&@dataStructure〚1〛["GenomicData"]["SimpleNucleotideVariation_MaskedSomaticMutation"]
Out[112]//Short=
{HugoGeneSymbolCAMTA1,EntrezGeneID23261,CenterWUGSC,NCBIBuildGRCh38,Chromosomechr1,StartPosition7736540,EndPosition7736540,Strand+,VariantClassificationMissense_Mutation,VariantTypeSNP,ReferenceAlleleG,TumorSeqAllele1G,TumorSeqAllele2A,dbSNP_RSrs1285071931,dbSNP_Val_StatusMissing[],TumorSampleBarcodeTCGA-VS-A8EH-01A-11D-A36J-09,109,HGVSOffsetMissing[],Phenotype0;1,GenePhenotype1,ContextATGGCGGTAAG,TumorBAMUUID9994b31f-bde3-408a-bfb8-626faa375aac,normal_bam_uuid32be9198-0aad-4fd7-ae06-f19cbd44868e,bcr_patient_uuid05026179-b1da-411e-a286-89727b1ae380,GDCFilterMissing[],COSMICMissing[],HotspotN,RNASupportUnknown,RNADepthMissing[],RNARefCountMissing[],RNAAltCountMissing[],Callersmuse;mutect2;varscan2,65,1}
Relevant columns you have access for the masked somatic mutations
In[87]:=
dataStructure〚1〛["GenomicData"]["SimpleNucleotideVariation_MaskedSomaticMutation"]〚1〛//Keys//Multicolumn[#,3]&
Out[87]=
HugoGeneSymbol
Gene
gnomAD_NFE_AF
EntrezGeneID
Feature
gnomAD_OTH_AF
Center
FeatureType
gnomAD_SAS_AF
NCBIBuild
OneConsequence
MAX_AF
Chromosome
Consequence
MAX_AF_POPS
StartPosition
cDNAPosition
gnomAD_non_cancer_AF
EndPosition
CDSPosition
gnomAD_non_cancer_AFR_AF
Strand
ProteinPosition
gnomAD_non_cancer_AMI_AF
VariantClassification
AminoAcids
gnomAD_non_cancer_AMR_AF
VariantType
Codons
gnomAD_non_cancer_ASJ_AF
ReferenceAllele
ExistingVariation
gnomAD_non_cancer_EAS_AF
TumorSeqAllele1
Distance
gnomAD_non_cancer_FIN_AF
TumorSeqAllele2
TranscriptStrand
gnomAD_non_cancer_MID_AF
dbSNP_RS
GeneSymbol
gnomAD_non_cancer_NFE_AF
dbSNP_Val_Status
SymbolSource
gnomAD_non_cancer_OTH_AF
TumorSampleBarcode
HGNCGeneID
gnomAD_non_cancer_SAS_AF
MatchedNormSampleBarcode
Biotype
gnomAD_non_cancer_MAX_AF_adj
MatchNormSeqAllele1
Canonical
gnomAD_non_cancer_MAX_AF_POPS_adj
MatchNormSeqAllele2
CCDS
ClinicalSignificance
TumorValidationAllele1
ENSP
Somatic
TumorValidationAllele2
SwissProt
PubmedID
MatchNormValidationAllele1
TrEMBL
TranscriptionFactors
MatchNormValidationAllele2
UniParc
MotifName
VerificationStatus
UniProtIsoform
MotifPosition
ValidationStatus
RefSeq
HighInformationPositionFlag
MutationStatus
Mane
MotifScoreChange
SequencingPhase
APPRIS
miRNA
SequenceSource
Flags
Impact
ValidationMethod
SIFT
Pick
Score
PolyPhen
VariantClass
BAMFile
EXON
TranscriptSupportLevel
Sequencer
Intron
HGVSOffset
TumorSampleUUID
Domains
Phenotype
MatchedNormSampleUUID
1000G_AF
GenePhenotype
HGVSc
1000G_AFR_AF
Context
HGVSp
1000G_AMR_AF
TumorBAMUUID
HGVSpShort
1000G_EAS_AF
normal_bam_uuid
TranscriptID
1000G_EUR_AF
bcr_patient_uuid
ExonNumber
1000G_SAS_AF
GDCFilter
t_depth
ESP_AA_AF
COSMIC
t_ref_count
ESP_EA_AF
Hotspot
t_alt_count
gnomAD_AF
RNASupport
n_depth
gnomAD_AFR_AF
RNADepth
n_ref_count
gnomAD_AMR_AF
RNARefCount
n_alt_count
gnomAD_ASJ_AF
RNAAltCount
AllEffects
gnomAD_EAS_AF
Callers
Allele
gnomAD_FIN_AF
​
In order to be able to demonstrate the selection of genomic data for different samples we make sure that there are patients who have data relative to multiple aliquots, identified by multiple “TumorSampleBarcode”, resulting from multiple genomic files.
Define variable with patients with multiple sample tested for genomic data
In[88]:=
dataPatientwithmultiplegenomicfiles=Select[Length[Union[Query[All,"TumorSampleBarcode"]@(#["GenomicData"]["SimpleNucleotideVariation_MaskedSomaticMutation"])]]>1&]@dataStructure;​​Length[dataPatientwithmultiplegenomicfiles]
Out[89]=
1

Extract data based on sample type

Define example data structure
In[90]:=
dataWithMultiplegenomics=​​Union[​​dataStructure〚;;3〛,​​(*addingapatientknowntohavemultiplefiles*)​​dataPatientwithmultiplegenomicfiles];
Show different output if restricting to single sample type:
Define sample type
In[91]:=
$sampletype="Primary Tumor";
Example showing how to restrict to a sample type, and displaying differences
In[120]:=
Query[​​All,​​​​​​"bcr_patient_uuid"#["bcr_patient_uuid"],​​​​"Clinical::Patient::weight"Query[First,"weight"]@#["Clinical","Patient"],​​"Clinical::Patient::height"Query[First,"height"]@#["Clinical","Patient"],​​​​(*herethereisnoconstrainonthesampletype*)​​"Impact"(Query[Counts,"Impact"]@#["GenomicData","SimpleNucleotideVariation_MaskedSomaticMutation"]),​​"Impact_high"(Query[Counts,"Impact"]@#["GenomicData","SimpleNucleotideVariation_MaskedSomaticMutation"])["HIGH"],​​​​(*hereweaddconstrainonthesampletypeby"Select[#["sample_type"]==$sampletype&]"*)​​"Impact_in_selected_sampletype"​​Query[Counts,"Impact"]@​​Query[Select[#["sample_type"]$sampletype&],All]@​​Query[All,{
"Impact"
,
"sample_type"
}]@​​JoinAcross[​​(Query[All,{"TumorSampleBarcode",
"Impact"
}]@#["GenomicData","SimpleNucleotideVariation_MaskedSomaticMutation"])​​,​​JoinAcross[(*weneedtofurtherjoinacrossbecauseTumorSampleBarcodeisequivaletto"bcr_aliquot_barcode"andnotsimplyto"bcr_sample_barcode"*)​​(Query[All,{"bcr_sample_barcode",
"sample_type"
}]@#["Biospecimen","Sample"]),​​(Query[All,{"bcr_aliquot_barcode","bcr_sample_barcode"}]@#["Biospecimen","Aliquot"]),​​"bcr_sample_barcode",​​"Outer"]​​,​​"TumorSampleBarcode""bcr_aliquot_barcode"​​],​​​​"Impact_high_in_selected_sampletype"​​(Query[Counts,"Impact"]@​​Query[Select[#["sample_type"]$sampletype&],All]@​​Query[All,{"Impact","sample_type"}]@​​JoinAcross[​​(Query[All,{"TumorSampleBarcode","Impact"}]@#["GenomicData","SimpleNucleotideVariation_MaskedSomaticMutation"])​​,​​JoinAcross[​​(Query[All,{"bcr_sample_barcode","sample_type"}]@#["Biospecimen","Sample"]),​​(Query[All,{"bcr_aliquot_barcode","bcr_sample_barcode"}]@#["Biospecimen","Aliquot"]),​​"bcr_sample_barcode",​​"Outer"]​​,​​"TumorSampleBarcode""bcr_aliquot_barcode"​​])["HIGH"]​​&​​​​]@dataWithMultiplegenomics

Create example data to extend existing design matrix

Define sample type
Example 1: Computation based on all columns for a given patient, determining the total count of high impact mutations.
Determine total high impact
Example 2: Computation based on single mutation
Define the mutation of interest
Determine total high impact mutations

Extend a design matrix

Define the variables for design matrix creation
Create a design matrix
DNA methylation download
Load TCGA-CESC example data structure and its description.

Get methylation data

Workflow functions for methylation data import.
Select project and patients UUID for the example
Select the download folder
Download methylation raw data
Download methylation raw data
Inspect the data

Get human methylation genomic coordinate and merge data

Get data for human methylation genomic coordinates from Wolfram Data Repository.
Merge methylation data with genomic coordinates

Brief data exploration

Create an overlapped histogram of beta values from the two patients
Compare distributions for a specific gene name

© 2025 Wolfram. All rights reserved.

  • Legal & Privacy Policy
  • Contact Us
  • WolframAlpha.com
  • WolframCloud.com