Function Repository Resource:

NCBIGenomicSNPData (1.0.0) current version: 1.1.0 »

Retrieve information on reference SNPs from the NCBI database

Contributed by: Keiko Hirayama

ResourceFunction["NCBIGenomicSNPData"][snp, "VariantDetails"]

gives the dataset of variant information for a specified snp.

ResourceFunction["NCBIGenomicSNPData"][snp, "FrequencyData"]

gives the dataset of allele frequencies for a specified snp.

ResourceFunction["NCBIGenomicSNPData"][snp, "ClinicalSignificance"]

gives the dataset of associated diseases for a specified snp.

Details

The retrieved single nucleotide polymorphism (SNP) report is based on the The Single Nucleotide Polymorphism database (dbSNP) hosted by the National Center for Biotechnology Information (NCBI).
Selected "SNP" entities or dbSNP reference SNP numbers can be used.
The "VariantDetails" content is a Dataset containing known variant placements on genomic sequences with the following properties for each SNP:
"Gene"associated ID
"GeneSymbol"associated gene symbol
"Orientation"orientation of the genomic sequence
"NucleotideSeqAccession"NCBI nucleotide sequence accession ID
"NucleotidePosition"position of the allele on the nucleotide sequence
"DeletedSequence"sequence of deleted nucleotides or the codon
"InsertedSequence"sequence of inserted nucleotides or the codon
"NucleotideVarSequenceOntologyAccession"accession ID of the Sequence Ontology (SO) concept describing the nucleotide sequence variation
"NucleotideVarSequenceOntologyTerm"name of the Sequence Ontology (SO) concept describing the nucleotide sequence variation
"HGVS"Human Genome Variation Society (HGVS) notation
"ProteinSeqAccession"NCBI protein sequence accession ID
"ProteinPosition"position of the amino acid change on the protein sequence
"DeletedAminoAcid"letter of the deleted amino acid
"InsertedAminoAcid"letter of the inserted amino acid
"ProteinVarSequenceOntologyAccession"accession ID of the Sequence Ontology (SO) concept describing the protein sequence variation
"ProteinVarSequenceOntologyTerm"name of the Sequence Ontology (SO) concept describing the protein sequence variation
The "FrequencyData" content is a Dataset of the reference and alternate allele frequencies reported by various studies:
"StudyName"name of the study
"RefSeqAccession"NCBI refrence sequence accession ID
"Position"position of the allele on the reference sequence
"RefAllele"reference allele
"AltAllele"alternate allele
"RefAlleleFrequency"reported reference allele frequency
"AltAlleleFrequency"reported alternate allele frequency
"TotalCount"total sample size
The "ClinicalSignificance" content is a Dataset of clinical significance information from ClinVar associated with the variations:
"AssociatedGenes"associated IDs
"ClinicalSignificance"reported clinical significance
"DiseaseNames"names of associated diseases
"MedGen"associated MedGen concepts
"ClinVarID"associated ClinVar ID
"AlleleID"assigned allele ID reported in ClinVar
"ReviewStatus"assigned review status

Examples

Basic Examples (1) 

For SNP RS429358, which is a genetic variation found in the APOE gene associated with a risk of Alzheimer's disease, list variant details:

In[1]:=
ResourceFunction["NCBIGenomicSNPData"]["RS429358", "VariantDetails"]
Out[1]=

Scope (2) 

Retrieve clinical significance information for a given SNP:

In[2]:=
ResourceFunction["NCBIGenomicSNPData", ResourceVersion->"1.0.0"]["RS748709116", "ClinicalSignificance"]
Out[2]=

Retrieve allele frequency data for a given SNP entity:

In[3]:=
ResourceFunction["NCBIGenomicSNPData"][
 Entity["SNP", "RS1801133"], "FrequencyData"]
Out[3]=

Applications (7) 

Compare the reference and the alternate sequences associated with a given SNP:

In[4]:=
brca = ResourceFunction["NCBIGenomicSNPData"]["80359421", "VariantDetails"]
Out[4]=

Use the ImportFASTA ResourceFunction to retrieve the reference sequence:

In[5]:=
ref = ResourceFunction[
ResourceObject[<|"Name" -> "ImportFASTA", "ShortName" -> "ImportFASTA", "UUID" -> "3ca50e0a-86fb-47da-ba15-8da3cf5eb306", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "Import FASTA data from the NCBI using an NCBI Reference Sequence", "RepositoryLocation" -> URL[
        "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$7dd2d674c168464689de5cc6e8787198`ImportFASTA", "FunctionLocation" -> CloudObject[
        "https://www.wolframcloud.com/obj/cfacdd70-f3a9-45f6-a844-df57fa5c78b7"]|>, ResourceSystemBase -> Automatic]][
    brca[1, "NucleotideSeqAccession"]["ExternalID"]][[2, 1]];
In[6]:=
ref // Short
Out[6]=

Compute the alternate sequence using "NucleotidePosition", "DeletedSequence" and "InsertedSequence" information:

In[7]:=
alt = StringReplacePart[ref, brca[1, "InsertedSequence"], {brca[1, "NucleotidePosition"], brca[1, "NucleotidePosition"] + StringLength[brca[1, "DeletedSequence"]] - 1}];

Use the DNAAlignmentPlot function to visualize the allele position:

In[8]:=
ResourceFunction[
ResourceObject[<|"Name" -> "DNAAlignmentPlot", "ShortName" -> "DNAAlignmentPlot", "UUID" -> "0d48e3b9-eb38-4264-ad69-2f923925d24e", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "Generate a visualization for DNA sequence alignment", "RepositoryLocation" -> URL[
     "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$ae31dc3229904954aa281908a58569b0`DNAAlignmentPlot", "FunctionLocation" -> CloudObject[
     "https://www.wolframcloud.com/obj/c1708b7a-f2c9-4d20-aec2-3b747a7ff356"]|>, ResourceSystemBase -> Automatic]][
 Sequence @@ StringTake[{ref, alt}, {4201, 4300}], Method -> "OneOnOne"]
Out[8]=

Next, explore how this change impacts the part of translated peptide sequences. Apply the BioSequenceTranslate function to the reference sequence to retrieve the sequence of amino acids:

In[9]:=
refPep = BioSequenceTranslate[
  BioSequence[
   StringTake[
    ref, {brca[1, "NucleotidePosition"], brca[1, "NucleotidePosition"] + 14}]]]
Out[9]=

Notice that the reading frame is shifted for the alternate peptide sequence and the stop codon is inserted four amino acids downstream:

In[10]:=
altPep = BioSequenceTranslate[
  BioSequence[
   StringTake[
    alt, {brca[1, "NucleotidePosition"], brca[1, "NucleotidePosition"] + 14}]]]
Out[10]=

Compare the molecule plots:

In[11]:=
GraphicsGrid[{{MoleculePlot3D[refPep], MoleculePlot3D[altPep]}}]
Out[11]=

Requirements

Wolfram Language 13.0 (December 2021) or above

Version History

  • 1.1.0 – 17 January 2025
  • 1.0.0 – 11 September 2024

Source Metadata

Related Resources

License Information