Function Repository Resource:

VCFRefSNPAnnotation

Source Notebook

Annotate the provided VCF dataset with matching RefSNP identifiers

Contributed by: Keiko Hirayama

ResourceFunction["VCFRefSNPAnnotation"][dataset]

returns the VCF dataset with RefSNP annotations.

Details and Options

VCFRefSNPAnnotation identifies the matching RefSNPs for the provided Variant Call Format (VCF) dataset.
VCF data can be given in a form of a dataset, tabular, association, data or plaintext.
This option can be given:
"Assembly""GCF_000001405.40"specify the genomic collection accession for the GenBank or RefSeq assembly; accession for the latest human genome assembly GRCh38.p14 is given by default

Examples

Basic Examples (2) 

Identify the RefSNP IDs for the provided VCF data:

In[1]:=
ResourceFunction[
 "VCFRefSNPAnnotation"]["CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\n1\t69134\t2205837\tA\tG\t.\t.\tALLELEID=2193183\n1\t69314\t3205580\tT\tG\t.\t.\tALLELEID=3374047\n1\t69423\t3205581\tG\tA\t.\t.\tALLELEID=3374048"]
Out[1]=

Get the VCF dataset:

In[2]:=
vcfdataset = CloudGet[
CloudObject[
  "https://www.wolframcloud.com/obj/d3c284ef-be89-4863-b51a-4821679759c8"]]
Out[2]=

Identify the RefSNP IDs for the provided VCF dataset:

In[3]:=
ResourceFunction["VCFRefSNPAnnotation"][vcfdataset]
Out[3]=

Scope (2) 

Get the VCF data in tabular format:

In[4]:=
vcftabular = CloudGet[
CloudObject[
  "https://www.wolframcloud.com/obj/414e5064-5ae5-4cb6-a16f-f285065740c6"]]
Out[4]=

Identify the RefSNP IDs for the provided VCF data:

In[5]:=
ResourceFunction["VCFRefSNPAnnotation"][vcftabular]
Out[5]=

Options (1) 

Assembly (1) 

Identify the matching RefSNPs for the VCF dataset based on the assembly GRCh37:

In[6]:=
dat37 = {{"CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO"}, {"Y", 249350, "2659854", "G", "A", ".", ".", "."}, {"Y", 535121, "3036689", "C", "CT", ".", ".", "."}, {"Y", 535121, "3036935", "C", "CTT", ".", ".", "."}, {"Y", 535121, "265992", "C",
     "G", ".", ".", "."}, {"Y", 535121, "3035933", "CT", "C", ".", ".",
     "."}, {"Y", 535122, "3036731", "T", "TTTG", ".", ".", "."}, {"Y",
     535123, "191363", "T", "TTTG", ".", ".", "."}, {"Y", 535124, "2659855", "T", "G", ".", ".", "."}, {"Y", 535173, "3571558", "T",
     "G", ".", ".", "."}, {"Y", 535258, "191362", "C", "A", ".", ".", "."}};
In[7]:=
ResourceFunction["VCFRefSNPAnnotation"][dat37, "Assembly" -> "GCF_000001405.13"]
Out[7]=

Applications (4) 

Annotate the variants dataset with matching RefSNP identifiers:

In[8]:=
daty = CloudGet@CloudObject[
   "https://www.wolframcloud.com/obj/328cc910-e173-48bc-9cb7-aa144f6e4081"];
In[9]:=
datysnp = ResourceFunction["VCFRefSNPAnnotation"][daty]
Out[9]=

Compute the distribution of variant positions on Chromosome Y:

In[10]:=
Histogram[datysnp[All, "POS"], 20]
Out[10]=

Use the NCBIGenomicSNPData resource function to get more information on individual identified variations:

In[11]:=
ResourceFunction[
ResourceObject[<|"Name" -> "NCBIGenomicSNPData", "ShortName" -> "NCBIGenomicSNPData", "UUID" -> "6f2d5756-cc3b-42f3-932f-bc6d163c3291", "ResourceType" -> "Function", "Version" -> "1.2.0", "Description" -> "Retrieve information on reference SNPs from the NCBI database", "RepositoryLocation" -> URL[
     "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$1d851075280648f1af03817a40603a5b`NCBIGenomicSNPData", "FunctionLocation" -> CloudObject[
     "https://www.wolframcloud.com/obj/80bb5f4c-465e-43cf-adb0-70a1b0655549"]|>, ResourceSystemBase -> Automatic]][datysnp[[15, 3]]]
Out[11]=

Find clinical significance:

In[12]:=
ResourceFunction[
ResourceObject[<|"Name" -> "NCBIGenomicSNPData", "ShortName" -> "NCBIGenomicSNPData", "UUID" -> "6f2d5756-cc3b-42f3-932f-bc6d163c3291", "ResourceType" -> "Function", "Version" -> "1.2.0", "Description" -> "Retrieve information on reference SNPs from the NCBI database", "RepositoryLocation" -> URL[
     "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$1d851075280648f1af03817a40603a5b`NCBIGenomicSNPData", "FunctionLocation" -> CloudObject[
     "https://www.wolframcloud.com/obj/80bb5f4c-465e-43cf-adb0-70a1b0655549"]|>, ResourceSystemBase -> Automatic]][
 datysnp[[15, 3]], "ClinicalSignificance"]
Out[12]=

Requirements

Wolfram Language 14.0 (January 2024) or above

Version History

  • 1.0.0 – 23 June 2025

Source Metadata

Related Resources

License Information