Function Repository Resource:

ImportVCF

Source Notebook

Import files in the VCF format, a bioinformatics standard for storing gene sequence variations

Contributed by: Arnoud Buzing

ResourceFunction["ImportVCF"][file]

imports file as a VCF file.

Details

The bioinformatics Variant Call Format (VCF) is a standard format for storing gene sequence variations, commonly used for DNA sequence analysis.
ResourceFunction["ImportVCF"] encapsulates functionality from the PyVCF3 package.
The data is imported as a Dataset.

Examples

Basic Examples (2) 

Import a VCF sample file from GitHub:

In[1]:=
ResourceFunction["ImportVCF"][
 URL["https://raw.githubusercontent.com/vcflib/vcflib/refs/heads/master/samples/10134514.vcf"]]
Out[1]=

Import another sample file:

In[2]:=
ResourceFunction["ImportVCF"][
 URL["https://raw.githubusercontent.com/vcflib/vcflib/refs/heads/master/samples/sample.vcf"]]
Out[2]=

Scope (4) 

Obtain the human variation set for GRCh38:

In[3]:=
archive = URLDownload[
   "https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz", "clinvar.vcf.gz"];

Extract the compressed archive file (contains one file):

In[4]:=
files = ExtractArchive[archive, OverwriteTarget -> True];

Import the VCF file (this takes a few minutes due to the large file size):

In[5]:=
ds = ResourceFunction["ImportVCF"][First[files]];

Examine the first 50 rows of data:

In[6]:=
ds[;; 50]
Out[6]=

Publisher

Arnoud Buzing

Requirements

Wolfram Language 14.0 (January 2024) or above

Version History

  • 1.0.0 – 21 May 2025

Source Metadata

Related Resources

License Information