Function Repository Resource:

ImportFASTA

Source Notebook

Import FASTA data from the NCBI

Contributed by: Brendan Elli and Keiko Hirayama

ResourceFunction["ImportFASTA"][seqref,database]

imports FASTA data for the specified seqref from the NCBI database and returns its "Header" and "Sequence" elements combined into a list.

Details

"Nucleotide" and "Protein" are the supported values for database.
FASTA formatted nucleotide and protein sequences are retrieved from the databases provided by the NCBI (National Center for Biotechnology Information).
Nucleotide sequences can be queried by their NCBI Nucleotide Reference Sequence accession number, GenBank database nucleotide sequence accession number, RCSB PDB (Research Collaboratory for Structural Bioinformatics Protein Data Bank) accession number.
Protein sequences can be queried by their NCBI Protein Reference Sequence accession number, GenBank database protein sequence accession number, RCSB PDB accession number or UniProt (Universal Protein Resource) name or accession number.
ResourceFunction["ImportFASTA"][seqref] is equivalent to ResourceFunction["ImportFASTA"][seqref,"Nucleotide"].

Examples

Basic Examples (2) 

Import a simple NCBI Reference Sequence and give the raw header and sequence:

In[1]:=
Short[mitoc = ResourceFunction["ImportFASTA"]["NC_013993", "Nucleotide"], 5]
Out[1]=

Use the chaos game representation to visualize this genome:

In[2]:=
srules = {"U" -> "T", Except[Characters["ACGT"]] -> ""};
In[3]:=
ResourceFunction["FCGRImage"][StringReplace[mitoc[[2, 1]], srules], 7]
Out[3]=

Retrieve a protein sequence for a UniProt protein:

In[4]:=
ResourceFunction["ImportFASTA"]["TP53B_HUMAN", "Protein"]
Out[4]=

Applications (2) 

Retrieve protein sequences for cytochrome C from various organisms:

In[5]:=
prot = ResourceFunction["ImportFASTA"][#, "Protein"] & /@ {"NP_001039526.1", "NP_001385227.1", "NP_001123442.1", "XP_069973776.1", "XP_064766840.1", "WP_276305918.1", "XP_059855190.1", "XP_068969445.1"}
Out[5]=

Use the PhylogeneticTreePlot resource function to generate the phylogenetic tree:

In[6]:=
ResourceFunction[
ResourceObject[<|"Name" -> "PhylogeneticTreePlot", "ShortName" -> "PhylogeneticTreePlot", "UUID" -> "562d05d8-fc55-4fe9-beb8-4e6746b1f1da", "ResourceType" -> "Function", "Version" -> "4.0.1", "Description" -> "Plot a dendrogram for a set of genome nucleotide sequences", "RepositoryLocation" -> URL[
     "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$22a295ca301946a0b4a3927b3f4ab819`PhylogeneticTreePlot", "FunctionLocation" -> CloudObject[
     "https://www.wolframcloud.com/obj/7a024d7d-ed87-4a84-8ab6-02b9992bde2b"]|>, ResourceSystemBase -> Automatic]][prot[[All, 2, 1]],
  Flatten@StringCases[prot[[All, 1, 1]], "[" ~~ sp__ ~~ "]" :> sp]]
Out[6]=

Version History

  • 2.0.0 – 18 December 2024
  • 1.0.0 – 10 July 2019

Related Resources

License Information