Function Repository Resource:

BLASTSearch

Analyze biological sequence similarity using Basic Local Alignment Search Tool

Contributed by: Keiko Hirayama

ResourceFunction["BLASTSearch"][query]

perform a genomic sequence similarity search for a given sequence query.

Details and Options

BLAST (Basic Local Alignment Search Tool) is a tool provided by the NCBI (National Center for Biotechnology Information) for aligning query sequences against those present in a selected target database.

The query can be a raw nucleotide or protein sequence, FASTA formatted sequence, GI (GenInfo) identifier or the accession number for nucleotide/protein sequence.

The following options can be given:

"Program"

"blastn"

BLAST program to access including "blastn", "blastp", "blastx", "tblastn", "tblastx", and "megablast"

"Database"

"core_nt"

BLAST database to access such as "core_nt" and "swissprot"

"Filter"

"mL"

masking off regions of low compositional complexity that may cuase spurious or misleading results ; "F" to disable; "T" or "L" to enable; prepend "m" for masking query while producing seeds used to scan database, but not for extensions (e.g. "mL"); following values are used by default: blastn: "mL" blastp, tblastn, blastx: "F", tblastx: "L"

"ExpectThreshold"

expected number of chance matches in a random model

"RewardPenaltyScores"

{2, -3}

pair of reward and penalty scores for matching and mismatching bases; applicable to blastn and megablast only; allowed combinations of reward and penalty parameters include: {1,-2}, {1,-3}, {2,-3}, {1,-4}, {4,-5}, {1,-1} following values are used by default: blastn: {2, -3} megablast: {1, -2}

"GapCosts"

{5, 2}

pair of positive integers indicating costs to create and extend a gap in an alignment; applicable to blastn, blastp, blastx and tblastn only; allowed pair of parameters include: blastn: {4, 4}, {2, 4}, {0, 4}, {3, 3}, {6, 2}, {5, 2}, {4, 2}, {2, 2} blastp, blastx, tblastn: {11, 2}, {10, 2}, {9, 2}, {8, 2}, {7, 2}, {6, 2}, {13, 1}, {12, 1}, {11, 1}, {10, 1}, {9, 1} following values are used by default: blastn: {5, 2} blastp, tblastn, blastx: {11, 2}

"WordSize"

the length of the seed for initial matches; allowed values include: blastn: 7, 11, 15 megablast: 16, 20, 24, 28, 32, 48, 64 blastp: 3, 5, 6 tblastn, blastx: 2, 3, 5, 6 tblastx: 2, 3 following values are used by default: blastn: 11 megablast: 28 blastp, tblastx: 3, tblastn,blastx:5

"Matrix"

"BLOSUM62"

scoring matrix name; applicable to blastp, blastx, tblastn and tblastx only; allowed values include: "BLOSUM45", "BLOSUM50", "BLOSUM62", "BLOSUM80", "BLOSUM90", "PAM250", "PAM30", or "PAM70"

"CompositionBasedStatistics"

composition based statistics algorithm to use; applicable to blastp, blastx, tblastn and tblastx only; allowed values include: 0, 1, 2, or 3

"ShortQueryAdjust"

False

automatically adjusting parameters for input sequences shorter than 30 bases/residues to improve results; applicable to blastn and blastp only

"Species"

All

specifying the taxon included in the search

MaxItems

100

maximum number of aligned sequences to keep

TimeConstraint

Infinity

maximum computation timeout in seconds

The query result is a Dataset containing details of closely aligned nucleotide or protein sequences with the following properties:

Description

short description of the database sequence

RefSeqAccession

unique accession number assigned to the database sequence

TaxonID

NCBI taxonomy identifier associated with the database sequence

ScientificName

scientific name of the organism associated with the database sequence

Length

length of the database sequence

StartPosition

start position of the aligned sequence

EndPosition

end position of the aligned sequence

NumberOfMatches

total number of sequence overlap

NumberOfGapOpenings

total number of gap openings

Identity

percent of nucleotides or amino acids that are identical between the aligned query and database sequence

Score

alignment score

EValue

number of hits or alignments that are expected to be seen by random chance with the same score or better

Sequence

aligned database sequence

Examples

Basic Examples (2)

Perform a sequence similarity search for a nucleotide sequence:

In[1]:=

ResourceFunction[
"BLASTSearch"]["GCTAGGCCTGAGTCAGCATAGGTTGCTGGCCTTGGTGGGTGTTCTGAGGCTCTACCTGCTCCCCTCGGAA", {MaxItems -> 10}]

Out[1]=

Specify the program, gap costs, reward/penalty scores and species for the sequence similarity search:

In[2]:=

ResourceFunction[
"BLASTSearch"]["TGAGTTTTTCTTAGGCAAGTAAGTGGCTTGGGACTTCGGGAGACAACCTTGTCAAGCACCTAATTGTGCC", {"Program" -> "megablast", "GapCosts" -> {0, 4}, "RewardPenaltyScores" -> {2, -3}, "Species" -> Entity["TaxonomicSpecies", "MusMusculus::y84t7"]}]

Out[2]=

Scope (2)

Perform a sequence similarity search for a nucleotide sequence:

In[3]:=

query = "CTCAAAAGTCTAGAAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCTCCGGGGACACTTTGCGTTCGGGCTGGGAGCA";
seqset = ResourceFunction["BLASTSearch"][query, {MaxItems -> 10}]

Out[4]=

Use the resource function DNAAlignmentPlot to visualize the alignment:

In[5]:=

Out[5]=

Requirements

Wolfram Language 13.0 (December 2021) or above

Version History

1.0.1 – 05 February 2025
1.0.0 – 19 December 2024

Source Metadata

Citation:
- Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410.
- Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J. (1997) "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25:3389-3402.
- Zhang Z., Schwartz S., Wagner L., Miller W. (2000), "A greedy algorithm for aligning DNA sequences" J Comput Biol 2000; 7(1-2):203-14.
- Morgulis A., Coulouris G., Raytselis Y., Madden T.L., Agarwala R., Schaffer A.A. (2008) "Database indexing for production MegaBLAST searches." Bioinformatics 15:1757-1764.
- Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., Madden T.L. (2008) "BLAST+: architecture and applications." BMC Bioinformatics 10:421.
- Boratyn GM, Schaffer AA, Agarwala R, Altschul SF, Lipman DJ, Madden T.L. (2012) "Domain enhanced lookup time accelerated BLAST." Biol Direct. 2012 Apr 17;7:12.
- Boratyn GM, Thierry-Mieg J, Thierry-Mieg D, Busby B, Madden T.L. (2019) "Magic-BLAST, an accurate RNA-seq aligner for long and short reads." BMC Bioinformatics. 2019 Jul 25;20(1):405.
- Camacho C, Boratyn GM, Joukov V, Vera Alvarez R, Madden TL. ElasticBLAST: accelerating sequence search via cloud computing. BMC Bioinformatics. 2023 Mar 26;24(1):117. doi: 10.1186/s12859-023-05245-9.

Related Resources

License Information

This work is licensed under a Creative Commons Attribution 4.0 International License

Wolfram Function Repository

BLASTSearch

Details and Options

Examples

Basic Examples (2)

Scope (2)

Related Links

Requirements

Version History

Source Metadata

Related Resources

License Information

BLASTSearch

Details and Options

Examples

Basic Examples (2)

Scope (2)

Related Links

Requirements

Version History

Source Metadata

Related Resources

Related Symbols

License Information