Compute the overlap fraction between two strings or biosequences

Contributed by:
Soutick Saha

ResourceFunction["SequenceOverlapFraction"][ returns a list of the total length of overlap between |

ResourceFunction["SequenceOverlapFraction"] performs a sequence alignment in order to calculate the total length of the common elements of *seq*_{1} and *seq*_{2}. The result is divided by the lengths of *seq*_{1} and *seq*_{2} respectively to obtain the overlap fraction.

The first and second element of the list corresponds to *seq*_{1} and *seq*_{2} respectively.

ResourceFunction["SequenceOverlapFraction"] accepts all options of SequenceAlignment.

Compute the overlap fraction between two strings:

In[1]:= |

Out[1]= |

Get the overlap fraction between two BioSequences:

In[2]:= |

Out[2]= |

Obtain different overlap fractions for the two sequences:

SequenceAlignment of the two sequences returns the following output:

In[3]:= |

Out[3]= |

Extract the common elements of the alignment:

In[4]:= |

Out[4]= |

Find the total length of the common elements:

In[5]:= |

Out[5]= |

Find the lengths of the input sequences:

In[6]:= |

Out[6]= |

Divide the overlap length by the sequence lengths to obtain the overlap fractions:

In[7]:= |

Out[7]= |

SequenceOverlapFraction gives the same result directly:

In[8]:= |

Wolfram Language 14.0 (January 2024) or above

- 1.0.0 – 28 August 2024

This work is licensed under a Creative Commons Attribution 4.0 International License