Function Repository Resource:

MostFrequentKStringDistance

Source Notebook

Calculate a distance metric between two strings based on the occurrences of their top-k most frequent characters

Contributed by: Haomin Yang

ResourceFunction["MostFrequentKStringDistance"][str1, str2,k]

gives the distance between strings str1 and str2 based on the top k frequent characters.

ResourceFunction["MostFrequentKStringDistance"][str1,str2,k,max]

uses max as the base distance from which similarity is subtracted.

Details

ResourceFunction["MostFrequentKStringDistance"] computes the distance by identifying the k most frequent characters in each string.
ResourceFunction["MostFrequentKStringDistance"] sums the counts of characters that appear in the top k list of both strings.
This similarity sum is subtracted from max (default 100) to return the final distance.
If k is larger than the number of unique characters in a string, all characters are used.
The comparison is case-sensitive.

Examples

Basic Examples (1) 

Compute the distance between two protein-like sequences:

In[1]:=
s1 = "LCLYTH";
s2 = "PYYTI";
ResourceFunction["MostFrequentKStringDistance"][s1, s2, 2, 100]
Out[3]=

Scope (2) 

Changing K affects the calculated distance:

In[4]:=
ResourceFunction["MostFrequentKStringDistance"]["apple", "pear", 1]
Out[4]=
In[5]:=
ResourceFunction["MostFrequentKStringDistance"]["apple", "pear", 5]
Out[5]=

Comparing identical strings with high frequency overlap results in a lower distance (or negative if max is exceeded):

In[6]:=
ResourceFunction[
 "MostFrequentKStringDistance"]["AAAAABBB", "AAAAABBB", 2, 100]
Out[6]=

Applications (1) 

Find the "closest" string in a list based on top-1 frequency:

In[7]:=
target = "111223";
candidates = {"33344", "11155", "22288"};
MinimalBy[candidates, ResourceFunction["MostFrequentKStringDistance"][target, #, 1] &]
Out[8]=

Publisher

138 Aspen

Requirements

Wolfram Language 14.0 (January 2024) or above

Version History

  • 1.0.0 – 16 January 2026

Related Resources

License Information