Function Repository Resource:

LongestCommonPrefix

Source Notebook

Find the longest common contiguous prefix of a set of strings or lists

Contributed by: Richard Hennigan (Wolfram Research)

ResourceFunction["LongestCommonPrefix"][{"str1","str2",}]

gives the longest string "pre" such that StringStartsQ["stri","pre"] for all of the "stri".

ResourceFunction["LongestCommonPrefix"][{list1,list2,}]

gives the longest list {e1,,en} such that Take[listi,n]==={e1,,en} for all of the listi.

ResourceFunction["LongestCommonPrefix"][a1,a2,]

is equivalent to ResourceFunction["LongestCommonPrefix"][{a1,a2,}].

Details and Options

In ResourceFunction["LongestCommonPrefix"][{a1,a2,}], if no common prefix is found, the empty string "" is returned if the ai are strings, and an empty list is returned if the ai are lists.
ResourceFunction["LongestCommonPrefix"] has the following options:
IgnoreCaseFalsewhether to ignore case of letters in strings
MethodAutomaticalgorithm to be used
Setting values for IgnoreCase has no effect when operating on lists.
When using IgnoreCaseTrue, the case of the result will be inherited from the case of "str1".
Possible values for Method include:
"LinearSearch"check sequences one element at a time starting from the left
"BinarySearch"recursively check from the middle of each sequence
The result of ResourceFunction["LongestCommonPrefix"][{seq}] is seq, since seq is a prefix of itself.
ResourceFunction["LongestCommonPrefix"][{}] is undefined and results in an error.

Examples

Basic Examples (2) 

Get the longest common prefix of strings:

In[1]:=
ResourceFunction[
 "LongestCommonPrefix"][{"the cat in the hat", "the cat insists"}]
Out[1]=
In[2]:=
ResourceFunction[
 "LongestCommonPrefix"][{"the cat in the hat", "the cat insists", "the catapult is inferior to the trebuchet"}]
Out[2]=

Get the longest common prefix of lists:

In[3]:=
ResourceFunction["LongestCommonPrefix"][{{1, 2, 3}, {1, 2, x}}]
Out[3]=
In[4]:=
ResourceFunction[
 "LongestCommonPrefix"][{{a, b, c}, {a, b, 1, 2}, {a, b, "hello"}}]
Out[4]=
In[5]:=
ResourceFunction[
 "LongestCommonPrefix"][{Fibonacci[Range[3, 10]], Prime[Range[10]]}]
Out[5]=

Scope (2) 

Multiple arguments will be interpreted as a list:

In[6]:=
ResourceFunction[
 "LongestCommonPrefix"]["the cat in the hat", "the cat insists", "the catastrophe"]
Out[6]=
In[7]:=
ResourceFunction["LongestCommonPrefix"][{1, 2, 3}, {1, 2, x}]
Out[7]=

If given an Association, LongestCommonPrefix uses the values:

In[8]:=
ResourceFunction[
 "LongestCommonPrefix"][<|"a" -> {1, 2, 3}, "b" -> {1, 2, x}|>]
Out[8]=

Options (7) 

IgnoreCase (2) 

By default, LongestCommonPrefix is case sensitive for strings:

In[9]:=
ResourceFunction["LongestCommonPrefix"]["Testing", "TEST STRING"]
Out[9]=

Set IgnoreCaseTrue to treat uppercase and lowercase letters as equivalent:

In[10]:=
ResourceFunction["LongestCommonPrefix"]["Testing", "TEST STRING", IgnoreCase -> True]
Out[10]=

When ignoring case, the result will match the case of the first string:

In[11]:=
ResourceFunction[
 "LongestCommonPrefix"]["TEST STRING", "Testing", "tested", IgnoreCase -> True]
Out[11]=
In[12]:=
ResourceFunction[
 "LongestCommonPrefix"]["tested", "TEST STRING", "Testing", IgnoreCase -> True]
Out[12]=
In[13]:=
ResourceFunction[
 "LongestCommonPrefix"]["Testing", "tested", "TEST STRING", IgnoreCase -> True]
Out[13]=

Method (5) 

Specify that a linear search algorithm should be used:

In[14]:=
ResourceFunction["LongestCommonPrefix"][{"abc", "abx"}, Method -> "LinearSearch"]
Out[14]=

Use a binary search:

In[15]:=
ResourceFunction["LongestCommonPrefix"][{"abc", "abx"}, Method -> "BinarySearch"]
Out[15]=

Choose a search algorithm automatically:

In[16]:=
ResourceFunction["LongestCommonPrefix"][{"abc", "abx"}, Method -> Automatic]
Out[16]=

If prefixes are likely to be very short relative to the sequence length, a linear search can be a little faster:

In[17]:=
Short[strings = "abc" <> # & /@ RandomChoice[Alphabet[], {10000, 1000}]]
Out[17]=
In[18]:=
AbsoluteTiming[
 ResourceFunction["LongestCommonPrefix"][strings, Method -> "LinearSearch"]]
Out[18]=
In[19]:=
AbsoluteTiming[
 ResourceFunction["LongestCommonPrefix"][strings, Method -> "BinarySearch"]]
Out[19]=

Using Automatic resorts to a linear search in this case:

In[20]:=
AbsoluteTiming[
 ResourceFunction["LongestCommonPrefix"][strings, Method -> Automatic]]
Out[20]=

The worse-case performance of a linear search is much worse than a binary search however:

In[21]:=
prefix = StringJoin[ConstantArray[Alphabet[], 50]];
Short[strings = prefix <> # & /@ RandomChoice[Alphabet[], {10000, 1000}]]
Out[18]=
In[22]:=
AbsoluteTiming[
  ResourceFunction["LongestCommonPrefix"][strings, Method -> "LinearSearch"]] // Short
Out[22]=
In[23]:=
AbsoluteTiming[
  ResourceFunction["LongestCommonPrefix"][strings, Method -> "BinarySearch"]] // Short
Out[23]=

Using Automatic resorts to a binary search in this case:

In[24]:=
AbsoluteTiming[
  ResourceFunction["LongestCommonPrefix"][strings, Method -> Automatic]] // Short
Out[24]=

Applications (2) 

Find the common root directory for a list of files:

In[25]:=
files = RandomSample[FileNames["*", $InstallationDirectory, 3], 10]
Out[25]=
In[26]:=
root = ResourceFunction["LongestCommonPrefix"][files]
Out[26]=

Show relative file names:

In[27]:=
StringDelete[files, StartOfString ~~ root] // Sort // Column
Out[27]=

Find the common parent directory for a list of cloud objects:

In[28]:=
{CloudObject["path/to/file"], CloudObject["path/to/another/file"]}
Out[28]=
In[29]:=
CloudObject[ResourceFunction["LongestCommonPrefix"][First /@ %]]
Out[29]=

Properties and Relations (6) 

For a single string or list, the longest common prefix is simply itself:

In[30]:=
ResourceFunction["LongestCommonPrefix"][{"the cat in the hat"}]
Out[30]=
In[31]:=
ResourceFunction["LongestCommonPrefix"][{Range[5]}]
Out[31]=

For strings, an empty string is returned if no prefix is found:

In[32]:=
ResourceFunction["LongestCommonPrefix"][{"this one", "another one"}]
Out[32]=
In[33]:=
InputForm[%]
Out[33]=

For lists, an empty list is returned if no prefix is found:

In[34]:=
ResourceFunction["LongestCommonPrefix"][{{a, b, c}, {1, 2, 3}}]
Out[34]=

If an empty string appears in the list of strings, the output will always be an empty string:

In[35]:=
ResourceFunction[
 "LongestCommonPrefix"][{"", "the cat in the hat", "the cat insists", "the catastrophe"}]
Out[35]=

If an empty list appears in the list of lists, the output will always be an empty list:

In[36]:=
ResourceFunction["LongestCommonPrefix"][{{1, 2, 3}, {1, 2, x}, {}}]
Out[36]=

LongestCommonPrefix is similar to LongestCommonSubsequence:

In[37]:=
LongestCommonSubsequence[{1, 2, 3, 4, 5, 6}, {1, 2, 3, x, 5, 6}]
Out[37]=
In[38]:=
ResourceFunction[
 "LongestCommonPrefix"][{1, 2, 3, 4, 5, 6}, {1, 2, 3, x, 5, 6}]
Out[38]=

However, LongestCommonPrefix only checks subsequences at the beginning of each list:

In[39]:=
LongestCommonSubsequence[{1, 2, 3, 4, 5, 6}, {1, 2, x, 4, 5, 6}]
Out[39]=
In[40]:=
ResourceFunction[
 "LongestCommonPrefix"][{1, 2, 3, 4, 5, 6}, {1, 2, x, 4, 5, 6}]
Out[40]=

Possible Issues (4) 

Lists and strings cannot be mixed:

In[41]:=
ResourceFunction["LongestCommonPrefix"][{"string", {a, b, c}}]
Out[41]=

Since LongestCommonPrefix can return a list or string depending on the input, at least one element must be given to resolve ambiguity:

In[42]:=
ResourceFunction["LongestCommonPrefix"][{}]
Out[42]=
In[43]:=
ResourceFunction["LongestCommonPrefix"][{{}}]
Out[43]=
In[44]:=
ResourceFunction["LongestCommonPrefix"][{""}]
Out[44]=

When setting the option IgnoreCase to True, the order of items can affect the result:

In[45]:=
ResourceFunction["LongestCommonPrefix"][{"testing", "TEST STRING"}, IgnoreCase -> True]
Out[45]=
In[46]:=
ResourceFunction["LongestCommonPrefix"][{"TEST STRING", "testing"}, IgnoreCase -> True]
Out[46]=

The IgnoreCase option has no effect when comparing lists:

In[47]:=
ResourceFunction[
 "LongestCommonPrefix"][{{"A", "B", "C"}, {"a", "b", "x"}}, IgnoreCase -> True]
Out[47]=
In[48]:=
ResourceFunction["LongestCommonPrefix"][{"ABC", "abx"}, IgnoreCase -> True]
Out[48]=

Neat Examples (2) 

Find the lengths of longest common prefixes in random binary sequences of length 100:

In[49]:=
Table[Length[
   ResourceFunction["LongestCommonPrefix"][RandomInteger[1, 100], RandomInteger[1, 100]]], 10000] // Histogram
Out[49]=

Compare distributions to LongestCommonSequence and LongestCommonSubsequence:

In[50]:=
ints = RandomInteger[1, {10000, 2, 100}];
Table[Labeled[Histogram[Length /@ f @@@ ints], ToString[f]], {f, {LongestCommonSequence, LongestCommonSubsequence, ResourceFunction["LongestCommonPrefix"]}}]
Out[51]=

Find words that share a prefix with "BirdSay":

In[52]:=
Select[WordList["KnownWords", IncludeInflections -> True], StringLength[
    ResourceFunction["LongestCommonPrefix"]["BirdSay", #, IgnoreCase -> True]] >= 4 &]
Out[52]=

BirdSay the results:

In[53]:=
ResourceFunction["BirdSay"][Multicolumn[%, 5]]
Out[53]=

Version History

  • 1.0.0 – 26 September 2019

Related Resources

License Information