Wolfram Research

Function Repository Resource:

LongestCommonPrefix

Source Notebook

Find the longest common contiguous prefix of a set of strings or lists

Contributed by: Richard Hennigan (Wolfram Research)

ResourceFunction["LongestCommonPrefix"][{"str1","str2",}]

gives the longest string "pre" such that StringStartsQ["stri","pre"] for all of the "stri".

ResourceFunction["LongestCommonPrefix"][{list1,list2,}]

gives the longest list {e1,,en} such that Take[listi,n]==={e1,,en} for all of the listi.

ResourceFunction["LongestCommonPrefix"][a1,a2,]

is equivalent to ResourceFunction["LongestCommonPrefix"][{a1,a2,}].

Details and Options

In ResourceFunction["LongestCommonPrefix"][{a1,a2,}], if no common prefix is found, the empty string "" is returned if the ai are strings, and an empty list is returned if the ai are lists.
ResourceFunction["LongestCommonPrefix"] has the following options:
IgnoreCase False whether to ignore case of letters in strings
Method Automatic algorithm to be used
Setting values for IgnoreCase has no effect when operating on lists.
When using IgnoreCaseTrue, the case of the result will be inherited from the case of "str1".
Possible values for Method include:
"LinearSearch" check sequences one element at a time starting from the left
"BinarySearch" recursively check from the middle of each sequence
The result of ResourceFunction["LongestCommonPrefix"][{seq}] is seq, since seq is a prefix of itself.
ResourceFunction["LongestCommonPrefix"][{}] is undefined and results in an error.

Examples

Basic Examples

Get the longest common prefix of strings:

In[1]:=
ResourceFunction[
 "LongestCommonPrefix"][{"the cat in the hat", "the cat insists"}]
Out[1]=
In[2]:=
ResourceFunction[
 "LongestCommonPrefix"][{"the cat in the hat", "the cat insists", "the catapult is inferior to the trebuchet"}]
Out[2]=

Get the longest common prefix of lists:

In[3]:=
ResourceFunction["LongestCommonPrefix"][{{1, 2, 3}, {1, 2, x}}]
Out[3]=
In[4]:=
ResourceFunction[
 "LongestCommonPrefix"][{{a, b, c}, {a, b, 1, 2}, {a, b, "hello"}}]
Out[4]=
In[5]:=
ResourceFunction[
 "LongestCommonPrefix"][{Fibonacci[Range[3, 10]], Prime[Range[10]]}]
Out[5]=

Scope

Multiple arguments will be interpreted as a list:

In[6]:=
ResourceFunction[
 "LongestCommonPrefix"]["the cat in the hat", "the cat insists", "the \
catastrophe"]
Out[6]=
In[7]:=
ResourceFunction["LongestCommonPrefix"][{1, 2, 3}, {1, 2, x}]
Out[7]=

If given an Association, LongestCommonPrefix uses the values:

In[8]:=
ResourceFunction[
 "LongestCommonPrefix"][<|"a" -> {1, 2, 3}, "b" -> {1, 2, x}|>]
Out[8]=

Options

IgnoreCase

By default, LongestCommonPrefix is case sensitive for strings:

In[9]:=
ResourceFunction["LongestCommonPrefix"]["Testing", "TEST STRING"]
Out[9]=

Set IgnoreCaseTrue to treat uppercase and lowercase letters as equivalent:

In[10]:=
ResourceFunction["LongestCommonPrefix"]["Testing", "TEST STRING", IgnoreCase -> True]
Out[10]=

When ignoring case, the result will match the case of the first string:

In[11]:=
ResourceFunction[
 "LongestCommonPrefix"]["TEST STRING", "Testing", "tested", IgnoreCase -> True]
Out[11]=
In[12]:=
ResourceFunction[
 "LongestCommonPrefix"]["tested", "TEST STRING", "Testing", IgnoreCase -> True]
Out[12]=
In[13]:=
ResourceFunction[
 "LongestCommonPrefix"]["Testing", "tested", "TEST STRING", IgnoreCase -> True]
Out[13]=

Method

Specify that a linear search algorithm should be used:

In[14]:=
ResourceFunction["LongestCommonPrefix"][{"abc", "abx"}, Method -> "LinearSearch"]
Out[14]=

Use a binary search:

In[15]:=
ResourceFunction["LongestCommonPrefix"][{"abc", "abx"}, Method -> "BinarySearch"]
Out[15]=

Choose a search algorithm automatically:

In[16]:=
ResourceFunction["LongestCommonPrefix"][{"abc", "abx"}, Method -> Automatic]
Out[16]=

If prefixes are likely to be very short relative to the sequence length, a linear search can be a little faster:

In[17]:=
Short[strings = "abc" <> # & /@ RandomChoice[Alphabet[], {10000, 1000}]]
Out[17]=
In[18]:=
AbsoluteTiming[
 ResourceFunction["LongestCommonPrefix"][strings, Method -> "LinearSearch"]]
Out[18]=
In[19]:=
AbsoluteTiming[
 ResourceFunction["LongestCommonPrefix"][strings, Method -> "BinarySearch"]]
Out[19]=

Using Automatic resorts to a linear search in this case:

In[20]:=
AbsoluteTiming[
 ResourceFunction["LongestCommonPrefix"][strings, Method -> Automatic]]
Out[20]=

The worse-case performance of a linear search is much worse than a binary search however:

In[21]:=
prefix = StringJoin[ConstantArray[Alphabet[], 50]];
Short[strings = prefix <> # & /@ RandomChoice[Alphabet[], {10000, 1000}]]
Out[18]=
In[22]:=
AbsoluteTiming[
  ResourceFunction["LongestCommonPrefix"][strings, Method -> "LinearSearch"]] // Short
Out[22]=
In[23]:=
AbsoluteTiming[
  ResourceFunction["LongestCommonPrefix"][strings, Method -> "BinarySearch"]] // Short
Out[23]=

Using Automatic resorts to a binary search in this case:

In[24]:=
AbsoluteTiming[
  ResourceFunction["LongestCommonPrefix"][strings, Method -> Automatic]] // Short
Out[24]=

Applications

Find the common root directory for a list of files:

In[25]:=
files = RandomSample[FileNames["*", $InstallationDirectory, 3], 10]
Out[25]=
In[26]:=
root = ResourceFunction["LongestCommonPrefix"][files]
Out[26]=

Show relative file names:

In[27]:=
StringDelete[files, StartOfString ~~ root] // Sort // Column
Out[27]=

Find the common parent directory for a list of cloud objects:

In[28]:=
{CloudObject["path/to/file"], CloudObject["path/to/another/file"]}
Out[28]=
In[29]:=
CloudObject[ResourceFunction["LongestCommonPrefix"][First /@ %]]
Out[29]=

Properties and Relations

For a single string or list, the longest common prefix is simply itself:

In[30]:=
ResourceFunction["LongestCommonPrefix"][{"the cat in the hat"}]
Out[30]=
In[31]:=
ResourceFunction["LongestCommonPrefix"][{Range[5]}]
Out[31]=

For strings, an empty string is returned if no prefix is found:

In[32]:=
ResourceFunction["LongestCommonPrefix"][{"this one", "another one"}]
Out[32]=
In[33]:=
InputForm[%]
Out[33]=

For lists, an empty list is returned if no prefix is found:

In[34]:=
ResourceFunction["LongestCommonPrefix"][{{a, b, c}, {1, 2, 3}}]
Out[34]=

If an empty string appears in the list of strings, the output will always be an empty string:

In[35]:=
ResourceFunction[
 "LongestCommonPrefix"][{"", "the cat in the hat", "the cat insists", "the catastrophe"}]
Out[35]=

If an empty list appears in the list of lists, the output will always be an empty list:

In[36]:=
ResourceFunction["LongestCommonPrefix"][{{1, 2, 3}, {1, 2, x}, {}}]
Out[36]=

LongestCommonPrefix is similar to LongestCommonSubsequence:

In[37]:=
LongestCommonSubsequence[{1, 2, 3, 4, 5, 6}, {1, 2, 3, x, 5, 6}]
Out[37]=
In[38]:=
ResourceFunction[
 "LongestCommonPrefix"][{1, 2, 3, 4, 5, 6}, {1, 2, 3, x, 5, 6}]
Out[38]=

However, LongestCommonPrefix only checks subsequences at the beginning of each list:

In[39]:=
LongestCommonSubsequence[{1, 2, 3, 4, 5, 6}, {1, 2, x, 4, 5, 6}]
Out[39]=
In[40]:=
ResourceFunction[
 "LongestCommonPrefix"][{1, 2, 3, 4, 5, 6}, {1, 2, x, 4, 5, 6}]
Out[40]=

Possible Issues

Lists and strings cannot be mixed:

In[41]:=
ResourceFunction["LongestCommonPrefix"][{"string", {a, b, c}}]
Out[41]=

Since LongestCommonPrefix can return a list or string depending on the input, at least one element must be given to resolve ambiguity:

In[42]:=
ResourceFunction["LongestCommonPrefix"][{}]
Out[42]=
In[43]:=
ResourceFunction["LongestCommonPrefix"][{{}}]
Out[43]=
In[44]:=
ResourceFunction["LongestCommonPrefix"][{""}]
Out[44]=

When setting the option IgnoreCase to True, the order of items can affect the result:

In[45]:=
ResourceFunction["LongestCommonPrefix"][{"testing", "TEST STRING"}, IgnoreCase -> True]
Out[45]=
In[46]:=
ResourceFunction["LongestCommonPrefix"][{"TEST STRING", "testing"}, IgnoreCase -> True]
Out[46]=

The IgnoreCase option has no effect when comparing lists:

In[47]:=
ResourceFunction[
 "LongestCommonPrefix"][{{"A", "B", "C"}, {"a", "b", "x"}}, IgnoreCase -> True]
Out[47]=
In[48]:=
ResourceFunction["LongestCommonPrefix"][{"ABC", "abx"}, IgnoreCase -> True]
Out[48]=

Neat Examples

Find the lengths of longest common prefixes in random binary sequences of length 100:

In[49]:=
Table[Length[
   ResourceFunction["LongestCommonPrefix"][RandomInteger[1, 100], RandomInteger[1, 100]]], 10000] // Histogram
Out[49]=

Compare distributions to LongestCommonSequence and LongestCommonSubsequence:

In[50]:=
ints = RandomInteger[1, {10000, 2, 100}];
Table[Labeled[Histogram[Length /@ f @@@ ints], ToString[f]], {f, {LongestCommonSequence, LongestCommonSubsequence, ResourceFunction["LongestCommonPrefix"]}}]
Out[51]=

Find words that share a prefix with “BirdSay”:

In[52]:=
Select[WordList["KnownWords", IncludeInflections -> True], StringLength[
    ResourceFunction["LongestCommonPrefix"]["BirdSay", #, IgnoreCase -> True]] >= 4 &]
Out[52]=

BirdSay the results:

In[53]:=
ResourceFunction["BirdSay"][Multicolumn[%, 5]]
Out[53]=

Resource History

Related Resources

License Information