Function Repository Resource:

WordPhoneticSyllabify (1.0.0) current version: 1.0.1 »

Source Notebook

Segment an English word and its phonetic form into syllables following phonetic rules

Contributed by: Mark Greenberg

ResourceFunction["WordPhoneticSyllabify"][w]

returns a list containing both w and its phonetic form with syllable delimiters.

Details and Options

ResourceFunction["WordPhoneticSyllabify"] can use LLMExampleFunction for some words if there is an OpenAI API Key set in the system. You can check whether an API Key is set by evaluating SystemCredential["OPENAI_API_KEY"]. If needed, the key can be set using SystemCredential["OPENAI_API_KEY"]="your-key-here".
ResourceFunction["WordPhoneticSyllabify"] places a bullet character • between each syllable.
ResourceFunction["WordPhoneticSyllabify"] syllabifies the two parts of compound words separately.
ResourceFunction["WordPhoneticSyllabify"] divides syllables using the Maximal Onset Principal, and does not take morphology into consideration.
ResourceFunction["WordPhoneticSyllabify"] is case-insensitive and returns results in lower case.
The second item of the output consists of phonetic symbols, some of which can be confused with keyboard characters including ɡ (LatinSmallLetterScriptG), ˈ (ModifierLetterVerticalLine), and ˌ ("ModifierLetterLowVerticalLine").
Phonetic output of ResourceFunction["WordPhoneticSyllabify"][w] is Standard Modern English consistent with WordData[w,"PhoneticForm"], even if w is not English or includes characters that are not part of the English alphabet.
The "Stress" option, when set to "Metric", changes the phonetic part of the output from having primary and secondary stress marks (ˈ and ˌ) common in phonetics, to having marks for stressed and unstressed syllables (↑ and ↓) for the analysis of poetic meter.
Though similar, phonetic syllabification follows different rules than either hyphenation or morphologic syllabification, the most notable difference being consonant sounds that attach to the beginning of syllables if phonetically feasible (e.g., strˈaɪ•kɪŋ instead of strˈaɪk•ɪŋ).
ResourceFunction["WordPhoneticSyllabify"] has the attribute Listable.

Examples

Basic Examples (1) 

Find the phonetic syllable breaks in a word:

In[1]:=
ResourceFunction["WordPhoneticSyllabify"]["computational"]
Out[1]=

Scope (2) 

Apply WordPhoneticSyllabify[w] to a list of words:

In[2]:=
ResourceFunction["WordPhoneticSyllabify", ResourceVersion->"1.0.0"][{"Once", "upon", "a", "midnight", "dreary"}]
Out[2]=

Find the phonetic syllable divisions of a name (as pronounced in English):

In[3]:=
ResourceFunction["WordPhoneticSyllabify"]["Galadriel"]
Out[3]=

Options (1) 

Find the stressed and unstressed syllables for poetic meter by changing the "Stress" option to "Metric":

In[4]:=
ResourceFunction["WordPhoneticSyllabify"]["alertness", Stress -> "Metric"]
Out[4]=

Applications (3) 

Determine whether words rhyme by comparing the last syllable:

In[5]:=
words = {"terrain", "remain"};
phons = ResourceFunction["WordPhoneticSyllabify"][words][[All, 2]]
Out[6]=
In[7]:=
lastsyllable = StringCases[#, "ˈ" ~~ __ ~~ EndOfString] & /@ phons
Out[7]=
In[8]:=
SameQ[lastsyllable]
Out[8]=

Count the syllables in a line of poetry ("Sonnet Composed At ____ Castle" by William Wordsworth):

In[9]:=
verse = "Degenerate Douglas! oh, the unworthy Lord!";
phons = (ResourceFunction["WordPhoneticSyllabify"] /@ TextWords[verse])[[All, 1]]
Out[10]=
In[11]:=
StringCount[StringRiffle[phons], " " | "\[Bullet]"] + 1
Out[11]=

Count the feet in the same line of poetry:

In[12]:=
verse = "Degenerate Douglas! oh, the unworthy Lord!";
phons = (ResourceFunction["WordPhoneticSyllabify"][#, Stress -> "Metric"] & /@ TextWords[verse])[[All, 2]]
Out[13]=
In[14]:=
StringCount[StringRiffle[phons], "\[UpArrow]"]
Out[14]=

Properties and Relations (3) 

WordData can give a word's phonetic form without the syllable breaks:

In[15]:=
words = {"elephant", "cheetah", "hippopotamus"};
<|"WordData" -> <|(# -> WordData[#, "PhoneticForm"] &) /@ words|>,
   "WordPhoneticSyllabify" -> <|(# -> ResourceFunction["WordPhoneticSyllabify"][#][[2]] &) /@ words|>
   |> // Transpose // Dataset
Out[16]=

WordData can also provide the hyphenation breaks of a word, but notice that they are not always the same as phonetic syllable breaks:

In[17]:=
words = {"elephant", "cheetah", "hippopotamus"};
<|"WordData" -> <|(# -> WordData[#, "Hyphenation"] &) /@ words|>,
   "WordPhoneticSyllabify" -> <|(# -> ResourceFunction["WordPhoneticSyllabify"][#][[1]] &) /@ words|>
   |> // Transpose // Dataset
Out[18]=

WordData returns Missing for many proper nouns, poetic words, nonsense words, etc., while WordPhoneticSyllabify uses AI to supply the missing data:

In[19]:=
words = {"Lenore", "surcease", "Jabberwocky"};
<|"WordData" -> <|(# -> WordData[#, "PhoneticForm"] &) /@ words|>,
   "WordPhoneticSyllabify" -> <|(# -> ResourceFunction["WordPhoneticSyllabify"][#] &) /@ words|>
   |> // Transpose // Dataset
Out[20]=

Possible Issues (1) 

WordPhoneticSyllabify does make mistakes, mostly in the placement of syllable breaks, on about 1% of Modern English words and about 2% of Elizabethan English words:

In[21]:=
ResourceFunction["WordPhoneticSyllabify"][{"braceleted", "quietus"}]
Out[21]=

Neat Examples (1) 

Use WordPhoneticSyllabify as the foundation for scansion of a line of poetry (here, Emily Dickinson's 656th poem):

In[22]:=
words = TextWords["I started Early\[Dash]Took my Dog\[Dash]"];
sylData = ResourceFunction["WordPhoneticSyllabify"][#, Stress -> "Metric"] & /@
    words;
{abcSyl, phoSyl} = Table[Flatten[
    StringSplit[#, "\[Bullet]"] & /@ sylData[[All, i]]], {i, 2}];
rawSeq = Prepend[ReplaceAll[#, {a_ /; StringContainsQ[a, "\[UpArrow]"] -> "\[UpArrow]", a_ /; StringContainsQ[a, "\[DownArrow]"] -> "\[DownArrow]", _ ->
         "?"}] & /@ phoSyl, "<"];
seq = FixedPoint[SequenceReplace[#, {
      {"\[UpArrow]", "\[UpArrow]" | "?", "\[UpArrow]"} -> Sequence["\[UpArrow]", "\[DownArrow]", "\[UpArrow]"],
      {"<", "\[UpArrow]" | "?", "\[UpArrow]"} -> Sequence["\[DownArrow]", "\[UpArrow]"]
      }] &,
   rawSeq];
Grid[{seq /. {"\[UpArrow]" -> "/", "\[DownArrow]" -> "\[Cup]"}, abcSyl}]
Out[23]=

Publisher

Mark Greenberg

Requirements

Wolfram Language 13.0 (December 2021) or above

Version History

  • 1.0.1 – 15 July 2024
  • 1.0.0 – 28 February 2024

Source Metadata

Related Resources

Author Notes

An API key for OpenAI is needed for WordPhoneticSyllabify to work on words that don't have a phonetic form in WordData. When using WordPhoneticSyllabify[w, Stress"Metric"][[2]] as the basis for scanning a poem, keep in mind that some of the syllables may not have either mark (↑ or ↓). The stress of such unmarked syllables needs to be determined in the context of the surrounding text. The surrounding text can also change the stress of a word. For example, most single-syllable words are stressed when taken alone, but a run of three stressed syllables, in most cases, destresses the middle one. (See the rules applied in the entry in the "Neat Examples" section.)

License Information