Function Repository Resource:

WordPhoneticSyllabify

Source Notebook

Segment an English word and its phonetic form into syllables following phonetic rules

Contributed by: Mark Greenberg

ResourceFunction["WordPhoneticSyllabify"][w]

returns a list containing both w and its phonetic form with syllable delimiters.

Details and Options

ResourceFunction["WordPhoneticSyllabify"] takes the following options:
AllowMissingFalsewhether to return Missing for an unknown word when there is no OpenAI key.
Stress"Phonetic"whether to mark primary and secondary stress ( ˈ and ˌ ) for phonetics, or on and off beats ( ↑ and ↓ ) for poetic meter.
With the option AllowMissing set to False (default), whenever ResourceFunction["WordPhoneticSyllabify"] encounters w such that WordData[w,"Phonetic Form"]==Missing["NotAvailable"], the Wolfram system will check whether LLM credentials are set, prompting the user if they are not.
With the AllowMissing option set to True, ResourceFunction["WordPhoneticSyllabify"] returns Missing when there is no OpenAI key saved and otherwise uses an LLM to resolve the unknown word's phonetics. This may be preferable when mapping WordPhoneticSyllabify over a list of words.
The "Stress" option, when set to "Metric", changes the phonetic part of the output from having primary and secondary stress marks (ˈ and ˌ) common in phonetics, to having marks for stressed and unstressed syllables (↑ and ↓) for the analysis of poetic meter.
You can check whether an API Key is set by evaluating SystemCredential["OPENAI_API_KEY"]. If needed, the key can be set using SystemCredential["OPENAI_API_KEY"]="your-key-here".
ResourceFunction["WordPhoneticSyllabify"] places a bullet character • between each syllable.
ResourceFunction["WordPhoneticSyllabify"] syllabifies the two parts of compound words separately.
ResourceFunction["WordPhoneticSyllabify"] divides syllables using the Maximal Onset Principal, and does not take morphology into consideration.
ResourceFunction["WordPhoneticSyllabify"] is case-insensitive and returns results in lower case.
The second item of the output consists of phonetic symbols, some of which can be confused with keyboard characters including ɡ (LatinSmallLetterScriptG), ˈ (ModifierLetterVerticalLine), and ˌ ("ModifierLetterLowVerticalLine").
Phonetic output of ResourceFunction["WordPhoneticSyllabify"][w] is General American English consistent with WordData[w,"PhoneticForm"], even if w is not English or includes characters that are not part of the English alphabet.
Though similar, phonetic syllabification follows different rules than either hyphenation or morphologic syllabification, the most notable difference being consonant sounds that attach to the beginning of syllables if phonetically feasible (e.g., strˈaɪ•kɪŋ instead of strˈaɪk•ɪŋ).
ResourceFunction["WordPhoneticSyllabify"] has the attribute Listable.

Examples

Basic Examples (1) 

Find the phonetic syllable breaks in a word:

In[1]:=
ResourceFunction["WordPhoneticSyllabify"]["computational"]
Out[1]=

Scope (2) 

Apply WordPhoneticSyllabify[w] to a list of words:

In[2]:=
ResourceFunction[
 "WordPhoneticSyllabify"][{"Once", "upon", "a", "midnight", "dreary"}]
Out[2]=

Find the phonetic syllable divisions of a name (as pronounced in English):

In[3]:=
ResourceFunction["WordPhoneticSyllabify"]["Galadriel"]
Out[3]=

Options (1) 

Find the stressed and unstressed syllables for poetic meter by changing the "Stress" option to "Metric":

In[4]:=
ResourceFunction["WordPhoneticSyllabify"]["alertness", Stress -> "Metric"]
Out[4]=

Applications (3) 

Determine whether words rhyme by comparing the last syllable:

In[5]:=
words = {"terrain", "remain"};
phons = ResourceFunction["WordPhoneticSyllabify"][words][[All, 2]]
Out[6]=
In[7]:=
lastsyllable = StringCases[#, "ˈ" ~~ __ ~~ EndOfString] & /@ phons
Out[7]=
In[8]:=
SameQ[lastsyllable]
Out[8]=

Count the syllables in a line of poetry ("Sonnet Composed At ____ Castle" by William Wordsworth):

In[9]:=
verse = "Degenerate Douglas! oh, the unworthy Lord!";
phons = (ResourceFunction["WordPhoneticSyllabify"] /@ TextWords[verse])[[All, 1]]
Out[10]=
In[11]:=
StringCount[StringRiffle[phons], " " | "\[Bullet]"] + 1
Out[11]=

Count the feet in the same line of poetry:

In[12]:=
verse = "Degenerate Douglas! oh, the unworthy Lord!";
phons = (ResourceFunction["WordPhoneticSyllabify"][#, Stress -> "Metric"] & /@ TextWords[verse])[[All, 2]]
Out[13]=
In[14]:=
StringCount[StringRiffle[phons], "\[UpArrow]"]
Out[14]=

Properties and Relations (3) 

WordData can give a word's phonetic form without the syllable breaks:

In[15]:=
words = {"elephant", "cheetah", "hippopotamus"};
<|"WordData" -> <|(# -> WordData[#, "PhoneticForm"] &) /@ words|>,
   "WordPhoneticSyllabify" -> <|(# -> ResourceFunction["WordPhoneticSyllabify"][#][[2]] &) /@ words|>
   |> // Transpose // Dataset
Out[16]=

WordData can also provide the hyphenation breaks of a word, but notice that they are not always the same as phonetic syllable breaks:

In[17]:=
words = {"elephant", "cheetah", "hippopotamus"};
<|"WordData" -> <|(# -> WordData[#, "Hyphenation"] &) /@ words|>,
   "WordPhoneticSyllabify" -> <|(# -> ResourceFunction["WordPhoneticSyllabify"][#][[1]] &) /@ words|>
   |> // Transpose // Dataset
Out[18]=

WordData returns Missing for many proper nouns, poetic words, nonsense words, etc., while WordPhoneticSyllabify uses AI to supply the missing data:

In[19]:=
words = {"Lenore", "surcease", "Jabberwocky"};
<|"WordData" -> <|(# -> WordData[#, "PhoneticForm"] &) /@ words|>,
   "WordPhoneticSyllabify" -> <|(# -> ResourceFunction["WordPhoneticSyllabify"][#] &) /@ words|>
   |> // Transpose // Dataset
Out[20]=

Possible Issues (1) 

WordPhoneticSyllabify does make mistakes, mostly in the placement of syllable breaks, on about 1% of Modern English words and about 2% of Elizabethan English words:

In[21]:=
ResourceFunction["WordPhoneticSyllabify"][{"braceleted", "quietus"}]
Out[21]=

Neat Examples (1) 

Use WordPhoneticSyllabify as the foundation for scansion of a line of poetry (here, Emily Dickinson's 656th poem):

In[22]:=
words = TextWords["I started Early\[Dash]Took my Dog\[Dash]"];
sylData = ResourceFunction["WordPhoneticSyllabify"][#, Stress -> "Metric"] & /@
    words;
{abcSyl, phoSyl} = Table[Flatten[
    StringSplit[#, "\[Bullet]"] & /@ sylData[[All, i]]], {i, 2}];
rawSeq = Prepend[ReplaceAll[#, {a_ /; StringContainsQ[a, "\[UpArrow]"] -> "\[UpArrow]", a_ /; StringContainsQ[a, "\[DownArrow]"] -> "\[DownArrow]", _ ->
         "?"}] & /@ phoSyl, "<"];
seq = FixedPoint[SequenceReplace[#, {
      {"\[UpArrow]", "\[UpArrow]" | "?", "\[UpArrow]"} -> Sequence["\[UpArrow]", "\[DownArrow]", "\[UpArrow]"],
      {"<", "\[UpArrow]" | "?", "\[UpArrow]"} -> Sequence["\[DownArrow]", "\[UpArrow]"]
      }] &,
   rawSeq];
Grid[{seq /. {"\[UpArrow]" -> "/", "\[DownArrow]" -> "\[Cup]"}, abcSyl}]
Out[23]=

Publisher

Mark Greenberg

Requirements

Wolfram Language 13.0 (December 2021) or above

Version History

  • 1.0.1 – 15 July 2024
  • 1.0.0 – 28 February 2024

Source Metadata

Related Resources

Author Notes

When using WordPhoneticSyllabify[w, Stress"Metric"][[2]] as the basis for scanning a poem, keep in mind that some of the syllables may not have either mark (↑ or ↓). The stress of such unmarked syllables needs to be determined in the context of the surrounding text. The surrounding text can also change the stress of a word. For example, most single-syllable words are stressed when taken alone, but a run of three stressed syllables, in most cases, destresses the middle one. (See the rules applied in the entry in the "Neat Examples" section.)

License Information