Function Repository Resource:

ElevenLabsHighlightSpeak

Source Notebook

Dynamically highlight content as it is spoken

Contributed by: Bob Sandheinrich

ResourceFunction["ElevenLabsHighlightSpeak"][text]

creates an interface highlighting words in text as they are spoken.

ResourceFunction["ElevenLabsHighlightSpeak"][list]

attempts to intelligently speak the elements of list while highlighting parts.

ResourceFunction["ElevenLabsHighlightSpeak"][…,prop]

returns the specified property.

Details and Options

ElevenLabsHighlightSpeak requires an ElevenLabs service connection with a valid API Key. Free trial keys are available by registering at with ElevenLabs.

For non-string inputs, ResourceFunction["ElevenLabsHighlightSpeak"] requires access to an LLM. By default, this is enabled by LLM Kit.

Values supported for the property prop include:

"List"

list with a AudioStream and dynamicly highlighted content

"Interface"

content along with a button for playing the speech

ResourceFunction["ElevenLabsHighlightSpeak"][expr] is equivalent to ResourceFunction["ElevenLabsHighlightSpeak"][expr,"Interface"].

ResourceFunction["ElevenLabsHighlightSpeak"] supports the following options:

"HighlightStyle"

Background→Green

Style specification for the highlighted content

"HighlightSize"

"Word"

granularity to highlight within text

"SpokenStringMethod"

Automatic

how to convert expressions into strings

LLMEvaluator

$LLMEvaluator

service for generating spoken strings with "SpokenStringMethod"→"LLM"

"CacheAudio"

True

whether to cache audio generation

"HighlightSize" supports "Word" or "Character".

"SpokenStringMethod" accepts the following values:

SpokenString

local text creation with SpokenString

{"SplitList",f}

applies f only to non-string components

"LLM"

uses LLMFunction to create spoken strings

arbtrary function

The default "SpokenStringMethod"→Automatic is equivalent to "SpokenStringMethod"→{"SplitList","LLM"}.

In the "Interface" result, the play button becomes a pause button during playback.

Examples

Basic Examples (4)

Create an interface for playing spoken text while it is highlighted:

In[1]:=

Out[1]=

Non-string expressions are highlighted in whole:

In[2]:=

$ResourceFunction[ "ElevenLabsHighlightSpeak"][(-b \[PlusMinus] Sqrt[b^2 - 4 a c])/(2 a) ]$

Out[2]=

Include a mix of text and mathematical expressions:

In[3]:=

$ResourceFunction[ "ElevenLabsHighlightSpeak"][{"this is a list with math ", a x^2/3, ". here is more text, and more math: ", HoldForm[1 + 2 + 3]}]$

Out[3]=

Create an output containing text:

In[4]:=

Out[4]=

In[5]:=

Out[5]=

Use the CellObject as an input:

In[6]:=

Out[6]=

Use the Cell expression:

In[7]:=

Out[7]=

In[8]:=

Out[8]=

Scope (2)

Get the audio stream and highlight content without a pre-built interface:

In[9]:=

Out[9]=

Play the stream to see the highlighting:

In[10]:=

Out[10]=

Speak a combination of text and code:

In[11]:=

Out[11]=

Options (7)

ElevenLabsParameters (2)

Choose a voice from the ElevenLabs service connection:

In[12]:=

Out[12]=

Speak a combination of text and code using the selected voice::

In[13]:=

Out[13]=

HighlightStyle (1)

Control the highlight styling:

In[14]:=

Out[14]=

HighlightSize (2)

Highlight each character instead of each word in the text:

In[15]:=

Out[15]=

Also set the styling to something weird:

In[16]:=

Out[16]=

CacheAudio (2)

By default, audio responses are cached in memory for fast result on repeated requests:

In[17]:=

Out[17]=

In[18]:=

Out[18]=

Turn off the caching:

In[19]:=

Out[19]=

Possible Issues (2)

Cell content that is hard to read as natural language gives strange results:

In[20]:=

Out[20]=

In[21]:=

Out[21]=

Large content is summarized automatically to save time and cost:

In[22]:=

Out[22]=

In[23]:=

Out[23]=

The spoken text is usually not helpful:

In[24]:=

Out[24]=

Version History

1.0.0 – 10 January 2025

Related Resources

Author Notes

Many improvements are possible including both chunking of inputs and prettier styling of highlighted text. I hope to improve this in the future and possibly generalize it to other text-to-speech services. Currently, ElevenLabs gives the best timestamp information through the service connection.

License Information

This work is licensed under a Creative Commons Attribution 4.0 International License