Function Repository Resource:

YouTubeTranscript (1.0.0) current version: 1.1.0 »

Source Notebook

Obtain YouTube transcripts

Contributed by: Anton Antonov

ResourceFunction["YouTubeTranscript"][id]

gets the transcript of the YouTube video with identifier id.

Details

ResourceFunction["YouTubeTranscript"] extracts the captions of the video, if they exist.
The transcript is returned as plain text.
The YouTube Data API has usage quotas.
Not all YouTube videos have automatic or manual captions. If no captions are available, the function returns a message indicating this.
ResourceFunction["YouTubeTranscript"] retrieves and processes the metadata field "captionTracks" from the YouTube Data API.
The field "captionTracks" is an array of objects, where each object represents a single caption track (e.g., for a specific language or type).

Examples

Basic Examples (2) 

Get a video transcript:

In[1]:=
transcript = ResourceFunction["YouTubeTranscript"]["ewU83vHwN8Y"];
transcript // StringLength
Out[2]=

Here is an excerpt:

In[3]:=
SeedRandom[332];
lines = StringSplit[transcript, "\n"];
p = RandomInteger[Length[lines] - 10];
lines[[p ;; p + 10]] // StringRiffle[#, "\n"] &
Out[6]=

Scope (1) 

If the video identifier is not found or the video has no captions, then a Failure object is returned:

In[7]:=
ResourceFunction["YouTubeTranscript"]["89328ewU83vHwN8Y"]
Out[7]=

Applications (3) 

Get a video transcript:

In[8]:=
transcript = ResourceFunction["YouTubeTranscript"]["_yUW-TGGKOc"];

Show the number of characters, words, and lines:

In[9]:=
Clear[TextStats];
TextStats[txt_String] := AssociationThread[{"Characters", "Words", "Lines"}, Through[{StringLength, Length@*TextWords, Length[StringSplit[#, "\n"]] &}[txt]]];
TextStats[transcript]
Out[10]=

Summarize the transcript:

In[11]:=
LLMResourceFunction["Summarize"][transcript]
Out[11]=

Neat Examples (1) 

Get a video transcript and show table of themes:

In[12]:=
transcript = ResourceFunction["YouTubeTranscript"]["_yUW-TGGKOc"];
Clear[GridTableFormFromJSON];
GridTableFormFromJSON[json_String] := ResourceFunction["GridTableForm"][
   Dataset[Association /@ ImportString[StringReplace[json, {"```json" -> "", "```" -> ""}],
        "JSON"]] /. {x_String :> Style[x, FontFamily -> "Times New Roman"]}];
GridTableFormFromJSON[
 LLMResourceFunction["ThemeTableJSON"][transcript, "article", 30]]
Out[15]=

Publisher

Anton Antonov

Requirements

Wolfram Language 14.0 (January 2024) or above

Version History

  • 1.1.0 – 25 April 2025
  • 1.0.0 – 23 April 2025

Related Resources

License Information