Function Repository Resource:

LinguaStopwords

Source Notebook

Stopwords for multiple languages

Contributed by: Anton Antonov

ResourceFunction["LinguaStopwords"][lang]

gives stopwords of the language lang.

Details

ResourceFunction["LinguaStopwords"] provides stopwords for 58 languages.
ResourceFunction["LinguaStopwords"] works with entities.
The stopwords of all languages can be retrieved with the argument All.

Examples

Basic Examples (4) 

Armenian stopwords:

In[1]:=
ResourceFunction["LinguaStopwords"]["Armenian"]
Out[1]=

Bulgarian stopwords:

In[2]:=
ResourceFunction["LinguaStopwords"]["Bulgarian"]
Out[2]=

Hindi stopwords:

In[3]:=
ResourceFunction["LinguaStopwords"]["Hindi"]
Out[3]=

Zulu stopwords:

In[4]:=
ResourceFunction["LinguaStopwords"]["Zulu"]
Out[4]=

Scope (2) 

LinguaStopwords works with entities. Here is an example with a language entity:

In[5]:=
Shallow@ResourceFunction["LinguaStopwords"][
  Entity["Language", "Bulgarian::xmr5j"]]
Out[5]=

Here is an example with a country entity:

In[6]:=
Shallow@ResourceFunction["LinguaStopwords"][
  Entity["Country", "Romania"]]
Out[6]=

The stopwords for all languages can be obtained with the argument All:

In[7]:=
Length /@ ResourceFunction["LinguaStopwords"][All]
Out[7]=

Applications (2) 

Remove the stopwords from a text and show the top word counts:

In[8]:=
TakeLargest[#, 20] &@GroupBy[#, Identity, Length] &@
 Select[ToLowerCase@
   TextWords[ExampleData[{"Text", "UNHumanRightsRussian"}]], ! MemberQ[ResourceFunction["LinguaStopwords"]["Russian"], #] &]
Out[8]=

Show the top word counts without stopword removal:

In[9]:=
TakeLargest[#, 20] &@GroupBy[#, Identity, Length] &@
 ToLowerCase@TextWords[ExampleData[{"Text", "UNHumanRightsRussian"}]]
Out[9]=

Neat Examples (1) 

Here is a Pareto principle adherence plot of the sizes of stopword collections:

In[10]:=
ResourceFunction["ParetoPrinciplePlot"][
 Length /@ ResourceFunction["LinguaStopwords"][All], PlotRange -> All,
  ImageSize -> Large]
Out[10]=

Publisher

Anton Antonov

Version History

  • 1.0.0 – 26 April 2022

Related Resources

License Information