FaizonZaman/LexicalCases

Extract lexical patterns from text

Contributed By: Faizon Zaman

Search text for lexical patterns. While similar to StringCases, TextCases etc., content types can be used freely in a StringExpression (as opposed to expressing complex lexical patterns in a Containing wrapper). Files, search index objects, and Wikipedia queries are supported. Results are returned in a summary object which supports several subvalues. Consult the documentation for usage.

Installation Instructions

To install this paclet in your Wolfram Language environment, evaluate this code:
PacletInstall["FaizonZaman/LexicalCases"]

Details

I developed this functionality for a particular work task (Lexical Programmer @ Wolfram|Alpha). I needed to find adjectives that precede certain phrases. I started by gathering article text from Wikipedia of relevant domains, then searched for the desired lexical sequences. The initial approach used TextCases and TextContents with the Containing wrapper, (Containing["AdjectivePhrase",Verbatim["music"]] for example), but it was slow. So I designed some lexical tokens I could use in a StringExpression that could be used with StringCases.
TextCases is used internally to extract examples of a content type from the source text. These examples then replace the TextType in the StringExpression. You can see the lexical pattern generated from a piece of text with ExpandPattern. Note that this means, for example, if you have TextType["Adjective"] in your lexical pattern, the token will be replaced by all text snippets identified as adjectives in the source text.
I’d also like to say thanks to Swastik Banerjee for helping me improve SearchIndexObject support!

Paclet Guide

Examples

Basic Examples (1) 

Search for verb phrases beginning with "Alice" in "Alice in Wonderland":

In[1]:=
alice = ExampleData[{"Text", "AliceInWonderland"}];
In[2]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/f94107e4-a8f0-4f38-8933-cd986148a702"]
Out[2]=
In[3]:=
aliceVbAvb["Dataset"]
Out[3]=
In[4]:=
aliceVbAvb["CountGroups"]
Out[4]=

Scope (2) 

Search for a lexical pattern in Wikipedia articles containing "darwin":

In[5]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/1a20e2fe-25a3-4cc7-9968-176a589f33d0"]
In[6]:=
darwin["CountGroups"]
Out[6]=

Search over index objects:

In[7]:=
index = CreateSearchIndex["ExampleData/Text"]
Out[7]=
In[8]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/7c1a8c0c-b013-468e-885a-83ef50294718"]

Visualize the results in a WordCloud:

In[9]:=
indexResults["WordCloud"]
Out[9]=

Publisher

Faizon Zaman

Disclosures

Compatibility

Wolfram Language Version 12.3

Version History

  • 1.2.1 – 15 September 2022
  • 1.2.0 – 15 September 2022
  • 1.1.2 – 15 September 2022
  • 1.1.0 – 12 September 2022
  • 1.0.5 – 26 July 2022

License Information

MIT License

Paclet Source

Source Metadata