FaizonZaman/LexicalCases | Paclet Repository

LexicalCases Overview

Installation	Built-in String Functions
Introduction	Abstractions
LexicalSummary Properties	Conclusion
Searching Files and Services

Installation

This paclet is hosted on the

Wolfram Paclet Repository

. First install the paclet from the

ResourceObject

, then load it with

Needs

Install the paclet and load its definitions with Needs

PacletInstall[ResourceObject["FaizonZaman/LexicalCases"]];Needs["FaizonZaman`LexicalCases`"]

Introduction

Where

StringCases

aims for character patterns and

TextCases

for text patterns, LexicalCases aims for lexical patterns. This is accomplished by expanding the scope of StringExpression where types can be expressed anywhere within. Listed below are the pattern types one can use when searching at the lexical level.

TypeToken ["type"]	Represents a TextContentType
TypeToken [ t 1 \|…\| t i ]	Represents any of the t i
BoundToken [lp]	Represents a lexical pattern bounded by WordBoundary
BoundToken [ lp 1 \|…\| lp i ]	Represents any of the lp i with explicit boundaries
BoundToken [outer,inner]	Represents a string with inner bounded by outer
WordToken [n]	Represents n words separated by whitespace
WordToken [n,m]	Represents n to m words seperated by whitespace
WordToken [n,"KeepContractions"]	Considers contractions as a single word
WordToken [n,m,"KeepContractions"]	Represents n to m words with contractions as single words
OptionalToken [lp]	Represents an optional lexical pattern

Additional pattern constructs made available by the LexicalCases paclet.

Here we can search the Origin Of Species for adverb~adjective patterns ending with "specie" or "species". Note the whitespace string between

TypeToken

["Adjective"]

and

BoundToken

. Whitespace is given "for free" between some tokens like

TypeToken

LexicalCases

returns a

LexicalSummary

object with several properties.

Search the origin of species for a lexical pattern

In[22]:=

oosp=ExampleData[{"Text","OriginOfSpecies"}];

In[25]:=

ols=

LexicalCases

oosp,

TypeToken

["Adverb"]~~

TypeToken

["Adjective"]~~" "~~

BoundToken

["specie"|"species"]

Out[25]=

LexicalSummary

Source: Text

Matches: 74



LexicalSummary Properties

LexicalSummary properties vary in output from data to metadata. One can access various forms of the results, including the data, and their visualization.

"Data"	Returns a list of Associations which contain the match and related metadata like string position
"Dataset"	Returns "Data" as a Dataset
"Counts"	Returns a Dataset of matches and their counts
"CountGroups"	Returns a Dataset of matches grouped by their count
"CountGroupPercentages"	Returns "CountGroups" with the count replaced by percentage
"PartOfSpeechGroups"	Returns a Dataset of unique words in matches grouped by POS
"WordStemGroups"	Returns a Dataset of unique words in matches grouped by word stem
"Source"	Returns a string indicating whether the source is "Wikipedia" or "Text"
"TotalMatchCount"	Returns the total number of matches found
"LexicalStructure"	Returns a text structure diagram of the lexical pattern
"Survey"	Returns a Dashboard of results

LexicalSummary properties

View the list of properties by calling

"Properties"

on the

LexicalSummary

object.

View the list of properties.

In[26]:=

ols["Properties"]

Out[26]=

{Data,Dataset,Counts,CountGroups,CountGroupPercentages,LowercaseCountGroupPercentages,PartOfSpeechGroups,WordStemCountGroups,Source,TotalMatchCount,LexicalStructure,Survey}

LexicalStructure

Use the

"LexicalStructure"

property or

LexicalStructure

function to visualize a lexical pattern.

View a lexical pattern's structure via the "LexicalStructure" property

In[12]:=

ols["LexicalStructure"]

Out[12]=

Adverb

TextType

Adjective

TextType

Text

specie

Text



species

Text

Alternatives

BoundToken

StringExpression

View a lexical pattern's structure via the LexicalStructure function

In[11]:=

LexicalStructure



TypeToken

["Adverb"]~~

TypeToken

["Adjective"]~~" "~~

BoundToken

["specie"|"species"]

Out[11]=

Adverb

TextType

Adjective

TextType

Text

specie

Text



species

Text

Alternatives

BoundToken

StringExpression

Data

Use the

"Data"

and

"Dataset"

properties to extract the results from the summary object.

View search results in a

Dataset

In[27]:=

ols["Dataset"]

Out[27]=

Match	Position
generally extinct species	{9949,9973}
aboriginally distinct species	{27844,27872}
aboriginally distinct species	3 total ›
as well-defined species	{39968,39990}
so many species	{42078,42092}
extraordinarily abnormal species	{43663,43694}
as distinct species	{85564,85582}
as distinct species	16 total ›
as good species	{87408,87422}
as undoubted species	{88427,88446}
as undoubted species	2 total ›
as independent species	{95992,96013}
closely related species	{100728,100750}
closely related species	2 total ›
as incipient species	{102609,102628}
as doubtful species	{107226,107244}
most vigorous species	{122687,122707}
already existing species	{193555,193578}
as new species	{201928,201941}
as new species	2 total ›
nearly extreme species	{221865,221886}
widely diffused species	{224108,224130}
more ancient species	{231849,231868}
more ancient species	2 total ›
most ancient species	{232133,232152}
very distinct species	{303893,303913}
very distinct species	6 total ›
rows 1–20 of 46

Counts

Use

"Counts"

to view the frequency of matches, or use

"CountGroups"

to group matches by frequency.

Return a dataset of matches grouped by their count

In[28]:=

ols["CountGroups"]

Survey

Return a Survey of results

Note that "Survey" returns a lexical dispersion plot. This plot is not particular to the survey property, there are two other ways to produce it .

LexicalDispersion

Quickly get a lexical dispersion plot from a summary object

Display the first 5 examples

Get a lexical dispersion plot of the 5 most common bigrams in the origin of species

Searching Files and Services

Services — Wikipedia

Search over Wikipedia articles containing the keyword "science fiction"

Files

Built-in String Functions

Get example text

Uppercase all verbs in a string

It's also possible to use the operator forms of these functions.

Define a Q operator for matching a lexical pattern

Abstractions

LexicalMap

Map a function over lexical patterns in a string. This is effectively the same as using a LexicalPattern in StringReplace.

LexigramCount

Compute the LexigramCount for a lexical pattern

Conclusion