Wolfram Language Paclet Repository

Community-contributed installable additions to the Wolfram Language

Primary Navigation

    • Cloud & Deployment
    • Core Language & Structure
    • Data Manipulation & Analysis
    • Engineering Data & Computation
    • External Interfaces & Connections
    • Financial Data & Computation
    • Geographic Data & Computation
    • Geometry
    • Graphs & Networks
    • Higher Mathematical Computation
    • Images
    • Knowledge Representation & Natural Language
    • Machine Learning
    • Notebook Documents & Presentation
    • Scientific and Medical Data & Computation
    • Social, Cultural & Linguistic Data
    • Strings & Text
    • Symbolic & Numeric Computation
    • System Operation & Setup
    • Time-Related Computation
    • User Interface Construction
    • Visualization & Graphics
    • Random Paclet
    • Alphabetical List
  • Using Paclets
    • Get Started
    • Download Definition Notebook
  • Learn More about Wolfram Language

LexicalCases

Guides

  • LexicalCases

Tech Notes

  • LexicalCases Overview

Symbols

  • BoundToken
  • CountSummaryLowercase
  • DataJoin
  • ExpandPattern
  • FormatLexicalPattern
  • HideMissing
  • LexicalCases
  • LexicalDispersionPlot
  • LexicalDispersionSmoothHistogram
  • LexicalMap
  • LexicalPattern
  • LexicalPatternQ
  • LexicalStructure
  • LexicalSummary
  • LexigramCount
  • MaxCategories
  • OptionalToken
  • Sandwich
  • StopWordQ
  • SynonymToken
  • TextType
  • ToLexicalPattern
  • TypeToken
  • WordToken
  • $FilterableProperties
  • $LexicalCasesServices
  • $SampleParagraph
  • $SampleSentence
  • $SampleStringExpression
LexicalCases Overview
​
Installation
Built-in String Functions
Introduction
Abstractions
LexicalSummary Properties
Conclusion
Searching Files and Services
​
Installation
This paclet is hosted on the
Wolfram Paclet Repository
. First install the paclet from the
ResourceObject
, then load it with
Needs
.
Install the paclet and load its definitions with Needs
PacletInstall[ResourceObject["FaizonZaman/LexicalCases"]];​​Needs["FaizonZaman`LexicalCases`"]
Introduction
Where
StringCases
aims for character patterns and
TextCases
for text patterns, LexicalCases aims for lexical patterns. This is accomplished by expanding the scope of StringExpression where types can be expressed anywhere within. Listed below are the pattern types one can use when searching at the lexical level.
TypeToken
["type"]
Represents a
TextContentType
TypeToken
[
t
​
1
|…|
t
​
i
]
Represents any of the
t
i
BoundToken
[lp]
Represents a lexical pattern bounded by
WordBoundary
BoundToken
[
lp
​
1
|…|
lp
​
i
]
Represents any of the
lp
​
i
with explicit boundaries
BoundToken
[outer,inner]
Represents a string with
inner
bounded by
outer
WordToken
[n]
Represents
n
words separated by whitespace
WordToken
[n,m]
Represents
n
to
m
words seperated by whitespace
WordToken
[n,"KeepContractions"]
Considers contractions as a single word
WordToken
[n,m,"KeepContractions"]
Represents
n
to
m
words with contractions as single words
OptionalToken
[lp]
Represents an optional lexical pattern
Additional pattern constructs made available by the LexicalCases paclet.
Here we can search the Origin Of Species for adverb~adjective patterns ending with "specie" or "species". Note the whitespace string between
TypeToken
["Adjective"]
and
BoundToken
. Whitespace is given "for free" between some tokens like
TypeToken
.
LexicalCases
returns a
LexicalSummary
object with several properties.
Search the origin of species for a lexical pattern
In[22]:=
oosp=ExampleData[{"Text","OriginOfSpecies"}];
In[25]:=
ols=
LexicalCases
oosp,
TypeToken
["Adverb"]~~
TypeToken
["Adjective"]~~" "~~
BoundToken
["specie"|"species"]
Out[25]=
LexicalSummary
Source: Text
Matches: 74

LexicalSummary Properties
LexicalSummary properties vary in output from data to metadata. One can access various forms of the results, including the data, and their visualization.
"Data"
Returns a list of Associations which contain the match and related metadata like string position
"Dataset"
Returns
"Data"
as a
Dataset
"Counts"
Returns a
Dataset
of matches and their counts
"CountGroups"
Returns a
Dataset
of matches grouped by their count
"CountGroupPercentages"
Returns "CountGroups" with the count replaced by percentage
"PartOfSpeechGroups"
Returns a
Dataset
of unique words in matches grouped by POS
"WordStemGroups"
Returns a
Dataset
of unique words in matches grouped by word stem
"Source"
Returns a string indicating whether the source is "Wikipedia" or "Text"
"TotalMatchCount"
Returns the total number of matches found
"LexicalStructure"
Returns a text structure diagram of the lexical pattern
"Survey"
Returns a Dashboard of results
LexicalSummary properties
View the list of properties by calling
"Properties"
on the
LexicalSummary
object.
View the list of properties.
In[26]:=
ols["Properties"]
Out[26]=
{Data,Dataset,Counts,CountGroups,CountGroupPercentages,LowercaseCountGroupPercentages,PartOfSpeechGroups,WordStemCountGroups,Source,TotalMatchCount,LexicalStructure,Survey}

LexicalStructure

Use the
"LexicalStructure"
property or
LexicalStructure
function to visualize a lexical pattern.
View a lexical pattern's structure via the "LexicalStructure" property
In[12]:=
ols["LexicalStructure"]
Out[12]=
Adverb
TextType
Adjective
TextType
Text
specie
Text

species
Text
Alternatives
BoundToken
StringExpression
View a lexical pattern's structure via the LexicalStructure function
In[11]:=
LexicalStructure

TypeToken
["Adverb"]~~
TypeToken
["Adjective"]~~" "~~
BoundToken
["specie"|"species"]
Out[11]=
Adverb
TextType
Adjective
TextType
Text
specie
Text

species
Text
Alternatives
BoundToken
StringExpression

Data

Use the
"Data"
and
"Dataset"
properties to extract the results from the summary object.
View search results in a
Dataset
.
In[27]:=
ols["Dataset"]
Out[27]=
Match
Position
generally extinct species
{9949,9973}
aboriginally distinct species
{27844,27872}
3 total ›
as well-defined species
{39968,39990}
so many species
{42078,42092}
extraordinarily abnormal species
{43663,43694}
as distinct species
{85564,85582}
16 total ›
as good species
{87408,87422}
as undoubted species
{88427,88446}
2 total ›
as independent species
{95992,96013}
closely related species
{100728,100750}
2 total ›
as incipient species
{102609,102628}
as doubtful species
{107226,107244}
most vigorous species
{122687,122707}
already existing species
{193555,193578}
as new species
{201928,201941}
2 total ›
nearly extreme species
{221865,221886}
widely diffused species
{224108,224130}
more ancient species
{231849,231868}
2 total ›
most ancient species
{232133,232152}
very distinct species
{303893,303913}
6 total ›
rows 1–20 of
46

Counts

Use
"Counts"
to view the frequency of matches, or use
"CountGroups"
to group matches by frequency.
Return a dataset of matches grouped by their count
In[28]:=
ols["CountGroups"]

Survey

Return a Survey of results
Note that "Survey" returns a lexical dispersion plot. This plot is not particular to the survey property, there are two other ways to produce it .

LexicalDispersion

Quickly get a lexical dispersion plot from a summary object
Display the first 5 examples
Get a lexical dispersion plot of the 5 most common bigrams in the origin of species
Searching Files and Services

Services — Wikipedia

Search over Wikipedia articles containing the keyword "science fiction"

Files

​
Built-in String Functions
Get example text
Uppercase all verbs in a string
It's also possible to use the operator forms of these functions.
Define a Q operator for matching a lexical pattern
Abstractions

LexicalMap

Map a function over lexical patterns in a string. This is effectively the same as using a LexicalPattern in StringReplace.

LexigramCount

Compute the LexigramCount for a lexical pattern
Compute the LexigramCount for a lexical pattern
Conclusion

© 2025 Wolfram. All rights reserved.

  • Legal & Privacy Policy
  • Contact Us
  • WolframAlpha.com
  • WolframCloud.com