Basic Examples (2)
Here is a stem or the word "качество":
Here are the stems for a list of words:
Scope (3)
The stem rules currently used by BulgarianStem can be retrieved with the argument "CurrentRules"; here is a sample of the current rules:
Words without recognized suffixes by BulgarianStem are returned unchanged:
The symbol BulgarianStem is overloaded—it takes arguments that allow the control and monitoring of the stem rules that are applied. There are three sets of rules.
The following command sets up the use of the third set with each rule having a frequency (count) of at least 2:
Here is the number of rules (which were just set):
Here is a sample of the rules:
Here are stems of the list of words above using the newly set rules:
Here we restore the default stem rules:
Applications (2)
Finding word stems is one of the fundamental procedures in information retrieval.
Take Bulgarian text from Wikipedia:
Here we get the words from the text:
Here we find the number of occurrences of each word and show the words with the largest counts:
Here we stem the words, find the number of occurrences of each word stem and show the stems with the largest counts:
Consider the following random job titles in Bulgarian:
Here is a list of tables that show the words of the job titles and their corresponding stems:
Properties and Relations (4)
Here is the number of rules in each of the three sets of rules:
Here are stem rule samples:
The current set of stem rules can be obtained with the argument "CurrentRules":
Each stem rule has an associated count (or frequency); here is the minimum count of the current rules:
Here is the number of current rules:
Here is the string length of the replacement values of the current stem rules:
Here is a histogram of the lengths of the suffixes that are replaced with the current stem rules:
The function WordStem gives stems of English words:
Here is a corresponding call of BulgarianStem:
Here is a list of tables that show the words of job titles in English and their corresponding stems (analogous to the list of tables above):
The words of a text can be obtained with StringSplit or TextWords and then given to BulgarianStem:
Neat Examples (3)
Here is some English text:
Here are the stems of the words in the English text:
Here we translate the English text into Bulgarian text, extract the words and stem them: