AntonAntonov/FunctionalParsers

Integer names parsing

Introduction	Tests
Generation of integer names	Performance tests
Generate rules	References
Random sentences

Introduction

In this notebook we describe how to make parser-interpreters that parse integer names and interpret to corresponding integer values.

Here is an example using Mathematica’s built-in functions:

In[1]:=

IntegerName[12321,"Words"]

Out[1]=

twelve thousand, three hundred twenty-one

In[2]:=

SemanticInterpretation["twelve thousand, three hundred twenty-one"]

Out[2]=

12321

We have two primary goals:

To show how this kind of functionality can be implemented.

To implement a faster parser-interpreter than the built-in function SemanticInterpretation.

In order to achieve those goals we more or less automatically generate grammar rules in

Extended Backus-Naur Form (EBNF)

, [Wk1], and then we generate the corresponding parsers using functions of the paclet

AntonAntonov/FunctionalParsers

, [AAp2]. (The package [AAp1] and paclet [AAp2] follow closely the exposition in [JF1]. The package [AAp3] provides a "productized" version of the grammar and parsers code in this notebook. )

Load the paclet

In[3]:=

Needs["AntonAntonov`FunctionalParsers`"]

Generation of integer names

The names of specified integers can be easily (and quickly) generated using the built-in function IntegerName. Here are few examples:

In[4]:=

IntegerName[1234567,"Words"]

Out[4]=

one million, two hundred thirty-four thousand, five hundred sixty-seven

In[5]:=

IntegerName[9342,"Words"]

Out[5]=

nine thousand, three hundred forty-two

Here we generate the names of “basic” integers from

In[6]:=

IntegerName[#,"Words"]&/@Range[0,10]

Out[6]=

{zero,one,two,three,four,five,six,seven,eight,nine,ten}

Here we generate names of “basic” integers from

In[7]:=

IntegerName[#,"Words"]&/@Range[11,19]

Out[7]=

{eleven,twelve,thirteen,fourteen,fifteen,sixteen,seventeen,eighteen,nineteen}

Here we generate names of “basic” multiples of

In[8]:=

IntegerName[#,"Words"]&/@Range[20,90,10]

Out[8]=

{twenty,thirty,forty,fifty,sixty,seventy,eighty,ninety}

Generate rules

In this section we generate the grammar

EBNF

rules. We use the built-in function

IntegerName

The basic rules

Here are EBNF rules for parsing the “basic” integer names:

In[9]:=

nRules=Table["<"<>"name-of-"<>ToString[i]<>"> = '"<>StringReplace[IntegerName[i],{"1 ""","one """}]<>"' <@ Function["<>ToString[i]<>"] ;",{i,Join[Range[0,19],Range[20,100,10],{1000,10^6}]}]

Out[9]=

{<name-of-0> = 'zero' <@ Function[0] ;,<name-of-1> = 'one' <@ Function[1] ;,<name-of-2> = 'two' <@ Function[2] ;,<name-of-3> = 'three' <@ Function[3] ;,<name-of-4> = 'four' <@ Function[4] ;,<name-of-5> = 'five' <@ Function[5] ;,<name-of-6> = 'six' <@ Function[6] ;,<name-of-7> = 'seven' <@ Function[7] ;,<name-of-8> = 'eight' <@ Function[8] ;,<name-of-9> = 'nine' <@ Function[9] ;,<name-of-10> = 'ten' <@ Function[10] ;,<name-of-11> = 'eleven' <@ Function[11] ;,<name-of-12> = 'twelve' <@ Function[12] ;,<name-of-13> = 'thirteen' <@ Function[13] ;,<name-of-14> = 'fourteen' <@ Function[14] ;,<name-of-15> = 'fifteen' <@ Function[15] ;,<name-of-16> = 'sixteen' <@ Function[16] ;,<name-of-17> = 'seventeen' <@ Function[17] ;,<name-of-18> = 'eighteen' <@ Function[18] ;,<name-of-19> = 'nineteen' <@ Function[19] ;,<name-of-20> = 'twenty' <@ Function[20] ;,<name-of-30> = 'thirty' <@ Function[30] ;,<name-of-40> = 'forty' <@ Function[40] ;,<name-of-50> = 'fifty' <@ Function[50] ;,<name-of-60> = 'sixty' <@ Function[60] ;,<name-of-70> = 'seventy' <@ Function[70] ;,<name-of-80> = 'eighty' <@ Function[80] ;,<name-of-90> = 'ninety' <@ Function[90] ;,<name-of-100> = 'hundred' <@ Function[100] ;,<name-of-1000> = 'thousand' <@ Function[1000] ;,<name-of-1000000> = 'million' <@ Function[1000000] ;}

Here we take the Left Hand Side (LHS) of each rules:

In[10]:=

nRulesLHS=Table["<"<>"name-of-"<>ToString[i]<>">",{i,Join[Range[0,19],Range[20,100,10],{1000,10^6}]}]

Out[10]=

{<name-of-0>,<name-of-1>,<name-of-2>,<name-of-3>,<name-of-4>,<name-of-5>,<name-of-6>,<name-of-7>,<name-of-8>,<name-of-9>,<name-of-10>,<name-of-11>,<name-of-12>,<name-of-13>,<name-of-14>,<name-of-15>,<name-of-16>,<name-of-17>,<name-of-18>,<name-of-19>,<name-of-20>,<name-of-30>,<name-of-40>,<name-of-50>,<name-of-60>,<name-of-70>,<name-of-80>,<name-of-90>,<name-of-100>,<name-of-1000>,<name-of-1000000>}

Here we join all basic integers rules:

In[11]:=

numberNameRules=StringRiffle[nRules,"\n"];Magnify[numberNameRules,0.8]

Out[12]=

<name-of-0> = 'zero' <@ Function[0] ;<name-of-1> = 'one' <@ Function[1] ;<name-of-2> = 'two' <@ Function[2] ;<name-of-3> = 'three' <@ Function[3] ;<name-of-4> = 'four' <@ Function[4] ;<name-of-5> = 'five' <@ Function[5] ;<name-of-6> = 'six' <@ Function[6] ;<name-of-7> = 'seven' <@ Function[7] ;<name-of-8> = 'eight' <@ Function[8] ;<name-of-9> = 'nine' <@ Function[9] ;<name-of-10> = 'ten' <@ Function[10] ;<name-of-11> = 'eleven' <@ Function[11] ;<name-of-12> = 'twelve' <@ Function[12] ;<name-of-13> = 'thirteen' <@ Function[13] ;<name-of-14> = 'fourteen' <@ Function[14] ;<name-of-15> = 'fifteen' <@ Function[15] ;<name-of-16> = 'sixteen' <@ Function[16] ;<name-of-17> = 'seventeen' <@ Function[17] ;<name-of-18> = 'eighteen' <@ Function[18] ;<name-of-19> = 'nineteen' <@ Function[19] ;<name-of-20> = 'twenty' <@ Function[20] ;<name-of-30> = 'thirty' <@ Function[30] ;<name-of-40> = 'forty' <@ Function[40] ;<name-of-50> = 'fifty' <@ Function[50] ;<name-of-60> = 'sixty' <@ Function[60] ;<name-of-70> = 'seventy' <@ Function[70] ;<name-of-80> = 'eighty' <@ Function[80] ;<name-of-90> = 'ninety' <@ Function[90] ;<name-of-100> = 'hundred' <@ Function[100] ;<name-of-1000> = 'thousand' <@ Function[1000] ;<name-of-1000000> = 'million' <@ Function[1000000] ;

Here we take the rules for integers from

In[13]:=

numberName1To19Rule="<name-1-to-19> = "<>StringRiffle[Take[nRulesLHS,{2,20}]," | "]<>" ;"

Out[13]=

Here we make a composite rule for the integers from

In[14]:=

numberNameUpTo19Rule="<name-up-to-19> = <name-of-0> | <name-1-to-19> ;"

Out[14]=

<name-up-to-19> = <name-of-0> | <name-1-to-19> ;

Here we make make a composite rule for the “basic” multipliers of

In[15]:=

numberNameM10Rule="<name-of-10s> = "<>StringRiffle[Take[nRulesLHS,{21,-4}]," | "]<>" ;"

Out[15]=

Here we make make a composite rule for the “basic” integers from

In[16]:=

numberName1To10Rule="<name-1-to-10> = "<>StringRiffle[Take[nRulesLHS,{2,11}]," | "]<>" ;"

Out[16]=

Here we define the interpreter functions to be used with the parsers:

In[17]:=

TimesFlatten=Function[Apply[Times,Flatten[List[#]]]];TotalFlatten=Function[Total[Flatten[List[#]]]];

The full grammar

Here is the grammar for “worded integers” in EBNF:

In[19]:=

wordedNumberRule=" <worded-number> = <worded-number-up-to-bil> | <worded-number-up-to-1000000> | <worded-number-up-to-1000> | <worded-number-up-to-100> <@ WordedNumber@*TotalFlatten ; <worded-number-100s> = <name-1-to-19> , <name-of-100> <@ TimesFlatten ; <worded-number-up-to-100> = <name-up-to-19> | <name-of-10s> , [ '-' ] &> [ <name-1-to-10> ] <@ TotalFlatten ; <worded-number-1000s> = <worded-number-up-to-1000> , <name-of-1000> <@ TimesFlatten ; <worded-number-up-to-1000> = <worded-number-up-to-100> | <worded-number-100s> , [ [ 'and' | ',' ] &> <worded-number-up-to-100> ] <@ TotalFlatten ; <worded-number-up-to-1000000> = <worded-number-up-to-1000> | <worded-number-1000s> , [ [ 'and' | ',' ] &> <worded-number-up-to-1000> ] <@ TotalFlatten ; <worded-number-1000000s> = <worded-number-up-to-1000000> , <name-of-1000000> <@ TimesFlatten ; <worded-number-up-to-bil> = <worded-number-up-to-1000000> | <worded-number-1000000s> , [ [ 'and' | ',' ] &> <worded-number-up-to-1000000> ] <@ TotalFlatten ; ";

Parser generation

Here we generate the parsers using the paclet [AAp2]:

Parser sanity check examples

Here we see that the parser works:

Random sentences

It is useful to see random sentences from the defined integer names grammar. We expect all random sentences to be meaningful integer names.

Tests

In this section we show tests derivation and verification for the grammar (and generated parsers.)

Tokenization function

Create a list of random integers

Make a table with parsing results

Here we verify that get expected results

Tally statistic

Performance tests

In this section we demonstrate the obtained parser-interpreter is faster the built-in ones.

Here are the number of queries:

Here is the timing with the dedicated parsers:

Here is how the results look like:

Remark: We can see that the dedicated parser provides the parsing results 100 times faster.

References

Articles

[JF1] Jeroen Fokker, Functional parsers. (1997), Advanced Functional Programming: First International Spring School on Advanced Functional Programming Techniques Båstad, Sweden, May 24–30, 1995 Tutorial Text. DOI: 10.1007/3-540-59451-5_1.

Packages

[AAp2] Anton Antonov, FunctionalParsers WL paclet, (2023), Wolfram Language Paclet Repository.