AntonAntonov/SoftwareDesignMethodsWithWLBook

Parsing and interpretation of integration commands

Introduction

This chapter exemplifies the use of Functional Parsers -- also known as Parser Combinators -- for the interpretation of a small Domain Specific Language (DSL) for symbolic and numeric integration commands.

The functional parsers implementations are based on the article “Functional Parsers” by Jeroen Fokker, [JF1].

The functional parsers presented form a monadic system. More explanations and references can be found in the Wikipedia article

Parser Combinator

, [Wk1].

It is assumed the readers of this chapter are at least somewhat familiar with the parsing problem, the Backus-Naur Form for grammar specifications, [Wk2], and the concept of definite integral in calculus.

Setup

Here we load the paclet

FunctionalParsers

, [AAp2]:

In[425]:=

Needs["AntonAntonov`FunctionalParsers`"]

Sentences

Suppose we want to make an interpreter of commands for single variable integrals calculation.

We can start by listing prototype sentences and then generalizing their form.

Here are some prototypes:

Calculate the integral of sin x over 0 to 1.

What is the integral of x^2+Log[x] for x in the interval 0 and 10?

Integrate Sin[x^2+5]

Integrate numerically 1/x+Sqrt[x] in the interval 4 and 10.

General command structure

We can say the integration commands have four parts:

1. type of integration (numerical or symbolical),

2. function to be integration (integrand),

3. variable (optional can be implied),

4. range (optional).

Roughly speaking we have the following structure:

Graph of the command structure

This graph represents the grammar of the integration commands we consider:
1. Branching represents alternatives or composition
2. The words and phrases and in quotes are tokens
3. The abstract rules are in angular brackets (“<xxx>”)
4. The dashed arrows mean “followed by”
5. Square brackets (“[xxx]”) are for optional parts
6. Examples are given as “free” text

Out[9]=

How we do the parsing

We are going to program parsers for the grammar rules in the graph of the previous slide.

The graph follows closely the Extended Backus-Naur Form (EBNF), [Wk2].

Here is the EBNF for the integration command language we consider:

<request-preamble> = 'compute' | 'what' , 'is' ;
<compute> = <request-preamble> , [ 'the' ] ;
<integral-type> = [ 'numerical' | 'symbolic' ] , 'integral' ;
<integrate> = [ 'symbolically' | 'numerically' ] , 'integrate' | 'integrate' , ( 'numerically' | 'symbolically' ) ;
<function> = '_String' ;
<variable> = '_IdentifierString' ;
<range-named> = 'R' | 'R+' | 'R-' ;
<range-interval> = [ 'from' ] , '_WordString' , [ 'to' | 'and' ] , '_WordString';
<range> = ( [ 'in' | 'over' ] , [ 'the' ] , [ 'interval' ] ) , ( <range-interval> | <range-named> ) ;
<var-range-spec> = ( 'for' | 'of' ) , <variable> , <range> ;
<integral-request> = <compute> , <integral-type> , 'of' ;
<command1> = ( <integral-request> | <integrate> ) , <function> , ( <var-range-spec> | <range> ) ;
<command2> = ( <integral-request> | <integrate> ) , <function> ;
<command> = <command1> | <command2> ;

We are going to write a parser for each EBNF rule.

Later in the talk we are going to show that using the package FunctionalParsers.m we can generate the parsers from a string with the EBNF of the grammar.

Parsing <compute>

The first grammar rule we consider is <compute> that has the <request-preamble> as sub-part.

<request-preamble> = 'compute' | 'what' , 'is' ;
<compute> = <request-preamble> , [ 'the' ] ;

Taking LHS of rule we remove the dashes, “-”, and put the prefix “p” in order to derive the symbol of parser corresponding to that rule.

Here is the parser for <request-preamble>:

In[426]:=

pREQUESTPREAMBLE=ParseAlternativeComposition[ParseSymbol["compute"],ParseSequentialComposition[ParseSymbol["what"],ParseSymbol["is"]]]

Out[426]=

ParseAlternativeComposition[If[Length[#1]>0&&compute===First[#1],{{Rest[#1],compute}},{}]&,ParseSequentialComposition[If[Length[#1]>0&&what===First[#1],{{Rest[#1],what}},{}]&,If[Length[#1]>0&&is===First[#1],{{Rest[#1],is}},{}]&]]

Which more compactly can be defines as:

In[427]:=

pREQUESTPREAMBLE=ParseSymbol["compute"]⊕ParseSymbol["what"]⊗ParseSymbol["is"];

The symbol “⊕” is an infix notation shorthand of ParseAlternativeComposition.
The symbol “⊗” is an infix notation shorthand of ParseSequentialComposition.

Let us test the parser with the integration command string

In[428]:=

icom="what is the integral of Sin[x] over 0 1";

which we split into tokens first:

In[429]:=

pREQUESTPREAMBLE[StringSplit[icom]]

Out[429]=

{{{the,integral,of,Sin[x],over,0,1},{what,is}}}

The parsing of <request-preamble> is successful, but in order to see than we need to explain the function parser definition.

Parser definition

First definition: We define a parser type to be a function that transforms a list of characters or tokens into (1) an unparsed list of tokens and (2) a parse tree:

P: _String  {_String, _ParseTree }

or

P: {_String...}  { {_String...}, _ParseTree }

It is better though to generalize this definition to allow a list of results:

P: {_String...}  { { {_String...}, _ParseTree }...}

(This is related to the so called “method of lists of success.”)

The functional parsers are categorized in the groups: basic, combinators, and transformers.
1. The basic parsers parse specified strings and strings adhering to predicates.
2. The combinator parsers allow sequential and alternative combinations of parsers.
3. The transformer parsers change the input or the output of the parsers that are transformed.

The parser ParseSymbol is a basic parser; ParseAlternativeComposition and ParseSequentialComposition are combinators.

Parsing <compute>, continued

So far we have the parsing result:

In[430]:=

pREQUESTPREAMBLE[StringSplit[icom]]

Out[430]=

{{{the,integral,of,Sin[x],over,0,1},{what,is}}}

The parsing of <request-preamble> is successful. Looking at the grammar rules

<request-preamble> = 'compute' | 'what' , 'is' ;
<compute> = <request-preamble> , [ 'the' ] ;

we are ready to define <compute> that combines <request-preamble> with an optional “the”:

The definition of <compute> uses another combinator parser ParseOption for optional tokens.

The parsed result contains two possible parsings, one with “the“ parsed, and one without:

We can use the parser combinator ParseShortest to select the one parsed the token “the”:

Parsing <integral-type> and <integral-request>

The definition of the parser <integral-type>

follows directly the EBNF rule

<integral-type> = [ 'numerical' | 'symbolic' ] , 'integral' ;

Now we can define <integral-request>

Picking only the important part

In many cases the semantic meaning of a grammar rule can be identified only by some part of the parsed input.
For example, in the context of the integration language the phrases “compute” and “what is” are needed in order to extend the number of admissible sentences, but as soon as they are parsed they are no longer of interest. What is of interest is to know what type of integration is requested: numerical or symbolical. (If the type is not explicitly specified we are going to interpret as “symbolic”.)

Instead of using ParseSequentialComposition (⊗) in the parser definition for the rule <integration-request> we can use ParseSequentialCompositionPickRight (⊳) :

Compare with the parsing with the previous definition using ParseSequentialComposition (⊗) :

If the type of integration is explicitly specified as in

then the new definition of <integral-request> outputs:

Transforming the parser results

Obviously we can (and should) go further and throw away also “integral“ using ParseSequentialCompositionPickLeft (⊲) as in

But now if do not specify explicitly the integration type we are going to have an empty successfully parsed part:

Of course when we interpret the parsed output we can imply from no integral type specification that the integration is symbolic, but it is better if we can modify the parsed output of the rule <integral-type> to indicate the integration type.

This can be done using the parser transformer ParseApply (infix shorthand ⊙) that applies a function to the successfully parsed part.

Transforming the parser results, continued

Let us apply a more complicated function to the rule <integral-type>. For example,

With the new definition we get

Other basic parsers

The only basic parse we used so far is ParseSymbol. A very often used one is ParsePredicate.

Suppose we want to parse a number string. Using ParsePredicate we can define the number parser as

ParserPredicate takes a function as an argument and the parsing is successful it that functions returns True.

Note that the parsed result is a string

We can use ParseApply to convert to a number in Mathematica:

Other basic parsers, continued

Here is the full list of the basic parser in the package FunctionalParsers.m :

Other parser combinators

Here is a couple of more useful parser combinators: ParseMany and ParseListOf.

Suppose we want to parse sequence of numbers. It is easy to define a parser doing that using ParseMany and ParsePredicate:

We parsed with the parser transformer ParseJust in order to get results that parsed out the sequence completely.

If we want to parse a sequence of numbers separated by, say, semicolon we can use ParseListOf:

Two other related parsers are ParseChainLeft and ParseChainRight :

The rest of the integration grammar rules

At this point it should be clear how we proceed with defining the rest of the grammar rules of the integration request language.

In order to keep things simple the integration function is assumed to be a Mathematica expression with no spaces within it.

For example

The type of the integration is going to be implied to be symbolic if IType has an empty list.

For the grammar rules <variable> and <range> we use IVar and IRange respectively:

Next we are going to generate automatically the parsers using the EBNF grammar.

Parser generation

Using the EBNF of the grammar of the integration commands language we can automatically generate the parsers for its grammar rules using the function GenerateParsersFromEBNF provided by the package FunctionalParsers.m .

In the current version of the EBNF parser functions there is a requirement that all elements of the parsed EBNF forms have to be separated by space.

The EBNF grammar string can have the pick-left and pick-right combinators (<& and &> respectively) and a ParseApply specification can be given within the form “<rhs> = parsers <@ transformer” . The application of functions can be done over the whole definition of an EBNF non-terminal symbol, not over the individual parts.

Here is the grammar of the integration language:

Parser generation, continued 1

It is beneficial to examine the random sentences with the grammar we have developed.

Parser generation, continued 2

Let us see how the parsed tree of the grammar looks like:

Parser generation, continued 3

Let us generate the grammars with GenerateParsersFromEBNF:

and tabulate the parsed results of a set of commands with ParsingTestTable:

Interpretation

After we have generated the parsers of the integration requests grammar we have to write an interpreter of the parsed results.

Because of the way the parsing results are structured we can just write a definition for the function ICommand and the integrals are going to be computed during the parsing with pCOMMAND.

Instead, we are going to write a function ICommandInterpreter and use another function, InterpretWithContext (also provided by the package FunctionalParsers.m) to trigger the computation of the integrals.

Interpretation, continued 1

The function InterpretWithContext returns the interpretation of the input together with a list of rules of the context data if such data is given.

Interpretation, continued 2

Here is a different set of integration commands:

Now let us parse the integration commands using also data values for the context of interpretation.

Interpretation, continued 3

To summarize the interpretation:

1. The context is given as a list of rules

2. There are two forms for the context rules:

{(_Symbol->_)..}
{"data"->{(_Symbol->_)...}, "functions"->{(_Symbol->_)...}}

3. If during the interpretation the context functions change the context data the result of InterpretWithContext will return the changed data.

Discussion

The automatic generation of parsers from a grammar specification is very useful.

Large grammars would demand too much book keeping of the parsers for them. Automatic generation of parsers definitely helps.

As an example let us consider a change in the integration requests language to include the sub-command of plotting the integrand over the specified integration range. We want to be able to interpret sentences like these:

compute the integral of Log[x] of x from 0 to 10 and plot it

integrate and plot Sin[x]*x^2 over 0 1

We can just extend the grammar, re-generate the parsers, and provide interpretation for plotting clauses if they are present.

Here is the grammar with “plot it” added (the new symbol <plot> is used in <command1>):

Discussion, continued 1

Here are some random sentences (commands) generated with the grammar we defined:

Discussion, continued 2

And here is the interpretation function that both integrates and plots:

Discussion, continued 3

Let us try the new functionality out with several integration-and-plot commands:

(We get the message “noplot” for the last integration request.)

Summary

The system of parsers of the package has

1. Tokenizer

2. Elementary parsers

3. Parser combinators

4. Parser transformers

5. Binding for infix notation

6. Functions for automatic parser generation

7. A function for interpretation with context

We discussed the automatic parser generation and its usefulness for grammar extension and bookkeeping.

References

Wolfram Language Paclet Repository

Setup

Articles

Packages, paclets