This chapter exemplifies the use of Functional Parsers -- also known as Parser Combinators -- for the interpretation of a small Domain Specific Language (DSL) for symbolic and numeric integration commands.
The functional parsers implementations are based on the article “Functional Parsers” by Jeroen Fokker, [JF1].
The functional parsers presented form a monadic system. More explanations and references can be found in the Wikipedia article
It is assumed the readers of this chapter are at least somewhat familiar with the parsing problem, the Backus-Naur Form for grammar specifications, [Wk2], and the concept of definite integral in calculus.
This graph represents the grammar of the integration commands we consider: 1. Branching represents alternatives or composition 2. The words and phrases and in quotes are tokens 3. The abstract rules are in angular brackets (“<xxx>”) 4. The dashed arrows mean “followed by” 5. Square brackets (“[xxx]”) are for optional parts 6. Examples are given as “free” text
Out[9]=
How we do the parsing
We are going to program parsers for the grammar rules in the graph of the previous slide.
The graph follows closely the Extended Backus-Naur Form (EBNF), [Wk2].
Here is the EBNF for the integration command language we consider:
We are going to write a parser for each EBNF rule.
Later in the talk we are going to show that using the package FunctionalParsers.m we can generate the parsers from a string with the EBNF of the grammar.
Parsing <compute>
The first grammar rule we consider is <compute> that has the <request-preamble> as sub-part.
The symbol “⊕” is an infix notation shorthand of ParseAlternativeComposition. The symbol “⊗” is an infix notation shorthand of ParseSequentialComposition.
Let us test the parser with the integration command string
In[428]:=
icom="whatistheintegralofSin[x]over01";
which we split into tokens first:
In[429]:=
pREQUESTPREAMBLE[StringSplit[icom]]
Out[429]=
{{{the,integral,of,Sin[x],over,0,1},{what,is}}}
The parsing of <request-preamble> is successful, but in order to see than we need to explain the function parser definition.
Parser definition
First definition: We define a parser type to be a function that transforms a list of characters or tokens into (1) an unparsed list of tokens and (2) a parse tree:
(This is related to the so called “method of lists of success.”)
The functional parsers are categorized in the groups: basic, combinators, and transformers. 1. The basic parsers parse specified strings and strings adhering to predicates. 2. The combinator parsers allow sequential and alternative combinations of parsers. 3. The transformer parsers change the input or the output of the parsers that are transformed.
The parser ParseSymbol is a basic parser; ParseAlternativeComposition and ParseSequentialComposition are combinators.
Parsing <compute>, continued
So far we have the parsing result:
In[430]:=
pREQUESTPREAMBLE[StringSplit[icom]]
Out[430]=
{{{the,integral,of,Sin[x],over,0,1},{what,is}}}
The parsing of <request-preamble> is successful. Looking at the grammar rules
In many cases the semantic meaning of a grammar rule can be identified only by some part of the parsed input. For example, in the context of the integration language the phrases “compute” and “what is” are needed in order to extend the number of admissible sentences, but as soon as they are parsed they are no longer of interest. What is of interest is to know what type of integration is requested: numerical or symbolical. (If the type is not explicitly specified we are going to interpret as “symbolic”.)
Instead of using ParseSequentialComposition (⊗) in the parser definition for the rule <integration-request> we can use ParseSequentialCompositionPickRight (⊳) :
Compare with the parsing with the previous definition using ParseSequentialComposition (⊗) :
If the type of integration is explicitly specified as in
then the new definition of <integral-request> outputs:
Transforming the parser results
Obviously we can (and should) go further and throw away also “integral“ using ParseSequentialCompositionPickLeft (⊲) as in
But now if do not specify explicitly the integration type we are going to have an empty successfully parsed part:
Of course when we interpret the parsed output we can imply from no integral type specification that the integration is symbolic, but it is better if we can modify the parsed output of the rule <integral-type> to indicate the integration type.
This can be done using the parser transformer ParseApply (infix shorthand ⊙) that applies a function to the successfully parsed part.
Transforming the parser results, continued
Let us apply a more complicated function to the rule <integral-type>. For example,
With the new definition we get
Other basic parsers
The only basic parse we used so far is ParseSymbol. A very often used one is ParsePredicate.
Suppose we want to parse a number string. Using ParsePredicate we can define the number parser as
ParserPredicate takes a function as an argument and the parsing is successful it that functions returns True.
Note that the parsed result is a string
We can use ParseApply to convert to a number in Mathematica:
Other basic parsers, continued
Here is the full list of the basic parser in the package FunctionalParsers.m :
Other parser combinators
Here is a couple of more useful parser combinators: ParseMany and ParseListOf.
Suppose we want to parse sequence of numbers. It is easy to define a parser doing that using ParseMany and ParsePredicate:
We parsed with the parser transformer ParseJust in order to get results that parsed out the sequence completely.
If we want to parse a sequence of numbers separated by, say, semicolon we can use ParseListOf:
Two other related parsers are ParseChainLeft and ParseChainRight :
The rest of the integration grammar rules
At this point it should be clear how we proceed with defining the rest of the grammar rules of the integration request language.
In order to keep things simple the integration function is assumed to be a Mathematica expression with no spaces within it.
For example
The type of the integration is going to be implied to be symbolic if IType has an empty list.
For the grammar rules <variable> and <range> we use IVar and IRange respectively:
Next we are going to generate automatically the parsers using the EBNF grammar.
Parser generation
Using the EBNF of the grammar of the integration commands language we can automatically generate the parsers for its grammar rules using the function GenerateParsersFromEBNF provided by the package FunctionalParsers.m .
In the current version of the EBNF parser functions there is a requirement that all elements of the parsed EBNF forms have to be separated by space.
The EBNF grammar string can have the pick-left and pick-right combinators (<& and &> respectively) and a ParseApply specification can be given within the form “<rhs> = parsers <@ transformer” . The application of functions can be done over the whole definition of an EBNF non-terminal symbol, not over the individual parts.
Here is the grammar of the integration language:
Parser generation, continued 1
It is beneficial to examine the random sentences with the grammar we have developed.
Parser generation, continued 2
Let us see how the parsed tree of the grammar looks like:
Parser generation, continued 3
Let us generate the grammars with GenerateParsersFromEBNF:
and tabulate the parsed results of a set of commands with ParsingTestTable:
Interpretation
After we have generated the parsers of the integration requests grammar we have to write an interpreter of the parsed results.
Because of the way the parsing results are structured we can just write a definition for the function ICommand and the integrals are going to be computed during the parsing with pCOMMAND.
Instead, we are going to write a function ICommandInterpreter and use another function, InterpretWithContext (also provided by the package FunctionalParsers.m) to trigger the computation of the integrals.
Interpretation, continued 1
The function InterpretWithContext returns the interpretation of the input together with a list of rules of the context data if such data is given.
Interpretation, continued 2
Here is a different set of integration commands:
Now let us parse the integration commands using also data values for the context of interpretation.
3. If during the interpretation the context functions change the context data the result of InterpretWithContext will return the changed data.
Discussion
The automatic generation of parsers from a grammar specification is very useful.
Large grammars would demand too much book keeping of the parsers for them. Automatic generation of parsers definitely helps.
As an example let us consider a change in the integration requests language to include the sub-command of plotting the integrand over the specified integration range. We want to be able to interpret sentences like these:
compute the integral of Log[x] of x from 0 to 10 and plot it
integrate and plot Sin[x]*x^2 over 0 1
We can just extend the grammar, re-generate the parsers, and provide interpretation for plotting clauses if they are present.
Here is the grammar with “plot it” added (the new symbol <plot> is used in <command1>):
Discussion, continued 1
Here are some random sentences (commands) generated with the grammar we defined:
Discussion, continued 2
And here is the interpretation function that both integrates and plots:
Discussion, continued 3
Let us try the new functionality out with several integration-and-plot commands:
(We get the message “noplot” for the last integration request.)
Summary
The system of parsers of the package has
1. Tokenizer
2. Elementary parsers
3. Parser combinators
4. Parser transformers
5. Binding for infix notation
6. Functions for automatic parser generation
7. A function for interpretation with context
We discussed the automatic parser generation and its usefulness for grammar extension and bookkeeping.