This is a chapter from
Creating a compiler with Raku
In this chapter, we will create a program that can evaluate simple arithmetic expressions such as 3 + 4
or 3 - 3 * 7
. We’ll start from the simplest equations with two operands and will work until we can introduce parentheses.
Enrolling a sum
Let us first take a test expression 3 + 4
and create a working prototype of the calculator for it. The grammar only needs to parse integers and a literal plus sign:
grammar Calculator {
rule TOP {
<number> '+' <number>
}
token number {
\d+
}
}
The actions are straightforward too. We are using the AST attributes to keep the values:
class CalculatorActions {
method TOP($/) {
$/.make($<number>[0].made + $<number>[1].made);
}
method number($/) {
$/.make(+$/);
}
}
Everything is ready to run the test and confirm that it prints 7
indeed:
Calculator.parse('3 + 4',
:actions(CalculatorActions)).made.say;
That was not difficult at all. The code works with other integer numbers with no changes in the grammar or actions, but our next goal is to teach it handling -
for subtraction.
The first approach can be introducing different kind of statements at the very top of the grammar:
rule TOP {
| <addition>
| <subtraction>
}
rule addition {
<number> '+' <number>
}
rule subtraction {
<number> '-' <number>
}
Accordingly, let’s create separate action methods for both addition and subtraction:
method TOP($/) {
$/.make($<addition> ??
$<addition>.made !! $<subtraction>.made);
}
method addition($/) {
$/.make($<number>[0].made + $<number>[1].made);
}
method subtraction($/) {
$/.make($<number>[0].made - $<number>[1].made);
}
Now, test the second case:
my @cases = '3 + 4', '3 - 4';
for @cases -> $test {
say "$test = " ~ Calculator.parse($test,
:actions(CalculatorActions)).made;
}
OK, everything works as expected:
3 + 4 = 7
3 - 4 = -1
It all works but you should not be satisfied with the solution. It is much better to merge both addition and subtraction down to a single rule:
rule TOP { <number> <op> <number> } token op { '+' | '-' }
The grammar became simpler, and the operator is explicitly defined in its own token that matches either a plus or a minus.
Similarly, the addition and subtraction action methods can be replaced with a generic solution:
method TOP($/) {
if $<op> eq '+' {
$/.make($<number>[0].made + $<number>[1].made);
}
else {
$/.make($<number>[0].made - $<number>[1].made);
}
}
And that’s it. The updated grammar treats both test strings as a valid expression and makes correct calculations based on the contents of $<op>
. When checking the condition in $<op> eq '+'
, the eq
string comparison operator is implicitly converting $<op>
to a string and no separate action to handle the op
token is needed.
Premature optimization
Are there any ways to make the code a bit more attractive? There are a few. We’ll try two approaches to unify the calls. The goal is to avoid repeating the longs lines of code, where the only difference is the sign of the operation, e. g.:
$/.make($<number>[0].made + $<number>[1].made);
and
$/.make($<number>[0].made - $<number>[1].made);
In both cases, the operator is surrounded by the same two operands, and that’s a good opportunity to write a couple of twin functions:
class CalculatorActions {
sub addition($a, $b) {
$a + $b
}
sub subtraction($a, $b) {
$a - $b
}
. . .
}
Both functions can be placed inside the CalculatorActions
class, and when they are called, no additional arguments pointing to the instance of the class are passed to it by Raku. To make a common entry point to both functions, let’s create a hash that keeps references to them:
class CalculatorActions {
my %operation =
'+' => &addition,
'-' => &subtraction;
. . .
}
It is quite easy to call a function according to the operator:
method TOP($/) {
$/.make(%operation{~$<op>}(
$<number>[0].made, $<number>[1].made));
}
Another interesting option to simplify the code and get rid of explicit if
checks is to use multiple dispatch.
class CalculatorActions {
multi sub operation('+', $a, $b) {
$a + $b
}
multi sub operation('-', $a, $b) {
$a - $b
}
method TOP($/) {
$/.make(operation(~$<op>,
$<number>[0].made,
$<number>[1].made));
}
. . .
}
Here, the operator sign is passed to the operation
function, and the compiler chooses which candidate to call: either the operation defined with a plus or the operation that needs a minus as its first argument. The Raku compiler does this job for us with pleasure.
More operands
The grammar and the actions are now smart enough to parse and evaluate expressions having two values in it, but it does not work with more complicated examples like 1 + 2 + 3
. It does not work as the TOP
level is restricted by the rule with two numbers only: <number> <op> <number>
.
In the grammar, the new optional items in the chain can be expressed by the *
quantifier:
rule TOP {
<number> [<op> <number> <ws>]*
}
The <ws>
token is a built-in tool to match optional whitespace. With this change, we also allowed expressions that contain a single number. So, the following test cases all match:
my @cases =
'3 + 4', '3 - 4',
'7',
'1 + 2 + 3', '1 + 3 + 5 + 7';
The two functions that do real calculations must also be adapted to accept more than two values:
multi sub operation('+', @values) {
[+] @values
}
multi sub operation('-', @values) {
[-] @values
}
Using a reduction operation simplifies the syntax a lot at this point. Finally, prepare an array of values for passing it to one of these multi-functions:
method TOP($/) {
$/.make(operation(~$<op>[0], $<number>.map: *.made));
}
To get the numbers from the AST tree, the map
method is used here. The *.made
construction is a WhateverCode
block that is executed for each element of the array of $<number>
s.
As we already expect that the calculator must work with single numbers, a small extension is required. With a single number, there is no operator in the expression:
method TOP($/) {
if $<op> {
$/.make(operation(~$<op>[0], $<number>.map: *.made));
}
else {
$/.make($<number>[0].made);
}
}
Run the test and check that all the test cases are properly evaluated:
3 + 4 = 7
3 - 4 = -1
7 = 7
1 + 2 + 3 = 6
1 + 3 + 5 + 7 = 16
A diversity test
The beauty and the simplicity of the reduction form of the operators (such as [+]
) made it possible to express the action in a few characters but it made impossible to evaluate expressions with different operators, for instance, 7 + 8 - 3
. To handle such examples, a loop over the operators and the operands can be performed.
At the top level, you’ve got to walk along the numbers and take the operator next to it. Here is an example of how you can loop over the values:
method TOP($/) {
my @numbers = $<number>.map: *.made;
my $make = @numbers.shift;
operation(~$<op>.shift, $make, @numbers.shift)
while @numbers.elems;
$/.make($make);
}
To simplify the while loop, let the operation
multi-functions to update one of their arguments:
multi sub operation('+', $a is rw, $b) {
$a += $b
}
multi sub operation('-', $a is rw, $b) {
$a -= $b
}
Again, the code looks compact and, what is even more important, it works correctly.
7 + 8 - 3 = 12
14 - 4 = 10
14 - 4 - 3 = 7
100 - 200 + 300 + 1 - 2 = 199
Adding more math
The calculator is now only working with addition and subtraction. The good part is that it can compute long expressions that contain more than two numbers. It is a right moment to teach it how to handle multiplication and division.
It would be naïve to just extend the op
token to create more candidates of the operation
function:
grammar Calculator {
. . .
token op {
'+' | '-' | '*' | '/'
}
. . .
}
class CalculatorActions {
. . .
multi sub operation('*', $a is rw, $b) {
$a *= $b
}
multi sub operation('/', $a is rw, $b) {
$a /= $b
}
. . .
}
This grammar parses and even evaluates all the possible expressions with all four arithmetic operations, but it does not follow the standard precedence rules:
3 * 4 = 12
100 / 25 = 41 + 2 * 3 = 9
One of the possible solutions is to use a stack to perform the calculations. You scan the input string from left to right and keep applying the next operator to the numbers until you meet an operator with higher precedence. In this case, you put the current result to the stack and continue with a new series of calculations until you reach an operator with lower precedence. Then you pop the numbers and the operators from the stack and reduce it until it is fully consumed. This is a good exercise to do at home, but we’ll choose a simpler strategy.
Another method of handling the precedence of arithmetic operations is to change the grammar in such a way so that it extracts multiplication and division operations first, and then passes the result to the remaining additions and subtractions.
In the next fragment, the new grammar is shown:
grammar Calculator {
rule TOP {
<term>* %% <op1>
}
rule term {
<factor>* %% <op2>
}
token op1 {
'+' | '-'
}
token op2 {
'*' | '/'
}
rule factor {
<number>
}
token number {
\d+
}
}
Let us examine it. First of all, notice how we changed the TOP
rule comparing to the previous variant. Fragments repetition is much simpler to express with the %%
operator. Compare the two regexes:
<number> [<op> <number> <ws>]*
and
<number>* %% <op>
The second change is introducing two operator sets: op1
for +
and -
and op2
for *
and /
. Operators in op1
have lower precedence, and they appear in the top-level rule of the grammar. In other words, we think of an input expression as of a sum that only contains numbers that you add or subtract.
The multiplication and division operators have higher precedence, and you should treat their results as a whole on the top level. This is why instead of matching numbers, the term
token is introduced. It can be a number in the end, but first of all, it is a series of factors separated by *
or /
. A factor is basically a number in our case. I intentionally added a separate proxy rule, factor
, to keep the names terms and factors, which you can often see in compiler-related literature.
In the actions class, we have the four operation
functions already; all we need is to add a trivial method for factors and create the action for terms.
class CalculatorActions {
. . .
method TOP($/) {
$/.make(process($<term>, $<op1>));
}
method term($/) {
$/.make(process($<factor>, $<op2>));
}
sub process(@data, @ops) {
my @nums = @data.map: *.made;
my $result = @nums.shift;
operation(~@ops.shift, $result, @nums.shift)
while @nums;
return $result;
}
method factor($/) {
$/.make($<number>.made);
}
method number($/) {
$/.make(+$/);
}
}
As you can see, both TOP
and term
are implementing the same algorithm; they just deal with different parts of the grammar (and different operators).
Testing the code
So far, we’ve got a calculator that performs four arithmetic operations with an arbitrary amount of numbers in the expression. You can come up with test cases, but you probably don’t want to calculate the correct result in your head and check it against each test case.
Raku distribution includes the Test
module that significantly helps to simplify the test cases loop. The module exports a few functions, and we’ll be using one of them, named is
. As the syntax of the calculator expressions coincides with Raku’s, let us ask it to check the result:
use Test; . . . for @cases -> $test { my $result = Calculator.parse( $test, :actions(CalculatorActions)).made; my $correct = EVAL($test); is($result, $correct, "$test = $correct"); }
Try different test cases, including those with a mixture of operators. For example:
ok 10 - 3 * 4 = 12
ok 11 - 100 / 25 = 4
ok 12 - 1 + 2 * 3 = 7
ok 13 - 1 + 2 - 3 * 4 / 5 = 0.6
Actually, each test have to check two things: 1) whether an example was parsed, and 2) if it was evaluated correctly. We can split the tests to satisfy this observation:
for @cases -> $test { my $parse = Calculator.parse($test, :actions(CalculatorActions)); next unless isa-ok($parse, Match, "parsed $test"); my $result = $parse.made; my $correct = EVAL($test); is($result, $correct, "computed $test = $correct"); }
The output gets updated correspondently:
ok 1 - parsed 3 + 4 ok 2 - computed 3 + 4 = 7 ok 3 - parsed 3 - 4 ok 4 - computed 3 - 4 = -1 . . .
Adding more power
In this section, we will make our calculator a bit more versatile as we are going to add the power operator, **
. The difference to the previous sets of operators is that the power operator has even higher precedence and must be processed first, before any multiplication or addition.
The operator can be added similarly to what we did earlier for the *
and /
operators. Let us replace the factor
rule and define a token for the operator itself:
rule factor {
<number>* %% <op3>
}
token op3 {
'**'
}
Another subtle change that needs to be done is adding whitespaces to the number
token. You either can do it explicitly:
token number {
<ws> \d+ <ws>
}
Or implicitly by converting the token to a rule:
rule number {
\d+
}
Update the actions class to support the new operator:
multi sub operation('**', $a is rw, $b) {
$a **= $b
}
method factor($/) {
$/.make(process($<number>, $<op3>));
}
All hard work is done (it was simple, wasn’t it?). The calculator is now handling five operators:
ok 27 - parsed 2 ** 3 ok 28 - computed 2 ** 3 = 8 ok 29 - parsed 2 + 3 ** 4 ok 30 - computed 2 + 3 ** 4 = 83 ok 31 - parsed 1 + 2 * 3 ** 4 - 5 * 6 ok 32 - computed 1 + 2 * 3 ** 4 - 5 * 6 = 133 ok 33 - parsed 2 ** 3 - 4 ok 34 - computed 2 ** 3 - 4 = 4
Allowing parentheses
The last touch to the calculator design is making it understand parentheses. Although it may seem a difficult task, in reality, the implementation is rather simple. This is because anything within parentheses is another expression that follows the very same grammatical rules. In other words, if you see parentheses, you can recursively start from TOP
.
After you compute the value inside parentheses, you get a single value, thus you can treat it as any other number. To extend the parser, just make a new value
rule out of the number and list two alternatives there:
rule factor {
<value>* %% <op3>
}
rule value {
| <number>
| '(' <TOP> ')'
}
The value
method picks up the value made on the previous level:
method value($/) {
$/.make($<number> ?? $<number>.made !! $<TOP>.made);
}
That’s it. Only three simple changes and we can parse much more complicated expressions:
ok 35 - parsed 10 * (20 - 30) ok 36 - computed 10 * (20 - 30) = -100 ok 37 - parsed 10 * 20 - 30 ok 38 - computed 10 * 20 - 30 = 170 ok 39 - parsed (5 * 6) ok 40 - computed (5 * 6) = 30 ok 41 - parsed (10) ok 42 - computed (10) = 10 ok 43 - parsed 1 - (5 * (3 + 4)) / 2 ok 44 - computed 1 - (5 * (3 + 4)) / 2 = -16.5
Our calculator is ready. In the next chapter, we will integrate it to the interpreter so that it can parse arithmetic expressions that involve variables.
Next: Chapter 4. A Better Interpreter
Great series! I’m in the process of reading PLP (Programming Language Pragmatics, 2nd ed.), so this is excellent to put the theory into practice!
A small correction: in the last step, a fourth change is necessary – number should be replaced with value in method factor.