This is a chapter from
Creating a compiler with Raku
We started Chapter 8 with an attempt to implement the if
keyword. It was tricky to do so in the interpreter that executes the code line by line. Introducing the AST allowed to parse the program without executing it, so we can first build the node and only run the code behind it if the condition is true.
Implementing if
The language construction we are going to implement first is the prefix if
clause, demonstrated on the following example:
my flag = 42;
if flag say "Depends";
say "Always";
The if
keyword is followed by a value (which is a condition) and a statement. A value is a value
in the sense of our grammar, thus it is either a variable as in the example, or a string, or an expression. A statement can be any construct recognised by the statement
rule.
Embedding the if
clause to the grammar is simple, as we have all statement
definitions in one rule:
rule condition { 'if' <value> } rule statement { <condition> ? [ | <statement=variable-declaration> | <statement=assignment> | <statement=function-call> ] }
We have seen earlier that we should not execute the statement before we know if the condition is true. In the AST world, we can pick the statement node out of the tree and place it to an attribute of the condition node. Later, the evaluator decides whether to execute the statement node. So, we need a new kind of an AST node:
class AST::Condition is ASTNode { has ASTNode $.value; has ASTNode $.statement; }
In the actions class, the method trigged in response to a parsed statement should now be split into two branches. When there is no if
clause, everything remains as is. When there is such a clause, the statement node is enters the tree via a proxy AST::Condition
node:
multi method statement($/ where !$<condition>) {
$/.make($<statement>.made);
}
multi method statement($/ where $<condition>) {
$/.make(AST::Condition.new(
value => $/<condition><value>.made,
statement => $/<statement>.made,
));
}
The whole AST for the test program is drawn below.
Take a look at the two calls of say
. The first call is a child of the AST::Condition
node. The second one (in the right side of the diagram) is located at the top level inside the TOP’s @.statements
array.
The tree is ready and we can evaluate it. The LinguaEvaluator
class needs a way to react to the AST::Condition
type of AST nodes. If should check the condition value and pass execution further if needed.
multi method eval-node(AST::Condition $node) {
if $node.value.value {
self.eval-node($node.statement);
}
}
That’s all! Test the program with different values of the flag variable and see how the program reacts.
To complete the implementation, add a test in the t
directory:
my flag = 1; if flag say "Printed"; ## Printed flag = 0; if flag say "Ignored"; say "Done"; ## Done
Implementing else
A logical next step is implementing the else
clause. Let us expand the test program:
my flag = 0;
if flag say "Depends" else say "Otherwise";
say "Always";
The else
keyword can only appear if there was an if
before. In our current grammar, adding the else
block after the statement is tricky. It is better to re-organise the grammar so that the statement
rule contains different variants of statements only.
Let us allow the if
–else
construct on the top level:
rule TOP {
[
| <one-line-comment>
| <statement=conditional-statement> ';'
| <statement> ';'
]*
}
rule conditional-statement {
'if' <value> <statement>
[
'else' <statement>
]?
}
rule statement {
| <statement=variable-declaration>
| <statement=assignment>
| <statement=function-call>
}
Notice that in the TOP
rule, the new alternative is given an alias, statement
, which let us keep the same loop in the corresponding action:
method TOP($/) {
. . .
for $<statement> -> $statement {
$top.statements.push($statement.made);
}
. . .
}
OK, we can have one or two statements in the conditional construct now. The AST::Conditional
node gets the second attribute:
class AST::Condition is ASTNode { has ASTNode $.value; has ASTNode $.statement; has ASTNode $.antistatement; }
The new attribute is filled with either AST::Null
(if there is no else
clause) or with a AST::Statement
node:
method conditional-statement($/) {
$/.make(AST::Condition.new(
value => $/<value>.made,
statement => $/<statement>[0].made,
antistatement => $/<statement>[1] ??
$/<statement>[1].made !! AST::Null,
));
}
The statement
action can now be reverted to a single method with a trivial body:
method statement($/) {
$/.make($<statement>.made);
}
Finally, the evaluator must deside which statement to execute:
multi method eval-node(AST::Condition $node) { if $node.value.value { self.eval-node($node.statement); } else { self.eval-node($node.antistatement); } }
We have to implement the no-operation behaviour of the AST::Null
node too:
multi method eval-node(AST::Null) {
}
The else
keyword is ready and functioning, and a test t/else.t
must be added to the repository:
if 1 say "A" else say "B"; ## A if 0 say "A" else say "B"; ## B my x; if 1 x = 10 else x = 20; say x; ## 10 if 0 x = 30 else x = 40; say x; ## 40
Implementing a loop
Let us implement a loop in our Lingua language. At the moment, let us make it simple and allow only the loop
keyword with the counter variable decremented by the compiler.
my n = 5;
loop n say n;
This program prints the values from 5 to 1 and stops.
The syntax of the loop
clause is similar to the if
condition we saw in this chapter, and it can be embedded on the top level of the grammar too:
rule TOP { [ | <one-line-comment> | <statement=conditional-statement> ';' | <statement=loopped-statement> ';' | <statement> ';' ]* } rule loopped-statement { 'loop' <variable-name> <statement> }
Next, we need a new type of AST node:
class AST::Loop is ASTNode { has ASTNode $.variable; has ASTNode $.statement; }
The syntax is chosen in such a way so that you have to provide the loop with an already existing variable. In this case, we do not need to create a temporary loop variable. In the action method, a reference to the variable is kept in an AST::Variable
node:
method loopped-statement($/) { $/.make(AST::Loop.new( variable => AST::Variable.new( variable-name => ~$/<variable-name> ), statement => $/<statement>.made, )); }
Decreasing the counter is done in the evaluator:
multi method eval-node(AST::Loop $node) { while %!var{$node.variable.variable-name} { self.eval-node($node.statement); %!var{$node.variable.variable-name}--; } }
It may be safer to re-write the stop condition a bit:
while %!var{$node.variable.variable-name} > 0 { . . .
This allows to avoid infinite loops when the initial value is not an integer number, for example:
my n = 5.5;
loop n say n;
Let’s save our achievements in the test suite. As the loop prints a few lines, you should add them all to the test file:
my n = 3; loop n say n; ## 3 ## 2 ## 1
Statement blocks
Until now, both if
and else
constructs could contain only a single statement. Of course, if you need more than one, you can always add an if
clause to each of them, but it would be much nicer to implement support of code blocks, which is present in many other languages. The goal of this section is to execute the following program with the blocks directed by both if
–else
and loop
:
my c = 0; if c { say "c = $c"; say "IF block"; } else { say "c = $c"; say "ELSE block"; } my n = 5; loop n { say n; say 2 * n; }
Let us make a first attempt to replace a statement
with a newly defined block
. If the block contains a single instruction, curly braces are optional:
rule block { | '{' ~ '}' <statement>* %% ';' | <statement> ';' }
The '{' ~ '}' <X>
construct in Raku is equivalent to '{' <X> '}'
, but it often looks cleaner, especially when X
is a represented by a big piece of code.
On a conceptual level, we have to allow code blocks where a statement is allowed in the grammar. From a practical perspective, we have to think about semicolons at the end of the statements.
For example, a semicolon is not allowed before the else
keyword:
if 1 say "A"; else say "B";
The problem is that it still stands after a statement in the grammar. So, we should distinguish between a statement in the if
clause with and without the else
clause.
Also, we should not insist on a semicolon after the closing curly brace:
loop n {
say n;
say 2 * n;
};
And finally, a semicolon before the closing brace can be declared optional:
loop n {
say n;
say 2 * n
}
To satisfy all the conditions listed above, let us tell the grammar where we expect a semicolon and where it is not needed. We can introduce two multi-alternatives of the block
rule (remember how we were adding arguments to the methods in our grammar for the calculator).
multi rule block() {
| '{' ~ '}' <statement>* %% ';'
| <statement>
}
multi rule block(';') {
| '{' ~ '}' <statement>* %% ';'
| <statement> ';'
}
By doing so, we do not add a new rule name. Now, we can rewrite the rules for conditions and preliminary introduce the while
loop (we’ll return to it at the end of the chapter):
rule conditional-statement {
| 'if' <value> <block()> 'else' <block(';')>
| 'if' <value> <block(';')>
}
rule loopped-statement {
'while' <variable-name> <block(';')>
}
Notice that in the conditional-statement
rule, it was also possible to use square brackets to extract the common part:
rule conditional-statement {
'if' <value> [
| <block()> 'else' <block(';')>
| <block(';')>
]
}
You may prefer either; my own personal preference is using the form where you see both branches one after another:
| 'if' <value> <block()> 'else' <block(';')>
| 'if' <value> <block(';')>
Finally, adjust the TOP
rule to remove extra semicolons from there:
rule TOP {
[
| <one-line-comment>
| <statement=conditional-statement>
| <statement=loopped-statement>
| <statement> ';'
]*
}
Now to the actions part. First, lets us change the actions for the conditional constructs and use blocks there:
method conditional-statement($/) {
$/.make(AST::Condition.new(
value => $/<value>.made,
statement => $/<block>[0]<statement>[0].made,
antistatement => $/<block>[1] ??
$/<block>[1]<statement>[0].made !! AST::Null,
));
}
method loopped-statement($/) {
$/.make(AST::Loop.new(
variable => AST::Variable.new(
variable-name => ~$/<variable-name>
),
statement => $/<block><statement>[0].made,
));
}
Although the result contains long chains of keys and indices of the match object, such as $/<block>[1]<statement>[0]
, the transformation is simple and easy to understand. You can test the program, and it will execute the first statement from each curly block.
The next step is to modify the AST so that it keeps more than one statement where a block is allowed. We have to keep all of them, so let us use arrays instead of scalars:
class AST::Condition is ASTNode {
has ASTNode $.value;
has ASTNode @.statements;
has ASTNode @.antistatements;
}
class AST::Loop is ASTNode {
has ASTNode $.variable;
has ASTNode @.statements;
}
The corresponding actions should create the new AST nodes and pass all the statements from the code blocks. In the next fragment, we are using map
s to loop over the statements:
method conditional-statement($/) { $/.make(AST::Condition.new( value => $/<value>.made, statements => ($/<block>[0]<statement>.map: *.made), antistatements => $/<block>[1] ?? ($/<block>[1]<statement>.map: *.made) !! (AST::Null), )); } method loopped-statement($/) { $/.make(AST::Loop.new( variable => AST::Variable.new( variable-name => ~$/<variable-name> ), statements => ($/<block><statement>.map: *.made), )); }
Blocks in the AST
OK, we can return to the AST and let the nodes keep arrays of statements when the block in curly braces was seen.
Let us take a closer look at the loop from the example above:
loop n {
say n;
say 2 * n;
}
In the form of AST, this fragment takes the following shape:
AST::Loop.new(
variable => AST::Variable.new(
. . .
),
statements => Array[ASTNode].new(
AST::FunctionCall.new(
. . .
),
AST::FunctionCall.new(
. . .
)
)
)
Again, the abstract nature of the abstract syntax tree is demonstrated here. Although we added the block
rule to the grammar, there is no trace of it in the tree. The two say
calls appeared immediately as the elements of the statements
array attribute within the AST::Loop
node.
What we need is to loop over the statements in the evaluator:
multi method eval-node(AST::Condition $node) {
if $node.value.value {
self.eval-node($_) for $node.statements;
}
else {
self.eval-node($_) for $node.antistatements;
}
}
multi method eval-node(AST::Loop $node) {
while %!var{$node.variable.variable-name} > 0 {
self.eval-node($_) for $node.statements;
%!var{$node.variable.variable-name}--;
}
}
That’s it! Introducing the {
}
blocks was a relatively simple task.
A real-life test
If you run the loop, you’ll see an error very soon:
5 10 4 Cannot shift from an empty Array[ASTNode] in method value at /Users/ash/lingua/LinguaAST.rakumod (LinguaAST) line 112
The first three lines display the expected beginning of the result, but then, everything suddenly stopped. The error message helps to understand that this happened because we were using shift
to compute the expressions. Within a loop, expressions can be reused, so we should not change them. Let us change all shifts to loops. There are two such places in our code. The first one in the AST::MathOperations
class:
method value() {
my $result = @.operands[0].value;
for 1 .. @.operands.elems - 1 -> $i {
$result = operation(@.operators[$i - 1],
$result, @.operands[$i].value);
}
return $result;
}
The second in one of the multi-methods of the evaluator:
multi method eval-node(AST::HashAssignment $node) {
%!var{$node.variable-name} = Hash.new;
for 0 .. $node.keys.elems - 1 -> $i {
%!var{$node.variable-name}.{$node.keys[$i]} =
$node.values[$i].value;
}
}
The test code is working now. Let us fix the behaviour in the test with similar instructions:
my n = 5; loop n { my n2 = n * n; say "n = $n, n2 = $n2"; } ## n = 5, n2 = 25 ## n = 4, n2 = 16 ## n = 3, n2 = 9 ## n = 2, n2 = 4 ## n = 1, n2 = 1
Don’t forget to test the if
and else
blocks, too. The program should be able to execute all the statements from each of them.
Conditional operators
So far, the if
clause can only check if the value is zero or not. Let us make it more powerful and allow conditional and Boolean operators:
| & < <= > >= == !=
We are not going to introduce the Boolean data type, but instead, let’s add the operators to existing expressions. It makes both 3 + n
and x < y
valid expressions, which can be used as a condition in the if
test.
The above listed operators are binary operators and have precedence lower than arithmetic operators +
and -
. Among themselves, &
has a higher precedence. So, we have to update the op
tokens in the grammar and add the new operators:
multi token op(1) { | '|' | '<' | '>' | '<=' | '>=' | '==' | '!=' } multi token op(2) { '&' }
Update the rest to explicitly give the precedence:
multi token op(3) {
'+' | '-'
}
multi token op(4) {
'*' | '/'
}
multi token op(5) {
'**'
}
multi rule expr(6) {
| <number>
| <variable-name> <index>?
| '(' <expression> ')'
}
Add the functions to the AST::MathOperations
class:
multi sub operation('|', $a, $b) {
$a || $b
}
multi sub operation('&', $a, $b) {
$a && $b
}
multi sub operation('<', $a, $b) {
$a < $b
}
multi sub operation('<=', $a, $b) {
$a <= $b
}
multi sub operation('>', $a, $b) {
$a > $b
}
multi sub operation('>=', $a, $b) {
$a >= $b
}
multi sub operation('!=', $a, $b) {
$a != $b
}
multi sub operation('==', $a, $b) {
$a == $b
}
We rely on Raku’s operators here. Another alternative would be to only generate 1 or 0, for example:
multi sub operation('<', $a, $b) {
$a < $b ?? 1 !! 0
}
In the test, try different operators and their combinations:
my x = 10; my y = 20; if x > y say ">" else say "<"; ## < if x < y say ">" else say "<"; ## > if x != y say "!=" else say "=="; ## != if x != x say "!=" else say "=="; ## == if x == y say "==" else say "!="; ## != if x == x say "==" else say "!="; ## == if 5 <= 5 say "5 <= 5"; ## 5 <= 5 if 5 <= 6 say "5 <= 6"; ## 5 <= 6 if 5 >= 5 say "5 >= 5"; ## 5 >= 5 if 6 >= 5 say "6 >= 5"; ## 6 >= 5 if (10 > 1) & (5 < 50) say "OK 1"; ## OK 1 if (10 < 1) | (5 < 50) say "OK 2"; ## OK 2
Implementing the while
loop
Before the end of the chapter, let us implement the while
keyword, which is illustrated by the following code:
my n = 1;
while n <= 5 {
say n;
n = n + 1
}
The loop body is run until the condition stops being true.
Syntactically and semantically, the while
loop is very close to the loop
cycle we worked on before. Implementing it is mostly a copy-and-paste work.
In the grammar:
rule TOP { [ . . . | <statement=loopped-statement> | <statement=while-statement> . . . ]* } rule while-statement { 'while' <value> <block(';')> }
Define a new AST node:
class AST::While is ASTNode {
has ASTNode $.value;
has ASTNode @.statements;
}
In the actions:
method while-statement($/) {
$/.make(AST::While.new(
value => $/<value>.made,
statements => ($/<block><statement>.map: *.made),
));
}
In the evaluator:
multi method eval-node(AST::While $node) {
while $node.value.value {
self.eval-node($_) for $node.statements;
}
}
The while
construct is implemented. Add a test and move on to the next chapter.
my n = 1; while n <= 5 { say n; n = n + 1 } ## 1 ## 2 ## 3 ## 4 ## 5 my k = 1; while k < 10 k = k + 1; say k; ## 10
What a masterpiece this entire series of articles has been!!
Until now i have only heard Raku is a power tool, but these articles make me see what a magnificient super-tool it is!!