Chapter 9. Evaluating AST

This is a chapter from
Creating a compiler with Raku

In the previous chapter, we learned how to transform an input program to an AST. The tree only holds information about the actual essence of the program, and ignores, for example, all the comments from the source code.

During the whole previous chapter, the output of the compiler was a tree. We could see what the program thought it has had to do, but we did not actually see the result of that work. In this chapter, we are traversing the AST and executing the tasks that it represents.

Evaluating from the top

The AST tree elements bubble up from the leaf rules and tokens to the TOP rule. Let us make our grammar return the whole AST:

method TOP($/) {
    my $top = AST::TOP.new;
    
    . . .
    
    $/.make($top);
}

While we were building the tree, the pair of methods, make and made, gave us the way to pass a tree fragment to the parent node. When you use make at the TOP level, you get the object as the result of the parse method:

my $ast = Lingua.parse($code, :actions(LinguaActions.new));
say $ast ?? 'OK' !! error('Error: parse failed');

We used the resulting AST in a Boolean context but that’s not the only thing we can do with it. In this chapter, we will work on creating the code to walk along the tree and execute the actions that the nodes demand.

In the main script, let’s pass the AST to some eval method (we are going to create it) of the LinguaEvaluator class:

use LinguaEvaluator; 

. . .

my $eval = LinguaEvaluator.new();
$eval.eval($ast.made);

Let me remind the two design ideas from the previous chapter. Any AST node is a child class of an abstract ASTNode class, and the top-level node contains everything in an array of statements:

class ASTNode {    
}

class AST::TOP is ASTNode {
    has ASTNode @.statements;
}

Executing a program means running through all the nodes. Thus, we start from the top level, taking all the statements one by one:

use LinguaAST;

class LinguaEvaluator {    
    method eval(ASTNode $top) {
        self.eval-node($_) for $top.statements;
    }
}

For each statement, the eval-node method is called. There is no need to explicitly check the real nature of each statement, as we can let Raku use its multiple-dispatching facilities to decide which method to call for a given statement.

Let us start with the simplest program that declares a variable and initialises it with a value:

my a = 7;

There is a single statement in the AST for this program:

AST::TOP.new(
    statements => Array[ASTNode].new(
        AST::ScalarDeclaration.new(
            variable-name => "a",
            value => AST::NumberValue.new(
                value => 7
            )
        )
    )
)

The type of the node is AST::ScalarDeclaration, so there should be a multi-method eval-node that takes an argument of that type:

class LinguaEvaluator {
    . . .

    multi method eval-node(AST::ScalarDeclaration $node) {
        . . .
    }
}

The job of the method is to create a variable and put a value into it. But where do you store the variables now? In the previous chapters, the %!var hash was a part of the LinguaActions class. Now, we don’t need it there anymore and can remove the hash declaration from the class.

By the way, there is a good sanity check: there should be no more references to the %!var hash in the LinguaActions class. We have one left in the LinguaActions.string method where it is used to make string interpolation. Let us remove it from there and keep the action method as simple as possible:

class LinguaActions {
    . . .
    method string($/) {
        $/.make(AST::StringValue.new(value => ~$/[0]));
    }
    . . .
}

As there are no data members in the class left, we can get rid of instantiating it and pass a class name instead of an instance to the parser:

my $ast = Lingua.parse($code, :actions(LinguaActions));

If everything runs now, then your definition of the LinguaActions class is cleaned up completely.

Working with variables

The variables can be stored inside the LinguaEvaluator object, and we can add our data storage to it:

class LinguaEvaluator {
    has %!var;

    . . .
}

Use this hash to create our first variable:

multi method eval-node(AST::ScalarDeclaration $node) {
    %!var{$node.variable-name} = 
        $node.value ~~ AST::Null ?? 0 !! $node.value.value;
}

All the information for the evaluator is taken from an AST node. In this case, it tells us its type (AST::ScalarDeclaration is scalar declaration), the name of the variable ($node.variable-name) and the content of the variable ($node.value). If there’s no value (the $node.value attribute contains AST::Null), let us store 0 in a variable. Otherwise, just copy the value attribute of the value node.

For debugging, add a couple of dd calls to display the AST and the variable storage content:

method eval(ASTNode $top) {
    dd $top;
    self.eval-node($_) for $top.statements;
    dd %!var;
}

The program executes our first example and successfully creates and initialises the variable:

Hash %!var = {:a("7")}

This is all great, so let the program print the value of the variable:

my a = 7;
say a;

The AST of this program contains two statements:

AST::TOP.new(
    statements => Array[ASTNode].new(
        AST::ScalarDeclaration.new(
            variable-name => "a",
            value => AST::NumberValue.new(
                value => 7
            )
        ),
        AST::FunctionCall.new(
            function-name => "say",
            value => AST::Variable.new(
                variable-name => "a"
            )
        )
    )
)

The second statement is an AST::FunctionCall node. It keeps the name of the function and, in another AST node, the node referencing the variable.

The evaluator should now find a multi-method for the function call node. Here it is:

multi method eval-node(AST::FunctionCall $node) {
    self.call-function($node.function-name, $node.value);
}

As we expect that there will be more than one function, and their arguments can be of different type, let us implement another set of multi-methods specifically for each function and its arguments. For the given program, we need a say function that prints a variable:

multi method call-function('say', AST::Variable $value) {
    say %!var{$value.variable-name};
}

This method does the final job: it accesses the variable storage and sends the variable to the output.

It is now obvious how to implement the function for working with different data types using multi-methods:

multi method call-function('say', AST::NumberValue $value) {
    say $value.value;
}

multi method call-function('say', AST::StringValue $value) {
    say $value.value;
}

With these additions, the compiler can handle the following program:

my a = 7;
say a;

say 8;
say "nine";

A few more steps can be done to allow other simple operations. We start with a scalar assignment:

a = 10;

There’s a variable on the left and a NumberValue on the right.

AST::ScalarAssignment.new(
    variable-name => "a",
    rhs => AST::NumberValue.new(
        value => 10
    )
)

The value is directly accessible via the $.value attribute:

multi method eval-node(AST::ScalarAssignment $node) {
    %!var{$node.variable-name} = $node.rhs.value;
}

The good part about AST is that its nodes represent simple, almost atomic, actions. Such as variable declaration or assignment. It means that the implementation is straightforward enough.

Let us teach the evaluator to create array and hashes:

my data[];
my relation{};

To implement it, we have to define another couple of multi-methods for AST::ArrayDeclaration and AST::HashDeclaration:

multi method eval-node(AST::ArrayDeclaration $node) {
    %!var{$node.variable-name} = Array.new;
}

multi method eval-node(AST::HashDeclaration $node) {
    %!var{$node.variable-name} = Hash.new;
}

After it is executed, the variable storage contains the desired objects:

Hash %!var = {:data($[]), :relation(${})}

We also have to think about array and hash initialisation an assignment, but let us first work with single elements:

data[3] = 4;
color{"red"} = "#FF0000";

Assignments to the elements of arrays and hashes are as simple as that:

multi method eval-node(AST::ArrayItemAssignment $node) {
    %!var{$node.variable-name}[$node.index] = $node.rhs.value;
}

multi method eval-node(AST::HashItemAssignment $node) {
    %!var{$node.variable-name}{$node.key} = $node.rhs.value;
}

The next logical step is to implement array and hash assignments:

data = 7, 8, 9;
color = "white": "#FFFFFF", "black": "#000000";

Codewise, this is about two more multi-methods:

multi method eval-node(AST::ArrayAssignment $node) {
    %!var{$node.variable-name} = 
        ($node.elements.map: *.value).Array;
}

multi method eval-node(AST::HashAssignment $node) {
    %!var{$node.variable-name} = Hash.new;
    while $node.keys {
        %!var{$node.variable-name}.{$node.keys.shift} =
            $node.values.shift.value;
    }
}

We did not talk about initialization yet, but you will be surprised that the following code already works:

my array[] = 1, 3, 4;
my h{} = "a": 1, "b": 2;

This is possible because in the AST tree, array and hash declaration with initialisations are presented by the same AST nodes: AST::ArrayAssignment and AST::HashAsignment.

String interpolation

Let us take a program with string interpolation:

my alpha = 54;
say "The value of α is $alpha.";

If you run it, the string will be printed as-is, with no interpolation of the variable. So, we have to restore the code that does that.

The problem is that the AST::StringValue node only contains a string itself, and to make a substitution, we need to parse the string again to find all variable names that are mentioned in there. This is not very efficient, and it is better to store the information about the variables immediately during the parsing phase. Let us remind the grammar token that describes strings:

token string {
    '"' ( [
            | <-["\\$]>+
            | '\\"'
            | '\\\\'
            | '\\$'
            | '$' <variable-name>
            ]* )
    '"'
}

Variable names land in the match object. Let us reserve room for them in the AST node:

class AST::StringValue is ASTNode {
    has Str $.value;
    has @.interpolations;
}

In the action, make some simple calculations to compute the name of the variable and its position in a string:

method string($/) {
    my $match = $/[0];

    my @interpolations;
    push @interpolations, [
            .Str,
            .from - $match.from - 1,
            .pos - .from + 1,
        ] for $match<variable-name>;

    $/.make(AST::StringValue.new(
        value => ~$match,
        interpolations => @interpolations
    ));
}

Finaly, in the LinguaEvaluator class, the string can be processed to substitute the variables:

multi method call-function('say', AST::StringValue $value) {
    say self.interpolate($value);
}

method interpolate(AST::StringValue $str) {
    my $s = $str.value;

    for $str.interpolations.reverse -> $var {
        $s.substr-rw($var[1], $var[2]) = %!var{$var[0]};
    }

    $s ~~ s:g/\\\"/"/;
    $s ~~ s:g/\\\\/\\/;
    $s ~~ s:g/\\\$/\$/;

    return $s;
}

You may optimize the code and move the three substitutions to the LinguaActions class, but it is important to realise that variable interpolation has to happen every time the string is accessed, because the value of the variables may change between the calls, as shown in the next example:

my alpha = 54;
say "The value of α is $alpha.";

my beta;
beta = 200;
alpha = 100;
say "And now it is $alpha (β = $beta).";

Using Raku getters

In the previous section, we created a separate method for interpolating variables in a string. Interpolation worked well with the say function. Imagine now that we want to assign a string to a variable:

my date = 18;
my month = "October";
my year = 2018;

my date = "$date $month $year";
say date;

This code does not work, as no interpolation has been made by the evaluator. The eval-node method for a scalar declaration (and initialisation, too) is using the value attribute of a AST::StringValue object:

multi method eval-node(AST::ScalarDeclaration $node) {
    %!var{$node.variable-name} =
        $node.value ~~ AST::Null ??
                       0 !! $node.value.value;
}

It would be great if the string taken from $node.value.value already contains interpolated variables. We should not do it statically (at the moment of creation of an AST::StringValue node), because the variables may change their values before the string is used.

Fortunately, Raku offers a mechanism that allows us to change the behaviour of the value attribute without changing the code that uses it. In fact, we are already using a getter method with the same name, value, which was created by Raku in response to declaring a class attribute:

class AST::StringValue . . . {
    has Str $.value;
    . . .
}

It is possible to redefine the method with our own implementation:

class AST::StringValue is ASTNode {
    has Str $.value;
    has @.interpolations;

    method value() {
        . . .
        return . . . ;
    }
}

Now, we can call the interpolation code from within the value method of AST::StringValue. As it needs access to the variable storage, let us pass it as an argument:

class AST::StringValue is ASTNode {
    has Str $.value;
    has @.interpolations;

    method value(%var) {
        my $s = $!value;

        for @!interpolations.reverse -> $var {
            $s.substr-rw($var[1], $var[2]) = %var{$var[0]};
        }

        . . .
        
        return $s;
    }
}

Notice how the %var variable is used together with the $!value and @!interpolations attributes in the method.

In the evaluator, pass the %!var storage:

multi method eval-node(AST::ScalarDeclaration $node) {
    %!var{$node.variable-name} =
        $node.value ~~ AST::Null ?? 
                       0 !! $node.value.value(%!var);
}

Another part where we can compute the values by redefining the value method of an AST node is accessing items of arrays and keys. The following program is an example containing both such cases:

my a[] = 10, 20;
my b{} = "A": "ey", "B": "bee";

my index = 1;
say a[index]; # prints 20

my key = "A";
say b{key}; # prints "ey"

Unlike the previous examples, variables are used here as array indices and hash keys.

Let us allow variables to compute their values themselves:

class AST::Variable is ASTNode {
    has Str $.variable-name;
    has $.evaluator;

    method value() {
        return $.evaluator.var{$.variable-name};
    }
}

Pass a reference of the evaluator to the AST nodes when you build them. For array items:

multi method array-index($/ where $<variable-name>) {
    $/.make(AST::Variable.new(
        variable-name => ~$<variable-name>,
        evaluator => $!evaluator,
    ));
}

And for the key—value pairs of hashes:

multi method hash-index($/ where $<variable-name>) {
    $/.make(AST::Variable.new(
        variable-name => ~$<variable-name>,
        evaluator => $!evaluator,
    ));
}

Also, don’t forget scalar variables. They can also return their values when they are used in expressions, for example:

multi method expr($/ where $<variable-name> && !$<index>) {
    $/.make(AST::Variable.new(
        variable-name => ~$<variable-name>,
        evaluator => $!evaluator,
    ));
}

Interpolating everywhere

At this moment, there is another problem. We only need interpolation for strings, but the eval-node method does a generic call of the value(%!var) method.

It fails on our test program, as two of the three AST::ScalarDeclaration nodes contain references to numbers, not strings, and thus there is no value method that takes a hash. If you change the program so that it only uses strings, it would works:

my date = "18";
my month = "October";
my year = "2018";
. . .

We have to find the way to make the code correct for both strings and numbers, but keep it simple. One of the possible solutions is to check the type of the AST node each time it is used in situations when the value is taken. Another solution is to define a method that takes the parameter in the AST::NumberValue class. Both approaches seem to be quite line-consuming and redundant. Instead, let us allow the AST nodes to access the variables directly.

We pass the evaluator object to the instance of LinguaActions and keep a reference to it in the node variables. The main program must be modified a bit: a few objects must be created before the actual parsing.

my $evaluator = LinguaEvaluator.new();
my $actions = LinguaActions.new(:evaluator($evaluator));
my $ast = Lingua.parse($code, :actions($actions));

if $ast {
    say 'OK';
    $evaluator.eval($ast.made);
}
else {
    error('Error: parse failed');
}

An evaluator object of the LinguaEvaluator type is passed to the constructor of the actions class. Notice that we pass an instance of LinguaActions to the parse method, not just the class name.

As we are willing to use the variables outside of the evaluator class, let us declare it public by changing its twigil to a dot.

class LinguaEvaluator {
    has %.var;
    . . .
}

(As we saw from the previous section, data methods that would be called public in other programming languages, are a combination of an attribute and a getter method in Raku.)

In the LinguaActions class, add an attribute for keeping the evaluator:

use LinguaEvaluator;
 
class LinguaActions {
    has LinguaEvaluator $.evaluator;
    . . .
}

We are passing it to the AST node, but only when creating an AST::StringValue node:

$/.make(AST::StringValue.new(
    evaluator => $!evaluator,
    value => ~$match,
    interpolations => @interpolations
));

Nodes of other types do not need such reference, as they are not using data from the evaluator object. So, we only need to change the AST::StringValue class:

class AST::StringValue is ASTNode {
    has Str $.value;
    has @.interpolations;
    has $.evaluator;
    . . .
}

In this class, we will redefine the default value getter with our own method that interpolates the variables, taking them from evaluator’s variable storage:

method value() {
    my $s = $!value;

    for @!interpolations.reverse -> $var {
        $s.substr-rw($var[1], $var[2]) = 
            $.evaluator.var{$var[0]};
    }

    . . .
}

Now the code works again, and the string can successfully interpolate both numbers and strings.

Arithmetic operations

It is time to restore the work of our calculator within the compiler, and to implement the solution for the AST::MathOperations node. The work we did earlier to re-use the value method brings its benefits, and the code is very simple:

class AST::MathOperations is ASTNode {
    . . .
    method value() {
        my $result = @.operands.shift.value;

        while @.operands {
            $result = operation(
                @.operators.shift,
                $result, @.operands.shift.value);
        }

        return $result;
    }
}

The actual operation methods are the same which we had in the calculator discussed in Chapter 3. Put them to the AST::MathOperations class:

multi sub operation('+', $a, $b) {
    $a + $b
}

multi sub operation('-', $a, $b) {
    $a - $b
}

multi sub operation('*', $a, $b) {
    $a * $b
}

multi sub operation('/', $a, $b) {
    $a / $b
}

multi sub operation('**', $a, $b) {
    $a ** $b
}

Having that done, we can simplify the implementation of the say function. Previously, we had two multi-variants reacting to either AST::NumberValue or AST::StringValue. As strings now interpolate variables themselves, we can just take the value of the node, and there’s no need to distinguish between the types of the nodes:

multi method call-function('say', ASTNode $value) {
    say $value.value;
}

Don’t forget to specify the behaviour of the say function for arrays and hashes. Again, multiple dispatch with the correct where clause helps here, while the condition becomes a bit wordy:

multi method call-function('say', AST::Variable $value 
    where %!var{$value.variable-name} ~~ Array) {

    say %!var{$value.variable-name}.join(', ');
}

multi method call-function('say', AST::Variable $value 
    where %!var{$value.variable-name} ~~ Hash) {
    
    my $data = %!var{$value.variable-name};
    my @str;
    for $data.keys.sort -> $key {
        @str.push("$key: $data{$key}");
    }
    say @str.join(', ');
}

For array and hash items the functions are straightforward:

multi method call-function('say', AST::ArrayItem $item) {
    say %!var{$item.variable-name}[$item.index.value];
}

multi method call-function('say', AST::HashItem $item) {
    say %!var{$item.variable-name}{$item.key.value};
}

This is the end of this chapter, where we used the AST tree to evaluate the values of its nodes and to run the program in the end. The evaluator class that was built is very compact, as we moved some logic to the AST nodes. This gave us a good balance between the pure design and the complexity of the code.

Next: Chapter 10. Test Suite