Chapter 8. Building AST. Part 2

This is a chapter from
Creating a compiler with Raku

In the second part of this chapter, I am talking about building the AST fragments for hosting strings, arrays, and hashes, as well as bigger constructs such as expressions, and how to manipulate these elements in a program.

Declaring strings

For the string values, let us create a new node type, AST::StringValue:

class AST::StringValue is ASTNode {
    has Str $.value;
}

An instance of this class is created whenever a string is parsed:

method string($a) {
    . . .

    $a.make(AST::StringValue.new(value => $s));
}

We will return to strings in the next chapter, because we still have to re-implement variable interpolation.

Declaring arrays and hashes

For arrays and hashes, we also have to create nodes that are responsible for declaring variables of those types. Let us start with arrays:

class AST::ArrayDeclaration is ASTNode {
    has Str $.variable-name;
    has ASTNode @.elements;
}

Now, let us change the action. Currently, array declarations immediately create a slot in the %!var storage:

multi method array-declaration($/ where !$<value>) {
    %!var{$<variable-name>} = Array.new;
}

When we are working with the AST tree, all such actions must be postponed until the compiler traverses the complete tree. For now, the only action is to pass an AST::ArrayDeclaration node further:

multi method array-declaration($/ where !$) {
    $/.make(AST::ArrayDeclaration.new(
        variable-name => ~$,
        elements => []
    ));
}

The new method just adds an AST node to the tree.

multi method variable-declaration($/ where $<array-declaration>) {
    $/.make($<array-declaration>.made);
}

multi method array-declaration($/ where !$<value>) {                
    $/.make(AST::ArrayDeclaration.new(
        variable-name => ~$<variable-name>,
        elements => []));
}

Hashes support requires similar code; the obvious difference is the aggregate type of the elements attribute.

multi method variable-declaration($/
    where $<hash-declaration>) {
    $/.make($<hash-declaration>.made);
}

multi method hash-declaration($/ where !$<string>) {
    $/.make(AST::HashDeclaration.new(
        variable-name => ~$<variable-name>,
        elements => {}));
}

The AST node definition is shown below:

class AST::HashDeclaration is ASTNode {
    has Str $.variable-name;
    has ASTNode %.elements;
}

After this change, you may notice three lookalike methods:

multi method variable-declaration($/ where $<scalar-declaration>) {
    $/.make($<scalar-declaration>.made);
}

multi method variable-declaration($/ where $<array-declaration>) {
    $/.make($<array-declaration>.made);
}

multi method variable-declaration($/ where $<hash-declaration>) {
    $/.make($<hash-declaration>.made);
}

All three methods can be re-written if you add an alias in the grammar rule:

rule variable-declaration {
    'my' [
        | <declaration=array-declaration>
        | <declaration=hash-declaration>
        | <declaration=scalar-declaration>
    ]
}

The match variable contains two entries now: one for the original rule name, the second for the alias:

｢my array[]｣
 declaration => ｢array[]｣
  variable-name => ｢array｣
 array-declaration => ｢array[]｣
  variable-name => ｢array｣

Having that, the three actions can be merged to a simple single method that does the job for all the branches of the rule.

multi method variable-declaration($/) {
    $/.make($<declaration>.made);
}

Scalar assignment

As you can see, when building the AST, we have to work with all the rules we’ve built in the previous chapters. Actually, we will need another round in the next chapter, when we will create the code to execute the actions required by different AST nodes.

For now, let us implement scalar assignments. The AST node for it should contain the name of the variable and something that will be assigned. Let’s call it $.rhs, which stands for right-hand side. This object is definitely an ASTNode and let’s not restrict what it hosts exactly. It can be a scalar value or an expression that has to be evaluated first.

class AST::ScalarAssignment is ASTNode {
    has Str $.variable-name;
    has ASTNode $.rhs;
}

Unlike the declaration rules, there is only one assignment rule, and we are using its parts to dispatch the action methods. Here is what needs to be changed to build an assignment node for scalars:

multi method assignment($/ where !$<index>) {
    . . .
    else {
        $/.make(AST::ScalarAssignment.new(
            variable-name => ~$<variable-name>,
            rhs => $<value>[0].made));
    }
}

To unify the way nodes are passed to the statement, we can do the same trick as in the previous section and add an alias to the rule:

rule statement {
    | <statement=variable-declaration>
    | <statement=assignment>
    | <statement=function-call>
}

Now, a single action method can handle all three branches:

method statement($/) {
    $/.make($<statement>.made);
}

Working with indices

It is important to realise that when we are building the AST, we are not going to evaluate it immediately. It means, for example, that we should not worry about the real content of a variable. Our task is just to put its name in the node’s attribute. The same counts for accessing elements of arrays and hashes. In AST, we only save the name of the variable and the index of an element or the key if that variable is a hash.

Let us create classes for the assignments to array and hash items.

class AST::ArrayItemAssignment is ASTNode {
    has Str $.variable-name;
    has Int $.index;
    has ASTNode $.rhs;
}

class AST::HashItemAssignment is ASTNode {
    has Str $.variable-name;
    has Str $.key;
    has ASTNode $.rhs;
}

An index of an array is stored in an integer $.index field, and the key for a hash is kept in the string $.key. In the rest, the nodes share the same structure.

Emitting AST nodes is also a job that looks similar for both arrays and hashes:

multi method assignment($/ 
    where $<index> && $<index><array-index>) {
    $/.make(AST::ArrayItemAssignment.new(
        variable-name => ~$<variable-name>,
        index => $<index>.made,
        rhs => $<value>[0].made
    ));
}

multi method assignment($/ 
    where $<index> && $<index><hash-index>) {
    $/.make(AST::HashItemAssignment.new(
        variable-name => ~$<variable-name>,
        key => $<index>.made.value,
        rhs => $<value>[0].made
    ));
}

Note that using the AST nodes already let us making an assignment quite versatile. For example, our code supports both numbers and strings in the right-hand side of the equation. Take the following Lingua example:

my a[];
a[2] = 3;

my b{};
b{"x"} = "y";

Its generated AST tree is shown in the next diagram together with the lines of code that the nodes reflect.

Array and hash assignment and initialisation

We skipped the code for initialising arrays and hashes and did it for a reason. Let us first make the AST for the cases when you declare and assign in two different statements, as shown in the next example:

my d[];
d = 3, 5, "7";

my e{};
e = "1": 2, "3": "4";

Array and hash assignment differs from assigning to individual elements, so these actions require their own types of AST nodes:

class AST::ArrayAssignment is ASTNode {
    has Str $.variable-name;
    has ASTNode @.elements;
}

class AST::HashAssignment is ASTNode {
    has Str $.variable-name;
    has Str @.keys;
    has ASTNode @.values;
}

For arrays, we collect the elements in the array consisting of some AST nodes. For hashes, the keys and the values are kept in separate arrays, and the keys are forced to be strings. Alternatively, we could create Pair objects (Pair is Raku’s built-in type) to combine the key—value pairs.

In the actions, we have to carefully separate assignments to scalars, arrays and hashes. Multi-methods work quite well here, while it may require a lot of conditions in the where clause. Another approach would be to split the assignment rule into three, one per the data type. Anyway, here’s our current implementation of the action methods:

multi method assignment($/ where !$<index> && 
                                  !$<value>) {
    $/.make(AST::ScalarAssignment.new(
        variable-name => ~$<variable-name>,
        rhs => $<value>[0].made
    ));
}

multi method assignment($/ where !$<index> && 
                                   $<value> &&
                                  !$<string>) {
    $/.make(AST::ArrayAssignment.new(
        variable-name => ~$<variable-name>,
        elements => $<value>.map: *.made
    ));
}

multi method assignment($/ where !$<index> && 
                                   $<value> &&
                                   $<string>) {
    $/.make(AST::HashAssignment.new(
        variable-name => ~$<variable-name>,
        keys => ($<string>.map: *.made.value),
        values => ($<value>.map: *.made)
    ));
}

Examine this code carefully: you should understand how methods’ signatures help to select the right method for each given situation. In this code, a lot of map methods are used too, they are always concise and mostly clear but are not always easy to write.

The next step is to complete the assignment with initialisation actions, and we already have the code for them. Extract it to separate functions (as we did it earlier in the interpreter), and here is the final result.

Array-related methods:

sub init-array($variable-name, $elements) {
    return AST::ArrayAssignment.new(
        variable-name => $variable-name,
        elements => $elements.map: *.made
    );
}

multi method array-declaration($/ where $<value>) {
    $/.make(init-array(~$<variable-name>, $<value>));
}

multi method assignment($/ where !$<index> && $<value> && !$<string>) {
    $/.make(init-array(~$<variable-name>, $<value>));
}

The code looks transparent and self-explanatory again. Repeat the same for the code for hashes:

sub init-hash($variable-name, $keys, $values) {
    return AST::HashAssignment.new(
        variable-name => $variable-name,
        keys => ($keys.map: *.made.value),
        values => ($values.map: *.made)
    );
}

multi method hash-declaration($/ where $<string> && $<value>) {
    $/.make(init-hash(~$<variable-name>, 
                  $<string>, $<value>));
}

multi method assignment($/ where !$<index> && $<value> && $<string>) {
    $/.make(init-hash(~$<variable-name>, 
                      $<string>, $<value>));
}

Many of the grammar rules are covered now. Let us continue with filling the gaps in the new architecture of the compiler and implement the AST elements for expressions and function calls. Our goal is to either have action methods that just pass the made value to the next level or those that put one of the AST::* objects in the made attribute.

AST for expressions

Consider a simple program:

my a;
a = 2 + 3;

The first line is a declaration of a variable, the second line is an assignment. There is one more action here: an addition, which has to be represented by an AST node too. Let us build the following tree:

As you may notice, the new node is called AST::MathOperations and it contains two arrays to keep the operators and the operands.

In the simplest example, there is only one operator and two operands, but the same object is also suitable for longer sequences, say 2 + 3 - 4, where all operators have the same precedence level. This solution helps to reduce the number of AST nodes and also simplifies the next stage of processing the tree.

So, here’s the class:

class AST::MathOperations is ASTNode {
    has Str @.operators;
    has ASTNode @.operands;
}

The operators are stored as strings; the operands are other AST nodes. Embedding AST generation to the current LinguaActions class has more deletions than insertions. We have to remove all the operation subroutines. They do actual addition, multiplication, etc., and we do not need that in the AST. All those real actions will be done on a later stage of traversing the tree.

The philosophy of having a few multi-variants of the expr action, on one side, makes the implementation more complicated, as you have to think how the AST node will be propagating to the top. On the other side, you can split it into even smaller actions and decide whether you have to emit an AST::MathOperations or an AST::NumberValue node depending on the presence of the $<op> key in the match object. In other words, if there are operators in the expression, use AST::MathOperations, otherwise, an AST::NumberValue node with the number stored it its attribute.

multi method expr($/ where $<expr> && !$<op>) {
    $/.make($<expr>[0].made);
}

multi method expr($/ where $<expr> && $<op>) {
    $/.make(AST::MathOperations.new(
        operators => ($<op>.map: ~*),
        operands => ($<expr>.map: *.made)
    ));
}

multi method expr($/ where $<expression>) {
    $/.make($<expression>.made);
}

Our implementation is not a typical textbook AST tree for arithmetical expressions. Let us examine a simple expression with operators of different precedence:

my x = 2 + 3 * 4 / 5 - 6;

For such an example, you would most likely meet a tree where each node represents a single arithmetical operation with two operands, such as shown on the left side in the diagram below.

The tree for our compiler is shown on the right. Each node can take any number of operands, and the tree becomes much more compact: in this example, there are two times fewer nodes comparing to the standard version. If the node has three operands, it should always have two operators, which you will later put between the operands.

After the tree has been built, the $top variable in the TOP method contains the following data structure:

AST::TOP $top = AST::TOP.new(
  statements => Array[ASTNode].new(
    AST::ScalarDeclaration.new(
      variable-name => "x",
      value => AST::MathOperations.new(
        operators => Array[Str].new(
          "+",
          "-"
        ),
        operands => Array[ASTNode].new(
          AST::NumberValue.new(
            value => 2
          ),
          AST::MathOperations.new(
            operators => Array[Str].new(
              "*",
              "/"
            ),
            operands => Array[ASTNode].new(
              AST::NumberValue.new(
                value => 3
              ),
              AST::NumberValue.new(
                value => 4
              ),
              AST::NumberValue.new(
                value => 5
              )
            )
          ),
          AST::NumberValue.new(
            value => 6
          )
        )
      )
    )
  )
)

Function calls

Function calls are also significantly simpler in the form of AST, as the node carries only the name of the function and an attribute node containing the argument of the function:

class AST::FunctionCall is ASTNode {
    has Str $.function-name;
    has ASTNode $.value;
}

The new action method does not do any real printing or string formatting:

method function-call($/) {
    $/.make(AST::FunctionCall.new(
        function-name => ~$<function-name>,
        value => $<value>.made
    ));
}

Final tuning

At the end of the chapter, let us make some changes that give us a simpler AST. It was not that important at the early stages, but now we can see a real parse tree. Let us add a shortcut for variable’s name in the value rule. Currently, the variable name should bubble up from the deepest expr(4) rule. We can let the parser see it immediately:

rule value {
    | <variable-name>
    | <expression>
    | <string>
}

The action, thus, immediately creates an AST::Variable node:

multi method value($/ where $<variable-name>) {
    $/.make(AST::Variable.new(
        variable-name => ~$<variable-name>
    ));
}

It is a good idea to extend the variable name with an optional index to allow arrays or hashes elements in the values:

rule value {
    | <variable-name> <index>?
    | <expression>
    | <string>
}

We also need to define new AST nodes for such types of data:

class AST::ArrayItem is ASTNode {
    has Str $.variable-name;
    has ASTNode $.index;
}

class AST::HashItem is ASTNode {
    has Str $.variable-name;
    has ASTNode $.key;
}

The reason we are using the ASTNode type for storing array indices and hash keys is that we expect them to be not only integers or strings, but, for example, other variables:

my c{} = "a": 1, "b": 2;
my s = "a";
say c{s};

Modify the existing action and add a couple of more:

multi method value($/ where $<variable-name> && !$<index>) {
    $/.make(AST::Variable.new(
        variable-name => ~$<variable-name>
    ));
}

multi method value($/ where $<variable-name> && $<index> && 
                            $<index><array-index>) {
    $/.make(AST::ArrayItem.new(
        variable-name => ~$<variable-name>,
        index => $<index>.made
    ));
}

multi method value($/ where $<variable-name> && $<index> && 
                            $<index><hash-index>) {
    $/.make(AST::HashItem.new(
        variable-name => ~$<variable-name>,
        key => $<index>.made 
    ));
}

The indices themselves have to be converted to AST nodes. For each kind of the index there are two multi-methods depending on whether we see a variable or not:

multi method array-index($/ where !$<variable-name>) {
    $/.make(AST::NumberValue.new(
        value => +$<integer>
    ));
}

multi method array-index($/ where $<variable-name>) {
    $/.make(AST::Variable.new(
        variable-name => ~$<variable-name>
    ));
}

multi method hash-index($/ where !$<variable-name>) {
    $/.make($<string>.made);
}

multi method hash-index($/ where $<variable-name>) {
    $/.make(AST::Variable.new(
        variable-name => ~$<variable-name>
    ));
}

Another place where we have to worry about variables is a part that is still managed by the old action:

multi method expr($/ where $<variable-name> && $<index>) {
    if %!var{$<variable-name>} ~~ Array {
        $/.make(%!var{$<variable-name>}[$<index>.made]);
    }
    elsif %!var{$<variable-name>} ~~ Hash {
        $/.make(%!var{$<variable-name>}{$<index>.made});
    }
    else {
        $/.make(%!var{$<variable-name>}.substr($<index>.made, 1));
    }
}

This code checks the type of the variable at hand and decides how to work with it. As we already did before in this chapter, let’s remove all this logic and replace it with one of the AST nodes depending not on the type of the variable but on the grammatical information that is available at this point.

multi method expr($/ where $<variable-name> && !$<index>) {
    $/.make(AST::Variable.new(
        variable-name => ~$<variable-name>
    ));
}

multi method expr($/ where $<variable-name> && $<index> && 
                           $<index><array-index>) {
    $/.make(AST::ArrayItem.new(
        variable-name => ~$<variable-name>,
        index => $<index>.made
    ));
}

multi method expr($/ where $<variable-name> && $<index> && 
                           $<index><hash-index>) {
    $/.make(AST::HashItem.new(
        variable-name => ~$<variable-name>,
        key => $<index>.made.value
    ));
}

As a home exercise, try modifying the grammar to eliminate the need of having variable shortcuts (you will also want to join the above code with the value methods we’ve seen in this section.

Finally, let us correct the actions to be ready for the following assignments:

my a;
my b[];

a = 1;
b = 1, 2, 3;

Until we do not know the type of the variable, the only way to decide if the variable in the assignment is an array or a scalar is to count the elements on the right side of the equals sign. Here is how we can do that:

multi method assignment($/ where !$<index> && $<value> && !$<string>) {
    if $<value>.elems == 1 {
        $/.make(AST::ScalarAssignment.new(
            variable-name => ~$<variable-name>,
            rhs => $<value>[0].made
        ));
    }
    else {
        $/.make(AST::ArrayAssignment.new(
            variable-name => ~$<variable-name>,
            elements => ($<value>.map: *.made)
        ));
    }
}

And that’s all for now. In this chapter, which is the biggest in the book, we came a long way to transform the compiler from an interpreter to a machine building an abstract syntax tree. As you have just seen, we removed some code and mostly had to rely on syntax only. The resulting tree, nevertheless, is a good representation of the program flow, which we will efficiently be using very soon.

Next: Chapter 9. Evaluating AST