Chapter 4. A Better Interpreter

This is a chapter from
Creating a compiler with Raku

This chapter is dedicated to Jeff Goff,
who was one of the most active promoters of Raku grammars

The goal of this chapter is to build a better interpreter using all the achievements of the previous three chapters. The new interpreter will be able to work with numbers of different types and to perform different arithmetic operations with variables. To make it even better, we’ll start with a very useful addition — comments.

Skipping comments

Comments are a must-have for any programming language, so let’s extend the Lingua grammar to allow comments for humans in our programs.

First, we will implement one-line comments that start with a hash (#) character and continue to the end of the line, as shown in the next example:

# Declare a variable
my alpha;
alpha = 100; # Assign a value

Our grammar dictates that a program is a set of statements separated by a semicolon.

rule TOP {
    <statement>* %% ';'
}

A statement is one of the following: variable declaration, assignment, or a function call.

rule statement {
    | <variable-declaration>
    | <assignment>
    | <function-call>
}

How can we add rules for comments here? A comment itself can be represented by a rule that matches a hash character followed by any number of non-newline characters:

rule comment {
    '#' \N*
}

At first, you may think that comments can be added to the grammar in a simple manner:

rule statement {
    | <comment>
    | <variable-declaration>
    | <assignment>
    | <function-call>
}

Unfortunately, that does not work. One rule requires that a comment goes until the end of the line, the other wants a semicolon after it. The possible solution is to admit that the program is not only a list of statements.

rule TOP {
    [
        | <comment>
        | <statement> ';'
    ]*
}

Now, the program consists of comments and instructions. The latter ending with a semicolon.

After this change, a semicolon after the last statement at the end of the program became mandatory. This is a tricky moment, so let us spend some time to understand it properly.

Take a simple program with three statements:

my a;
a = 10;
say a

There’s no semicolon at the end of the program but if you run it you will still see 10 in the output as if the last instruction was also executed.

In fact, the grammar could not completely parse the input text. You can easily prove that by taking a look at the return value of the Calculator.parse method, which is Nil for this program. So, the grammar did not confirm the validity of the program, but it still executed the actions while working on parsing it. We can avoid this behaviour by switching to the real AST generation, which we’ll do in the following chapters. Meanwhile, let us update the interpreter so that it reports about the status of parsing:

my $result = Lingua.parse($code, :actions(LinguaActions));
say $result ?? 'OK' !! 'Error';

The second type of comments that we are willing to allow, are the comments between a pair of character sequences /* and */. They can contain both one-line and multi-line comments and may appear at any place of the code where a whitespace is allowed. For example:

my /* inline comment */ a;
# one-line comment
a = 10;

/* multi-line
   comment */
say a;

The task seems to be difficult as you cannot control where the user puts their comments. As it was just mentioned, the comment is allowed at any place where a whitespace is allowed. Raku already handles whitespaces for us, so can we ask it to skip the comments too?

Any rule in a grammar implicitly contains a regex for matching whitespaces. For example, take the assignment rule:

rule assignment {
    <variable-name> '=' <value>
}

The rule can be replaced with a token with a couple of embedded ws regexes:

token assignment {
    <variable-name> <ws> '=' <ws> <value>
}

Without <ws>, the token would require you not to use spaces around the equals sign.

a=10;
b=20;

Adding optional whitespaces let us create more human-oriented programs:

a = 10;
b = 20;

It is possible to redefine the ws regex. By default, it matches any number (including none) of spaces outside of the word:

regex ws {
    <!ww> \s*
}

This is a perfect place to define the /* */ comments. The regex has to allow both whitespaces and any text sequences between the comment delimiters:

regex ws {
    <!ww> [
        | \s*
        | \s* '/*' \s* .*? \s* '*/' \s*
    ]
}

It looks ugly because of lots of slashes and stars but it does the job as expected. Notice that you have to allow some spaces both before and after the /* and */ literals. Also notice that this grammar method is a regex, not a rule or a token.

Sophisticated numbers

The interpreter works with numbers, which we already can parse really well. Let us join the two grammars and let the program work with all kind of numbers, positive, negative, integers, floating-point and numbers in scientific notation.

Take the body of the Number grammar from Chapter 2 and copy it to the grammar of Lingua. The TOP rule of Number should become the value rule in Lingua.

token value {
    <sign>? [
        | <integer>
        | <floating-point>
    ] <exponent>?
}

Also do not forget to update the actions class:

method value($/) {
    $/.make(+$/);
}

Try the following program now:

my alpha;
my beta;
my gamma;

alpha = 3.14;
beta = 42;
gamma = -4.5E-2;

say alpha;
say beta;
say gamma;

It should parse the numbers and print them all. This step is fully completed now.

Sophisticated expressions

The next step we can easily take is to merge the Calculator grammar into the language definition. Let’s rename an existing value method to number, and the language grammar will use the value that was defined in Calculator.

grammar Lingua {
    . . .

    rule value {
        | <number>
        | '(' <expression> ')'
    }

    rule expression {
        <term>* %% <op1>
    }

    token number {
        <sign>? [
            | <integer>
            | <floating-point>
        ] <exponent>?
    }

   . . .
}

The expression rule here is the former TOP rule of the Calculator grammar. Update the actions class symmetrically:

class LinguaActions {
    . . .

    method value($/) {
        $/.make($<number> ??
            $<number>.made !! $<exression>.made);
    }

    method expression($/) {
        $/.make(process($<term>, $<op1>));
    }

    method number($/) {
        $/.make(+$/);
    }
}

After this, any number in the program can be presented by an expression of arbitrary complexity. The only place in our current grammar where a number was used is assignment. We can replace the right-hand side of it with an expression.

In the grammar:

rule assignment {
    <variable-name> '=' <expression>
}

In the actions class:

method assignment($/) {
    %var{~$<variable-name>} = $<expression>.made;
}

method value($/) {
    $/.make($<number> ??
        $<number>.made !! $<expression>.made);
}

Change the test program to include a few expressions there. Here is my example:

my pi;
pi = 22/7 - 0.001265; # very rough approximation
say pi;

my x;
x = 2 * (3 + 4);
say x; # prints 14

Don’t forget that we can also use comments in the code!

Using variables

Our expressions can use only numbers so far. It would be much handier if we can use variables there too. To achieve that, a small change to the grammar is needed. In terms of grammar rules, to have a variable in an expression means to allow variable names in it:

rule value {
    | <number>
    | <variable-name>
    | '(' <expression> ')'
}

There is no made attribute associated with the variable-name token, so we have to make some simple work to access the storage:

method value($/) {
    if $<number> {
        $/.make($<number>.made);
    }
    elsif $<variable-name> {
        $/.make(%var{$<variable-name>});
    }
    else {
        $/.make($<expression>.made);
    }
}

The value method consists of three branches, whose task is to find an appropriate data to be passed further. In the case of variables, a hash lookup is done.

Having all that, the interpreter can now process the following program and compute the length of Earth’s equator (assuming we already assigned the value to pi earlier):

my r;
r = 6371; # km

my d;
d = 2 * pi * r;
say d;

Declaration with initialization

I hope you noticed how easy it was to make the recent changes comparing to the material in the first chapters. This section brings another example of this kind.

Let us simplify creation of variables and allow declarations with an optional assignment. So instead of

my x;
x = 10;

we can express both steps in one line:

my x = 10;

Declaration of the variable is handled by the variable-declaration rule, thus let’s update it with an optional assignment clause:

rule variable-declaration {
    'my' <variable-name> [ '=' <expression> ]?
}

In the action, check if there was an expression, and use its value:

method variable-declaration($/) {
    %var{$<variable-name>} =
        $<expression> ?? $<expression>.made !! 0;
}

Rewrite the test program using these additions:

my pi = 3.1415926;
my r = 6371; # km

my d = 2 * pi * r;
say d;

The program prints the result, and we only have less than 200 lines of code (including empty lines and the lines with a single curly brace in it). Raku grammars are really great!

Next: Chapter 5. Working on Grammar