Chapter 7. Arrays and Hashes

This is a chapter from
Creating a compiler with Raku

In this chapter, we will extend the Lingua language with aggregate data types: arrays and hashes. From this point, we will call variables that contain numbers and strings scalar variables.

Arrays

Arrays are collections of elements which share their common variable name and are accessible via integer indices. Let us introduce the following syntax for array declaration:

my data[];

It uses the same my keyword as for declaring scalar variables (which can keep numbers or strings) and has two square brackets after the name. The variable-declaration grammar rule can now be split in two parts, one for arrays and one for scalars:

rule variable-declaration {
    'my' [
         | <array-declaration>
         | <scalar-declaration>
    ]
}

Arrays go first here, as their definition contains extra characters after the variable name and can be caught earlier.

Alternatively, we could introduce a new keyword, say arr, to define arrays instead of my, and simplify parsing at this point: arr data. But let us return to my choice, my data[], as it reduces the number of reserved keywords and have its own advantages when we come to initialisations.

The previous rule for scalar variable declaration migrates to a separate rule:

rule array-declaration {
    <variable-name> '[' ']'
}

rule scalar-declaration {
    <variable-name> [ '=' <value> ]?
}

The new array-declaration rule requires a pair of square brackets and does not yet include an initialiser part.

In the actions, we also have to distinguish between arrays and scalars, and we can do it by checking the presence of the $<array-declaration> match object.

method variable-declaration($/) {
    if $<scalar-declaration> {
        %!var{$<scalar-declaration><variable-name>} =
            $<scalar-declaration><value> ??
            $<scalar-declaration><value>.made !! 0;
    }
    elsif $<array-declaration> {
        %!var{$<array-declaration><variable-name>} = $[];
    }
}

It works but the method looks too overloaded because of nested match object keys. In fact, there is no need for doing that, because individual actions can be created for each case.

method scalar-declaration($/) {
    %!var{$<variable-name>} = $<value> ?? $<value>.made !! 0;
}

method array-declaration($/) {
    %!var{$<variable-name>} = $[];
}

With this change, the variable-declaration method is not needed anymore and can be removed from the LinguaActions class.

You can temporarily replace it with the following code just to see how the parser works with arrays:

method variable-declaration($/) {
    dd %!var;
}

The method displays what the variable storage contains after processing each of the variable declarations. Let us test this in action:

my x = 3;
say x;

my data[];

This program successfully compiles, and you can see how the %!var hash changes:

Hash %!var = {:x(3)}
3
Hash %!var = {:data($[]), :x(3)}
OK

Assigning to an array item

OK, we can create an array and it’s time to fill its elements with some data:

data[0] = 10;
data[1] = 20;

The assignment rule can be updated similarly to how we did it with string indexing in the previous chapter by adding an optional integer index in square brackets:

rule assignment {
    <variable-name> [ '[' <integer> ']' ]? '=' <value>
}

In the corresponding action, the presence of the index indicates that we are working with an array, otherwise it is a scalar variable.

method assignment($/) {
    if $<integer> {
        %!var{~$<variable-name>}[+$<integer>] =
            $<value>.made;
    }
    else {
        %!var{~$<variable-name>} = $<value>.made;
    }
}

After you run the program with the above assignments, the data variable will keep two values in the storage:

Hash %!var = {:data($[10, 20])}

Side story: The joy of syntax

Before moving on towards more features for arrays and hashes, let us transform the grammar a bit. In the assignment method, the if–else check occupies more lines than the “useful” code. We can do a couple of transformation to make the methods more compact.

First, let us repeat the trick with splitting a rule into two. Instead of one universal assignment rule, we can have two subrules:

rule assignment {
    | <array-item-assignment>
    | <scalar-assignment>
}

rule array-item-assignment {
    <variable-name> [ '[' <integer> ']' ] '=' <value>
}

rule scalar-assignment {
    <variable-name> '=' <value>
}

It made the grammar more verbose, but the actions themselves became clearer:

method array-item-assignment($/) {
    %!var{~$<variable-name>}[+$<integer>] = $<value>.made;
}

method scalar-assignment($/) {
    %!var{~$<variable-name>} = $<value>.made;
}

The second possible solution is to keep the original assignment rule and use the where clause in method’s signatures to dispatch the calls depending of the content of the match object.

multi method assignment($/ where $<integer>) {
    %!var{~$<variable-name>}[+$<integer>] = $<value>.made;
}

multi method assignment($/ where !$<integer>) {
    %!var{~$<variable-name>} = $<value>.made;
}

The negative condition !$<integer> in the signature of the second variant of the multi-method is optional and redundant, but I’d prefer to keep it for clarity of the code.

There are two more actions that can be re-written using the same principle. The new value actions:

multi method value($/ where $<expression>) {
    $/.make($<expression>.made);
}

multi method value($/ where $<string>) {
    $/.make($<string>.made);
}

Another action with a big if–elsif–else condition is expr, let us transform it too:

multi method expr($/ where $<number>) {
    $/.make($<number>.made);
}

multi method expr($/ where $<string>) {
    $/.make($<string>.made);
}

multi method expr($/ where $<variable-name> && $<integer>) {
    $/.make(%!var{$<variable-name>}.substr(+$<integer>, 1));
}

multi method expr($/ where $<variable-name> && !$<integer>) {
    $/.make(%!var{$<variable-name>});
}

multi method expr($/ where $<expr>) {
    $/.make(process($<expr>, $<op>));
}

multi method expr($/ where $<expression>) {
    $/.make($<expression>.made);
}

These methods look so trivial now. Notice that some of the candidates check more than one key in the match object, for example: $<variable-name> && !$<integer>.

Accessing array elements

The next goal is to start using individual array items, for instance, as it is shown in the next fragment:

say data[0];
say data[1];

my n = data[0] * data[1];
say n;

Our current actions class supports string indexing already, and that’s the exact place which we have to extend:

multi method expr($/ where $<variable-name> && $<integer>) {
    if %!var{$<variable-name>} ~~ Array {
        $/.make(%!var{$<variable-name>}[+$<integer>]);
    }
    else {
        $/.make(%!var{$<variable-name>}.substr(
            +$<integer>, 1));
    }
}

This method checks the type of the variable stored in the %!var hash, and if it is an array, it returns the requested element. The other branch works with strings as it did before.

The grammar can be simplified once again by extracting the sequence representing an array (and string) index to a separate rule:

rule index {
    '[' <integer> ']'
}

Use the new rule inside assignment and inside expr when you take the value:

rule assignment {
    <variable-name> <index>? '=' <value>
}

. . .

multi rule expr(4) {
    | <number>
    | <variable-name> <index>?
    | '(' <expression> ')'
}

If you ever will want to change the syntax for indexing, say, to data:3 instead of data[3], there’s a single place to do that, the index rule.

The actions must be adapted too. The index’s attribute is an integer value:

method index($/) {
    $/.make(+$<integer>);
}

And thus you should use $<index>.made to read it from other methods:

multi method assignment($/ where $<index>) {
    %!var{~$<variable-name>}[$<index>.made] = $<value>.made;
}

multi method assignment($/ where !$<index>) {
    %!var{~$<variable-name>} = $<value>.made;
}

. . .

multi method expr($/ where $<variable-name> && $<index>) {
    if %!var{$<variable-name>} ~~ Array {
        $/.make(%!var{$<variable-name>}[$<index>.made]);
    }
    else {
        $/.make(%!var{$<variable-name>}.substr(
            $<index>.made, 1));
    }
}

multi method expr($/ where $<variable-name> && !$<index>) {
    $/.make(%!var{$<variable-name>});
}

Once again, redundant conditions such as !$<index> are used in the where clause to make the code more readable; the multi-method can be correctly dispatched without them.

List assignments

So far, arrays can be created but you have to assign their elements one by one. Let us allow list assignment and initialisation:

my data[] = 111, 222, 333;

data = 7, 9, 11;

A new syntax element, comma, appeared here. It does not clash with any other constructs of the language, so it can be easily embedded into the grammar.

rule array-declaration {
    <variable-name> '[' ']' [ '=' <value>+ %% ',' ]?
}

rule assignment {
    <variable-name> <index>? '=' <value>+ %% ','
}

In both cases, the value rule is used, which means you can use numbers, strings, and arithmetical expressions as initialising values for the array elements:

my strings[] = "alpha", "beta", "gamma";
say strings[1]; # beta

my arr[] = 11, 3 * 4, 2 * (6 + 0.5);
say arr[0]; # 11
say arr[1]; # 12
say arr[2]; # 13

To implement it in actions, let’s make a helper method init-array that takes the name of the variable and the list of the values:

method init-array($variable-name, @values) {
    %!var{$variable-name} = $[];
    for @values -> $value {
        %!var{$variable-name}.push($value.made);
    }
}

multi method array-declaration($/ where $<value>) {
    self.init-array($<variable-name>, $<value>);
}

multi method assignment($/ where !$<index>) {
    if %!var{$<variable-name>} ~~ Array {
        self.init-array($<variable-name>, $<value>);
    }
    . . .
}

When creating a new array, you can also type Array.new instead of $[].

Unlike, for example, the set of operator functions, the init-array routine is made a method as it has to have access to the variable storage %!var.

Printing arrays

Another thing which is really desired for arrays, is having the way to print all their elements in a single instruction. Instead of listing separate items, we’d like to pass the whole array to the say function:

my data[] = 5, 7, 9;
say data;

In fact, Raku can already do that because our implementation of say just passes the whole container to Raku’s say, which prints the data like this:

[5 7 9]

Let us be less humble and create our own output format by checking the type of the variable, as we did before:

method function-call($/) {
    my $object = $<value>.made;

    if $object ~~ Array {
        say $object.join(', ');
    }
    else {
        say $object;
    }
}

This function prints the array as a comma-separated list of its items:

5, 7, 9

Hashes

In the remaining part of this chapter, we will implement hashes in our Lingua language. You have seen most of the ideas on the example of implementing arrays, so the changes should be transparent and obvious.

So, we have to implement a few things: declaration, declaration with initialisation, assignment to the whole hash and to a single element, reading a single value and printing a hash.

The following fragments demonstrate the syntax we use. To declare a hash, use a pair of curly braces after the name of the variable:

my data{};

Initialisation and assignments are done using the comma-separated list of key—value pairs. Keys are always strings, values can be any scalar value (numbers or strings). The separator between the key and the value is a colon:

my hash{} = "alpha" : 1, "beta": 2, "gamma": 3;

my days{};
days = "Mon": "work", "Sat": "rest";

The grammar includes a separate rule for hash declarations:

rule variable-declaration {
    'my' [
        | <array-declaration>
        | <hash-declaration>
        | <scalar-declaration>
    ]
}

rule hash-declaration {
    <variable-name> '{' '}' [
        '=' [ <string> ':' <value> ]+ %% ','
    ]?
}

The assignment rule should know how to deal with hashes. This time, the changes can be done in-place without creating new rules.

rule assignment {
    <variable-name> <index>? '='
        [
            | [ <string> ':' <value> ]+ %% ','
            |                <value>+   %% ','
        ]
}

In the actions, you have to carefully implement the declaration and hash assignment methods. They both use common method, init-hash, to set the keys and the values of the hash.

method init-hash($variable-name, @keys, @values) {
    %!var{$variable-name} = Hash.new;
    while @keys {
        %!var{$variable-name}.{@keys.shift.made} =
            @values.shift.made;
   }
}

multi method hash-declaration($/) {
    self.init-hash($<variable-name>, $<string>, $<value>);
}

multi method assignment($/ where !$<index>) {
    . . .
    elsif %!var{$<variable-name>} ~~ Hash {
        self.init-hash($<variable-name>, 
                       $<string>, $<value>);
    }
    . . .
}

Another part of hash implementation is allowing value access via the keys. It is wise to re-use the index rule and to make it a collection of two alternatives:

rule index {
    | <array-index>
    | <hash-index>
}

rule array-index {
    '[' <integer> ']'
}

rule hash-index {
    '{' <string> '}'
}

Use the new rules in the already existing methods. The where clauses receive an additional condition to make sure we caught the rule we want.

multi method assignment($/ where $<index> && 
                        $<index><array-index>) {
    %!var{$<variable-name>}[$<index>.made] =
        $<value>[0].made;
}

multi method assignment($/ where $<index> &&
                        $<index><hash-index>) {
    %!var{$<variable-name>}{$<index>.made} =
        $<value>[0].made;
}

multi method assignment($/ where !$<index>) {
    . . .
    elsif %!var{$<variable-name>} ~~ Hash {
        self.init-hash($<variable-name>,
                       $<string>, $<value>);
    }
    . . .
}

The new index rule can already work in the expr(4) rule, which allows us to read the values by the given hash key using the hash{"key"} syntax. All we need is to update the expr method to let it know about the new data structure:

multi method expr($/ where $<variable-name> && $<index>) {
    . . .
    elsif %!var{$<variable-name>} ~~ Hash {
        $/.make(%!var{$<variable-name>}{$<index>.made});
    }
    . . .
}

Add the following lines to the test program and confirm that it works:

days{"Tue"} = "work";
say days{"Sat"};

Finally, teach the say function to print hashes:

method function-call($/) {
    . . .
    elsif $object ~~ Hash {
        my @str;
        for $object.keys.sort -> $key {
            @str.push("$key: $object{$key}");
        }
        say @str.join(', ');
    }
    . . .
}

If you are good at using the map method, try making a better version of the function. The expected output in response to say days; should be like this:

Mon: work, Sat: rest, Tue: work

Review and test

Let us take a look at the current state of the Lingua language. We made a big work and implemented support for numbers, strings, arrays and hashes. It is possible to change the content of the variables and print their values. Let us make another small step to allow variables to appear in place of array indices or hash keys.

rule array-index {
    '[' [ <integer> | <variable-name> ] ']'
}

rule hash-index {
    '{' [ <string> | <variable-name> ] '}'
}

The corresponding actions can be transformed to pairs of trivial multi-methods:

multi method array-index($/ where !$<variable-name>) {
    $/.make(+$<integer>);
}

multi method array-index($/ where $<variable-name>) {
    $/.make(%!var{$<variable-name>});
}

multi method hash-index($/ where !$<variable-name>) {
    $/.make($<string>.made);
}

multi method hash-index($/ where $<variable-name>) {
    $/.make(%!var{$<variable-name>});
}

Refer to the repository to check if you’ve got the correct files and if so, you will be able to run the following test program that uses most of the features that are implemented at this moment.

# Illustrating the Pythagorean theorem
my a = 3;
my b = 4;
my c = 5;
my left = a**2 + b**2;
my right = c**2;
say "The hypotenuse of a rectangle triangle with the 
sides $a and $b is indeed $c, as $left = $right.";

/* Using floating-point numbers for 
computing the length of a circle */
my pi = 3.1415926;
my R = 7;
my c = 2 * pi * R;
say "The length of a circle of radius $R is $c.";

# A list of prime numbers
my n = 5;
my data[] = 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31;
my nth = data[n];
say "$n th prime number is $nth.";

# Demonstrating the use of hashes
my countries{} = 
    "France": "Paris", "Germany": "Berlin", "Italy": "Rome";
my country = "Italy";
my city = countries{country};
say "$city is the capital of $country.";

This program prints the following result.

$ ./lingua test22.lng 
The hypotenuse of a rectangle triangle with the 
sides 3 and 4 is indeed 5, as 25 = 25.
The length of a circle of radius 7 is 43.9822964.
5 th prime number is 13.
Rome is the capital of Italy.
OK

It is quite fascinating to see that the interpreter understands the program that never existed before. You wrote it and you can make lots of changes in it, and the program will still show the results that you expect.

In the next chapters, we will work on a few more complex aspects of the language.

Next: Chapter 8. Building AST. Part 1

2 thoughts on “Chapter 7. Arrays and Hashes”

Pingback: Chapter 6. Working with Strings – Andrew Shitov’s Blog
Tim says:

April 30, 2020 at 7:18 am

Thanks for making this so easy to follow, and showing many of Raku’s features in context. I’m finding it very valuable!

I do see one bit that might be misleading the way it’s written, when you write “Arrays go first here, as their definition contains extra characters after the variable name and can be caught earlier.” The grammar is using “|” (a single pipe) for alternation here, so the order of the alternatives doesn’t determine which one is used. The longest token match rules will pick the right one even if their order is reversed.

If “||” is used in the alternation, then Raku works like most other languages and the alternatives are evaluated in the order they’re given in the expression. Since LTM is such a significant part of Raku, it might be worth mentioning its function here.