ðŸ“˜ Decoding Roman numerals using Perl 6

ðŸ“˜ Decoding Roman numerals using Raku

N. B. Perl 6 has been renamed to Raku. Click to read more.

Convert a string, with a Roman number, to a decimal number.

The task is opposite to Task 46, Convert to Roman numerals, but letâ€™s use grammars to solve it. The idea is to directly find the sequences of Roman digits that correspond to thousands, hundreds, tens, and ones. For example, as soon as the program sees LXX, it knows that it is equal to 70. We are not analysing which letters go on the left or right of the given one. Instead, the result is achieved directly.

Here is a complete program, the biggest part of which is the grammar class. It uses a global variableÂ `\$n` for accumulating the decimal number during the parsing.

```my \$n = 0;
grammar Roman {
Â Â Â Â token TOP {
Â Â Â Â Â Â Â Â <thousands>? <hundreds>? <tens>? <ones>?
Â Â Â Â }

Â Â Â Â token thousands {
Â Â Â Â Â Â Â Â | MÂ Â Â Â { \$n += 1000 }Â Â Â | MMÂ Â { \$n += 2000 }
Â Â Â Â Â Â Â Â | MMMÂ Â { \$n += 3000 }Â Â Â | MMMM { \$n += 4000 }
Â Â Â Â }

Â Â Â Â token hundreds {
Â Â Â Â Â Â Â Â | CÂ Â Â Â { \$n += 100 }Â Â Â Â | CCÂ Â { \$n += 200 }
Â Â Â Â Â Â Â Â | CCCÂ Â { \$n += 300 }Â Â Â Â | CDÂ Â { \$n += 400 }
Â Â Â Â Â Â Â Â | DÂ Â Â Â { \$n += 500 }Â Â Â Â | DCÂ Â { \$n += 600 }
Â Â Â Â Â Â Â Â | DCCÂ Â { \$n += 700 }Â Â Â Â | DCCC { \$n += 800 }
Â Â Â Â Â Â Â Â | CMÂ Â Â { \$n += 900 }
Â Â Â Â }

Â Â Â Â token tens {
Â Â Â Â Â Â Â Â | XÂ Â Â Â { \$n += 10 }Â Â Â Â Â | XXÂ Â { \$n += 20 }
Â Â Â Â Â Â Â Â | XXXÂ Â { \$n += 30 }Â Â Â Â Â | XLÂ Â { \$n += 40 }
Â Â Â Â Â Â Â Â | LÂ Â Â Â { \$n += 50 }Â Â Â Â Â | LXÂ Â { \$n += 60 }
Â Â Â Â Â Â Â Â | LXXÂ Â { \$n += 70 }Â Â Â Â Â | LXXX { \$n += 80 }
Â Â Â Â Â Â Â Â | XCÂ Â Â { \$n += 90 }
Â Â Â Â }

Â Â Â Â token ones {
Â Â Â Â Â Â Â Â | IÂ Â Â Â { \$n += 1 }Â Â Â Â Â Â | IIÂ Â { \$n += 2 }
Â Â Â Â Â Â Â Â | IIIÂ Â { \$n += 3 }Â Â Â Â Â Â | IVÂ Â { \$n += 4 }
Â Â Â Â Â Â Â Â | VÂ Â Â Â { \$n += 5 }Â Â Â Â Â Â | VIÂ Â { \$n += 6 }
Â Â Â Â Â Â Â Â | VIIÂ Â { \$n += 7 }Â Â Â Â Â Â | VIII { \$n += 8 }
Â Â Â Â Â Â Â Â | IXÂ Â Â { \$n += 9 }
Â Â Â Â }
}

my \$roman = 'MMXVIII';
Roman.parse(\$roman);
say \$n;Â # 2018```

TheÂ `TOP` token of the grammar describes how the Roman number is built. A Roman number is a sequence of thousands, hundreds, tens, and ones. All these parts are optional:Â `<thousands>? <hundreds>? <tens>? <ones>?`.

Then, the grammar defines the tokens for each individual part. Their structure is similar: It is a set of alternatives; examine, for instance, theÂ `ones` token:Â `I | II | III | IV | V | VI | VII | VIII | IX`.

Each branch of alternatives is equipped with a simple code block that updates the value of the global variableÂ `\$n`. To make the grammar reusable, add theÂ `{ \$n = 0 }` block at the beginning ofÂ `TOP`. As homework, convert the grammar to useÂ `\$/.make` andÂ `\$/.made` methods of the match object to collect the parts of the value without using a global variable.