Remove repeated words from froma sentence.
Repeated words are most often unintended typing mistakes. In rare cases, though, this is correct like with the word that:
He said that that tree is bigger.
Anyway, let us remove the double words ignoring the grammar for now. To find if the word is repeated, a regex with variables can be used. Then, using a substitution, only one copy of a word is passed to the resulting string.
my $string = 'This is is a string';
$string ~~ s:g/ << (\w+) >> ' ' << $0 >> /$0/;
say $string;
The regex part of the sroutine is a regex that is first looking for a word (as a sequence of word characters \w+
) and its copy after a space. The first occurrence is saved in the $0
variable, which is immediately used in the same regex. It is also used in the replacement part.
To prevent repetitions, the word-edge anchors are used:Â <<
for the beginning of a word and >>
for its end. In the given example, this prevents treating the last two letters of the word This as a separate word, is, and thus, the correct phrase This is a string will not be broken after the substitution.
Notice that non-literal spaces in a regex are not taking part in string matching, although, they are necessary in a sequence << (\w+) >>
. The construction <<(\w+)>>
is a syntax error as it is similar to the character class <[...]>
or a reference to a named regex like <:alnum>
, and the compiler prefers explicit spaces in this case.