Today, let’s look at another, and presumably the most well known method of sorting data, Quick sort.
The algorithm requires you to select the so-called pivot, one of the element from the data, and split the rest in two parts: the elements less than the pivot, and the elements more or equals to the pivot. Each part is then recursively sorted again. On each iteration, the parts become smaller and smaller until the sublists are trivial data sets of one or even zero elements.
A separate theory is how to choose the right pivot. There are a few methods, for example, taking a value from the middle of the list, or taking the first item, or the last, or a random item. There are also more complicated methods, and you can tune it to achieve the best performance on your type of data sets.
For simplicity, let’s choose the first element as a pivot, and here is the code:
sub quick-sort(@data) { return @data if @data.elems <= 1; my $pivot = @data[0]; my @left; my @right; for @data[1..*] -> $x { if $x < $pivot { push @left, $x; } else { push @right, $x; } } return flat(quick-sort(@left), $pivot, quick-sort(@right)); } my @data = 4, 5, 7, 1, 46, 78, 2, 2, 1, 9, 10; @data = quick-sort @data; say @data;
Unlike the previous examples of Bubble sort, this program does not sort in-place but returns a new list instead.
Now it is time to transform the code to make it more Perl 6-ish.
Multi-subs come for the rescue again, which while increasing the number of lines of code, make the whole algorithm easier to catch at a first glance.
multi sub quick-sort(@data where @data.elems <= 1) { return @data; } multi sub quick-sort(@data where @data.elems > 1) { my $pivot = @data[0]; . . . }
Now, take a look at these two pieces:
my $pivot = @data[0]; . . . for @data[1..*] -> $x {
The first element needs to be taken, and the rest of the algorithm only deals with the rest of the list. Currently, this is achieved by taking an element and a slice, but there’s a shift
method that does exactly what we need, and removes the element from the data. So, let’s use it:
my $pivot = @data.shift; . . . for @data -> $x {
The next comes the if
–else
selection, which can be effectively (although maybe a bit less efficiently) be replaced with the two grep
s: one selecting the part prior to the pivot, another selecting the rest.
my @left = @data.grep(* < $pivot); my @right = @data.grep(* >= $pivot);
Basically, that’s it. What you can also do is to get rid of temporary variables and put the filters to the return
statement:
return flat( quick-sort(@data.grep(* < $pivot)), $pivot, quick-sort(@data.grep(* >= $pivot)) );
It all worked before this last change, but now it is broken:
$ perl6 quick-sort.pl Cannot call 'pop' on an immutable 'List' in sub quick-sort at 3.pl line 6 in sub quick-sort at 3.pl line 8 in block <unit> at 3.pl line 12
The problem is that you need to return a single list of numbers, but each subcall of quick-sort
returns its own lists.
You can easily address the issue by slurping the elements by putting a *
before the function argument:
multi sub quick-sort(*@data where @data.elems > 1) { . . .
The final code looks like this:
multi sub quick-sort(*@data where @data.elems <= 1) { return @data; } multi sub quick-sort(*@data where @data.elems > 1) { my $pivot = @data.shift; return flat( quick-sort(@data.grep(* < $pivot)), $pivot, quick-sort(@data.grep(* >= $pivot)) ); }
Instead of using the flat
routine, you can also flat each list independently using the |
:
return |quick-sort(@data.grep(* < $pivot)), $pivot, |quick-sort(@data.grep(* >= $pivot));
But I think it is still better to have a couple of understandable intermediate variables to avoid punctuation noise:
multi sub quick-sort(@data where @data.elems <= 1) { return @data; } multi sub quick-sort(@data where @data.elems > 1) { my $pivot = @data.shift; my @left = @data.grep(* < $pivot); my @right = @data.grep(* >= $pivot); return flat(quick-sort(@left), $pivot, quick-sort(@right)); } my @data = 4, 5, 7, 1, 46, 78, 2, 2, 1, 9, 10; @data = quick-sort @data; say @data;
As a home work, create an implementation of the Quick sort algorithm that sorts the data in-place.
The code is available on GitHub, you are welcome to join this journey too!
And let me update the post with based on some comments on Reddit.
First, instead of two grep
s, it was nice to use some function that does the separation in one go (as it was in the original program with if
and else
). Actually, Perl 6 has one. It is called classify
, and it returns a hash, where the keys are all the possible values of the condition. In our case, it will be True
(when the element is less than the pivot), and False
otherwise.
multi sub quick-sort(@data where @data.elems > 1) { my $pivot = @data.shift; my %part = @data.classify(* < $pivot); return flat( quick-sort(%part{True} // []), $pivot, quick-sort(%part{False} // []) ); }
As it may happen that all elements are on the same side of the pivot, you need to make sure you pass an empty list in that case (thus, // []
).
The second change will lead to the the code similar to what is published (as of today) on Rosettacode. Indeed, if our choice of the pivot is the first element, than it is possible to separate it in the function signature instead of manually taking the item in the code. You can also simplify the signature of the first multi-method:
multi sub quick-sort([]) { return []; } multi sub quick-sort([$pivot, *@data]) { my %part = @data.classify(* < $pivot); return flat( quick-sort(%part{True} // []), $pivot, quick-sort(%part{False} // []) ); }
Rosettacode’s code also does not have a return
statement. You can do that too, while I prefer having an explicit return
in the function.
Can you suggest any further clever change?
Thank you for the example! Would you please expand on the pop error and the slurpy? Is the pop() call inside flat()? Does the slurpy make an array instead of a list? I don’t think I understand the error or the fix 🙂 . Thanks in advance!
Yes, it’s List vs Array. You can add @data.WHAT.say immediately at the beginning of the second function, and you’ll see that when you move the greps, you get List instead of Array. I cannot say I am happy with how Perl 6 deals that and how you can resolve it with *. It does not seem intuitively understandable.
Hi Andy,
very nice post, but I will be nitpicking a bit here. Choosing the first element of the list to be sorted as a pivot is often dangerous for the sorting performance. If the data is already sorted or almost sorted (forward or backward), then the partitions become totally unbalanced, and we go to the worst case scenario of quick sort, i.e. a complexity of O(n²) and very bad performance for large lists. This is not a problem with random data, but it happens quite often that we have to sort data that is almost sorted, with just a couple of items changed or added to the original sorted list. It is usually better to choose an item near the middle of the list, or, better yet, to pick a random item from the list. Having said that, I should acknowledge that I have sometimes shown similar implementations of quick sort in tutorials where I wanted to show simple implementations without wanting to go into such gory details.