NeatSort - A practical adaptive algorithm
Marcello La Rocca1 and Domenico Cantone2
1
Scuola Superiore Sant’Anna
2
Università di Catania
arXiv:1407.6183v1 [cs.DS] 23 Jul 2014
Abstract
We present a new adaptive sorting algorithm which is optimal for most disorder metrics
and, more important, has a simple and quick implementation. On input X, our algorithm has a
theoretical Ω(|X|) lower bound and a O(|X| log |X|) upper bound, exhibiting amazing adaptive
properties which makes it run closer to its lower bound as disorder (computed on different
metrics) diminishes. From a practical point of view, NeatSort has proven itself competitive
with (and often better than) qsort and any Random Quicksort implementation, even on random
arrays.
1
Introduction
Our algorithm NeatSort is based on a simple idea: exploit all the information one gathers while
reading the input array, as soon as one gets it. It is in this good practice that NeatSort “cleverness” resides. NeatSort is a variant of the standard Mergesort algorithm, as its core workflow
consists of merging ordered lists. However, in order to speed up the merging phase, the input array X is preliminarily scanned so as to split it into a (minimal) sequence of nondecreasing sublists
L[0], L[1], . . . , L[m], by executing the following instructions:
0. i = 0;
1. add the first undiscovered element, X[i], to a new sublist;
2. keep adding elements X[i + 1], . . . , X[k] to the current sublist until either X[k] > X[k + 1] or
k = |X|;
3. if there are still undiscovered elements in X, go back to step 1.
The following properties are immediate:
(A) each sublist L[q] is in nondecreasing order, for q = 0, 1, . . . , m;
(B) if L[q][iq ] is the last element in L[q], then L[q + 1][0] < L[q][iq ], for q = 0, 1, . . . , m − 1;
(C) let L∗ [q] and L∗ [q +1] be, respectively, a sorted list resulting from merging L[q] with any subset
of the lists L[0], . . . , L[q − 1], and a sorted list resulting from merging L[q + 1] with any subset
of the lists L[q + 2], . . . , L[m], where q ∈ {1, . . . , m − 1}. Then L∗ [q + 1][0] < L∗ [q][i∗q ], where
i∗q is the index of the last element in L∗ [q].
1
After creating the sublists L[q], for q = 0, 1, . . . , m, adjacent pairs can be merged using an ad hoc
variant of mergesort’s merging procedure (which takes advantage of properties (B) and (C) above),
until a single (ordered) list remains.
In fact, property (B) allows one to save one comparison when merging the initial lists, and then,
thanks to property (C), one can take advantage of such saving at each subsequent merging step of
“superlists”.
1.1
Merging points
Adjacent sublists L[q], L[q + 1], where as above L[q] and L[q + 1] are in nondecreasing order and
L[q + 1][0] < L[q][iq ] holds (with iq the index of the last element in L[q]), can be stably merged into
a single nondecreasing list in a convenient way. For the sake of simplicity, let us first assume that
L[q][0] < L[q + 1][0]
and
L[q][iq ] > L[q + 1][iq+1 ]
(1)
hold. Then, in order to merge L[q] and L[q + 1], it is enough to find out two sequences
0 < j0 < j1 < . . . < jt = iq + 1
and
0 = k0 < k1 < . . . < kt = iq+1 + 1
(2)
of merging points in L[q] and L[q + 1], respectively, such that
L[q][ji − 1] ≤ L[q + 1][ki ]<L[q][ji ]
L[q + 1][ki+1 − 1] < L[q][ji ]
≤L[q + 1][ki+1 ]
(3)
(4)
for i = 0, 1, . . . , t − 1 (where we convene that L[q + 1][iq+1 + 1] = +∞). Then the array resulting
from concatenating the slices1
L[q][0 .. j0 ], L[q + 1][k0 .. k1 − 1], L[q][j1 .. j2 − 1], . . . , L[q + 1][kt−1 .. kt − 1], L[q][jt−1 .. jt − 1]
(in the order shown) is the stable merging of L[q] and L[q + 1].
Remark 1 By relaxing (2) so as to allow 0 ≤ j0 and jt−1 ≤ jt , the above considerations can be
immediately generalized also to the cases in which any of the conditions in (1) does not hold.
The merging points j0 , j1 , . . . , jt and k0 , k1 , . . . , kt can be computed quite efficiently. The index
j0 can be found by performing a binary search in L[q][0 .. iq − 1], as it is known in advance that
L[q][iq ] > L[q + 1][0]. Then, the remaining merging points can be found by a simple linear search
which is directly based on the very definitions (3) and (4). The number of comparisons for two lists
of length kq and kq+1 is at most O(log(kq ) + (kq + kq+1 )) = O(kq + kq+1 ); in every step the number
of comparisons is therefore O(|X|): it is self evident in the last step of the merging phase, with just
two sublists with a total of |X| elements to merge, but of course in every merging step the sum of the
number of elements of all the sublists is always equal to |X|. Despite the asymmetry in the sublists
sizes
l mthat is due to the very nature of the analysis phase, their number is guaranteed to be at most
|X|
2 , and at each merging step the number of sublists is halved, so there will be at most O(log |X|)
merging steps, and therefore the total number of comparison is guaranteed to be O(|X| log |X|).
1 For an array T of length n and indices 0 ≤ i ≤ j ≤ n − 1, we denote by T [i .. j] the slice of T from T [i] to T [j].
When j < i, T [i .. j] will denote the empty array.
2
Figure 1: A comparison between Mergesort and NeatSort
1.2
Keys to improvements
The standard mergesort algorithm follows a strategy divided into two phases:
• A top-down phase, where the initial array is recursively divided in half-sized subarrays, until
a minimum size (1 element) is reached.
• A bottom-up phase, where the subarrays are recursively merged back together, resulting in
the (stably) sorted version of the initial array.
In NeatSort, the top-down phase is replaced by the preliminary phase which identifies the sequence of ordered sublists, as seen above. The latter will be used as the base for a subsequent
bottom-up phase, which, up to some optimizations, is basically the same as in the standard Mergesort. Figure 1 shows the different ways in which Mergesort and NeatSort work.
It is important to notice that, for an input array X, Mergesort’s bottom-up phase requires 2 · |X|
steps, whereas NeatSort’s preliminary phase requires just |X| − 1 comparisons.
2
Further improvements
The crucial improvement in NeatSort is the efficient partitioning of the initial array in sublists during
the preliminary phase, before the bottom-up phase starts. Notice, however, that in the worst case,
i.e., when the initial array is sorted backwards, |X| sublists (containing exactly one element each)
would be produced, thus resulting in no improvements in comparison to Mergesort.
For the sake of clarity, let us suppose that our initial array X is in strictly decreasing order, while
a nondecreasing order is seeked for. A first immediate solution would be to check, at thelendmof the
preliminary phase, whether the number of sublists produced is greater than or equal to |X|
2 : this
could happen if and only if the ratio of adjacent elements which are inverted is higher than 50%; in
this case, the preliminary phase could just be repeated by examining the input array backwards (we
denote it as backward analysis, as opposed to forward analysis, where array’s elements are examined
from first to last), and be sure to obtain an improvement.
3
Settling with this solution, however, would betray NeatSort’s philosophy of making use of all of
the information one has collected. Additionally, such solution is not optimal. In fact, let us consider
the following array X, where
j k
elements in increasing ordered,
• the first half contains |X|
2
l m
elements in decreasing order.
• the second half contains |X|
2
The
phase would produce a partitioning consisting of one list in account of the first half,
l analysis
m
plus |X|
lists
in
account of the second half, so that the backward analysis would take place and
2
j k
output one list for the second half of the initial array plus |X|
lists for the first half of the input
2
j k
+ 1. This would be inefficient, as we know that the
array, for a total number of lists equal to |X|
2
first half of the array is ordered, and so it is the second one (though in nonincreasingjorder).
k Thus,
if the order of the second half is reversed, one ends up with just two lists, rather than |X|
+ 1 lists.
2
A solution to the above situation is the following: every time, during the preliminary phase, a
singleton sublist is created (i.e., there is an inversion in the input, whose first element is not part
of any previously created sublist), a new sublist formed by such two elements is created and then
further elements are added to it until one is found which is greater than its predecessor–basically,
a backward analysis is started from the point of the inversion to the first non-inverted couple of
adjacent elements; subsequently, the sublist so obtained is reversed and a check is made to see if any
additional element can be added to its tail (by any means starting a new forward analysis).
In the particular situation in which the input is sorted in reverse order, the above procedure
creates just one list, proving itself as efficient as it is when dealing with sorted arrays (i.e., it is
optimal in both extreme situations).
In the situation reported above, when the array is composed by two subarrays–the first one in
increasing order and the second one in decreasing order,–such solution would create, during
|X|the
d 2 e
preliminary phase, two lists; in particular, the construction of the second list would require
2
element swaps (the first element in the sublist is swapped
with the last one, the second one with the
d |X|
2 e
second-last one, etc.), and thus a total of 3 ·
assignments would be required.
2
After the preliminary phase, adjacent sublists are iteratively merged together using an ad hoc
variant of the canonical merge procedure until a single list is left.
2.1
Correctness
Let L[0], L[1], . . . , L[m] be the sequence of sublists created during the preliminary phase (with forward and backward analyses). Then, it is an easy matter to check that, by the very construction,
the following two properties hold:
(A) each sublist L[q] is in nondecreasing order, for q = 0, 1, . . . , m;
(B) if L[q][iq ] is the last element in L[q], then L[q + 1][0] < L[q][iq ], for q = 0, 1, . . . , m − 1.
Properties (A) and (B) readily imply
(C) let L∗ [q] and L∗ [q +1] be, respectively, a sorted list resulting from merging L[q] with any subset
of the lists L[0], . . . , L[q − 1], and a sorted list resulting from merging L[q + 1] with any subset
of the lists L[q + 2], . . . , L[m], where q ∈ {1, . . . , m − 1}. Then L∗ [q + 1][0] < L∗ [q][i∗q ], where
i∗q is the index of the last element in L∗ [q].
4
From Property (C), it follows that during any sequence of merging steps, in which only adjacent
sublists are allowed to be merged, Properties (A) and (B) are maintained as invariant, and so also
Property (C).
2.2
Analysis phase performance
The combination of forward and backward analysis proves itself optimal in any other situation with
respect to the number of sublists created.
We can compare forward analysis, backward analysis and their combination through some examples shown in Figure 2.
As it is clear in each one of the examples above, the combination of forward and backward
analysis produces a minimal number of sublists in comparison with:
1. Mergesort (which will produce exactly |X| sublists);
2. Forward analysis only (by definition)
3. The algorithm
m applies forward analysis and than, if the number of sublists produced is
l that
|X|
greater than 2 , switches to backward analysis.
While the correctness of each one of the statements above appears evident, for the first two the proof
is trivial, while proving the last one, though intuitive, involves a simple reasoning by contradiction,
which is left to the reader.
This solution, however, is not always optimal with respect to the total distance (the sum of the
distances of each element from its final position in the ordered sequence), as can be seen in examples
E and F, where backward analysis produces the lowest value; example C, however, shows how
backward analysis can also lead to the highest possible value in other situations, so that backward
analysis doesn’t prove optimal either with respect to total distance.
2.3
Heuristics
In order to improve merging efficiency, a few attempts have been made. First, as described in
Section 1.1, different strategies have been tried to improve efficiency in finding merging points
between lists and to improve the process.
Let L = hl1 , l2 , . . . , ln i and R = hr1 , r2 , . . . , rm i be two sublists in nondecreasing to be merged such
that ln > r1 , and let S = hs1 , s2 , . . . , sn+m i be the list resulting from their merge. Due to the nature
of the problem and the overhead introduced to make an extra copy of at least one of the lists, the
best performance has been reached with the procedure outlined in Algorithm 1.
Particular care has also been put in tuning the code. In order to further improve performance,
our efforts have been focused on the choice of the order used to merge the sublists: to introduce
adaptivity in the merging phase (then having a second-level adaptivity), a few heuristics have been
tested and compared against the simplest merge approach, to verify whether possible advantages
deriving from the choice of a better order for merging would be larger than the required overhead.
Notice that the number of merges for merging m lists is (m − 1), independently of the strategy
followed.
We have tested and benchmarked the following alternative solutions:
2 In the actual implementation, in order to minimize the number of swaps and extra memory consumption, S reuses
the array L while T, that is a temporary array, will have its element copied from L[i1 ], . . . , L[n-1]; initially the size
of array S is set to i1 − 1 (possibly 0), and it will grow to n + m elements, reusing the memory previously occupied
by both L and R.
5
Figure 2: Performance of the different strategies described for the preliminary phase over a few
examples. ∆ indicates the distance of each element from its position in the sorted sequence.
6
ALGORITHM 1: NeatMerge
Input: two sublists L = hl1 , l2 , . . . , ln i and R = hr1 , r2 , . . . , rm i in nondecreasing order and such that
ln > r1 .
Output: A single ordered list S containing all the elements in the input lists.
Using binary search, find the lowest element li1 in hl1 , l2 , . . . , ln−1 i greater than r1 (i.e., the final position
of r1 in S).
// Note that 1 ≤ i1 ≤ n. Thus the first i1 elements in S will be hl1 , l2 , . . . , li1 −1 , r1 i, where,
if i1 = 1, the initial sublist hl1 , l2 , . . . , li1 −1 i is empty.
Init2 S = hl1 , l2 , . . . , li1 −1 i and T = hli1 , . . . , ln i;
k := 1;
i1 := 1;
j1 := 1;
repeat
add rjk to the tail of S;
j := 1;
repeat
add rjk +j to the tail of S;
j +:= 1;
until rjk +j > tik or R is empty;
jk+1 := jk + j;
add tik to the tail of S;
i := 1;
repeat
add tik +i to the tail of S;
i +:= 1;
until tik +i > rjk+1 or T is empty;
ik+1 := ik + i;
k +:= 1;
until either T or R is empty;
if R is empty then
copy all the elements left in T to the tail of S;
end
return S;
(I) The first (leftmost) list is always merged with the second one.
This heuristic is sensibly slower than merging adjacent pairs. These results showed us that
there might be a close connection between the degree of similarity between the size of the
lists to be merged and the performance of the algorithm, which in turn suggested us to try to
improve the coupling of the sublists in order to have their sizes matching as much as possible.
The slowdown registered when merging unbalanced lists is likely to be related to the ratio of
elements of the bigger list that has to be moved for each element in the smallest one: the
largest the difference, the highest the ratio, until this turns into a bottleneck.
(II) When one chooses to merge all pairs of adjacent lists, when the number of lists is odd, one of
the lists go unaltered to the next step; usually, the surviving list is the last (rightmost) one.
However, a 2% improvement in execution time has been observed by choosing to leave out
the longest one.
(III) For a triple A, B, C of adjacent lists (where A precedes B and B precedes C), one checks
whether |A| ≥ p(|B| + |C) holds, for an assigned constant p. If this is the case, lists B and C
are merged whereas A goes unaltered to the next step, otherwise A and B are merged.
7
A series of tests has been run to tune the parameter p; experimental results show that the best
performance is obtained for values of p ranging from 1.4 to 1.25, as the size of initial arrays
grows from a few hundreds to millions of elements. Using an average value for p, we obtained a
performance improvement close to 3.2%. The pseudocode of the resulting algorithm is shown
in the box for Algorithm 2.
ALGORITHM 2: NeatSort
Input: A list X.
Output: The ordered version of the input list.
lists := []; listCounter := 1; add X[1] to lists[listCounter];
for i := 1 to |X| do
while X[i] ≤ X[i + 1] do
add X[i] to lists[listCounter] i +:= 1;
end
if length(lists[listCounter]) == 1 then
while X[i] > X[i + 1] do
append X[i + 1] to lists[listCounter];
i +:= 1;
end
reverse lists[listCounter];
while X[i] ≤ X[i + 1] do
append X[i + 1] to lists[listCounter];
i +:= 1;
end
end
if listCounter > 1 and first element in lists[listCounter] is greater than or equal to the last element in
lists[listCounter − 1] then
merge lists[listCounter − 1] and lists[listCounter]
end
listCounter +:= 1;
end
while listCounter > 1 do
j := 1;
while j < listCounter do
if length(lists[j]) ≤ p ∗ (length(lists[j + 1]) + length(lists[j + 2])) then
neatMerge(lists[j], lists[j + 1]);
j +:= 2;
end
else
neatMerge(lists[j + 1], lists[j + 2]);
j +:= 3;
end
end
listCounter := |lists|;
end
2.4
Asymptotic Analysis
Upper and lower bounds for NeatSort can be computed quite trivially. Given an array X of length
n, the preliminary phase requires Θ(n) time, while the merging phase, as in Mergesort, requires
O(n log n) time: thus, the total time required by NeatSort is mathcalO(n log n).
As for space requirements, the preliminary phase can be realized efficiently with an array of length
8
at most | n2 |, while the merging procedure requires an array of length at most n, so the additional
space required is O(n).
Summing up, denoting with T (n) and S(n) the execution time of NeatSort on a list with n elements
and extra space required by it, respecetively, we have
• T (n) = Ω(n) and T (n) = O(n log n);
• S(n) = O(n).
3
Disorder metrics
In this section we will review some of the most common measure of disorder for sorting algorithms
and then analyze NeatSort performance with respect to them.
The disorder of a sequence is evaluated by a measure of presortedness (or measure of disorder),
namely a real-valued function over the collection of finite sequences of integers. More precisely, given
a sequence X of distinct elements3 , a measure of disorder M satisfies the following properties:
(a) If X is sorted (i.e., if the elements in X are in nondecreasing order, then M (X) = 0.
(b) If X and Y are order isomorphic, then M (X) = M (Y ).
(c) If X is a subset of Y , then M (X) ≤ M (Y ).
(d) If every element of X is smaller than every element of Y , then M (X.Y ) ≤ M (X) + M (Y ).
(e) M ({x}.X) ≤ |X| + M (X), for every x ∈ N.
The measure of efficiency of a sorting algorithm for a given input array X, instead, is the number
of comparisons it performs while sorting X.
A definition of optimal (or maximal ) adaptivity is due to Mannila [Mannila(1985)]: a sorting algorithm is optimally adaptive with respect to a measure of disorder if it takes a number of comparisons
that is within a constant factor of the lower bound.
Let below (z, n, M ) be the set of permutations of n distinct integers whose disorder is not larger
to z, with respect to a disorder measure M , i.e.,
below (z, n, M ) = {Y ∈ N<N ||Y | = n ∧ M (Y ) ≤ z} .
It can be shown that the comparison tree for any sequence Y of length n such that M (Y ) ≤ z
has at least |below (z, n, M )| leaves, and so its height is Ω(log |below (z, n, M )|). Hence, for an input
array X, a comparison based algorithm requires Ω(|X| + log |below (z, |X|, M )|) comparisons4 .
Mannila [Mannila(1985)] defines also the notion of optimal adaptivity in the worst case: let M
be a measure of disorder and let S be a sorting algorithm which uses TS (X) comparisons on input
X. We say that S is optimal with respect to M (or M -optimal) if, for some c > 0, we have
TS (X) ≤ c · max{|X|, log |below (z, |X|, M )|} ,
for every finite sequence X of integers.
3 Every
4 Of
sequence with repetitions can be easily mapped to the sequence of unique tuples (xi , i), where xi = X[i].
course at least a linear number of comparisons is required in order to test presortedness.
9
3.1
Commonly used metrics
In this section we review 11 commonly used measures of disorder.
1. Inv: given a sequence S = hs1 , s2 , . . . , sn i, an inversion is any pair (si , sj ) such that i < j and
si > sj ; Inv(S) is the number of inversions in S.
2. Dis: the largest distance determined by an inversion [Estivill-Castro and Wood(1989)]. For
example, let S1 = h1, 8, 4, 3, 7, 6, 2, 5, 10i; then (8, 5) is the inversion whose elements are farthest
apart, so that Dis(S1 ) = 7. This measure puts more emphasis on the inversions which are
more far apart.
3. Max: the largest distance an element must travel to reach its sorted position. Let S1 as above.
Then 8 must travel 6 positions to reach the right place, so M ax(S1 ) = 6. This measure gives
more importance to global disorder rather than local disorder.
4. Exc: the minimum number of exchanges required to sort a sequence [Mannila(1985)]. Consider
again the sequence S1 above. It can be shown that 4 exchanges suffice to sort it, whereas 3
exchanges are not enough. Therefore, Exc(S1 ) = 4.
5. Rem: the minimum number of elements that must be removed to obtain a sorted subsequence
[Knuth(1973a)]. Considering again our sequence S1 , we have easily Rem(S1 ) = 5.
6. Runs: ascending runs are sorted portions of the input; for a sequence S, Runs(S) is the number
of boundaries between the maximal runs in S, called step-downs [Knuth(1973b)]. Thus, for
our example, we have Runs(S1 ) = 4.
7. SUS (short for Shuffled Up-Sequences [Levcopoulos and Petersson(1990)]); it is a generalization of the Runs measure and is defined as the minimum number of ascending subsequences
(of possibly not adjacent elements) into which we can partition a given sequence. In our example, SUS (S1 ) = 4.
8. SMS (short for Shuffled Monotone Subsequence); it further generalizes the previous
measure: it is defined as the minimum number of monotone (ascending or descending) subsequences into which one can partition the input sequence [Levcopoulos and Petersson(1990)].
In our example, SMS (S1 ) = 3.
9. Enc: it refers to the concept of Encroaching lists introduced by Skiena in its adaptive
algorithm Melsort [Skiena(1988)]; it is defined as the number of sorted lists constructed by
Melsort when applied to a sequence.
10. Osc: it has been defined by Levcopoulos and Petersson [Levcopoulos and Petersson(1989)]
after a study of Heapsort; in some sense it evaluates the “oscillations” of large and small
elements in a given sequence.
11. Reg: this measure has been defined by Moffat and Petersson [Moffat and Petersson(1991),
Petersson and Moffat(1995)]; it results that any Reg-optimal sorting algorithm is optimally
adaptive with respect to the other 10 measures.
A partial order and related equivalence relation on the above measures is provided by the following
definition.
Definition 1 Let M1 , M2 be two measures of disorder. We state that:
10
Figure 3: Metrics partial order
1. M1 is algorithmically finer than M2 (denoted M1 ≤alg M2 ) if and only if any M1 -optimal
algorithm is also M2 -optimal.
2. M1 and M2 are algorithmically equivalent (denoted M1 =alg M2 ) if and only if M1 ≤alg M2
and M2 ≤alg M1 .
Figure 3 shows in details the partial order introduced by ≤alg ; as already remarked, Regoptimality implies optimality with respect to any other of the above metrics, and SUS -optimality
implies Runs-optimality, while it is implied by SMS -optimality.
Therefore, to prove that NeatSort is optimal for all these metrics, it is enough to show that it is
Reg-optimal.
3.1.1
Reg-optimality of NeatSort
Moffat and Petersson [Moffat and Petersson(1991)] defined the measure Reg while studying more
efficient variants of Insertion-Sort which improve the performance of Insertion-Sort by keeping
track of the information gathered during the algorithm execution, such as the position at which the
last elements5 have been inserted. Let
di = {k|1 ≤ k < i ∧ min{xi−1 , xi } < xk < max{xi−1 , xi }} + 1
be the distance between the last insertion point to the actual insertion point and let
di,j = {k|1 ≤ k < i ∧ min{xi , xj } < xk < max{xi , xj }} + 1
be the distance from xj , with j < i, to the insertion point of xi . Note that di = di,i−1 .
Next, for i > 1, let ti = min{j|1 < j < i ∧ di,i−j = 1}; ti represents the amount of history needed
for inserting xi in its final position.
Q|X|
Finally, by putting ri = min{t + di , i − t}, we then have Reg(X) = i=2 (ri − 1).
Since everyPsublist L[q] is ordered after NeatSort’s preliminary phase, ri = 1 for i = 1, .., |X|,
m
and therefore q=0 Reg(L[q]) = 0 and thus NeatSort is adaptive with respect to the measure Reg,
and it is also optimal for all the other measures defined above.
5 The algorithm Regional Insertion Sort searches, at each step, a logarithmic fraction of the element in the ordered
portion of the array.
11
3.1.2
Metrics Lower Bounds for NeatSort
Estivill-Castro and Wood introduced, in 1992 [Estivill-Castro and Wood(1992)], the notion of generic
sorting algorithm (see Algorithm 3 below).
ALGORITHM 3: Generic Sort
Input: A list X with n elements.
Output: The ordered version of the input list.
if X is sorted then
terminate;
end
if X is simple then
sort X using an alternative sorting algorithm for simple sequences;
end
else if X is neither sorted nor simple then
apply a division protocol to divide X into at least s ≥ 2 disjoint sequences;
recursively sort the sequences using Generic Sort;
merge the sorted sequences to obtain X in sorted order;
end
Remark 2 The definition of “simple” in Algorithm 3 depends on the actual definition of the algorithm.
Table 1: Known lower bounds for disorder metrics
Measure Lower bound: log kbelow(M (X), |X|, M )k
Dis
Ω(|X|(1 + log(Dis(X) + 1)))
Exc
Ω(|X|(1 + Exc(X) log(Exc(X) + 1)))
Enc
Ω(|X|(1 + log(Enc(X) + 1)))
Inv
Ω(|X| · (1 + log( Inv(X)
+ 1)))
|X|
M ax
Ω(|X|(1 + log(M ax(X) + 1)))
Osc
Ω(|X| · (1 + log( Osc(X)
+ 1)))
|X|
Reg
Ω(|X|(1 + log(Reg(X) + 1)))
Rem
Ω(|X|(1 + Rem(X) log(Rem(X) + 1)))
Runs
Ω(|X|(1 + log(Runs(X) + 1)))
SM S
Ω(|X|(1 + log(SM S(X) + 1)))
SU S
Ω(|X|(1 + log(SU S(X) + 1)))
As is clear, NeatSort perfectly fits the description above. We can thus make use of the following
theorem [Estivill-Castro and Wood(1990)]:
Theorem 1 Let M be a measure of disorder such that a sequence X is simple whenever M (X) = 0,
and let D ∈ R and s ∈ N be constants such that 0 ≤ D < 2 and s > 1. Also, let DP be a linear-time
division protocol that divides any sequence X into s sequences of almost equal sizes. Then:
1. Generic Sort is worst-case optimal and it takes O |X| log |X| -time in the worst case.
2. Generic
Sort is adaptive with respect to the measure M and it takes O |X| · (1 + log(M (X) +
1)) -time in the worst case, provided that
s
X
M (j-th sequence) ≤ D ·
j=1
12
jsk
2
· M (Y )
holds, for all sufficiently long sequences Y .
Table 1 reports the known lower bounds for the metrics defined in Section 3.1: NeatSort, as
proved above, being optimal for all these metrics, meets all such lower bounds.
4
Performance
ALGORITHM 4: Melsort
Input: A list X of length n.
Output: The ordered version of the input list.
listCount := 1;
put X1 in list1 ;
for i := 2 to n do
for j := 1 to listCount do
if Xi < head(listj ) then
add Xi to the head of listj ;
break;
end
else if Xi > tail(listj ) then
add Xi to the tail of listj ;
break;
end
end
if Xi couldn’t be added to any list then
add 1 to listCount;
create list listCount ;
put Xi in the newly created list;
end
end
while listCount > 1 do
if odd (listCount) then
head(listCount − 1) := merge(head(listCount − 1), head(listCount));
end
for i := 1 to | listCount
2
| do
head(i) := merge head(i), head(| listCount
| + i) ;
2
end
listCount /:= 2;
end
return head(1)
To test the performance of our algorithm, a test suite has been designed to benchmark NeatSort
behavior against a tuned version of random Quicksort algorithm, the standard C++ qsort function,
a tuned version of Mergesort, and Skiena’s Melsort algorithm (whose pseudo code is shown in the
box for Algorithm 4).
To minimize the influence of kernel and other background programs running at the same time,
the test suite iterates a loop executing in turn all 5 algorithms, once per iteration, on (a copy of)
the same array; these arrays are generated randomly (or according to specific criteria) at every
iteration. In this way, possible computational lags due to external factors will affect on average all
the algorithms much in the same way.
The test suite has been run on different machines:
13
• a desktop PC with an Intel core-duo processor and 2 GB of RAM, running Windows Vista,
32 bit version;
• an Asus notebook with an Intel Core i7 2.0 GHz processor, 6 GB of RAM and running both
Windows 7, 64 bit version and, in a separate partition, Ubuntu 10, 64 bit version;
• a Fujitsu Siemens notebook with an Intel core-duo P8400 processor (2.26 GHz), 4 GB of RAM
and running Ubuntu 10, 64 bit version.
Under Windows, Microsoft Visual C++ Express has been used, setting the compiler to make
advantage of the multicore processor and to optimize the code for faster execution. Under Linux,
the Netbeans 6.9.1 suite had been used with the g++ compiler set for multicore processor 64 bit
machines.
The simulation has provided consistent results on all the platforms. To ensure the greatest
precision in evaluating algorithms’ performance, it has been used the high resolution time measure
mechanism provided by both systems: by window.h library in Windows (the minimum measurable
interval is approximately 10 microseconds, with a resolution of 1190000 tick per second) and by
time.h library in Linux; using the clock-gettime function, the interval resolution is 1 nanosecond. At
each iteration, for every algorithm the number of intervals consumed is stored and then, at the end
of the cycle, the median value is extracted for each algorithm; the median value, unlike the average
(that is computed anyway), is not affected by extreme, out of scale, values, which can be caused by
unpredictable peaks of requests for OS’ services: this is especially true for large testing sets. Both
values (median and average) are expressed in milliseconds and rounded to the microsecond.
The first test suite has been run on random arrays, then a few specific cases are examined: ordered
arrays, inversely ordered arrays, and partially ordered ones. We tried to make as an extensive test
as it was possible, considering the time requested to sort huge arrays; in details, the number of
iteration has been fixed depending on array’s size.
Table 2: Relation between array size and number of test cases
Array size
Test cases
3.276.800
5000
Da 100 a 102.400
10000
Da 204.800 a 409.600
50000
819.200
25000
1.638.400
10000
Da 6.553.600 a 26.214.400 1000
50.000.000+
500
4.1
Random Arrays
Tests on random arrays show consistent performance for NeatSort as the size of the arrays grow. For
small arrays, the best performing algorithm is the implementation of random Quicksort, provided
here, that had been optimized for best performance. For larger arrays, however, this algorithm’s
performance progressively degrades, while NeatSort and qsort steadily grows with n log n, as highlighted using a logarithmic scale to visualize the results (Figure 6). The results have been averaged
over all the testing platforms.
14
Figure 4: Execution time (ms) on random arrays
Figure 5: Details of previous chart for arrays of size ≤ 50K elements
15
Figure 6: Execution time (ms) on random arrays - log10 scale
4.2
Sorted arrays (Most favourable case)
Sorted arrays are the most favourable case for NeatSort, and indeed the measured performance shows
that NeatSort’s running time is several order of magnitudes smaller than the other algorithms.
16
Figure 7: Execution time (ms) on sorted arrays
Figure 8: Execution time (ms) on sorted arrays - log10 scale
17
4.3
Statistics about performance and disorder metrics
So far, we have only examined the two extremes of the input landscape; sorted arrays are, by design,
the most favourable case for NeatSort, but it would be reasonable to expect that when run on nearly
sorted arrays the algorithm would largely benefit from the analysis phase and demonstrate superior
performance. To further investigate this issue, we run a series of comparative tests on qsort and
NeatSort, gathering, together with performance measurements, a set of statistics about the degree
of disorder of the input, with the goal of bringing up correlations between the relative performance,
and
• the number of inversions,
• the max distance of elements to their position in the sorted sequence,
• the number runs in the input array.
For each of these metrics, two charts are shown:
1. a 2D chart stressing correlation between the metric and the relative performance of NeatSort
in comparison to qsort;
2. a 3D chart, where each point in the R2 domain correspond to an input sequence identified by
its size and the measure for the metric.
In both charts, the relative performance is expressed in percentage, and computed as
Tqsort − TNeatSort
× 100 .
max(Tqsort , TNeatSort )
So positive values show better performance for NeatSort (the greater the absolute value, the better),
and negative values, instead, shows cases in which qsort outperformed NeatSort. Values are shown
using a gradient going from green (for positive values), to yellow (for ties), to red (for negative
values).
In the 2D charts, the size of the dots is proportional to the size of the test case.
4.3.1
Inversions
The number of inversions is shown as a percentage of the maximum number of possible inversions
for the input size: for a sequence of length n, there can be at most n(n−1)
inversions.
2
As expected, the data plots a bowl-shaped figure with a minimum corresponding to 50% of
inversions, while ordered sequences (0% of inversions) and reversed sequences (100% of inversions)
represent the best case scenario for NeatSort. The figure also shows a different cluster, showing
almost constant relative performance, in correspondence with larger input sequences. As it is also
clarified by the 3D chart, this anomaly in the results testify that the performance delta in favour of
NeatSort grows with the size of the input.
18
Figure 9: Relative performance for NeatSort and qsort, with respect to percentage of inversions
(spots proportional to array size)
Figure 10: Relative performance for NeatSort and qsort, with respect to percentage of inversions
and size
4.3.2
Max distance
The charts in this section show the relative performance with respect to the maximum distance
of elements in the input (expressed as a percentage of the input length). Interestingly enough,
19
NeatSort’s relative performance steadily improves not only as the max distance becomes smaller,
but also as the size of the input grows.
Figure 11: Relative performance for NeatSort and qsort, with respect to max distance / array size
(spots proportional to array size)
Figure 12: Relative performance for NeatSort and qsort, with respect to max distance / array size
and size
20
4.3.3
Runs
The number of runs is expressed as a percentage of the input length; Figures 13 and 14 show, as
expected, much the same shape as Figure 9: 0% runs corresponds to sorted sequence, but as runs
grows from 50% (the global minimum for relative performance) to 100% (and hence toward reversed
sequences), NeatSort performs increasingly better.
Interestingly, a local maximum is present in correspondence to sequences with nearly 60K elements and 78% of runs.
Figure 13: Relative performance for NeatSort and qsort, with respect to runs/size (spots proportional
to array size)
21
Figure 14: Relative performance for NeatSort and qsort, with respect to runs/size and size
5
Conclusions
We have presented an intuitive adaptive sorting algorithm that proves to be optimal for most of
the disorder metrics present in literature.Although other algorithms exists that outperforms it on
peculiar ad hoc metrics (in particular Melsort [Skiena(1988)]), those algorithms, as most of the
adaptive algorithms in literature, have such an intricate workflow that their implementations results
slower by some order of magnitude in comparison with Mergesort or Quicksort.For NeatSort, instead,
we have carefully both engineered its design to be as simple as possible and tuned its implementation
to make it extremely efficient and performant.The result is a flexible and fast algorithm which
on average is as efficient as Quicksort and outperforms even the C implementation of qsort: the
ratio between the running times of Neatsort and qsort, besides being consistently below 1, gets
progressively smaller as the number of inversions moves from 50% to both 0% and 100%, i.e. to
sorted sequences in direct and inverse order.
6
Acknowledgments
Charts in Sections 4.1 and 4.2 have been created with Excel c Starter 2010, while the remaining
charts have been created with the MatPlotLib Python library.
References
[Estivill-Castro and Wood(1989)] Vladimir Estivill-Castro and Derick Wood. A new measure of
presortedness. Information and Computation, 83(1):111–119, 1989.
[Estivill-Castro and Wood(1990)] Vladimir Estivill-Castro and Derick Wood. A generic adaptive
sorting algorithm. University of Waterloo, Computer Science Department, 1990.
[Estivill-Castro and Wood(1992)] Vladmir Estivill-Castro and Derick Wood. A survey of adaptive
sorting algorithms. ACM Computing Surveys (CSUR), 24(4):441–476, 1992.
[Knuth(1973a)] Donald E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming, section 5.2.1. Addison-Wesley, Reading, Massachusetts, second edition, 10 January
1973a. Full INBOOK entry (w series).
[Knuth(1973b)] Donald E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming, page 161. Addison-Wesley, Reading, Massachusetts, second edition, 10 January
1973b. Full INBOOK entry (w series).
[Levcopoulos and Petersson(1989)] Christos Levcopoulos and Ola Petersson. A note on adaptive
parallel sorting. Information processing letters, 33(4):187–191, 1989.
[Levcopoulos and Petersson(1990)] Christos Levcopoulos and Ola Petersson. Sorting shuffled monotone sequences. Springer, 1990.
[Mannila(1985)] Heikki Mannila. Measures of presortedness and optimal sorting algorithms. Computers, IEEE Transactions on, 100(4):318–325, 1985.
22
[Moffat and Petersson(1991)] Alistair Moffat and Ola Petersson. Historical searching and sorting.
In ISA’91 Algorithms, pages 263–272. Springer, 1991.
[Petersson and Moffat(1995)] Ola Petersson and Alistair Moffat. A framework for adaptive sorting.
Discrete Applied Mathematics, 59(2):153–179, 1995.
[Skiena(1988)] Steven S. Skiena. Encroaching lists as a measure of presortedness. BIT Numerical
Mathematics, 28(4):775–784, 1988.
23
View publication stats