NeatSort - A practical adaptive algorithm

Domenico  Cantone

NeatSort - A practical adaptive algorithm

Domenico Cantone

2014

visibility

…

description

23 pages

link

1 file

We present a new adaptive sorting algorithm which is optimal for most disorder metrics and, more important, has a simple and quick implementation. On input $X$, our algorithm has a theoretical $\Omega (|X|)$ lower bound and a $\mathcal{O}(|X|\log|X|)$ upper bound, exhibiting amazing adaptive properties which makes it run closer to its lower bound as disorder (computed on different metrics) diminishes. From a practical point of view, \textit{NeatSort} has proven itself competitive with (and often better than) \textit{qsort} and any \textit{Random Quicksort} implementation, even on random arrays.

NeatSort - A practical adaptive algorithm Marcello La Rocca1 and Domenico Cantone2 1 Scuola Superiore Sant’Anna 2 Università di Catania arXiv:1407.6183v1 [cs.DS] 23 Jul 2014 Abstract We present a new adaptive sorting algorithm which is optimal for most disorder metrics and, more important, has a simple and quick implementation. On input X, our algorithm has a theoretical Ω(|X|) lower bound and a O(|X| log |X|) upper bound, exhibiting amazing adaptive properties which makes it run closer to its lower bound as disorder (computed on different metrics) diminishes. From a practical point of view, NeatSort has proven itself competitive with (and often better than) qsort and any Random Quicksort implementation, even on random arrays. 1 Introduction Our algorithm NeatSort is based on a simple idea: exploit all the information one gathers while reading the input array, as soon as one gets it. It is in this good practice that NeatSort “cleverness” resides. NeatSort is a variant of the standard Mergesort algorithm, as its core workflow consists of merging ordered lists. However, in order to speed up the merging phase, the input array X is preliminarily scanned so as to split it into a (minimal) sequence of nondecreasing sublists L[0], L[1], . . . , L[m], by executing the following instructions: 0. i = 0; 1. add the first undiscovered element, X[i], to a new sublist; 2. keep adding elements X[i + 1], . . . , X[k] to the current sublist until either X[k] > X[k + 1] or k = |X|; 3. if there are still undiscovered elements in X, go back to step 1. The following properties are immediate: (A) each sublist L[q] is in nondecreasing order, for q = 0, 1, . . . , m; (B) if L[q][iq ] is the last element in L[q], then L[q + 1][0] < L[q][iq ], for q = 0, 1, . . . , m − 1; (C) let L∗ [q] and L∗ [q +1] be, respectively, a sorted list resulting from merging L[q] with any subset of the lists L[0], . . . , L[q − 1], and a sorted list resulting from merging L[q + 1] with any subset of the lists L[q + 2], . . . , L[m], where q ∈ {1, . . . , m − 1}. Then L∗ [q + 1][0] < L∗ [q][i∗q ], where i∗q is the index of the last element in L∗ [q]. 1 After creating the sublists L[q], for q = 0, 1, . . . , m, adjacent pairs can be merged using an ad hoc variant of mergesort’s merging procedure (which takes advantage of properties (B) and (C) above), until a single (ordered) list remains. In fact, property (B) allows one to save one comparison when merging the initial lists, and then, thanks to property (C), one can take advantage of such saving at each subsequent merging step of “superlists”. 1.1 Merging points Adjacent sublists L[q], L[q + 1], where as above L[q] and L[q + 1] are in nondecreasing order and L[q + 1][0] < L[q][iq ] holds (with iq the index of the last element in L[q]), can be stably merged into a single nondecreasing list in a convenient way. For the sake of simplicity, let us first assume that L[q][0] < L[q + 1][0] and L[q][iq ] > L[q + 1][iq+1 ] (1) hold. Then, in order to merge L[q] and L[q + 1], it is enough to find out two sequences 0 < j0 < j1 < . . . < jt = iq + 1 and 0 = k0 < k1 < . . . < kt = iq+1 + 1 (2) of merging points in L[q] and L[q + 1], respectively, such that L[q][ji − 1] ≤ L[q + 1][ki ]<L[q][ji ] L[q + 1][ki+1 − 1] < L[q][ji ] ≤L[q + 1][ki+1 ] (3) (4) for i = 0, 1, . . . , t − 1 (where we convene that L[q + 1][iq+1 + 1] = +∞). Then the array resulting from concatenating the slices1 L[q][0 .. j0 ], L[q + 1][k0 .. k1 − 1], L[q][j1 .. j2 − 1], . . . , L[q + 1][kt−1 .. kt − 1], L[q][jt−1 .. jt − 1] (in the order shown) is the stable merging of L[q] and L[q + 1]. Remark 1 By relaxing (2) so as to allow 0 ≤ j0 and jt−1 ≤ jt , the above considerations can be immediately generalized also to the cases in which any of the conditions in (1) does not hold. The merging points j0 , j1 , . . . , jt and k0 , k1 , . . . , kt can be computed quite efficiently. The index j0 can be found by performing a binary search in L[q][0 .. iq − 1], as it is known in advance that L[q][iq ] > L[q + 1][0]. Then, the remaining merging points can be found by a simple linear search which is directly based on the very definitions (3) and (4). The number of comparisons for two lists of length kq and kq+1 is at most O(log(kq ) + (kq + kq+1 )) = O(kq + kq+1 ); in every step the number of comparisons is therefore O(|X|): it is self evident in the last step of the merging phase, with just two sublists with a total of |X| elements to merge, but of course in every merging step the sum of the number of elements of all the sublists is always equal to |X|. Despite the asymmetry in the sublists sizes l mthat is due to the very nature of the analysis phase, their number is guaranteed to be at most |X| 2 , and at each merging step the number of sublists is halved, so there will be at most O(log |X|) merging steps, and therefore the total number of comparison is guaranteed to be O(|X| log |X|). 1 For an array T of length n and indices 0 ≤ i ≤ j ≤ n − 1, we denote by T [i .. j] the slice of T from T [i] to T [j]. When j < i, T [i .. j] will denote the empty array. 2 Figure 1: A comparison between Mergesort and NeatSort 1.2 Keys to improvements The standard mergesort algorithm follows a strategy divided into two phases: • A top-down phase, where the initial array is recursively divided in half-sized subarrays, until a minimum size (1 element) is reached. • A bottom-up phase, where the subarrays are recursively merged back together, resulting in the (stably) sorted version of the initial array. In NeatSort, the top-down phase is replaced by the preliminary phase which identifies the sequence of ordered sublists, as seen above. The latter will be used as the base for a subsequent bottom-up phase, which, up to some optimizations, is basically the same as in the standard Mergesort. Figure 1 shows the different ways in which Mergesort and NeatSort work. It is important to notice that, for an input array X, Mergesort’s bottom-up phase requires 2 · |X| steps, whereas NeatSort’s preliminary phase requires just |X| − 1 comparisons. 2 Further improvements The crucial improvement in NeatSort is the efficient partitioning of the initial array in sublists during the preliminary phase, before the bottom-up phase starts. Notice, however, that in the worst case, i.e., when the initial array is sorted backwards, |X| sublists (containing exactly one element each) would be produced, thus resulting in no improvements in comparison to Mergesort. For the sake of clarity, let us suppose that our initial array X is in strictly decreasing order, while a nondecreasing order is seeked for. A first immediate solution would be to check, at thelendmof the preliminary phase, whether the number of sublists produced is greater than or equal to |X| 2 : this could happen if and only if the ratio of adjacent elements which are inverted is higher than 50%; in this case, the preliminary phase could just be repeated by examining the input array backwards (we denote it as backward analysis, as opposed to forward analysis, where array’s elements are examined from first to last), and be sure to obtain an improvement. 3 Settling with this solution, however, would betray NeatSort’s philosophy of making use of all of the information one has collected. Additionally, such solution is not optimal. In fact, let us consider the following array X, where j k elements in increasing ordered, • the first half contains |X| 2 l m elements in decreasing order. • the second half contains |X| 2 The phase would produce a partitioning consisting of one list in account of the first half, l analysis m plus |X| lists in account of the second half, so that the backward analysis would take place and 2 j k output one list for the second half of the initial array plus |X| lists for the first half of the input 2 j k + 1. This would be inefficient, as we know that the array, for a total number of lists equal to |X| 2 first half of the array is ordered, and so it is the second one (though in nonincreasingjorder). k Thus, if the order of the second half is reversed, one ends up with just two lists, rather than |X| + 1 lists. 2 A solution to the above situation is the following: every time, during the preliminary phase, a singleton sublist is created (i.e., there is an inversion in the input, whose first element is not part of any previously created sublist), a new sublist formed by such two elements is created and then further elements are added to it until one is found which is greater than its predecessor–basically, a backward analysis is started from the point of the inversion to the first non-inverted couple of adjacent elements; subsequently, the sublist so obtained is reversed and a check is made to see if any additional element can be added to its tail (by any means starting a new forward analysis). In the particular situation in which the input is sorted in reverse order, the above procedure creates just one list, proving itself as efficient as it is when dealing with sorted arrays (i.e., it is optimal in both extreme situations). In the situation reported above, when the array is composed by two subarrays–the first one in increasing order and the second one in decreasing order,–such solution would create, during |X|the d 2 e preliminary phase, two lists; in particular, the construction of the second list would require 2 element swaps (the first element in the sublist is swapped with the last one, the second one with the d |X| 2 e second-last one, etc.), and thus a total of 3 · assignments would be required. 2 After the preliminary phase, adjacent sublists are iteratively merged together using an ad hoc variant of the canonical merge procedure until a single list is left. 2.1 Correctness Let L[0], L[1], . . . , L[m] be the sequence of sublists created during the preliminary phase (with forward and backward analyses). Then, it is an easy matter to check that, by the very construction, the following two properties hold: (A) each sublist L[q] is in nondecreasing order, for q = 0, 1, . . . , m; (B) if L[q][iq ] is the last element in L[q], then L[q + 1][0] < L[q][iq ], for q = 0, 1, . . . , m − 1. Properties (A) and (B) readily imply (C) let L∗ [q] and L∗ [q +1] be, respectively, a sorted list resulting from merging L[q] with any subset of the lists L[0], . . . , L[q − 1], and a sorted list resulting from merging L[q + 1] with any subset of the lists L[q + 2], . . . , L[m], where q ∈ {1, . . . , m − 1}. Then L∗ [q + 1][0] < L∗ [q][i∗q ], where i∗q is the index of the last element in L∗ [q]. 4 From Property (C), it follows that during any sequence of merging steps, in which only adjacent sublists are allowed to be merged, Properties (A) and (B) are maintained as invariant, and so also Property (C). 2.2 Analysis phase performance The combination of forward and backward analysis proves itself optimal in any other situation with respect to the number of sublists created. We can compare forward analysis, backward analysis and their combination through some examples shown in Figure 2. As it is clear in each one of the examples above, the combination of forward and backward analysis produces a minimal number of sublists in comparison with: 1. Mergesort (which will produce exactly |X| sublists); 2. Forward analysis only (by definition) 3. The algorithm m applies forward analysis and than, if the number of sublists produced is l that |X| greater than 2 , switches to backward analysis. While the correctness of each one of the statements above appears evident, for the first two the proof is trivial, while proving the last one, though intuitive, involves a simple reasoning by contradiction, which is left to the reader. This solution, however, is not always optimal with respect to the total distance (the sum of the distances of each element from its final position in the ordered sequence), as can be seen in examples E and F, where backward analysis produces the lowest value; example C, however, shows how backward analysis can also lead to the highest possible value in other situations, so that backward analysis doesn’t prove optimal either with respect to total distance. 2.3 Heuristics In order to improve merging efficiency, a few attempts have been made. First, as described in Section 1.1, different strategies have been tried to improve efficiency in finding merging points between lists and to improve the process. Let L = hl1 , l2 , . . . , ln i and R = hr1 , r2 , . . . , rm i be two sublists in nondecreasing to be merged such that ln > r1 , and let S = hs1 , s2 , . . . , sn+m i be the list resulting from their merge. Due to the nature of the problem and the overhead introduced to make an extra copy of at least one of the lists, the best performance has been reached with the procedure outlined in Algorithm 1. Particular care has also been put in tuning the code. In order to further improve performance, our efforts have been focused on the choice of the order used to merge the sublists: to introduce adaptivity in the merging phase (then having a second-level adaptivity), a few heuristics have been tested and compared against the simplest merge approach, to verify whether possible advantages deriving from the choice of a better order for merging would be larger than the required overhead. Notice that the number of merges for merging m lists is (m − 1), independently of the strategy followed. We have tested and benchmarked the following alternative solutions: 2 In the actual implementation, in order to minimize the number of swaps and extra memory consumption, S reuses the array L while T, that is a temporary array, will have its element copied from L[i1 ], . . . , L[n-1]; initially the size of array S is set to i1 − 1 (possibly 0), and it will grow to n + m elements, reusing the memory previously occupied by both L and R. 5 Figure 2: Performance of the different strategies described for the preliminary phase over a few examples. ∆ indicates the distance of each element from its position in the sorted sequence. 6 ALGORITHM 1: NeatMerge Input: two sublists L = hl1 , l2 , . . . , ln i and R = hr1 , r2 , . . . , rm i in nondecreasing order and such that ln > r1 . Output: A single ordered list S containing all the elements in the input lists. Using binary search, find the lowest element li1 in hl1 , l2 , . . . , ln−1 i greater than r1 (i.e., the final position of r1 in S). // Note that 1 ≤ i1 ≤ n. Thus the first i1 elements in S will be hl1 , l2 , . . . , li1 −1 , r1 i, where, if i1 = 1, the initial sublist hl1 , l2 , . . . , li1 −1 i is empty. Init2 S = hl1 , l2 , . . . , li1 −1 i and T = hli1 , . . . , ln i; k := 1; i1 := 1; j1 := 1; repeat add rjk to the tail of S; j := 1; repeat add rjk +j to the tail of S; j +:= 1; until rjk +j > tik or R is empty; jk+1 := jk + j; add tik to the tail of S; i := 1; repeat add tik +i to the tail of S; i +:= 1; until tik +i > rjk+1 or T is empty; ik+1 := ik + i; k +:= 1; until either T or R is empty; if R is empty then copy all the elements left in T to the tail of S; end return S; (I) The first (leftmost) list is always merged with the second one. This heuristic is sensibly slower than merging adjacent pairs. These results showed us that there might be a close connection between the degree of similarity between the size of the lists to be merged and the performance of the algorithm, which in turn suggested us to try to improve the coupling of the sublists in order to have their sizes matching as much as possible. The slowdown registered when merging unbalanced lists is likely to be related to the ratio of elements of the bigger list that has to be moved for each element in the smallest one: the largest the difference, the highest the ratio, until this turns into a bottleneck. (II) When one chooses to merge all pairs of adjacent lists, when the number of lists is odd, one of the lists go unaltered to the next step; usually, the surviving list is the last (rightmost) one. However, a 2% improvement in execution time has been observed by choosing to leave out the longest one. (III) For a triple A, B, C of adjacent lists (where A precedes B and B precedes C), one checks whether |A| ≥ p(|B| + |C) holds, for an assigned constant p. If this is the case, lists B and C are merged whereas A goes unaltered to the next step, otherwise A and B are merged. 7 A series of tests has been run to tune the parameter p; experimental results show that the best performance is obtained for values of p ranging from 1.4 to 1.25, as the size of initial arrays grows from a few hundreds to millions of elements. Using an average value for p, we obtained a performance improvement close to 3.2%. The pseudocode of the resulting algorithm is shown in the box for Algorithm 2. ALGORITHM 2: NeatSort Input: A list X. Output: The ordered version of the input list. lists := []; listCounter := 1; add X[1] to lists[listCounter]; for i := 1 to |X| do while X[i] ≤ X[i + 1] do add X[i] to lists[listCounter] i +:= 1; end if length(lists[listCounter]) == 1 then while X[i] > X[i + 1] do append X[i + 1] to lists[listCounter]; i +:= 1; end reverse lists[listCounter]; while X[i] ≤ X[i + 1] do append X[i + 1] to lists[listCounter]; i +:= 1; end end if listCounter > 1 and first element in lists[listCounter] is greater than or equal to the last element in lists[listCounter − 1] then merge lists[listCounter − 1] and lists[listCounter] end listCounter +:= 1; end while listCounter > 1 do j := 1; while j < listCounter do if length(lists[j]) ≤ p ∗ (length(lists[j + 1]) + length(lists[j + 2])) then neatMerge(lists[j], lists[j + 1]); j +:= 2; end else neatMerge(lists[j + 1], lists[j + 2]); j +:= 3; end end listCounter := |lists|; end 2.4 Asymptotic Analysis Upper and lower bounds for NeatSort can be computed quite trivially. Given an array X of length n, the preliminary phase requires Θ(n) time, while the merging phase, as in Mergesort, requires O(n log n) time: thus, the total time required by NeatSort is mathcalO(n log n). As for space requirements, the preliminary phase can be realized efficiently with an array of length 8 at most | n2 |, while the merging procedure requires an array of length at most n, so the additional space required is O(n). Summing up, denoting with T (n) and S(n) the execution time of NeatSort on a list with n elements and extra space required by it, respecetively, we have • T (n) = Ω(n) and T (n) = O(n log n); • S(n) = O(n). 3 Disorder metrics In this section we will review some of the most common measure of disorder for sorting algorithms and then analyze NeatSort performance with respect to them. The disorder of a sequence is evaluated by a measure of presortedness (or measure of disorder), namely a real-valued function over the collection of finite sequences of integers. More precisely, given a sequence X of distinct elements3 , a measure of disorder M satisfies the following properties: (a) If X is sorted (i.e., if the elements in X are in nondecreasing order, then M (X) = 0. (b) If X and Y are order isomorphic, then M (X) = M (Y ). (c) If X is a subset of Y , then M (X) ≤ M (Y ). (d) If every element of X is smaller than every element of Y , then M (X.Y ) ≤ M (X) + M (Y ). (e) M ({x}.X) ≤ |X| + M (X), for every x ∈ N. The measure of efficiency of a sorting algorithm for a given input array X, instead, is the number of comparisons it performs while sorting X. A definition of optimal (or maximal ) adaptivity is due to Mannila [Mannila(1985)]: a sorting algorithm is optimally adaptive with respect to a measure of disorder if it takes a number of comparisons that is within a constant factor of the lower bound. Let below (z, n, M ) be the set of permutations of n distinct integers whose disorder is not larger to z, with respect to a disorder measure M , i.e., below (z, n, M ) = {Y ∈ N<N ||Y | = n ∧ M (Y ) ≤ z} . It can be shown that the comparison tree for any sequence Y of length n such that M (Y ) ≤ z has at least |below (z, n, M )| leaves, and so its height is Ω(log |below (z, n, M )|). Hence, for an input array X, a comparison based algorithm requires Ω(|X| + log |below (z, |X|, M )|) comparisons4 . Mannila [Mannila(1985)] defines also the notion of optimal adaptivity in the worst case: let M be a measure of disorder and let S be a sorting algorithm which uses TS (X) comparisons on input X. We say that S is optimal with respect to M (or M -optimal) if, for some c > 0, we have TS (X) ≤ c · max{|X|, log |below (z, |X|, M )|} , for every finite sequence X of integers. 3 Every 4 Of sequence with repetitions can be easily mapped to the sequence of unique tuples (xi , i), where xi = X[i]. course at least a linear number of comparisons is required in order to test presortedness. 9 3.1 Commonly used metrics In this section we review 11 commonly used measures of disorder. 1. Inv: given a sequence S = hs1 , s2 , . . . , sn i, an inversion is any pair (si , sj ) such that i < j and si > sj ; Inv(S) is the number of inversions in S. 2. Dis: the largest distance determined by an inversion [Estivill-Castro and Wood(1989)]. For example, let S1 = h1, 8, 4, 3, 7, 6, 2, 5, 10i; then (8, 5) is the inversion whose elements are farthest apart, so that Dis(S1 ) = 7. This measure puts more emphasis on the inversions which are more far apart. 3. Max: the largest distance an element must travel to reach its sorted position. Let S1 as above. Then 8 must travel 6 positions to reach the right place, so M ax(S1 ) = 6. This measure gives more importance to global disorder rather than local disorder. 4. Exc: the minimum number of exchanges required to sort a sequence [Mannila(1985)]. Consider again the sequence S1 above. It can be shown that 4 exchanges suffice to sort it, whereas 3 exchanges are not enough. Therefore, Exc(S1 ) = 4. 5. Rem: the minimum number of elements that must be removed to obtain a sorted subsequence [Knuth(1973a)]. Considering again our sequence S1 , we have easily Rem(S1 ) = 5. 6. Runs: ascending runs are sorted portions of the input; for a sequence S, Runs(S) is the number of boundaries between the maximal runs in S, called step-downs [Knuth(1973b)]. Thus, for our example, we have Runs(S1 ) = 4. 7. SUS (short for Shuffled Up-Sequences [Levcopoulos and Petersson(1990)]); it is a generalization of the Runs measure and is defined as the minimum number of ascending subsequences (of possibly not adjacent elements) into which we can partition a given sequence. In our example, SUS (S1 ) = 4. 8. SMS (short for Shuffled Monotone Subsequence); it further generalizes the previous measure: it is defined as the minimum number of monotone (ascending or descending) subsequences into which one can partition the input sequence [Levcopoulos and Petersson(1990)]. In our example, SMS (S1 ) = 3. 9. Enc: it refers to the concept of Encroaching lists introduced by Skiena in its adaptive algorithm Melsort [Skiena(1988)]; it is defined as the number of sorted lists constructed by Melsort when applied to a sequence. 10. Osc: it has been defined by Levcopoulos and Petersson [Levcopoulos and Petersson(1989)] after a study of Heapsort; in some sense it evaluates the “oscillations” of large and small elements in a given sequence. 11. Reg: this measure has been defined by Moffat and Petersson [Moffat and Petersson(1991), Petersson and Moffat(1995)]; it results that any Reg-optimal sorting algorithm is optimally adaptive with respect to the other 10 measures. A partial order and related equivalence relation on the above measures is provided by the following definition. Definition 1 Let M1 , M2 be two measures of disorder. We state that: 10 Figure 3: Metrics partial order 1. M1 is algorithmically finer than M2 (denoted M1 ≤alg M2 ) if and only if any M1 -optimal algorithm is also M2 -optimal. 2. M1 and M2 are algorithmically equivalent (denoted M1 =alg M2 ) if and only if M1 ≤alg M2 and M2 ≤alg M1 . Figure 3 shows in details the partial order introduced by ≤alg ; as already remarked, Regoptimality implies optimality with respect to any other of the above metrics, and SUS -optimality implies Runs-optimality, while it is implied by SMS -optimality. Therefore, to prove that NeatSort is optimal for all these metrics, it is enough to show that it is Reg-optimal. 3.1.1 Reg-optimality of NeatSort Moffat and Petersson [Moffat and Petersson(1991)] defined the measure Reg while studying more efficient variants of Insertion-Sort which improve the performance of Insertion-Sort by keeping track of the information gathered during the algorithm execution, such as the position at which the last elements5 have been inserted. Let di = {k|1 ≤ k < i ∧ min{xi−1 , xi } < xk < max{xi−1 , xi }} + 1 be the distance between the last insertion point to the actual insertion point and let di,j = {k|1 ≤ k < i ∧ min{xi , xj } < xk < max{xi , xj }} + 1 be the distance from xj , with j < i, to the insertion point of xi . Note that di = di,i−1 . Next, for i > 1, let ti = min{j|1 < j < i ∧ di,i−j = 1}; ti represents the amount of history needed for inserting xi in its final position. Q|X| Finally, by putting ri = min{t + di , i − t}, we then have Reg(X) = i=2 (ri − 1). Since everyPsublist L[q] is ordered after NeatSort’s preliminary phase, ri = 1 for i = 1, .., |X|, m and therefore q=0 Reg(L[q]) = 0 and thus NeatSort is adaptive with respect to the measure Reg, and it is also optimal for all the other measures defined above. 5 The algorithm Regional Insertion Sort searches, at each step, a logarithmic fraction of the element in the ordered portion of the array. 11 3.1.2 Metrics Lower Bounds for NeatSort Estivill-Castro and Wood introduced, in 1992 [Estivill-Castro and Wood(1992)], the notion of generic sorting algorithm (see Algorithm 3 below). ALGORITHM 3: Generic Sort Input: A list X with n elements. Output: The ordered version of the input list. if X is sorted then terminate; end if X is simple then sort X using an alternative sorting algorithm for simple sequences; end else if X is neither sorted nor simple then apply a division protocol to divide X into at least s ≥ 2 disjoint sequences; recursively sort the sequences using Generic Sort; merge the sorted sequences to obtain X in sorted order; end Remark 2 The definition of “simple” in Algorithm 3 depends on the actual definition of the algorithm. Table 1: Known lower bounds for disorder metrics Measure Lower bound: log kbelow(M (X), |X|, M )k Dis Ω(|X|(1 + log(Dis(X) + 1))) Exc Ω(|X|(1 + Exc(X) log(Exc(X) + 1))) Enc Ω(|X|(1 + log(Enc(X) + 1))) Inv Ω(|X| · (1 + log( Inv(X) + 1))) |X| M ax Ω(|X|(1 + log(M ax(X) + 1))) Osc Ω(|X| · (1 + log( Osc(X) + 1))) |X| Reg Ω(|X|(1 + log(Reg(X) + 1))) Rem Ω(|X|(1 + Rem(X) log(Rem(X) + 1))) Runs Ω(|X|(1 + log(Runs(X) + 1))) SM S Ω(|X|(1 + log(SM S(X) + 1))) SU S Ω(|X|(1 + log(SU S(X) + 1))) As is clear, NeatSort perfectly fits the description above. We can thus make use of the following theorem [Estivill-Castro and Wood(1990)]: Theorem 1 Let M be a measure of disorder such that a sequence X is simple whenever M (X) = 0, and let D ∈ R and s ∈ N be constants such that 0 ≤ D < 2 and s > 1. Also, let DP be a linear-time division protocol that divides any sequence X into s sequences of almost equal sizes. Then: 1. Generic Sort is worst-case optimal and it takes O |X| log |X| -time in the worst case. 2. Generic Sort is adaptive with respect to the measure M and it takes O |X| · (1 + log(M (X) + 1)) -time in the worst case, provided that s X M (j-th sequence) ≤ D · j=1 12 jsk 2 · M (Y ) holds, for all sufficiently long sequences Y . Table 1 reports the known lower bounds for the metrics defined in Section 3.1: NeatSort, as proved above, being optimal for all these metrics, meets all such lower bounds. 4 Performance ALGORITHM 4: Melsort Input: A list X of length n. Output: The ordered version of the input list. listCount := 1; put X1 in list1 ; for i := 2 to n do for j := 1 to listCount do if Xi < head(listj ) then add Xi to the head of listj ; break; end else if Xi > tail(listj ) then add Xi to the tail of listj ; break; end end if Xi couldn’t be added to any list then add 1 to listCount; create list listCount ; put Xi in the newly created list; end end while listCount > 1 do if odd (listCount) then head(listCount − 1) := merge(head(listCount − 1), head(listCount)); end for i := 1 to | listCount 2 | do head(i) := merge head(i), head(| listCount | + i) ; 2 end listCount /:= 2; end return head(1) To test the performance of our algorithm, a test suite has been designed to benchmark NeatSort behavior against a tuned version of random Quicksort algorithm, the standard C++ qsort function, a tuned version of Mergesort, and Skiena’s Melsort algorithm (whose pseudo code is shown in the box for Algorithm 4). To minimize the influence of kernel and other background programs running at the same time, the test suite iterates a loop executing in turn all 5 algorithms, once per iteration, on (a copy of) the same array; these arrays are generated randomly (or according to specific criteria) at every iteration. In this way, possible computational lags due to external factors will affect on average all the algorithms much in the same way. The test suite has been run on different machines: 13 • a desktop PC with an Intel core-duo processor and 2 GB of RAM, running Windows Vista, 32 bit version; • an Asus notebook with an Intel Core i7 2.0 GHz processor, 6 GB of RAM and running both Windows 7, 64 bit version and, in a separate partition, Ubuntu 10, 64 bit version; • a Fujitsu Siemens notebook with an Intel core-duo P8400 processor (2.26 GHz), 4 GB of RAM and running Ubuntu 10, 64 bit version. Under Windows, Microsoft Visual C++ Express has been used, setting the compiler to make advantage of the multicore processor and to optimize the code for faster execution. Under Linux, the Netbeans 6.9.1 suite had been used with the g++ compiler set for multicore processor 64 bit machines. The simulation has provided consistent results on all the platforms. To ensure the greatest precision in evaluating algorithms’ performance, it has been used the high resolution time measure mechanism provided by both systems: by window.h library in Windows (the minimum measurable interval is approximately 10 microseconds, with a resolution of 1190000 tick per second) and by time.h library in Linux; using the clock-gettime function, the interval resolution is 1 nanosecond. At each iteration, for every algorithm the number of intervals consumed is stored and then, at the end of the cycle, the median value is extracted for each algorithm; the median value, unlike the average (that is computed anyway), is not affected by extreme, out of scale, values, which can be caused by unpredictable peaks of requests for OS’ services: this is especially true for large testing sets. Both values (median and average) are expressed in milliseconds and rounded to the microsecond. The first test suite has been run on random arrays, then a few specific cases are examined: ordered arrays, inversely ordered arrays, and partially ordered ones. We tried to make as an extensive test as it was possible, considering the time requested to sort huge arrays; in details, the number of iteration has been fixed depending on array’s size. Table 2: Relation between array size and number of test cases Array size Test cases 3.276.800 5000 Da 100 a 102.400 10000 Da 204.800 a 409.600 50000 819.200 25000 1.638.400 10000 Da 6.553.600 a 26.214.400 1000 50.000.000+ 500 4.1 Random Arrays Tests on random arrays show consistent performance for NeatSort as the size of the arrays grow. For small arrays, the best performing algorithm is the implementation of random Quicksort, provided here, that had been optimized for best performance. For larger arrays, however, this algorithm’s performance progressively degrades, while NeatSort and qsort steadily grows with n log n, as highlighted using a logarithmic scale to visualize the results (Figure 6). The results have been averaged over all the testing platforms. 14 Figure 4: Execution time (ms) on random arrays Figure 5: Details of previous chart for arrays of size ≤ 50K elements 15 Figure 6: Execution time (ms) on random arrays - log10 scale 4.2 Sorted arrays (Most favourable case) Sorted arrays are the most favourable case for NeatSort, and indeed the measured performance shows that NeatSort’s running time is several order of magnitudes smaller than the other algorithms. 16 Figure 7: Execution time (ms) on sorted arrays Figure 8: Execution time (ms) on sorted arrays - log10 scale 17 4.3 Statistics about performance and disorder metrics So far, we have only examined the two extremes of the input landscape; sorted arrays are, by design, the most favourable case for NeatSort, but it would be reasonable to expect that when run on nearly sorted arrays the algorithm would largely benefit from the analysis phase and demonstrate superior performance. To further investigate this issue, we run a series of comparative tests on qsort and NeatSort, gathering, together with performance measurements, a set of statistics about the degree of disorder of the input, with the goal of bringing up correlations between the relative performance, and • the number of inversions, • the max distance of elements to their position in the sorted sequence, • the number runs in the input array. For each of these metrics, two charts are shown: 1. a 2D chart stressing correlation between the metric and the relative performance of NeatSort in comparison to qsort; 2. a 3D chart, where each point in the R2 domain correspond to an input sequence identified by its size and the measure for the metric. In both charts, the relative performance is expressed in percentage, and computed as Tqsort − TNeatSort × 100 . max(Tqsort , TNeatSort ) So positive values show better performance for NeatSort (the greater the absolute value, the better), and negative values, instead, shows cases in which qsort outperformed NeatSort. Values are shown using a gradient going from green (for positive values), to yellow (for ties), to red (for negative values). In the 2D charts, the size of the dots is proportional to the size of the test case. 4.3.1 Inversions The number of inversions is shown as a percentage of the maximum number of possible inversions for the input size: for a sequence of length n, there can be at most n(n−1) inversions. 2 As expected, the data plots a bowl-shaped figure with a minimum corresponding to 50% of inversions, while ordered sequences (0% of inversions) and reversed sequences (100% of inversions) represent the best case scenario for NeatSort. The figure also shows a different cluster, showing almost constant relative performance, in correspondence with larger input sequences. As it is also clarified by the 3D chart, this anomaly in the results testify that the performance delta in favour of NeatSort grows with the size of the input. 18 Figure 9: Relative performance for NeatSort and qsort, with respect to percentage of inversions (spots proportional to array size) Figure 10: Relative performance for NeatSort and qsort, with respect to percentage of inversions and size 4.3.2 Max distance The charts in this section show the relative performance with respect to the maximum distance of elements in the input (expressed as a percentage of the input length). Interestingly enough, 19 NeatSort’s relative performance steadily improves not only as the max distance becomes smaller, but also as the size of the input grows. Figure 11: Relative performance for NeatSort and qsort, with respect to max distance / array size (spots proportional to array size) Figure 12: Relative performance for NeatSort and qsort, with respect to max distance / array size and size 20 4.3.3 Runs The number of runs is expressed as a percentage of the input length; Figures 13 and 14 show, as expected, much the same shape as Figure 9: 0% runs corresponds to sorted sequence, but as runs grows from 50% (the global minimum for relative performance) to 100% (and hence toward reversed sequences), NeatSort performs increasingly better. Interestingly, a local maximum is present in correspondence to sequences with nearly 60K elements and 78% of runs. Figure 13: Relative performance for NeatSort and qsort, with respect to runs/size (spots proportional to array size) 21 Figure 14: Relative performance for NeatSort and qsort, with respect to runs/size and size 5 Conclusions We have presented an intuitive adaptive sorting algorithm that proves to be optimal for most of the disorder metrics present in literature.Although other algorithms exists that outperforms it on peculiar ad hoc metrics (in particular Melsort [Skiena(1988)]), those algorithms, as most of the adaptive algorithms in literature, have such an intricate workflow that their implementations results slower by some order of magnitude in comparison with Mergesort or Quicksort.For NeatSort, instead, we have carefully both engineered its design to be as simple as possible and tuned its implementation to make it extremely efficient and performant.The result is a flexible and fast algorithm which on average is as efficient as Quicksort and outperforms even the C implementation of qsort: the ratio between the running times of Neatsort and qsort, besides being consistently below 1, gets progressively smaller as the number of inversions moves from 50% to both 0% and 100%, i.e. to sorted sequences in direct and inverse order. 6 Acknowledgments Charts in Sections 4.1 and 4.2 have been created with Excel c Starter 2010, while the remaining charts have been created with the MatPlotLib Python library. References [Estivill-Castro and Wood(1989)] Vladimir Estivill-Castro and Derick Wood. A new measure of presortedness. Information and Computation, 83(1):111–119, 1989. [Estivill-Castro and Wood(1990)] Vladimir Estivill-Castro and Derick Wood. A generic adaptive sorting algorithm. University of Waterloo, Computer Science Department, 1990. [Estivill-Castro and Wood(1992)] Vladmir Estivill-Castro and Derick Wood. A survey of adaptive sorting algorithms. ACM Computing Surveys (CSUR), 24(4):441–476, 1992. [Knuth(1973a)] Donald E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming, section 5.2.1. Addison-Wesley, Reading, Massachusetts, second edition, 10 January 1973a. Full INBOOK entry (w series). [Knuth(1973b)] Donald E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming, page 161. Addison-Wesley, Reading, Massachusetts, second edition, 10 January 1973b. Full INBOOK entry (w series). [Levcopoulos and Petersson(1989)] Christos Levcopoulos and Ola Petersson. A note on adaptive parallel sorting. Information processing letters, 33(4):187–191, 1989. [Levcopoulos and Petersson(1990)] Christos Levcopoulos and Ola Petersson. Sorting shuffled monotone sequences. Springer, 1990. [Mannila(1985)] Heikki Mannila. Measures of presortedness and optimal sorting algorithms. Computers, IEEE Transactions on, 100(4):318–325, 1985. 22 [Moffat and Petersson(1991)] Alistair Moffat and Ola Petersson. Historical searching and sorting. In ISA’91 Algorithms, pages 263–272. Springer, 1991. [Petersson and Moffat(1995)] Ola Petersson and Alistair Moffat. A framework for adaptive sorting. Discrete Applied Mathematics, 59(2):153–179, 1995. [Skiena(1988)] Steven S. Skiena. Encroaching lists as a measure of presortedness. BIT Numerical Mathematics, 28(4):775–784, 1988. 23 View publication stats

Log In

NeatSort - A practical adaptive algorithm

Related papers

Related papers