Academia.eduAcademia.edu

Investigation of sorting algorithms on a variety of computers

1990, Computers in Physics

Four sorting algorithms-bubble, insertion, heap, and quick-are studied on an IBM 3090/ 600, a VAX 11/780, and the NYU Ultracomputer. It is verified that for N items the bubble and insertion sorts are of order N 2 whereas the heap and quick sorts are of order N In N. It is shown that the choice of algorithm is more important than the choice of machine. Moreover, the influence of paging on algorithm performance is examined.

Investigation of sorting algorithms on a variety of computers Marvin Bishop, Jordan D. Perel and Victoria Swan Citation: Computers in Physics 4, 179 (1990); doi: 10.1063/1.168362 View online: https://doi.org/10.1063/1.168362 View Table of Contents: https://aip.scitation.org/toc/cip/4/2 Published by the American Institute of Physics ARTICLES YOU MAY BE INTERESTED IN Quantum transmission in periodic potentials: A transfer matrix approach Computers in Physics 4, 176 (1990); https://doi.org/10.1063/1.168361 The Numerical Solution of the N-Body Problem Computers in Physics 4, 142 (1990); https://doi.org/10.1063/1.4822898 Efficient random walk algorithm for computing conductivity in continuum percolation systems Computers in Physics 4, 181 (1990); https://doi.org/10.1063/1.168363 How to use MACSYMA to write long FORTRAN codes for noncompact simulations of gauge theories Computers in Physics 4, 159 (1990); https://doi.org/10.1063/1.168380 A comparison of line integral algorithms Computers in Physics 4, 166 (1990); https://doi.org/10.1063/1.168381 Derive—A Mathematical Assistant Computers in Physics 4, 210 (1990); https://doi.org/10.1063/1.4822903 Investigation of sorting algorithms on a variety of computers Marvin Bishop, al Jordan D. Perel, and Victoria Swan Department of Mathematics and Computer Science, Manhattan College, Riverdale, New York 10471 (Received 18 April 1988; accepted 1May1989) Four sorting algorithms-bubble, insertion, heap, and quick-are studied on an IBM 3090/ 600, a VAX 11/780, and the NYU Ultracomputer. It is verified that for N items the bubble and insertion sorts are of order N 2 whereas the heap and quick sorts are of order N In N. It is shown that the choice of algorithm is more important than the choice of machine. Moreover, the influence of paging on algorithm performance is examined. Many algorithms have been developed to perform sorting. 1·2 These can be classified on the basis of their time complexity. Given N integers one asks what is the time T(N) required to sort them. In many cases T(N) can be analytically determined. Usually T(N), for large N, is of the form AN 2 or BN In N. Here, A and Bare constants for a given computer. They reflect the cycle time and architecture of a particular machine. In this Note we study four sorting methods-bubble sort, insertion sort, heap sort, and quick sort-on a variety of computers: IBM 3090/600, VAX 11/780, and the NYU Ultracomputer. The same random number generator is used on all machines to produce the integers for sorting. These integers are stored in an array and the sorting routines operate on that array. The four sorting routines considered here are written in FORTRAN and have been taken from Boillot's3 (bubble sort) and Press et al. 's4 (insertion, heap, and quick sorts) textbooks. The appropriate computer system timer has been employed to obtain T(N) in milliseconds. The results of our study are presented in Table I and Figs. 1-4. The coefficients A and Bare listed in Table I; they have been determined by graphical fits to the T(N) vs N data. Consider first the N 2 -order sorts. The ratio of the A's for the bubble and insertion sorts is 120.2 and 127. l for the Ultracomputer/IBM and 7.1 and 7.2 for the VAX/ IBM. These ratios reflect the difference in raw speed of the computers; the IBM is the fastest and the Ultracomputer is the slowest. It should be noted that the special features of these machines have not been used in this study; the IBM 3090/600 is a vector computer and the NYU Ultracomputer is an eight processor parallel machine (each processor is a Motorola 68010 10-MHz unit). We have done all the timing runs for serial code and thus have used only one processor in the Ultracomputer. Moreover, the current Ultracomputer is an early prototype and the timings reported here are not indicative of the speed of future models that will contain many more and much faster processini units. The curves of Figs. 1and2 illustrate the N 2 order of the bubble and insertion sorts. The straight lines have been drawn with slope A. These figures further indicate the relative raw speeds of the different machines. The ratio of the A 's for bubble/insertion sorts for a given computer is 4.51 (Ultracomputer), 4.73 (VAX), and 4.77 (IBM). Thus the bubble sort is significantly slower than the insertion sort. Indeed, insertion sort on the VAX is almost as fast as 'bubble sort on the IBM, even though the IBM computer is about seven times faster for either of these sorts. Now consider the N In N-order sorts. The ratio of the B's for the heap and quick sorts is 97.0 and 149.5 for Ultracomputer/IBM and 6.2 and 8. 7 for VAX/IBM. Again these reflect the raw processing speed. The ratio of the B 's for heap/quick sort for a given machine is 1.01 (Ultracomputer), 1.10 (VAX), and 1.55 (IBM), demonstrating that there is not a great difference in performance between heap and quick sorts. Of course the ratio of the orders of the sorts, AN I B In N, indicates that for large N the slowest machine will do better than the fastest if we use the proper sorting routine. In our case quick sort on the Ultracomputer for 1 000 000 integers will be nearly 215 times faster TABLE I. The timing results. Sort Machine Bubble Ultra VAX IBM 0.1290 0.00762 0.0010727 Insertion Ultra VAX IBM 0.0286 0.00161 0.000225 Heap Ultra VAX IBM 0.3651 0.0232 0.003763 Quick Ultra 0.36205 0.0210 0.002421 VAX •>Present address: Department of Chemistry, UMIST, Manchester M60 lQD, England. IBM Order coefficient COMPUTERS IN PHYSICS, MAI/ APR I 880 178 BUBBLE SORT "'E "'0 :::: w >: >-- 3. 2 * IBM 2. 8 + VAX 2.• X ULTRA 2. 0 FIG. I. Bubble sort. I. 6 FIG. 3. Heap sort. I. 2 o. 8 0. 4 o. 0 0. 0 0.2 0.4 0.6 2 N I10 0.8 1.0 8 than bubble sort on the IBM, even though the IBM machine is about 130 times faster than the Ultracomputer. Figures 3 and 4 present the timing data for heap and quick sorts, respectively. The straight lines have been drawn with slope B. The unusual feature of these plots is the nature of heap sort on the VAX. Indeed, it seems as if the heap sort algorithm is of order N ln N for N below a maximum ofabout 150 000; forlarger N's the orderis larger than N ln N. This behavior is not surprising when we consider some details of the VAX architecture. It is a paging machine for which the page size is 512 bytes. The entire program does not need to reside in central memory for a paged machine. 5 Pages are transferred from backing store on demand. Since an integer is 4 bytes long, each VAX page can hold 128 integers. The working set size, the number of active pages needed for the program to run, is fixed at a maximum of 1000 for the VAX at Manhattan College. Hence, as N grows to 128 000 the page fault rate will increase greatly. After this critical number of pages is in use, most of the computer time will be spent swapping pages to and from the backing store. That is the reason for the dramatic increase in time illustrated in Fig. 3. The IBM is also a paged machine but its page size is eight times larger than the VAX and thus the swapping effects will not show up until a larger N is reached. The NYU Ultracomputer is not a paged machine and these effects are not present. The observation that the paging only influences the heap sort and not the other sorts is explained by the manner in which memory addresses are referenced in the different sorts. For both bubble and insertion sorts the algorithm is coded as a set of nested sequential loops. In quick sort the initial unsorted data set is partitioned into small sequential files. These three sorts have good "locality" and their address reference strings are likely to be on the same page. In the INSERT ION SORT heap sort, however, the address references jump around the heap and the data will therefore be found on different pages. Either heap or quick sort should be used when sorting large data files. If one is using a paged machine, the quick sort algorithm is preferred. ACKNOWLEDGMENTS This research has been supported by the Donors of the Petroleum Research fund, administered by the American Chemical Society and the Manhattan College Computer Center. It was conducted using the supercomputing resources of the Center for Theory and Simulation and Science and Engineering at Cornell University, which receives funding in part from the National Science Foundation, New York State, and the IBM Corporation. We thank Professor Malvin H. Kalos for providing Ultracomputer time for our investigation. We also wish to thank John Hanley and Patricia J. Teller for helpful discussions and David Brown for assistance with the figures. 2. 8 2.• "'E 2. 0 "'0 ..... I. 6 >: I. 2 w ;: * IBM + VAX FIG. 2. Insertion sort. X ULTRA o. 8 0.• 0. 0 0. 0 180 o. 2 0.4 0.6 N2 I10 8 0.8 1.0 COMPUTERS IN PHYSICS, MAR/APR 1990 REFERENCES I. D. E. Knuth, The Art of Computer Programming (Addison-Wesley, Reading, MA, 1973). 2. R. Sedgewick, Algorithms (Addison-Wesley, Reading, MA, 1983). 3. M Boillot, Understanding FORTRAN 77 (West, St. Paul, 1987). 4. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes: The Art of Scientific Computing (Cambridge U.P., New York, 1986). 5. A. Silberschatz and J. L. Peterson, Operating Systems Concepts (Addison-Wesley, Reading, MA, 1988).