Skip to content

Files

Latest commit

783bfb0 · Feb 15, 2019

History

History

data

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Dec 17, 2018
Feb 15, 2019
Dec 17, 2018
Dec 17, 2018

WikiSort datasets

Command line arguments

The common code base parses command line arguments when running the test programs. The following options are available.

  • -m MAX_LEN set the maximum array length to test (inclusive) (default 1000)
  • -s START_LEN set the starting array length (default 100)
  • -i LEN_INC set the amount by which the array should grow; if linear growth is set, the length increases by this much every time, otherwise the length is scaled up by this amount (default 100)
  • -e enables exponential array length growth, thus disabling linear growth (default linear growth)
  • -t NUM_TRIALS sets the number of trials to run for each array length before computing an average (default 10)
  • -o OUTPUT sets the filename of the output file; this flag must be set

Description

The data included in the repository was generated by timing the execution of sorting algorithms on arrays.

Filename Description
ID.ods, ID 2.ods This test, which used random data, compared the runtime of Shellsort on lists containing integers and doubles. The first test used 3-smooth numbers as the gaps while the second used Sedgewick's single formula sequence from 1986. In the first test, the program was compiled without optimizations and integer comparison was measured to be slower than double comparison. In the second test, the program was compiled with optimizations and integer comparison was significantly faster than double comparison. The integer and double lists contained elements with the same integral portion. The elements in the double list also included two decimal places.
IHQ.ods, IHQ 2.ods This test used random data and compared the runtime of introsort, heapsort, and quicksort. The introsort algorithm uses both the two other sorting algorithms' techniques. The data shows that heapsort is much slower for large data sets while introsort isn't much faster than quicksort.
Shell Gaps.ods This test used random data and compared the runtime of Shellsort using different gap sequences. The data shows that under the given experimental conditions, the single-formula sequence described by Sedgewick in 1986 was the fastest by a margin of several thousand microseconds with large arrays. The two sequences described by Pratt in 1971 were by far the slowest, taking over 10 times as long as the other sequences to sort any array over 2000 elements in length.
Shell Gap Generation.ods To determine the overheads of different gap sequences used with Shellsort, this test compares the runtimes of just the gap sequence generation. This is including memory deallocation time. The data shows that the sequence described by Pratt in 1971 using 3-smooth numbers was the slowest, taking over 10 times as long as the other sequences to generate. The sequence described by Incerpi and Sedgewick in 1985 was the second slowest, but only taking on the order of 4-5 microseconds compared to over 30 with Pratt's sequence even for short arrays. The other sequences were all comparatively trivial to generate, taking only 2-3 microseconds or less.
Comb Shrink Factors.ods This test used random data and compared the runtime of comb sort using different shrink factors. The original author of the paper in which comb sort was described suggested that 1.3 was the ideal shrink factor for comb sort. However, the data shows that under the given experiemental conditions, 1.4 and 1.5 both outperformed this value. A shrink factor of 1.6 also outperformed 1.3 for most of the attempts. Using 1.4 resulted in the sorting operation taking an average of almost 36 milliseconds less than 1.3.
List Gaps (Merge).ods, List Gaps (Cocktail).ods, List Gaps (Cocktail) 2.ods These tests generated lists of integers that were random, in reverse order, kind of sorted, or that had few unique elements. These were then sorted using different sorting algorithms. The data shows that sorting the reversed lists was fastest for mergesort. One possible hypothesis to explain this is branch prediction. Since the input array was reversed, the merge operation always took the elements from the second sub-array first, as these were always the largest in the set. With large arrays, this led to the millisecond runtime difference shown. With the cocktail sort, having few unique elements had next to no impact on the runtime when compared to having random elements. However, since cocktail sort is an exchange sort, the reversed lists took the longest to sort, as would be expected.
Memoization.ods In this test, the Shellsort algorithm was used with and without memoization of the gap sequences. The tests used Shell's 1959 sequence and Pratt's sequence of 3-smooth numbers, the latter of which was found to be the slowest in the test on Shell gap generation. The data shows that gap generation is comparatively fast enough that the use of memoization does not prevent any noticeable overhead during sorting. At longer array lengths, the sorting time was on the order of thousands of microseconds, over a hundred times longer than required to generate Pratt's sequence even at arrays with ten times the length. Together with the data from the Shell gap generation test, it can be easily concluded that gap sequence memoization does not provide any benefit.