csc148 reflections: Sorting and Efficiency

There are various methods of sorting lists and those different methods have to be considered when a programmer wishes to sort lists of different builds and lengths. A programmer has to consider those variations in order to implement the most efficient sorting method.

Insertion sort is a sorting algorithm that iterates through the entire length of the list, and after each iteration, takes the first element from the unsorted part of the list, say x, (at the beginning it will be the element at index 0) and puts it in its place in the sorted part of the list (e.g. when the element after x is bigger than x and the element before x is smaller than x in the sorted part (or x is at index 0 or -1)). Henceforth, after each iteration, the sorted part of the list becomes bigger by one element, and the unsorted part becomes smaller by one element; until the unsorted part does not exist anymore and the entire list is sorted. its worst case performance would occur when the list is ordered decreasingly, as at every iteration the first element in the unsorted part would be inserted as the first element of the sorted part (iterating over the entire length of the sorted part of the list). This would be done n times (n = length of list) and so therefore Insertion sort has a O(n^2) as there would be n iterations (to traverse the list and insert) n times. Insertion Sort's Best Case Performance would occur when the list is already ordered increasingly, as there would not be any displacement of elements (the sorted part would solely increase at every iteration) therefore BCP is O(n). This sorting algorithm is more efficient then others for a big list however, as there are not as many inversions as say, selection or bubble sort.

Selection sort is a different but similar sorting algorithm that takes the minimum value from the unsorted part of the list and puts it as the last element of the sorted part (equivalently to Insertion sort, the sorting part increases by one and unsorted decreases by one after each iteration of the function). Henceforth, the function has to iterate over all of the elements of the unsorted part at every iteration to 'make sure' that there is no other number smaller than the 'current minimum'. This algorithm is not the most efficient because as it happens, even when the list is sorted, the function iterates over all of the elements in a list, L, every time it adds an element to the sorted part of the list. Therefore its runtime is always the same, namely (n-1)n : runs over all elements in the list (n-1) times.

Bubble sort is another algorithm that in some cases is more efficient than selection and in other less. the algorithm consists of comparing pair of values in the unsorted part of the list (in this algorithm the sorted part is at the end of the list 'incrementing' downwards until the first element of the list is also sorted) and if the element at index i's value is smaller than element at index i+1's value, it will switch between the two, and i will be incremented by one, therefore the function compares the next two elements, until it reaches the end of the unsorted part of the list, whereby the largest element is added to the beginning of the sorted part of the list. Similarly to the other sorting algorithms this process is repeated until the sorted part of the list is equal to the list itself. Just like the previously mentioned algorithms, Bubble Sort's worst case performance will occur when the list is ordered decreasingly, as there will be at least n inversions, n times (n being the length of the list) so Bubble sort has a O(n^2). One of the main advantages of Bubble Sort however, is its capability of quickly recognizing whether a list is already ordered or not, as it compares all pairs of elements in the list.

Merge Sort is a rather efficient algorithm due to the fact that it applies the method of 'divide and conquer'. It splits the to-be-ordered list continually using recursion. If the lists that we end up with is empty or has only one item, they are ordered by definition. Otherwise new lists are recreated by comparing the first elements of two smaller lists at a time and merging them. This is done by recreating bigger, and bigger lists, until the final list is made, and there is no longer anything two lists that have to be compared as all of them have been merged into one, in order. this algorithm Worst case performance has O(nlogn) as it splits the list logn times (since the algorithm splits the a list of length 2, 1 time; a list of 4, two times (makes 2 lists of size 2 and comparing them within each other before merging them together. and for a list of size 8, it splits into two lists of 4... 2... etc..). after the splitting comes the merging, where the function has to put all the elements of the lists into the new list (i.e. n passes), as many times as the list was split. therefore the runtime is nlogn. it is faster on average then quicksort, as it makes less comparisons.

The Quick Sort algorithm also uses a divide and conquer method. it does so by partitioning the list using by firstly locating two position markers at the beginning and at the end of the remaining items in the list that is not sorted and picking a pivot value (it is usually random). It then moves the items that are on the wrong side of the value (if smaller to the right, if bigger to left) to the right side, while converging the right and left position markers towards the split point. Henceforth after each iteration, the left and right markers approach each other until they cross each other's path. This signifies that all the values in the list are ordered as all the values to the left of the left mark are ordered and all the values to the right of the right mark are also ordered. The worst case performance of this algorithm occurs when the pivot value is always the biggest one in the unsorted part of the list, which is rather rare. This however also has a O(nlogn) as it does the left and right side sorting n times (until left mark has crossed with right mark (i.e. times (right mark incrementing leftwards) + times (left mark incrementing rightwards) = size of list = n)).

The final sorting algorithm that was discussed in this course is Python's built in sort. It combines all the different aforementioned sorting algorithms and using each one of them when it is most efficient to do so (with respect to size, how ordered it is and a number of other combination of factors). It basically does the programmer's work by choosing the most efficient sorting algorithm. However, if one does not trust or does not wish to use a great algorithm that was created by a great man, Tim Peters: one can always use the sorting algorithms that are explained in this post.

csc148 reflections

Wednesday, April 2, 2014

Sorting and Efficiency

No comments:

Post a Comment