In Randomized Quick Sort A Random Element Is Choose As A Pivot Element

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 17

Course Code :MCS-031 Course Title :Design and Analysis of Algorithms Assignment Number :MCA(3)/031/Assign/2011 Maximum Marks :100

Weightage :25% Last Dates for Submission :15th April, 2011 (For January Session) 15th October, 2011 (For July Session)
There are four questions in this assignment, which carries 80 marks. Rest 20 marks are for viva-voce. Answer all the questions. You may use illustrations and diagrams to enhance the explanations. Please go through the guidelines regarding assignments given in the MCA Programme Guide for the format of presentation. The examples, whenever asked to be given, should be different from those that are discussed in the course material.

Question 1:
(a) What is Randomized Quicksort? Analyse the expected running time of Randomized Quicksort, with the help of a suitable example. (5 Marks)

Ans :

In Randomized quick sort a random element is choose as a pivot element. In very early versions of quicksort, the leftmost element of the partition would often be chosen as the pivot element. Unfortunately, this causes worst-case behavior on already sorted arrays, which is a rather common use-case. The problem was easily solved by choosing either a random index for the pivot, choosing the middle index of the partition or (especially for longer partitions) choosing the median of the first, middle and last element of the partition for the pivot. Consider a zero-sum game between player A, whose strategies are deterministic algorithms, and player B, whos strategies are inputs for As algorithms. The cost of a strategy profile is the running time of As chosen algorithm on Bs chosen input. Therefore, player A tries to minimize the cost, and player B tries to maximize it. In the world of pure strategies, for every algorithm that A chooses, B may choose the most costly input this is the worst case scenario, and can be found using standard complexity analysis. But in the real world, inputs are normally not selected by an evil opponent rather, they come from some distribution over inputs. Since this is the case, if we allow the algorithms to also be drawn from some distribution, we may

look at the game as one that allows mixed strategies. That is, each player chooses a distribution over its strategies.

Analysis

Incorporating mixed strategies into the game allows us to use von Neumann's minimax theorem: where R is a distribution over the algorithms, D is a distribution over inputs, A is a single deterministic algorithm, and T(A, D) is the average running time of algorithm a on input D. More specifically: If we limit the set of algorithms to a specific family (for instance, all deterministic choices for pivots in the quick sort algorithm), choosing an algorithm A from R is equivalent to running a randomized algorithm (for instance, running quick sort and randomly choosing the pivots at each step). This gives us an insight on Yao's principle, which states that the expected cost of any randomized algorithm for solving a given problem, on the worst case input for that algorithm, can be no better than the expected cost, for a worst-case random probability distribution on the inputs, of the deterministic algorithm that performs best against that distribution.

Algorithm: void RandQuickSort(int Array[], int l, int r) { int piv=l+(rand()%(r-1+1); swap(Array[1],Array[piv]); int i = l+1; int j = r;

void RandQuickSort(int Array[], int l, int r) { int piv=l+(rand()%(r-1+1); swap(Array[1],Array[piv]); int i = l+1; int j = r; while (1) { while(Array[i] <= Array[1] && i<r) ++i; while (Array[j] <= Array[l] && j>l) -j; if (i >=j) { swap(Array[j],Array[l]); return j; } else Swap(Array[i],Array[j]); } (b) Explain the Greedy Structure algorithm. Give an example in which the Greedy technique fails to deliver an optimal solution. Ans: A greedy structure algorithm is any algorithm that follows the problem solving heuristic of making the locally optimal choice at each stage[1] with the hope of finding the global optimum. For example, applying the greedy strategy to the traveling salesman problem yields the following algorithm: "At each stage visit the unvisited city nearest to the current city". In general, greedy algorithms are used for optimization problems. In general, greedy algorithms have five pillars: 1.A candidate set, from which a solution is created 2.A selection function, which chooses the best candidate to be added to the solution 3.A feasibility function, that is used to determine if a candidate can be used to contribute to a solution 4.An objective function, which assigns a value to a solution, or a partial solution, and

5.A solution function, which will indicate when we have discovered a complete solution.
Types

Greedy algorithms can be characterized as being 'short sighted', and as 'non-recoverable'. They are ideal only for problems which have 'optimal substructure'. Despite this, greedy algorithms are best suited for simple problems (e.g. giving change). It is important, however, to note that the greedy algorithm can be used as a selection algorithm to prioritize options within a search, or branch and bound algorithm. There are a few variations to the greedy algorithm: Pure greedy algorithms Orthogonal greedy algorithms Relaxed greedy algorithms

Cases of failure

For many other problems, greedy algorithms fail to produce the optimal solution, and may even produce the unique worst possible solution. One example is the traveling salesman problem mentioned above: for each number of cities there is an assignment of distances between the cities for which the nearest neighbor heuristic produces the unique worst possible tour.[3] Imagine the coin example with only 25-cent, 10-cent, and 4-cent coins. The greedy algorithm would not be able to make change for 41 cents, since after committing to use one 25-cent coin and one 10-cent coin it would be impossible to use 4-cent coins for the balance of 6 cent. Whereas a person or a more sophisticated algorithm could make change for 41 cents change with one 25-cent coin and four 4-cent coins.

(c) Describe the two properties that characterise a good dynamic programming Problem. Ans: Dynamic programming is a useful type of algorithm that can be used to optimize hard problems by breaking them up into smaller subproblems. By storing and reusing partial solutions, it manages to avoid the pitfalls of using a greedy algorithm. There are two kinds of dynamic programming, bottom-up and topdown. In order for a problem to be solvable using dynamic programming, the problem must possess the property of what is called an optimal substructure. This means that, if the problem was broken up into a series of subproblems and the optimal solution for each subproblem was found, then the resulting solution would be realized through the solution to these subproblems. A problem that does not have this structure cannot be solved with dynamic programming. Luckily, dynamic programming has become really in when it comes to competitive programming. Check out Dynamic Programming on UVAJudge for some practice problems that will test your ability to implement and find recurrences for dynamic programming problems. Dynamic programming is a technique for solving problem and come up an algorithm. Dynamic programming divide the problem into subparts and then solve the subparts and use the solutions of the subparts to come to a solution.The main difference b/w dynamic programming and divide and conquer design technique is that the partial solutions are stored in dynamic programming but are not stored and used in divide and conquer technique.

(d) State Traveling Sales Persons problem. Comment on the nature of solution to the problem.
Ans : The Travelling Salesman Problem (TSP) is an NP-hard problem in combinatorial optimization studied in operations research and theoretical computer science. Given a list of cities and their pairwise distances, the task is to find a shortest possible tour that visits each city exactly once. The problem was first formulated as a mathematical problem in 1930 and is one of the most intensively studied problems in optimization. It is used as a benchmark for many

optimization methods. Even though the problem is computationally difficult, a large number of heuristics and exact methods are known, so that some instances with tens of thousands of cities can be solved. The TSP has several applications even in its purest formulation, such as planning, logistics, and the manufacture of microchips. Slightly modified, it appears as a subproblem in many areas, such as DNA sequencing. In these applications, the concept city represents, for example, customers, soldering points, or DNA fragments, and the concept distance represents travelling times or cost, or a similarity measure between DNA fragments. In many applications, additional constraints such as limited resources or time windows make the problem considerably harder. In the theory of computational complexity, the decision version of the TSP (where, given a length L, the task is to decide whether any tour is shorter than L) belongs to the class of NP-complete problems. Thus, it is likely that the worst case running time for any algorithm for the TSP increases exponentially with the number of cities. TSP can be modeled as an undirected weighted graph, such that cities are the graph's vertices, paths are the graph's edges, and a path's distance is the edge's length. A TSP tour becomes a Hamiltonian cycle, and the optimal TSP tour is the shortest Hamiltonian cycle. Often, the model is a complete graph (i.e., an edge connects each pair of vertices). If no path exists between two cities, adding an arbitrarily long edge will complete the graph without affecting the optimal tour.

Question 2: Give an analysis of each one of the following with examples: a. b. c. d. e. Ans : 1.Insertion Sort We can give a more precise analysis of the operation of insertion sort. It is going to be a bit complicated. We will find out in the next section that insertion sort is not a very good sorting method and should not be used except in special circumstances. So why waste time on a complicated analysis? Simply because it happens to be a reasonably nice piece of mathematics from which you might learn something. The important quantity in the analysis of the timing of Insertion Sort is the total distance through which the elements of the list are moved to the right during the sort. Let us assume, as usual, that we are dealing with a permutation of (1, 2,..., n). Insertion Sort Selection Sort Heap Sort Merge Sort Quick Sort

By the nature of the algorithm, the number of places that i is moved in the sort is equal to the number of elements that are less than i and to the right of i in the original list. For example, if you apply insertion sort to the list = (7, 4, 5, 1, 3, 2, 6) you will find that 1,2 and 6 do not get moved at all, 3 is moved one place (2 is less than it), 5 is moved 3 places (1,3,2 are less than it) and so on. Bearing this in mind let us associate to each permuted list Sn a list m = (m1, m2,..., mn) where mi is the number of numbers in that are less than i and to the right of it. For the permutation of the previous paragraph we have m = (0, 0, 1, 3, 3, 0, 6) In this m5 = 3 because, of the four numbers less than 5, three of them are to the right of 5 in . The lists m that we produce from permutations obviously have the property that 0mi < i, because there are only i - 1 positive numbers less than i. In particular m1 must be 0 and m2 can only take the values 0 or 1. Let us now introduce the following set Mn: Mn = {(m1, m2, m3,..., mn) : 0mi < i, i = 1, n} Then what we have done is constructed a mapping : SnMn m. 2.Selection Sort Once again, here are the formulas we derived for the number of comparisons and swaps the Selection Sort requires in the worst case. The number of items to be sorted is represented by n. # of swaps = (n-1) # of comparisons = (n-1) + (n-2) + ... + 1 Notice that the formula for the number of comparisons is a summation, a series of terms added together. Each term represents the comparisons from one step of our sort. As we increase the number of items to be sorted, we also increase the number of terms in our formula. For example, suppose we were sorting six items and we wanted to know the maximum number of comparisons needed to sort them (i.e. the worst case). Here is how we would find the answer. First, we write out the correct number of terms in our formula. Remember that n = 6 in this example. The last term in our formula is (n - 5) because this is equal to 1. # of comparisons = (n-1) + (n-2) + (n-3) + (n-4) + (n-5)

Now we substitute 6 for n and sum the values. # of comparisons = (6-1) + (6-2) + (6-3) + (6-4) + (6-5) # of comparisons = 5 + 4 + 3 + 2 + 1 # of comparisons = 15

When sorting six items with the Selection Sort, the algorithm will need to perform 15 comparisons in the worst case. Since we computed the performance in the worst case, we know that the Selection Sort will never need more than 15 comparisons regardless of how the six numbers are originally ordered. 3.Heap Sort Heapsort is a really cool sorting algorithm. It has an extremely important advantage of worst-case O(n log n) runtime while being an in-place algorithm. Heapsort is not a stable sort. It mainly competes with mergesort, which has the same time bounds, but requires (n) auxiliary space, whereas heapsort requires only a constant amount.

As worst case comparison estimate is lower that in quicksort, heapsort is safer then the latter. And it actually works pretty well in most practically important cases like already sorted arrays (quicksort requires O(n2) comparisons in this case). Heapsort does not have a locality of reference advantage and thus might runs quicker then alternatives on CPU architectures with small or slow data caches. On the other hand, mergesort has several advantages over heapsort: Better data cache performance, because it accesses the elements in order. Simpler A stable sort. Parallelizes better; the most trivial way of parallelizing mergesort achieves close to linear speedup, while there is no obvious way to parallelize heapsort at all. Can be used on linked lists and very large lists stored on slow-to-access media such as disk storage or network attached storage. Heapsort relies strongly on random access, and has poor locality of references.

4.Merge Sort In sorting n objects, merge sort has an average and worst-case performance of O(n log n). If the running time of merge sort for a list of length n is T(n), then the recurrence T(n) = 2T(n/2) + n follows from the definition of the algorithm (apply the algorithm to two lists

of half the size of the original list, and add the n steps taken to merge the resulting two lists). The closed form follows from the master theorem. In the worst case, merge sort does an amount of comparisons equal to or slightly smaller than (n lg n - 2lg n + 1), which is between (n lg n - n + 1) and (n lg n + n + O(lg n)).[2] For large n and a randomly ordered input list, merge sort's expected (average) number of comparisons approaches n fewer than the worst case where In the worst case, merge sort does about 39% fewer comparisons than quicksort does in the average case; merge sort always makes fewer comparisons than quicksort, except in extremely rare cases, when they tie, where merge sort's worst case is found simultaneously with quicksort's best case. In terms of moves, merge sort's worst case complexity is O(n log n)the same complexity as quicksort's best case, and merge sort's best case takes about half as many iterations as the worst case.[citation needed] Recursive implementations of merge sort make 2n 1 method calls in the worst case, compared to quicksort's n, thus merge sort has roughly twice as much recursive overhead as quicksort. However, iterative, non-recursive implementations of merge sort, avoiding method call overhead, are not difficult to code. Merge sort's most common implementation does not sort in place; therefore, the memory size of the input must be allocated for the sorted output to be stored in (see below for versions that need only n/2 extra spaces). Merge sort as described here also has an often overlooked, but practically important, best-case property. If the input is already sorted, its complexity falls to O(n). Specifically, n-1 comparisons and zero moves are performed, which is the same as for simply running through the input, checking if it is pre-sorted. 5.Quick Sort Quicksort sorts by employing a divide and conquer strategy to divide a list into two sublists. The steps are: Pick an element, called a pivot, from the list. Reorder the list so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation. Recursively sort the sub-list of lesser elements and the sub-list of greater elements. The base case of the recursion are lists of size zero or one, which never need to be sorted. In simple pseudocode, the algorithm might be expressed as this: function quicksort(array) var list less, greater if length(array) 1 return array select and remove a pivot value pivot from array for each x in array

if x pivot then append x to less else append x to greater return concatenate(quicksort(less), pivot, quicksort(greater)) Question 3: (a) Consider Best first search technique and Breadth first search technique. Answer the following with respect to these techniques. Give justification for your answer in each case. (i) Which algorithm has some knowledge of problem space? (ii) Which algorithm has the property that if a wrong path is chosen, it can be corrected afterwards?

Ans : We knew two of the systematic control strategies breadth first search (BFS) and depth first search (DFS). Another technique which is a new method called Best First Search which is a the combination of the advantages of both depth first search and breadth first search techniques into a single strategy. TWO systematic control strategies: BFS & DFS (of several variates, i.e. Heuristic techniques). BFS is a good search technique because it does not get trapped on the dead-end paths (the Merit is there is no delay with this technique) To combine these two, we have follow a single path at a time, but we have to switch the paths whenever some competing path looks more promising than current the one does. DFS good because it allow? a solution to be found without all competing branches having to be expanded. Note: Demerit time delay Another technique BFS combining the merits of DFS & BFS into a single method

BFS is an uninformed search method that aims to expand and examine all nodes of a graph or combination of sequences by systematically searching through every solution. In other words, it exhaustively searches the entire graph or sequence without considering the goal until it finds it. It does not use a heuristic algorithm. From the standpoint of the algorithm, all child nodes obtained by expanding a node are added to a FIFO (i.e., First In, First Out) queue. In typical implementations, nodes that have not yet been examined for their neighbors are placed in some container (such as a

queue or linked list) called "open" and then once examined are placed in the container "closed". FEATURES Space complexity Since all of the nodes of a level must be saved until their child nodes in the next level have been generated, the space complexity is proportional to the number of nodes at the deepest level. Given a branching factor b and graph depth d the asymptotic space complexity is the number of nodes at the deepest level, O(bd). When the number of vertices in the graph is known ahead of time, and additional data structures are used to determine which vertices have already been added to the queue, the space complexity can also be expressed as O( | V | ) where | V | is the cardinality of the set of vertices. In the worst case the graph has a depth of 1 and all vertices must be stored. Since it is exponential in the depth of the graph, breadth-first search is often impractical for large problems on systems with bounded space. Time complexity Since in the worst case breadth-first search has to consider all paths to all possible nodes the time complexity of breadth-first search is which is O(bd). The time complexity can also be expressed as O( | E | + | V | ) since every vertex and every edge will be explored in the worst case. Completeness Breadth-first search is complete. This means that if there is a solution, breadth-first search will find it regardless of the kind of graph. However, if the graph is infinite and there is no solution breadth-first search will diverge. Proof of completeness If the shallowest goal node is at some finite depth say d, breadth-first search will eventually find it after expanding all shallower nodes (provided that the branching factor b is finite).[1] Optimality For unit-step cost, breadth-first search is optimal. In general breadth-first search is not optimal since it always returns the result with the fewest edges between the start node and the goal node. If the graph is a weighted graph, and therefore has costs associated with each step, a goal next to the start does not have to be the cheapest goal available. This problem is solved by improving breadth-first search to uniform-cost search which considers the path costs. Nevertheless, if the graph is not weighted, and therefore all step costs are equal, breadth-first search will find the nearest and the best solution. Bias towards nodes of high degree It has been empirically observed (and analytically shown for random graphs) that incomplete breadth-first search is biased towards nodes of high degree. This makes a breadth-first search sample of a graph very difficult to interpret. For example, a breadthfirst sample of 1 million nodes in Facebook (less than 1% of the entire graph) overestimates the average node degree by 240%.[2]

(i) Best First Search has some knowledge of problem space Best-first search is a search algorithm which explores a graph by expanding the most promising node chosen according to a specified rule. best-first search as estimating the promise of node n by a "heuristic evaluation function f(n) which, in general, may depend on the description of n, the description of the goal, the information gathered by the search up to that point, and most important, on any extra knowledge about the problem domain heuristic that attempts to predict how close the end of a path is to a solution, so that paths which are judged to be closer to a solution are extended first.

(ii) The Breadth first search is guaranteed to find the correct path even if a wrong path is selected first, since it travels all paths until target is found. Searching takes place as: First all nodes one edge away from the start node are visited, then two edges away, and so on until all nodes are visited. This way, we'll find the path from the start to the goal with the minimum number of traversed edges. Another way to word it is like this: Visit the neighbor nodes, then the neighbor's neighbor nodes, and so on until the goal node was found. An example of a breadth-first search is in where the nodes are numbered in the order they are visited.

(b) Describe the difference between a Deterministic Finite Automata and NonDeterministic Finite Automata. In general, which one is expected to have less number of states ? Ans:

A finite-state machine (FSM) or finite-state automaton (plural: automata), or simply a state machine, is a mathematical abstraction sometimes used to design digital logic or computer programs. It is a behavior model composed of a finite number of states, transitions between those states, and actions, similar to a flow graph in which one can inspect the way logic runs when certain conditions are met. It has finite internal memory, an input feature that reads symbols in a sequence, one at a time without going backward; and an output feature, which may be in the form of a user interface, once the model is implemented. The operation of an FSM begins from one of the states (called a start state), goes through transitions depending on input to different states and can end in any of those available, however only a certain set of states mark a successful flow of operation (called accept states). Finite-state machines can solve a large number of problems, among which are electronic design automation, communication protocol design, parsing and other engineering applications. In biology and artificial intelligence research, state machines or hierarchies

of state machines are sometimes used to describe neurological systems and in linguisticsto describe the grammars of natural languages. A nondeterministic finite state automaton is a tuple A = <Q, Sigma, Delta, q0, F >, where Q, Sigma, q0 and F are as in the definition of deterministic finite state automata, and Delta Q x (Sigma u e) x Q is a transition relation; here; e (sometimes denoted as Lambda) is the special empty string symbol. That is, in each state we may have not exactly one transition defined on each symbol, plus we may also have empty transitions, ie. transitions that change the state without reading a symbol at all. Alternatively, these automata can be defined to have just a single accepting state, or not to have any, but instead to accept when the automaton would block. A Deterministic Finite Automata (DFA for short) is a computational device that will either accept or reject a given string. Each state in a DFA determines the next state, which can only be accessed if the next symbol to be processed lies on the edge connecting the current state to the next state. In books on computational theory, a DFA is written as a tuple, where each item in the tuple denotes the set of states, the set of symbols the DFA works with (usually a couple letters or numbers to illustrate the concept), a transition function, and the start and accept states.

A deterministic finite automaton will have a single possible output for a given input. The answer is deterministic because you can always tell what the output will be. A nondeterministic finite automaton will have at least one input which will cause a "choice" to be made during a state transition. Unlike a DFA, one input can cause multiple outputs for a given NFA.

(c)

Explain "TURING THESIS" in detail.

Ans: In computability theory, the ChurchTuring thesis (also known as the ChurchTuring conjecture, Church's thesis, Church's conjecture, and Turing's thesis) is a combined hypothesis ("thesis") about the nature of functions whose values are effectively calculable; i.e. computable. In simple terms, it states that "everything computable is computable by a Turing machine." Several attempts were made in the first half of the 20th Century to formalize the notion of computability:

American mathematician Alonzo Church created a method for defining functions called the -calculus, British mathematician Alan Turing created a theoretical model for a machine, now called a universal Turing machine, that could carry out calculations from inputs, Church, along with mathematician Stephen Kleene and logician J.B. Rosser created a formal definition of a class of functions whose values could be calculated by recursion. All three computational processes (recursion, the -calculus, and the Turing machine) were shown to be equivalentall three approaches define the same class of functions. [1][2] This has led mathematicians and computer scientists to believe that the concept of computability is accurately characterized by these three equivalent processes. Informally the ChurchTuring thesis states that if some method (algorithm) exists to carry out a calculation, then the same calculation can also be carried out by a Turing machine (as well as by a recursively-definable function, and by a -function). The ChurchTuring thesis is a statement that characterizes the nature of computation and cannot be formally proven. Even though the three processes mentioned above proved to be equivalent, the fundamental premise behind the thesisthe notion of what it means for a function to be "effectively calculable" (computable)is "a somewhat vague intuitive one".[3] Thus, the "thesis" remains an hypothesis.[3] Despite the fact that it cannot be formally proven, the ChurchTuring thesis now has near-universal acceptance.

Question 4: (a) Write a randomized algorithm to statistic in a set of n elements Select. (8 Marks) (b) Write a recursive procedure to compute the factorial of a number. (5 Marks) (c) Design a Turing Machine that increments a binary number which is stored on the input tape. (7 Marks)

Ans: (a)

Definition: A randomized algorithm is an algorithm that can make calls to a random number generator during the execution of the algorithm. These calls will be of the form x := Random(a, b), where a, b are integers, a b.

A call to Random(a, b) returns an integer in [a, b], with every integer in the range being equally likely to occur. Successive calls to Random( ) are assumed to be mutually independent.

Algorithm:

Randomize-in-place(A) Input. An array A[1..n] of n elements. Output. A rearrangement of the elements of array A, with every permutation of the n elements equally likely to occur. for i := 1 to n do swap A[i] and A[Random(i, n)]

(b) FUNCTION FACTORIAL (N: INTEGER): INTEGER

BEGIN IF N <= 0 THEN FACTORIAL := 1 ELSE

FACTORIAL := N * FACTORIAL(N - 1) END;

(C) stored on the input tape. (7 Marks)

The binary increment function adds one to a binary number. The x state takes us to the rightmost digit, then we switch to the i state - if a zero is read then we change it to 1, if a 1 is read we change it to a 0 and "carry one", (move left and repeat). Be sure to test your machine on the input 111. Examples of binary increment: 1001 -> 1010 111-> 1000 The x state requires three different clauses.

R: Move one position to the right. L: Move one position to the left. HALT: halt

x,1,1,R,x x,0,0,R,x x, , ,L,i i,0,1,0,HALT

i,1,0,L,i i, ,1,0,HALT

You might also like