Standard Template Library: The Probabilities For All The Characters
Standard Template Library: The Probabilities For All The Characters
Standard Template Library: The Probabilities For All The Characters
. o Algorithms: generic off-the-shelf function templates for operating containers. o Iterators: generalized pointers that allow algorithms to operate on almost any container. STL Containers: o Sequential: deque, list, vector o Associative: map, multimap, multiset, set o Adapters: priority_queue, queue stack o All STL containers use copy semantics. Vector Container: type-independent pattern for a generalized array class. Its capacity can expand and it is self-contained. Its capacity doubles when more space is needed. Iterator: Can point to an element, access the value within that element, and move from one element to another. (acts like a pointer but fails the duck test). Basic operators can be applied to iterators: Increment, decrement, dereferencing, assignment, addition, subtraction, subscript operator.
Huffman Codes Common character codes such as ASCII and EBCDIC use the same size data structure for all characters (8 bits per character) Variable Length Codes: can produce shorter messages than fixed length codes. (on average when applied to many messages with given character probabilities.) The expected message length per character is the sum of the products of the lengths and the probabilities for all the characters. Immediate Decodability: when no sequence of bits that represent a character is a prefix of a longer sequence for another character. (Can be decoded without waiting for remaining bits). Codes that are immediately decodable are called prefix codes, and no valid code symbol is a prefix of another valid code symbol. Optimal Codes: Immediately decodable and the average message length for a number of messages is minimal.
AVL Trees
When tree is nicely balanced, search tie is O(log2n) If nodes are entered in increasing order, BST degenerates into linked list with search time O(n) A height balanced tree:
o Tree is rebalanced after additions and deletions make it skewed beyond a certain limit. o Search time is O(log2n) o Named after its two inventors: GM Adelson-Velskii and EM Landis The balance factor of a node is defined as height of its left subtree minus height of its right subtree. o A balance factor with absolute value greater than 1 is unacceptable for an AVL tree. AVL Tree Definition: o Binary Search Tree o Balance factor of each node is 0, 1, or -1 o The key is to rebalance the tree whenever an insertion or a deletion creates a balance factor with absolute value greater than 1 at any node. o The difference between an AVL tree and the Binary Search Tree that we studied previously are purely internal. On AVL Trees, insertions may require adjustment of the tree to maintain balance. Inserting a new item into an AVL tree can result in a subtree having a balance factor (absolute value) greater than 1, but not more than 2 Inserting a new item does not necessarily increase the balance factor. o May leave it unchanged o May decrease it If the balance factor is unacceptable after an insertion, it can be fixed by one of four possible rebalancing rotations. Rotations: o AVL tree rotations rearrange links so as to reduce the height of the larger subtree at an unbalanced node. o They always preserve the BST invariants: Everything in a nodes left subtree is less than or equal to it Everything in a nodes right subtree is greater than or equal to it. o When an unacceptable imbalance was produced, we always did a rotation at the first ancestor of the added node having an unacceptable imbalance. o We always rotated that node away from the side with the larger subtree height. Its child with the larger subtree height takes its place and it becomes a new child of that node. o If the first two steps from the first unbalanced ancestor to the added node were in the same direction, a single rotation restored balanced. Otherwise, a double rotation was necessary. Rebalancing Rotations: o Single Right Rotation: Use when first unacceptable unbalance ancestor has a balance factor of +2 and the inserted item is in the left subtree of its left child. Can be done on any node having a left child. Let A = nearest out of balance ancestor of inserted item and B= left child of A. Set link from parent of A to point to B
Set link of A to point to the right child of B. Set right link of B to point to A.
Single Left Rotation: Use when first unacceptable unbalance ancestor has a balance factor of -2 and the inserted item is in the right substree of its right child. Can be done on any node having a right child. Let A = nearest out of balance ancestor of inserted item and B= right child of A. Set link from parent of A to point to B Set link of A to point to the left child of B. Set left link of B to point to A. Left-Right Rotation: Use when first unacceptable unbalance ancestor has a balance factor of +2 and the inserted item is in the right subtree of its left child. Let A= nearest ancestor of the inserted item, B=left child of A, C=right subtree of B, New item added to C. First: Left Rotation of B and C o Set left link of A to point to C. o Set right link of B equal to the left link of C o Set left link of C to point to B Last: Right Rotation of A and C o Reset link from parent of A to point to C o Set left link of A to right link of C o Set right link of C to A
Right-Left Rotation: Use when first unacceptable unbalanced ancestor has a balance factor of -2 and the inserted item is in the left subtree of its right child. Symmetrical with Left-Right (replace right with left and vice versa) Observations: o When we add a node to an AVL tree, the parent node will either be a leaf or have a single child. If the parent node is a leaf, its balance factor prior to the addition is 0. After the addition, the balance factor absolute value will be 1. Ancestor nodes balance factor may increase, decrease, or remain same. (Rebalancing may or may not be necessary) If the parent node has a single child, its absolute balance factor prior to the addition will be 1. After the addition, it will be 0. Ancestor nodes balance factors will not change. (Rebalancing will not be necessary). o Adding a node always changes the balance factor of the parent of the new node (absolute value 0 to 1 or 1 to 0) o The parent node will never become unbalanced due to an addition
The grandparent is the first node (going up the tree) that can become unbalanced due to an addition. o A node will become unbalanced due to an addition if the following conditions are true: It already had a balance factor of +1, or -1. In searching for the point to do the addition we descended into its subtree that already had a greater height. (left substree if balance factor was +1, right subtree if balance factor was -1) Adding the node increased the height of that subtree We need to know if adding a node to a subtree increased the height of the subtree. Summary: Adding nodes to a binary tree can result in an unbalanced tree. We can restore balance with one of four kinds of rotation: Single Left, Single Right, Left-Right, Right-Left
2-3-4 Trees Define m-node in a search tree. Definition: o Each node stores at most 3 data values o Each internal node is a 2-node, a 3-node, or a 4-node. o All the leaves are on the same level. Algorithm: if tree is empty Create a 2-node containing new item, root of tree. Else Find leaf node where item should be inserted by repeatedly comparing item with values in node and following appropriate link to child. If room in leaf node Add item Else (leaf holds 3 values) Split the 4-node into two 2 nodes, one storing item less than median value, other storing item greater than median. Move median value up to the parent having these 2 nodes as children If parent has no room for median, split that 4-node in the same manner, continuing until either a parent is found with room or we reach the root.
If root is a 4-node split it into 2 nodes, create a new parent 2-node as the new root
Red-Black Trees A way of constructing 2-3-4 trees from 2-nodes Defined as a BST which: o Has two kinds of links (red and black) o Every path from root to leaf node has same number of black links o No path from root to leaf node has more than two consecutive red links.
B-Trees Useful in external searching o Data stored in secondary memory o Each node is a block on disk o Typically the data in a node is really a pointer. B-Tree of order m has properties: o The root has at least two subtrees unless it is a leaf o Each node stores at most m-1 data values and has at most m links to subtrees o Each internal node stores at least ceiling(m/2) data values o All leaves are on the same level. Best performance for disk storage found to be with values for 50<= m <= 400
Directed Graphs Defined as: o Finite set of elements called vertices or nodes that can hold values. o Finite set of directed arcs or edges that connect pairs of vertices. Operations include: o Constructors o Insert node, edge o Delete node, edge o Search for a value in a node, starting from given node. A complete diagraph has an edge between each pair of vertices (each direction). N nodes will have N*(N-1) edges. Can be represented as an adjacency matric (n x n matrix, call it adj) o Adj [ i , j ]: 1 true if vertex j is adjacent to vertex i. 0 false otherwise
Terminology: o Out-degree of ith vertex(node) Number of arc emanating from that node Sum of 1s in row i. o In-degree of jth vertex (node) Number of arcs coming into that node,. Sum of the 1s in column j. Deficiencies in adjacency matrix representation: o Data must be stored in separate matrix. o When there are few edges, the matrix is sparse (wasted space) Adjacency List Representation: o Better to use an array of pointers to linked row-list Searching a graph: o Depth First Search: Start from a given vertex v Visit first neighbor w of v Then visit first neighbor of w which has not already been visited Continue descent until we reach a node with no unvisited neighbors When no unvisited neighbors: Back up to last visited node Visit next unvisited neighbor of that node. Same as pre-order traversal of a tree except we have to keep track of node visited and void going back to them. Uses backtracking to return to vertices that were seen earlier and were already processed or skipped over on an earlier pass Recursion is a natural technique for this task o Breath First Search At each point in the search, visit all previously unvisited neighbors of current node before advancing to their neighbors. Defines a tree consisting of nodes reachable from the starting node. Algorithm: While visiting each node on a given level, o Store its ID so that we can return to it after completing this level. (So that nodes adjacent to it can be visited) First node visited on a given level should be first node which we return upon completion of that level. (implies use of a queue) Pseudo Code: o Visit start Vertex o Initialize queue to contain only the start vertex. o While (queue not empty) DO Remove a vertex v from the queue.
For all vertices w adjacent to v do: If(w has not been visited, then) o Visit w o Add w to queue End while
P vs NP Problems o Nondeterministic polynomial Problems (NP) o Problems for which a solution can be guessed, then checked with an algorithm. Think going down all paths concurrently. Algorithm requires computing time O(P(n)) for some polynomial P(n). Time for a real algorithm on a real computer grows exponentially with n. (Intractable problems) o Deterministic Polynomial Problems(P) Can be solved by algorithms in polynomial time. o NP-Complete Problems A problem that any NP Problem can be mapped into o If a polynomial time algorithm that solves any of these problems can be found then the existence of polynomial time algorithms for all NP problems is guaranteed. (many graph problems are known to be NP Complete). Shortest Path Algorithm: o Given a directed graph with nodes 1 to n, a starting node, start, and a destination node, dest: Let dist[] be an array of ints Dist[v] will hold the distance from start to node v. Let pred[] be an array of node IDs Pred[v] will be the predecessor to node v on a shortest path from start to dest Initialize dist[start] to 0 and dist[v] to infinity for all other nodes. Initialize a vertex queue to contain just start While dest has not been visited and vertex queue is not empty: Remove the front item from the vertex queue as v. For each node, w, adjacent to v: o If dist[w] to dist [v]+1 o Set pred[w] to v o Add w to the vertex queue If dist[dest] is infinity report failure Else Initialize a stack with dest Initialize v as dest Do the following: o Set v to pred[v]
o Push v onto the stack Until v is start The stack now holds a shortest path from start to dest.
Dijkstras Algorithm (using a weighted graph) Algorithm: o Given a starting node and a destination node o Push outward from the starting node o Keep track of the shortest total distance seen so far to each node and the predecessor to that node on the shortest path seen so far. o Initially if the node is adjacent to the start Shortest total distance seen so far is the weight of the link. Predecessor on best path is the start o If node is not adjacent Shortest distance seen so far is infinity (INT_MAX) o On each step of the iteration, the shortest total distance to one more node will be determined And its predecessor on the best path to it from the starting node. o Keep track of which nodes we know the shortest distance to. Boolean array indexed by Node ID. o At each step of the iteration Determine a node with smallest best total distance seen so far among the nodes for which the best total distance has not yet been determined, Call it node N. The best distance so far for node N is the actual best distance. The predecessor for which that distance was determined is predecessor on the best path. The best path to that node is now known. o Update best distances o For each node, I, for which the best path has not yet been determined Check if the node N provides a better path than the best seen so far. Is Total distance[N]+distance(N,i) less than best total distance seen so far for node i? If so, make that the new best total distance so far and make N the predecessor. o Continue the iteration until the actual shortest total distance for the destination has been determined. o We then know the shortest path length from Start to Destination and the best path.