DS 1

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

MODULE 1

Elementary data organization


 Data are simply values or sets of values.
 A single unit of value is called a Data item.
 Group Items: Data items which have subordinate data items are called Group item, for
example, name of a student can have first name and the last name.
 An Entity is something that has certain attributes which may be assigned values.
 An entity with similar attributes is called an Entity set.
Eg:-
Entity : Employee
Attribute : Name , Age phone
Values : "ABC", 42, 9847092568
Entity set : All employees in an organization.
 Meaningful or processed data is called information.
 Record: Record can be defined as the collection of various data items, for example, if we
talk about the student entity, then his name, address, course and marks can be grouped
together to form the record for the student.
 File: A File is a collection of various records of one type of entity, for example, if there
are 60 employees in a Bank, then there will be 60 records in the related file where each
record contains the data about each employee.
Based on their length, records may be classified into two. They are,
Fixed-length record: All record contains the same data items with the same amount of space
assigned to each item.
Variable length record: Records may contain different length data items.

Definition of Data Structure


 Data structure is a particular way of organizing data in a computer so that it can be used
effectively.
 Data structure determines the arrangement of data in a computer’s memory.
 Data Structures are widely used in almost every aspect of Computer Science i.e. Operating
System, Compiler Design, Artificial intelligence, Graphics and many more.

What is the difference between a data type and a data structure?

Data types Data structure


A data type describes pieces of data that all A Data structure is a collection of data of different
share a common property data types. This collection of data can be
represented using an object and can be used
throughout the program.
For example an integer data type describes For example tree type data structures often allow
every integer that the computer can handle. for efficient searching algorithms.
Values can directly be assigned to the data The data is assigned to the data structure object
type variables. using some set of algorithms and operations like
push, pop and so on.
`There is no problem in the time complexity. When we deal with a data structure object, time
complexity plays an important role.
The examples of data type are int, float, char. The examples of data structure are stack, queue,
tree, graph.
Categories of Data Structure
Data structures are generally classified into primitive and non-primitive data structures.

Primitive data structures


These are the structures which are supported at the machine level, they can be used to make non-
primitive data structures. These are integral and are pure in form. They have predefined
behaviour and specifications. Primitive data structure includes Integer, Real, Character and
Boolean data types.
Non primitive data structures
The non-primitive data structures cannot be performed without the primitive data structures.
Although, they too are provided by the system itself yet they are derived data structures and
cannot be formed without using the primitive data structures. Non-primitive data structures are
again classified as linear and non-linear data types.
 A data structure is said to be linear if its elements form a sequence where the elements
are attached to its previous and next adjacent.
In linear data structure, single level is involved. Therefore, we can traverse all the
elements in single run only.
Linear data structures are easy to implement because computer memory is arranged in a
linear way.
Data structures like Array, Stack, Queue and linked list organizes data in linear order.
 A non-linear data structure is one in which its elements are not arranged in sequence.
In a non-linear data structure, single level is not involved. Therefore, we can’t traverse all
the elements in single run only.
Non-linear data structures are not easy to implement in comparison to linear data
structure.
Trees and graphs are widely used non-linear data structures. It represents a hierarchical
relationship between individual data elements.
Arrays:
 A linear array or a one-dimensional array is the simplest type of data structure. It is a list
of a finite number, n, of data elements of a similar type.
 These elements are represented by a set of consecutive numbers respectively. If A is the
array name, then the array elements are represented as any of the following notations.
a1, a2, …, aa
A(1), A(2), …, A(n)
A[1], A[2], …, A[n]

A Two-dimensional array is a collection of similar data elements where each element is

referenced by two subscripts.

Stack:
Stack, LIFO(Last In First Out) system, is a linear data structure in which insertion(PUSH) and
deletion(POP) are restricted to one endpoint called TOP.

Queue:
A queue is a FIFO(First In First Out) system, in which insertion(ENQUEUE) can take place only
at an end called REAR and deletion(DEQUEUE) can take place only at another end called FRONT.
Linked lists:
A linked list is a collection of data elements whose order is not given by their physical location
in memory. It consists of nodes that contain data and link to the next node. The structure allows
easiness in insertion and deletion operations. The last nodes are linked to NULL and signify the
end of the list

Trees:
Tree is a non linear data structure. Some data contains a hierarchical relationship between
various elements. These data are represented in the form of a Rooted tree graph or simply Tree.
Here node A is called the root of the free.

Graphs:
Sometimes data contain relationships that are not hierarchical in nature. This type of relationship
can be expressed in the form of a graph data structure.

Data structure operations


The data in the data structures are processed by certain operations.
 Traversing: Visiting each data item exactly once. so that items in the records can be
accessed.
 Searching: Finding the location of the record with a given key or value or finding all
records which satisfy given conditions.
 Inserting: Adding a new data item in the given collection of data items.
 Deleting: Removing an existing data item from the given collection of data items.
 Sorting: Arranging records in some logical or numerical order. (Eg: Alphabetic order)
 Merging: Combing the data items of two sorted files into single file in the sorted form.

Application of data structures


Different types of data structures are used for different kinds of purposes.
Applications of Arrays:
 To implement mathematical vectors and matrices.
 To model sets or collections in computer programming.
 To implement other data structures such as lists, queues, stacks, and heaps.
 Sometimes used to emulate in-program dynamic memory allocation.
Applications of Linked lists:
 To implement other data structures such as queues, stacks,trees…etc
 Used for dynamic memory allocation.
 For manipulating polynomials, representing sparse matrices…etc
 A doubly linked list can be used in navigation systems.
 The doubly linked list is also used by various applications to implement Undo and Redo
functionality.
 Advanced data structures like Fibonacci Heap are implemented using circular doubly
linked lists
Applications of Stacks:
The LIFO property of Stack can be useful in the following applications.
 Evaluating Expressions.
 Converting between expressions.
 Backtracking.
 Parsing context-free languages.
 Recursion removal.
 Tree and graph traversal.
Applications of Queues:
The following applications require FIFO storage and are implemented using Queues
 Manage resource sharing such as CPU scheduling, disk scheduling..etc
 When data is sent and received between two processes not necessarily at the same rate.
Applications of Trees:
 To search for an element in a set quickly, Binary Search Trees(BSTs) are used.
 Heap sort is done by a kind of tree called a Heap.
 Tries are modified versions of trees used to store routing information in routers.
 A compiler uses a syntax tree to parse the program you write.
 Shortest path trees and spanning Trees are used in bridges and routers.
Applications of Graphs:
 Transportation networks like the one used by Google Maps.
 Representation of molecular structure.
 Finding the shortest path.
 Airline network.
 Social networks.
Algorithms
Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a
certain order to get the desired output.
Algorithms are generally created independent of underlying languages, i.e. an algorithm can be
implemented in more than one programming language.
From the data structure point of view, following are some important categories of algorithms
 Search − Algorithm to search an item in a data structure.
 Sort − Algorithm to sort items in a certain order.
 Insert − Algorithm to insert item in a data structure.
 Update − Algorithm to update an existing item in a data structure.
 Delete − Algorithm to delete an existing item from a data structure.
Characteristics of an Algorithm
Not all procedures can be called an algorithm. An algorithm should have the following
characteristics
 Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or phases),
and their inputs/outputs should be clear and must lead to only one meaning.
 Input − An algorithm should have 0 or more well-defined inputs.
 Output − An algorithm should have 1 or more well-defined outputs, and should match the
desired output.
 Finiteness − Algorithms must terminate after a finite number of steps.
 Feasibility − Should be feasible with the available resources.
 Independent − An algorithm should have step-by-step directions, which should be
independent of any programming code.
Example: Design an algorithm to add two numbers and display the result.
Step 1 − START ADD
Step 2 − get values of a & b
Step 3 − c ← a + b
Step 4 − display c
Step 5 − STOP

Algorithm complexity
 Algorithmic complexity is a measure of how long an algorithm takes to complete a task
or solve a problem.
 It's a numerical function that calculates the amount of time an algorithm takes to produce
a result as a function of the size of the input.
 Algorithmic complexity is calculated asymptotically as n approaches infinity. This means
that an algorithm should compute the result within a finite and practical time bound even
for large values of n.
 Algorithm complexity is concerned about how fast or slow a particular algorithm
performs.
 It evaluates the order of count of operations executed by an algorithm as a function of
input data size.
 Algorithmic complexity can be divided into two types: time complexity and space
complexity. The most common measure of complexity is time complexity.
Time Complexity:
Time taken by the algorithm to solve the problem. It is measured by calculating the iteration
of loops, number of comparisons etc.
 Time complexity is a function describing the amount of time an algorithm takes in
terms of the amount of input to the algorithm.
 “Time” can mean the number of memory accesses performed, the number of
comparisons between integers, the number of times some inner loop is executed, or
some other natural unit related to the amount of real time the algorithm will take.
Space Complexity:
Space taken by the algorithm to solve the problem. It includes space used by necessary input
variables and any extra space (excluding the space taken by inputs) that is used by the
algorithm.
 Space complexity is a function describing the amount of memory(space)an
algorithm takes in terms of the amount of input to the algorithm.
 Space complexity is sometimes ignored because the space used is minimal and/ or
obvious, but sometimes it becomes an issue as time.

Different cases in complexity


 Worst case: Worst case is the function which performs the maximum number of steps on
input data of size n. we calculate the upper bound on the running time of an algorithm
 Average case: Average case is the function which performs an average number of steps
on input data of n elements.
 Best case: Best case is the function which performs the minimum number of steps on
input data of n elements. we calculate the lower bound on the running time of an
algorithm

Example: linear search algorithm, the best-case time complexity is O(1). The average case time
complexity is O(n). The worst-case complexity is O(n). The best case occurs when the search
item is at the beginning of the list. The worst case occurs when the search item is at the end of
the list or not at all.
space-time trade-off
In computer science, a space-time or time-memory trade-off is a way of solving a problem or
calculation in less time by using more storage space (or memory), or by solving a problem in
very little space by spending a long time.

Asymptotic Notation
 Asymptotic notation is a mathematical notation that describes the efficiency of an
algorithm as the input size grows.
 It is used to compare the performance of different algorithms on different inputs.
 it can also be used to predict the performance of an algorithm on a given input size.
There are three main types of asymptotic notation:
 Big O notation
It describes the worst-case running time of an algorithm. It tells you how much time the
algorithm will take in the worst possible scenario. For example, if an algorithm has a
time complexity of O(n^2), it means that the algorithm will take at most n^2 steps to
complete.
 Omega Notation:
It describes the best-case running time of an algorithm. It tells you how little time the
algorithm will take in the best possible scenario. For example, if an algorithm has a
time complexity of Omega(n), it means that the algorithm will take at least n steps to
complete.
 Theta Notation:
It describes the average-case running time of an algorithm. It tells you how much time
the algorithm will take on average. For example, if an algorithm has a time complexity
of Theta(n^2), it means that the algorithm will take on average n^2 steps to complete.

When choosing which asymptotic notation to use, it is important to consider the type of data
structure that is being used. For example, if the data structure is a linked list, then the asymptotic
notation for the worst-case running time would be O(n), because the worst-case scenario is that
the algorithm has to traverse the entire linked list.
Asymptotic notation can be used to compare the performance of different algorithms on different
inputs. For example, if we have two algorithms that both sort a list of numbers:
 Algorithm A:
This algorithm has a time complexity of O(n). This means that the algorithm will take at
most n steps to complete, regardless of the size of the input.
 Algorithm B:
This algorithm has a time complexity of O(n^2). This means that the algorithm will take at
most n^2 steps to complete, regardless of the size of the input.
In this case, Algorithm A is faster than Algorithm B because it has a lower time complexity.

Introduction to strings
In computer terminology, a sequence of characters is called a string.
 Strings are represented by enclosing inside quotation marks.
 A string with length 0 is called an empty string.
 Let A and B be two strings. A string consisting of characters in A followed by characters
in B is called concatenation of A and B. It is often represented as A//B.
Suppose string A = ‘Hello’ and B = ‘World’ then A//B will be ‘HelloWorld’.
 A string Q is called a substring of a string S, if there exist strings P and R such that
S=P//Q//R.
Suppose S=”Evening” then ‘Ev’, ‘ni’, ‘ng’… are the substrings of S.

String storage
Generally, three types of structures are used to store strings.
 Fixed length structure: In a fixed-length structure, each line of print is viewed as a
record. All records have the same length.
 Variable-length storage: Variable length strings can be stored in memory cells either by
using a marker like $$ or \0 to represent the end of the string or by listing the length of
the string as an additional element in a pointer array.
 Linked Storage: We use a linked list to store strings. It helps to easily delete, change,
and insert words, sentences even paragraphs in the text. A linked list is an ordered
sequence of nodes, where each node contains a link which has the address of the next
node in the list. Here each node is assigned one character or a fixed number of characters
String Operations
Substrings
A substring is the basic unit of access in a string. Unlike other array elements, substrings have
their own meaning. To access a substring we need the following information,
 Name of the string.
 Position of the first character of the substring in the given string.
 length of the substring.
Eg. if string A=’WELCOME’, then after SUBSTRING(SUB, A,3,2), the value of SUB will
be ‘LC’.

Indexing
Indexing operation is used to find the position of the first appearance of a string pattern in a
given string. It is also called pattern matching. INDEX operation returns a 0 if there is no such
pattern in that string.
Eg. if A=’MORNING’ then INDEX(A,’NI’) will return, 4, the first appearance of pattern ‘NI’ in
the given string ‘MORNING’.

Concatenation
if A and B are two strings, then the concatenation of two strings, A//B is the string that consisting
of the characters of A followed by the characters of B.
Eg. If A=’GOOD’ and B=’MORNING’ then A//B will be ‘GOODMORNING’

Length
The number of characters in a string is called the length of that string.
Eg. If A=”DECEMBER” then LENGTH(A) will return 8.

Text Processing Operations


The processing of printed matter such as letters and articles are called word processing. Basic
operations associated with word processing are,
 Replacement
 Insertion
 Deletion
Word processing operations can be done using string operations.

Insertion : INSERT(TEXT,POSITION,STRING) operation inserts STRING in TEXT so that


STRING begins in POSITION.For example INSERT(‘ABCDE’,2,’PQR’) = ‘APQRBCDE

Deletion: DELETE(TEXT,POSITION,LENGTH) delete the substring which begins at


POSITION and has length LENGTH.
For example DELETE(‘ABCDEF’,2,3) = ‘AEF’

Replacing: REPLACE(TEXT,PATTERN1,PATTERN2) replaces the first occurrence of


PATTERN1 by PATTERNP2 in TEXT.
For example REPLACE(‘PQRSTU’,’RS’,’X’) = PQXTU

Pattern matching algorithms


Pattern matching
Pattern matching algorithms are the algorithms that are designed to search for specific patterns
within the larger dataset. Commonly used algorithms are:
 Naive string matching algorithm:
This algorithm compares the pattern character by character to the text. It is the simplest
algorithm, but it is also the slowest.
 Knuth-Morris-Pratt (KMP) algorithm:
This algorithm improves upon the naive string matching algorithm by utilizing
information from previous comparisons to avoid unnecessary character comparisons. It
is faster than the naive string matching algorithm, especially for large patterns.
 Boyer-Moore algorithm:
Scans the pattern from right to left using pre-computed tables to determine the next
possible match location.
Applications of pattern matching algorithms
 Finding a phrase in a large text.
 Finding specific shapes within an image.
 Distinguish sound patterns in the spoken language.
 Searching all instances of a particular gene within the largest set of DNA sequences.
 Pattern matching helps us in information retrieval from complex data sets for
implementing fields such as data science, machine learning and bioinformatics.
Naïve Pattern Matching Algorithm
 Simplest method
 It checks for all characters of the main string to the pattern.
 Helpful for smaller text.
 Does not need any pre processing phrases.
 We can find the substring by checking once for the string.
 Does not occupy extra space to perform the operation.
 Time complexity- O(mxn) where m= size of pattern and n=size of main string.
Algorithm:
PAT and TEXT are two strings with length R and S respectively. This algorithm finds
INDEX(P)
ARRAYS
 Array is a container which can hold a fixed number of items and these items should be of
the same type.
 Most of the data structures make use of arrays to implement their algorithms.
 Arrays can be one dimensional or multidimensional.
 Arrays store the entries sequentially. Elements in an array are stored in continuous
locations and are identified using the location of the first element of the array.
 Arrays can hold any type of data, which includes string, integers, Boolean, and so on.
Multi-Dimensional Array
 When the number of dimensions specified is more than one, then it is called as a multi-
dimensional array.
 Multidimensional arrays include 2D arrays and 3D arrays. A two-dimensional array will
be accessed by using the subscript of row and column index.
 In the two-dimensional array the first index specifies the number of rows and the second
index specifies the number of columns. Similarly, in a three-dimensional array, there will
be three dimensions.

Representation of linear array in memory


Arrays can be declared in various ways in different languages. For illustration, let's take C array
declaration.

From the figure Index starts with 0. Array length is 10 which means it can store 10 elements.
Each element can be accessed via its index. For example, we can fetch an element at index 6 as
27.

Representing Two-Dimensional Arrays in Memory


Let M be a two-dimensional a x b array. Although, M is pictured as a rectangular array of
elements with a rows and b columns, the array will be represented in the memory as a block of a
x b sequential memory locations. Specifically, the programming language will store the array M
either column wise or row wise. When the programming language stores an array column wise, it
is known as column-major order and when it stores row wise it is known as row-major order.

Basic Operations
 Traverse − print all the array elements one by one.
 Insertion − Adds an element at the given index.
 Deletion − Deletes an element at the given index.
 Search − Searches an element using the given index or by the value.

Traverse Operation
This operation is to traverse through the elements of an array.
Example: Following program traverses and prints the elements of an array:
#include int main()
{
int age[5]={20,21,19,18,25};
int i;
for(i=0;i<a[i];i++)
printf("%d ", age[i]);
return 0;
}
Output: 20 21 19 18 25
Algorithm:
ARR is the array and UB is the limit of the array.
TRAVERSE(ARR,UB)
1. Set I =0.
2. Repeat steps 3 and 4 while I<UB.
3. Print ARR[i].
4. Set I=I+1
5. Stop.

Insertion operation
Given an array arr of size n, this article tells how to insert an element x in this array arr at a
specific position pos.
Approach:
1. First get the element to be inserted, say x
2. Then get the position at which this element is to be inserted, say pos
3. Then shift the array elements from this position to one position forward, and do this for all the
other elements next to pos.
4. Insert the element x now at the position pos, as this is now empty.

Algorithm
INSERT (arr, n, pos, x)
1. Set i=n
2. Repeat steps 3 and 4 while i>=pos
3. Set arr[i+1]=arr[i]
4. Set i=i-1
5. Set arr[pos]=num
6. Set n=n+1
7. Exit.
Deletion operation
 A user will enter the position at which the array element deletion is required. Deleting an
element does not affect the size of the array.
 It also checks whether deletion is possible or not, for example, if an array contains five
elements and user wants to delete the element at the sixth position, it isn't possible.
 we need to shift array elements which are after the element to be deleted, it is very
inefficient if the size of the array is large or we need to remove elements from an array
repeatedly
Algorithm
Arr is the array with n elements. And pos is a positive integer such that pos<n. This
algorithm deletes the element at a position pos from arr.
DELETE(arr, n, pos, num)
1. set num=arr[pos]
2. set I=pos
3. repeat steps 4 and 5 while i<=n-1
4. arr[i]=arr[i+1]
5. Set i=i+1
6. Set n=n-1
7. Exit.

Sparse Matrix
A matrix is a two-dimensional data object made of m rows and n columns, therefore having
total m x n values. If most of the elements of the matrix have 0 value, then it is called a sparse
matrix.
Why to use Sparse Matrix instead of simple matrix ?
 Storage: There are lesser non-zero elements than zeros and thus lesser memory can be used
to store only those elements.
 Computing time: Computing time can be saved by logically designing a data structure
traversing only non-zero elements.
Example:
00304
00570
00000
02600
Representing a sparse matrix by a 2D array leads to wastage of lots of memory as zeroes in the
matrix are of no use in most of the cases. So, instead of storing zeroes with non-zero elements,
we only store non-zero elements. This means storing non-zero elements with triples- (Row,
Column, value).
Sparse Matrix Representations can be done in many ways following are two common
representations:
1. Array representation
2. Linked list representation
Method 1: Using Arrays:
2D array is used to represent a sparse matrix in which there are three rows named as
 Row: Index of row, where non-zero element is located
 Column: Index of column, where non-zero element is located
 Value: Value of the non zero element located at index – (row,column)

Algorithm
1. Start
2. Set I=0
3. Repeat steps 4 to 7 while I<N
4. Val[I]= Read value of the non-zero element
5. Row[I]= Read row number of the non-zero element
6. Col[I]= Read column number of the non-zero element
7. Set I=I+1
8. Set I=j=0 // display the sparse matrix
9. Repeat steps 10 to 17 while I<p
10. Repeat steps 11 to 16 while j<q
11. Set k=flag=0
12. Repeat steps 13 to 14 while k<N
13. If val[k]!= 0 and Row[k]=I and Col[k]=j then
14. Display val[k]
15. Flag=1
[end of if]
16. Set k=k+1
[end of step 12 loop]
17. If flag=0 then display 0
18. Set j=j+1
[end of step 10 loop]
19.Set I=I+1
[end of step 9 loop]
20.stop
#include<stdio.h>
int main()
{
// Assume 4x5 sparse matrix
int sparseMatrix[4][5] =
{
{0 , 0 , 3 , 0 , 4 },
{0 , 0 , 5 , 7 , 0 },
{0 , 0 , 0 , 0 , 0 },
{0 , 2 , 6 , 0 , 0 }
};
int size = 0;
for (int i = 0; i < 4; i++)
for (int j = 0; j < 5; j++)
if (sparseMatrix[i][j] != 0)
size++;

// number of columns in compactMatrix (size) must be equal to number of


non - zero elements in sparseMatrix
int compactMatrix[3][size];
// Making of new matrix
int k = 0;
for (int i = 0; i < 4; i++)
for (int j = 0; j < 5; j++)
if (sparseMatrix[i][j] != 0)
{
compactMatrix[0][k] = i;
compactMatrix[1][k] = j;
compactMatrix[2][k] = sparseMatrix[i][j];
k++;
}

for (int i=0; i<3; i++)


{
for (int j=0; j<size; j++)
printf("%d ", compactMatrix[i][j]);

printf("\n");
}
return 0;
}

Output
001133
242312
345726
Time Complexity: O(NM), where N is the number of rows in the sparse matrix, and M is the
number of columns in the sparse matrix.

Parallel arrays
A parallel array is a structure that contains multiple arrays. Each of these arrays are of the same
size and the array elements are related to each other. All the elements in a parallel array represent
a common entity. An example of parallel arrays is as follows –
employee_name = { Harry, Sally, Mark, Frank, Judy }
employee_salary = {10000, 5000, 20000, 12000, 5000}

In the above example, the name and salary of 5 different employees is stored in 2 arrays.

You might also like