CSC462 Artificial Intelligence Lab Manual Version3
CSC462 Artificial Intelligence Lab Manual Version3
CSC462 Artificial Intelligence Lab Manual Version3
CSC462
Lab Manual
Name
Registration Number
Class
Instructor’s Name
Introduction
The "Artificial Intelligence Lab Manual" is a comprehensive resource designed to
accompany a lab on artificial intelligence. This lab manual aims to provide students
with practical, hands-on experience in the exciting field of AI. Through a series of
guided exercises and small projects, students will explore core AI concepts,
algorithms, and tools, enabling them to develop a strong foundation in this rapidly
evolving discipline. In order to achieve the set goal, lab will be divided into three
main modules.
The labs constitute 25 % of the total marks for this course. During the labs you will
work individually. You will be graded for this and the ‘In-Lab’ tasks during the in-
lab viva. You will complete the ‘Post-Lab’ section of each lab before coming to the
next week’s lab.
You are not allowed to wander in the lab or consult other groups when performing
experiments. Similarly, the lab reports must contain original efforts. CUI has a
zero tolerance anti-plagiarism policy.
Apart from these weekly labs you will complete a semester project in group of
three students. Lab Mid will be conducted during the lab session. Final Project /
Final Exam which will be graded as Lab Terminal. The grading policy is already
discussed in the Course Description File.
Acknowledgement
The labs for CSC462 Artificial Intelligence are designed by Ms. Sadaf Iqbal. This is
the first version and is still in completion process. Afterwords, Dr. Atif Shakeel and
Dr. Asfar Yaqub completed the manual. The third version is completed by Dr. Ali
Raza Shahid to restructure the labs which could help students in the semester
project regarding the AI advance learning methods.
History of Revision
Lab # Description
Lab 1 Python Introduction – Data Structures and Loops
Lab 2 Python Introduction – Functions, Tuples, Lists and Dictionaries
Lab 3 Object Oriented Design & Breadth First Search Algorithm in Python
Lab 4 Depth First Search for Graph and Tree Traversal
Lab 5 Greedy Best First Search / A* Search for Graphs
Lab 6 Hill Climb Search Algorithm Implementation
Lab 7 Genetic Algorithm
Lab 8 Constraint Satisfaction Problem
Lab 9 Linear Regression and Gradient descent
Lab 10 Logistic Regression / Classification in Machine learning
Lab 11 Decision Tree Classification / KNN Classification
Lab 12 Convolution Neural Networks
Objectives:
• Easy to use that makes it simple to get your first programs working.
• Easy to learn that makes it an excellent choice for beginners.
• Runs everywhere, including Mac OS X, Windows, Linux and Unix.
Pre-Lab Reading 1:
https://www.geeksforgeeks.org/how-to-install-jupyter-notebook-in-windows/
https://colab.research.google.com
Pre-Lab Reading 2:
As a pre-lab reading activity, read Chapters 3-5 from the book “Introduction to Programming
Using Python by Y. Liang (Pearson, 2013)” to gain an insight about python programming and
its fundamentals.
Programming is simply the act of entering instructions for the computer to perform. These
instructions might crunch some numbers, modify text, look up information in files, or
communicate with other computers over the Internet. All programs use basic instructions as
building blocks. Here are a few of the most common ones, in English:
You can combine these building blocks to implement more intricate decisions, too. Programming
is a creative task, somewhat like constructing a castle out of LEGO bricks. You start with a basic
idea of what you want your castle to look like and inventory your available blocks. Then you
start building. Once you’ve finished building your program, you can pretty up your code just like
you would your castle.
Python refers to the Python programming language (with syntax rules for writing what is
considered valid Python code) and the Python interpreter software that reads source code (written
in the Python language) and performs its instructions. The name Python comes from the surreal
British comedy group Monty Python, not from the snake. Python programmers are affectionately
called Pythonistas.
a=0
while a < 10:
a=a+1
print a
▪ The 'for' loop
print i
print i
else:
In-Lab Task 1:
In-Lab Task 2:
1. Python recognized strings through quotes (single, double). Anything inside quotes is a
string for Python interpreter.
2. Type hello, press Enter. An error message will be displayed as Python interpreter does
not understand this as a string.
3. Type ‘hello’ and press Enter. Hello will be displayed.
4. Type ‘Quote me on this!’ and press Enter. Same string will be displayed.
5. Type “What’s your name?” and press Enter. What’s your name? will be printed on the
screen.
6. You can specify multi-line strings using triple quotes – “ “ “ or ‘ ‘ ‘. Type following text
and press Enter
In-Lab Task 3:
1. The interpreter acts as a simple calculator: you can type an expression at it and it will
write the value. Expression syntax is straightforward: the operators + (addition), -
(subtraction), * (multiplication) and / (division) work just like in most other languages.
Parentheses (()) can be used for grouping.
2. Type in 2 + 2 and press Enter. Python will output its result i.e. 4.
3. Try following expressions in the shell.
a. 50 – 4
b. 23.5 – 2.0
c. 23 – 18.5
d. 5 * 6
e. 2.5 * 10
f. 2.5 * 2.5
g. 28 / 4 (note: division returns a floating point number)
h. 26 / 4
i. 23.4 / 3.1
In-Lab Task 4:
Get an integer answer from division operation. Also get remainder of a division operation
in the output.
1. Division (/) always gives a float answer. There are different ways to get a whole as an
answer.
2. Use // operator:
a. Type 28 // 4 and press Enter. The answer is a whole number.
b. Type 26 // 4 and press Enter.
3. Use int/float cast operator: this operator changes the interchangeable types.
a. Type int(26/4) and press Enter. The answer is a whole number.
b. Type int(28/4), Type float(28/4); press Enter.
4. The modulus (or mod) % operator is used to get the remainder as an output (division /
operator returns the quotient).
a. Type 28 % 4. Press Enter. 0 will be the result.
b. Type 26 % 4. Press Enter. 2 will be the result.
In-Lab Task 5:
1. The multiplication (*) operator can be used for calculating powers of a number.
However, if the power is big, the task will be tedious. For calculating powers of a
number, Python uses ** operator.
2. Type following and obtain the results of above expressions.
a. 4 ** 3
c. 4 ** 10
d. 4 ** 29
e. 4 ** 150
f. 4 ** 1000
In-Lab Task 6:
Write following math expressions. Solve them by hand using operators’ precedence. Calculate their
answers using Python. Match the results.
The table below shows the operator precedence in Python (from Highest to Lowest).
In-Lab Task 7:
In Python and many other programming languages you can get user input. In Python the input()
function will ask keyboard input from the user.The input function prompts text if a parameter is
given. The function reads input from the keyboard, converts it to a string and removes the newline
(Enter).Type and experiment with the script below.
In-Lab Task 9:
Let us take an integer from user as input and check whether the given value is even or not.
1. Create a new Python file from Python Shell and type the following code.
2. Run the code by pressing F5.
Let us modify the code to take an integer from user as input and check whether the given value is
even or odd. If the given value is not even then it means that it will be odd. So here we need to use
if-else statement an demonstrated below.
1. Create a new Python file from Python Shell and type the following code.
2. Run the code.
Calculate the sum of all the values between 0-10 using while loop.
1. Create a new Python file from Python Shell and type the following code.
2. Run the code
Accept 5 integer values from user and display their sum. Draw flowchart before coding in
python.
1. Create a new Python file from Python Shell and type the following code.
2. Run the code.
Write a Python code to keep accepting integer values from user until 0 is entered. Display sum
of the given values.
1. Create a new Python file from Python Shell and type the following code.
2. Run the code.
Write a Python code to accept an integer value from user and check that whether the given
value is prime number or not.
Solution:
1. Create a new Python file from Python Shell and type the following code.
2. Run the code.
Write a Python code to accept marks of a student from 1-100 and display the grade according
to the following formula.
Post-Lab Task 2:
Write a program that takes a number from user and calculate the factorial of that number.
Post-Lab Task 3:
Fibonacci series is that when you add the previous two numbers the next number is formed.
You have to start from 0 and 1.
E.g., 0+1=1 → 1+1=2 → 1+2=3 → 2+3=5 → 3+5=8 → 5+8=13
Lab Assessment
Pre-Lab /1
In-Lab /5
Data
Analysis
/4 /10
Data
Post-Lab /4 /4
Presentation
Writing
/4
Style
Objectives:
Pre-Lab Reading 1:
As a pre-lab activity, read Chapters 6, 10 and 14 from the book “Introduction to Programming Using
Python - Y. Liang (Pearson, 2013)” to gain an insight about python programming and its
fundamentals.
Python provides different types of data structures as sequences. In a sequence, there are
more than one values and each value has its own index. The first value will have an index 0 in
python, the second value will have index 1 and so on. These indices are used to access a
particular value in the sequence.
Python offers different types of sequences. Lists are the most important type of sequence being used
in Python. It is a collection of same or different type of objects. These objects are separated by
commas to distinguish from each other enclosed in square brackets. The following activities show
that how lists are used in Python. A tuple is a sequence of immutable Python objects. Tuples are
sequences, just like lists. The differences between tuples and lists are, the tuples cannot be changed
unlike lists and tuples use parentheses, whereas lists use square brackets. The data type "set", which
is a collection type, has been part of Python since version 2.4. A set contains an unordered collection
of unique and immutable objects. The set data type is, as the name implies, a Python implementation
of the sets as they are known from mathematics. This explains, why sets unlike lists or tuples can't
have multiple occurrences of the same element. A dictionary is a collection which is unordered,
changeable and indexed. In Python dictionaries are written with curly brackets, and they have keys
and values.
function_name(parameters)
a = multiplybytwo(70)
a=140
▪ Define a Function?
def function_name(parameter_1,parameter_2):
{this is the code in the function}
▪ range() Function:
If you do need to iterate over a sequence of numbers, the built-in function range() comes in
handy. It generates lists containing arithmetic progressions:
It is possible to let the range start at another number, or to specify a different increment
(even negative; sometimes this is called the ‘step’):
[5, 6, 7, 8, 9]
[0, 3, 6, 9]
Lists are what they seem - a list of values. Each one of them is numbered, starting from zero.
You can remove values from the list, and add new values to the end. Example: Your many cats'
names. Compound data types, used to group together other values. The most versatile is the list,
which can be written as a list of comma-separated values (items) between square brackets. List
items need not all have the same type
print cats[2]
cats.append('Catherine')
Compound datatype:
>>> a = ['spam', 'eggs', 100, 1234]
>>> a[1:-1]
['eggs', 100]
>>> a[:2] + ['bacon', 2*2]
['spam', 'eggs', 'bacon', 4]
>>> 3*a[:3] + ['Boo!']
['spam', 'eggs', 100, 'spam', 'eggs', 100, 'spam', 'eggs', 100, 'Boo!']
>>> a= ['spam', 'eggs', 100, 1234]
>>> a[2] = a[2] + 23
>>> a
['spam', 'eggs', 123, 1234]
Replace some items:
... a[0:2] = [1, 12]
>>> a
[1, 12, 123, 1234]
Remove some:
... a[0:2] = []
>>> a
[123, 1234]
Length of list:
>>> a = ['a', 'b', 'c', 'd']
>>> len(a)
4
Nest lists:
>>> q = [2, 3]
>>> p = [1, q, 4]
>>> len(p)
3
>>> p[1]
[2, 3]
Functions of lists (you may also explore the online python documentation for more
details):
list.append(x): Add an item to the end of the list; equivalent to a[len(a):] = [x].
list.extend(L): Extend the list by appending all the items in the given list;equivalent to a[len(a):]
= L.
list.insert(i, x): Insert an item at a given position. The first argument is the index of the element
before which to insert, so a.insert(0, x) inserts at the front of the list, and a.insert(len(a), x) is
equivalent to a.append(x).
list.remove(x): Remove the first item from the list whose value is x. It is an error if there is no
such item.
list.pop([i]): Remove the item at the given position in the list, and return it. If no index is
specified, a.pop() removes and returns the last item in the list.
months = ('January' , 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September',
'October', 'November', ' December’)
index Value
0 Jan
1 Feb
2 Mar
3 April
4 May
5 Jun
6 Jul
7 Aug
8 Sep
9 Oct
10 Nov
11 Dec
>>> fruit
True
False
In python, the word is called a 'key', and the definition a 'value'. The values in a dictionary
aren't numbered - they aren't in any specific order, either - the key does the same thing.
You can add, remove, and modify the values in dictionaries. Example: telephone book.
In-Lab Task 1:
Use loops to accept 5 values from user and store them in a list. Display all the values (objects) of
the list.
In-Lab Task 2:
Repeat the above code by accepting 5 integer values from user. Store these values in a list and
display the sum of given values.
You will get the following output.
In-Lab Task 3:
Accept 5 integer values from user. Store these values in a list and display the list in ascending
order.
In-Lab Task 4:
In-Lab Task 5:
Write a Python code to accept a list from user and find a required element in it.
In-Lab Task 6:
Write a function called say_hello that takes in a person’s name as a parameter. The function should
print a greeting message with the person’s name. Then call the function three times with three
different names.
In-Lab Task 7:
For example: "dad" is the same in forward or reverse direction. Another example is
"aibohphobia" which literally means, an irritable fear of palindromes.
Write a function in python that receives a string and returns True if that string is a palindrome and
False otherwise. Remember that difference between upper and lower case characters are ignored
during this determination.
In-Lab Task 8:
Imagine two matrices given in the form of 2D lists as under;
a = [[1, 0, 0], [0, 1, 0], [0, 0, 1] ]
b = [[1, 2, 3], [4, 5, 6], [7, 8, 9] ]
Write a python code that finds another matrix/2D list that is a product of and b, i.e.,
C=a*b
In-Lab Task 9:
A closed polygon with N sides can be represented as a list of tuples of N connected coordinates, i.e.,
[ (x1,y1), (x2,y2), (x3,y3), . . . , (xN,yN) ]. A sample polygon with 6 sides (N=6) is shown below.
Write a python function that takes a list of N tuples as input and returns the perimeter of the polygon.
Remember that your code should work for any value of N.
Imagine two sets A and B containing numbers. Without using built-in set functionalities, write
your own function that receives two such sets and returns another set C which is a symmetric
difference of the two input sets. (A symmetric difference between A and B will return a set C
which contains only those items that appear in one of A or B. Any items that appear in both
sets are not included in C). Now compare the output of your function with the following built-
in functions/operators.
• A.symmetric_difference(B)
• B.symmetric_difference(A)
• A^B
• B^A
Create a Python program that contains a dictionary of names and phone numbers. Use a tuple of
separate first and last name values for the key field. Initialize the dictionary with at least three
names and numbers. Ask the user to search for a phone number by entering a first and last name.
Display the matching number if found, or a message if not found.
Post-Lab Task 1:
Create two lists based on the user values. Merge both the lists and display in sorted order.
Post-Lab Task 2:
Repeat the above activity to find the smallest and largest element of the list. (Suppose all the
elements are integer values)
Post-Lab Task 3:
The derivate of a function 𝑓(𝑥) is a measurement of how quickly the function 𝑓 changes with
respect to change in its domain 𝑥. This measurement can be approximated by the following
relation,
Where h represents a small increment in x. You have to prove the following relation,
Imagine x being a list that goes from –pi to pi with an increment of 0.001. You can approximate
the derivative by using the following approximation,
In your case, assume h = 0.001. That is at each point in x, compute the right hand side of above
equation and compare whether the output value is equivalent to cos(x). Also print the
𝑑
corresponding values of 𝑑𝑥 sin (𝑥)and cos(x) for every point. Type ‘’from math import *’’ at the
start of your program to use predefined values of pi, and sin and cos functions. What happens if
you increase the interval h from 0.001 to 0.01 and then to 0.1?
Moreover, also plot your results using the matplotlib library (Hint: See online documentation and
examples)
Lab Assessment
Pre-Lab /1
In-Lab /5
Data
Analysis
/4 /10
Data
Post-Lab /4 /4
Presentation
Writing
/4
Style
LAB #3: Object Oriented Design & Breadth First Search Algorithm in
Python
Objectives
This lab will introduce students to objected oriented programming and search problems. We
will first start by representing problems in terms of state space graph using the classes. Given a
state space graph, starting state and a goal state, students will then perform a basic Breadth First
Search solution within that graph. The output will be a set of actions or a path that will begin
from initial state/node and end in the goal node.
Pre-Lab Reading 1:
The word 'class' can be used when describing the code where the class is defined.
• A class is like a
– Prototype
– Blue-print
– An object creator
• A class defines potential objects
– What their structure will be
– What they will be able to do
• Objects are instances of a class
– An object is a container of data: attributes
– An object has associated functions: methods
Syntax:
# Defining a class
class class_name:
[statement 1]
[statement 2]
[statement 3] [etc]
Example1:
class MyClass:
i = 12345
def f(self):
return 'hello world'
x = MyClass()
print x.i
print x.f()
Example2:
class Complex:
def __init__(self, realpart, imagpart):
self.r = realpart
self.i = imagpart
x = Complex(3.0, -4.5)
print x.r," ",x.i
Example3:
class Shape:
def __init__(self,x,y):
self.x = x
self.y = y
description = "This shape has not been described yet"
author = "Nobody has claimed to make this shape yet"
def area(self):
return self.x * self.y
def perimeter(self):
return 2 * self.x + 2 * self.y
def describe(self,text):
self.description = text
def authorName(self,text):
self.author = text
def scaleSize(self,scale):
self.x = self.x * scale
self.y = self.y * scale
a=Shape(3,4)
print a.area()
Inheritance Example:
class Square(Shape):
def __init__(self,x):
self.x = x
self.y = x
class DoubleSquare(Square):
def __init__(self,y):
self.x = 2 * y
self.y = y
def perimeter(self):
return 2 * self.x + 3 * self.y
Create a class name basic_calc with following attributes and methods; Two integers (values are
passed with instance creation) Different methods such as addition, subtraction, division,
multiplication
Create another class inherited from basic_calc named s_calc which should have the following
additional methods; Factorial, x_power_y,log, ln etc
class child_class(parent_class):
def __init__(self,x):
#it will modify the _init_ function from parent class
#additional methods can be defined
As pre-lab activity, read Chapter 3 from the book (Artificial Intelligence, A Modern Approach by
Peter Norvig, 3rd edition) to know the basics of search algorithms.
Graph Theory
Graph theory and in particular the graph ADT (abstract data-type) is widely explored and
implemented in the field of Computer Science and Mathematics. Consisting of vertices (nodes)
and the edges (optionally directed/weighted) that connect them, the data-structure is effectively
able to represent and solve many problem domains. One of the most popular areas of algorithm
design within this space is the problem of checking for the existence or (shortest) path between
two or more vertices in the graph. Properties such as edge weighting and direction are two such
factors that the algorithm designer can take into consideration. In this post I will be exploring
two of the simpler available algorithms, Depth-First and Breath-First search to achieve the goals
highlighted below:
And in the case of BFS, return the shortest path (length measured by number of path edges).
In our illustration, - which is a pictorial representation of a graph, - the node "a" is connected
with the node "c", but "a" is not connected with "b". The connecting line between two nodes is
called an edge. If the edges between the nodes are undirected, the graph is called an undirected
graph. If an edge is directed from one vertex (node) to another, a graph is called a directed graph.
An directed edge is called an arc.
Python has no built-in data type or class for graphs, but it is easy to implement them in Python.
One data type is ideal for representing graphs in Python, i.e. dictionaries.
The keys of the dictionary above are the nodes of our graph. The corresponding values are lists
with the nodes, which are connecting by an edge. There is no simpler and more elegant way to
represent a graph.
Paths in Graph
We want to find now the shortest path from one node to another node.
▪ Adjacent vertices:
Two vertices are adjacent when they are both incident to a common edge.
▪ Path in an undirected Graph:
A path in an undirected graph is a sequence of vertices P = ( v1, v2, ..., vn ) ∈ V x V x
... x V such that vi is adjacent to v{i+1} for 1 ≤ i < n. Such a path P is called a path of
length n from v1 to vn.
▪ Simple Path:
A path with no repeated vertices is called a simple path.
Degree of a Graph
The degree of a vertex v in a graph is the number of edges connecting it, with loops counted
twice. The degree of a vertex v is denoted deg(v). The maximum degree of a graph G, denoted
by Δ(G), and the minimum degree of a graph, denoted by δ(G), are the maximum and minimum
degree of its vertices.
In the graph on the right side, the maximum degree is 5 at vertex c and the minimum degree is 0,
i.e the isolated vertex f.
Pre-Lab Reading Task 5:
Sometimes, very different-sounding problems turn out to be similar when you think about
how to solve them. What do Pac-Man, the royal family of Britain, and driving to Orlando have
in common? They all involve route-finding or path-search problems:
How is the current Prince William related to King William III, who endowed the College of
William and Mary in 1693?
What's the best way to drive from Dallas, Texas to Orlando, Florida?
We have to be given some information to answer any of these questions. This information about
each question can be represented in terms of graph. This lab will enable students to represent
problems in terms of graph and then finding a solution in terms of a path using breadth first search
(BFS).
In-Lab Task 1:
Consider a toy problem that can be represented as a following graph. How would you represent this
graph in python?
Solution:
Remember a node of a tree can be represented by four attributes. We consider node as a class
having four attributes namely,
1- State of the node
2- Parent of the node
3- Actions applicable to that node
4- The total path cost of that node starting from the initial state until that particular node is
reached
We can now implement this class in a dictionary. This dictionary will represent our state space
graph. As we will traverse through the graph, we will keep updating parent and cost of each node.
In-Lab Task 2:
For the graph in previous activity, imagine node A as starting node and your goal is to reach F.
Keeping breadth first search in mind, describe a sequence of actions that you must take to reach that
goal state.
Solution:
Remember that in theory class, we discussed the following implementation of breadth first search.
What follows is python implementation of above mentioned algorithm,
In-Lab Task 3:
Change initial state to D and set goal state as C. What will be resulting path of BFS search?
In-Lab Task 4:
Imagine going from Arad to Bucharest in the following map. Implement a BFS to find the
corresponding path.
Post-Lab Task 2:
Consider a maze as shown below. Each empty tile represents a separate node in the graph. There
are maximum of four possible actions i.e., to move up, down, left or right on any given tile/node.
Using BFS, find out how to get out of the maze if you’re in the start position depicted below.
CRITICAL ANALYSIS/CONCLUSION
Lab Assessment
Pre-Lab /1
In-Lab /5
Data
Analysis
/4 /10
Data
Post-Lab /4 /4
Presentation
Writing
/4
Style
This lab will introduce students to search problems. We will first start by representing problems in
terms of state space graph. Given a state space graph, starting state and a goal state, students will
then perform a basic Depth First Search solution within that graph. The output will be a set of
actions or a path that will begin from initial state/node and end in the goal node. Students will also
see the working of Uniform Cost Search.
As pre-lab activity, read Chapter 3 from the book (Artificial Intelligence, A Modern Approach by
Peter Norvig, 3rd edition) to know the basics of search algorithms.
In the previous lab we studied breadth first search, which is the most naïve way of searching
trees. The path returned by breadth search is not optimal. Uniform cost search always returns the
optimal path. In this lab students will be able to implement a uniform cost search approach.
Students will also be able to apply depth first search to their problems.
In-Lab Task 1:
Consider a toy problem that can be represented as a following graph. How would you represent this
graph in python?
Remember a node of a tree can be represented by four attributes. We consider node as a class
having four attributes namely,
1. State of the node
2. Parent of the node
3. Actions applicable to that node (In the book/theory class, this attribute referred to the
parent action that generated current node, however in this lab we will use this
attribute to refer to all possible children states that are achievable from current state).
4. The total path cost of that node starting from the initial state until that particular node
is reached
We can now implement this class in a dictionary. This dictionary will represent our state space
graph. As we will traverse through the graph, we will keep updating parent and cost of each node.
In-Lab Task 2:
For the graph in previous activity, imagine node A as starting node and your goal is to reach F.
Keeping depth first search in mind, describe a sequence of actions that you must take to reach
that goal state.
Solution: Remember that we can implement depth first search simply by using LIFO approach
instead of FIFO that was used in breadth first search. Additionally we also don’t keep leaf nodes
(nodes without children) in explored set.
Now the function definition is complete which can be called as follows,
Notice the difference in two portions of the code between breadth first search and depth first search.
In the first we just pop out the last entry from the queue and in the 2 nd difference we delete leaf
nodes from the graph.
In-Lab Task 3:
Change initial state to D and set goal state as C. What will be resulting path of BFS search?
What will be the sequence of nodes explored?
In-Lab Task 4:
Imagine the same tree but this time we also mention the cost of each edge.
Implement a uniform cost solution to find the path from C to B.
Solution:
First we modify our graph structure so that in Node.actions is an array of tuples where each tuple
contains a vertex and its associated weight.
We also modify the frontier format which will now be a dictionary. This dictionary will contain
each node (the state of the node will act as a key and its parent and accumulated cost from the
initial state will be two attributes of a particular key). We now define a function which will give the
node/key for which the cost is minimum. This will be implementation of pop method from the
priority queue.
In-Lab Task 5:
Imagine going from Arad to Bucharest in the following map. Your goal is to minimize the distance
mentioned in the map during your travel. Implement a uniform cost search to find the
corresponding path.
Post-Lab Task 1:
Create a graph and then set initial and goal states such that the number of nodes visited for BFS is
smaller than that in DFS. Now modify the initial and goal state such that the number of nodes
visited for BFS is larger than that in DFS.
CRITICAL ANALYSIS/CONCLUSION
Lab Assessment
Pre-Lab /1
In-Lab /5
Data
Analysis
/4 /10
Data
Post-Lab /4 /4
Presentation
Writing
/4
Style
This lab will introduce students to heuristic and local search methods. In particular we will implement
a particular variant of heuristic search known as A* search.
• How to represent problems in terms of state space graph
• How to find a solution of those problems using A* search.
Pre-Lab Reading 1:
As pre-lab activity, read Chapter 3 from the book (Artificial Intelligence, A Modern Approach by Peter
So far we have studied uninformed search strategies. There can also be occasions where we are given
some extra information related to a particular goal in the form of heuristics. We can use such heuristics in
our search strategy. A particular form of this strategy known as A* search uses the total cost of a
solution via a particular node as that node’s evaluation criteria. We will see its implementation in
detail. A* Search algorithm is one of the best and popular technique used in path-finding and graph
traversals.
Informally speaking, A* Search algorithms, unlike other traversal techniques, it has “brains”. What it
means is that it is really a smart algorithm which separates it from the other conventional algorithms.
This fact is cleared in detail in below sections. And it is also worth mentioning that many games and
web-based maps use this algorithm to find the shortest path very efficiently (approximation).
Explanation
Consider a square grid having many obstacles and we are given a starting cell and a target cell. We
want to reach the target cell (if possible) from the starting cell as quickly as possible. Here A*
Search Algorithm comes to the rescue. What A* Search Algorithm does is that at each step it picks
the node according to a value-‘f’ which is a parameter equal to the sum of two other parameters – ‘g’
and ‘h’. At each step it picks the node/cell having the lowest ‘f’, and process that node/cell. We
define ‘g’ and ‘h’ as simply as possible below g = the movement cost to move from the starting point
to a given square on the grid, following the path generated to get there. h = the estimated movement
cost to move from that given square on the grid to the final destination. This is often referred to as
the heuristic, which is nothing but a kind of smart guess. We really don’t know the actual distance
until we find the path, because all sorts of things can be in the way (walls, water, etc.). There can be
many ways to calculate this ‘h’ which are discussed in the later sections.
Algorithm
We create two lists – Open List and Closed List (just like Dijkstra Algorithm)
// A* Search Algorithm
1. Initialize the open list
2. Initialize the closed list put the starting node on the open
list (you can leave its f at zero)
3. while the open list is not empty
a) find the node with the least f on the open list, call it
"q"
b) pop q off the open list
c) generate q's 8 successors and set their parents to q
d) for each successor
i) if successor is the goal, stop search
ii) else, compute both g and h for successor
successor.g = q.g + distance between
successor and q successor.h = distance from
goal to successor (This can be done using
many ways, we will discuss three heuristics-
Manhattan, Diagonal and Euclidean Heuristics)
successor.f = successor.g + successor.h
iii) if a node with the same position as successor
is in the OPEN list which has a lower f than
successor, skip this successor
iv) if a node with the same position as successor
is in the CLOSED list which has a lower f
than successor, skip this successor
otherwise, add the node to the open listend
(for loop)
e) push q on the closed list end (while loop)
So suppose as in the below figure if we want to reach the target cell from the source cell, then the A*
Search algorithm would follow path as shown below. Note that the below figure is made by
considering Euclidean Distance as a heuristics.
2) Diagonal Distance-
• It is nothing but the maximum of absolute
values of differences in the goal’s x and y
coordinates and the current cell’s x and y
coordinates respectively, i.e.,
dx = abs(current_cell.x – goal.x)
dy = abs(current_cell.y – goal.y)
h = D * (dx + dy) + (D2 - 2 * D) * min(dx, dy)
where D is length of each node(usually = 1) and D2 is diagonal distance between each node (usually
= sqrt(2) ).
• When to use this heuristic? – When we are allowed to move in eight directions only
(similar to a move of a King in Chess)
The Diagonal Distance Heuristics is shown by the below figure (assume red spot as source cell and
green spot as target cell).
3) Euclidean Distance-
• As it is clear from its name, it is nothing but
the distance between the current cell and the
goal cell using the distance formula
h = sqrt ( (current_cell.x – goal.x)2 +
(current_cell.y – goal.y)2 )
• When to use this heuristic? – When we are
allowed to move in any directions.
The Euclidean Distance Heuristics is shown by the below
figure (assume red spot as source cell and green spot as
target cell).
Relation (Similarity and Differences) with other algorithms- Dijkstra is a special case of A*
Search Algorithm, where h = 0 for all nodes.
In-Lab Task 1:
Consider a maze as shown below. Each empty tile represents a separate node in the graph, while the
walls are represented by blue tiles. Your starting node is A and the goal is to reach Y. Implement an
A* search to find the resulting path.
Solution:
Since A* search needs a heuristic function that specifies the distance of each node with the goal, you
can use Euclidean distance between a particular node and the goal node as your heuristic function of
that particular node.
But first we need to modify the structure of Node class that will also include the heuristic distance of
that node.
Next we modify the structure of dictionary that will save our graph. Here at each node, we also specify
the coordinate location of each node. For implementation we will follow uniform cost implementation
except for the fact that the total cost will include both the previous cost of reaching that node and the
distance of the goal with that node.
In-Lab Task 2:
Lab Assessment
Pre-Lab /1
In-Lab /5
Data
Analysis
/4 /10
Data
Post-Lab /4 /4
Presentation
Writing
/4
Style
Objective:
This lab will introduce students to search problems. We will first start by representing problems in
terms of state space graph. Given a state space graph, starting state and a goal state, students will then
perform a basic hill climbing solution within that graph. The output will be a set of actions or a path
that will begin from initial state/node and end in the goal node. In computer science, hill climbing is a
mathematical optimization technique which belongs to the family of local search. It is relatively simple
to implement, making it a popular first choice. Although more advanced algorithms may give better
results, in some situations hill climbing works just as well.
Pre-Lab Reading 1:
As pre-lab activity, read Chapter 3 from the book (Artificial Intelligence, A Modern Approach by Peter
Norvig, 3rd edition) to know the basics of search algorithms.
Hill climbing can be used to solve problems that have many solutions, some of which are better than
others. It starts with a random (potentially poor) solution, and iteratively makes small changes to the
solution, each time improving it a little. When the algorithm cannot see any improvement anymore, it
terminates. Ideally, at that point the current solution is close to optimal, but it is not guaranteed that hill
climbing will ever come close to the optimal solution.
For example, hill climbing can be applied to the traveling salesman problem. It is easy to find a
solution that visits all the cities but will be very poor compared to the optimal solution. The algorithm
starts with such a solution and makes small improvements to it, such as switching the order in which
two cities are visited. Eventually, a much better route is obtained. Hill climbing is used widely in
artificial intelligence, for reaching a goal state from a starting node. Choice of next node and starting
node can be varied to give a list of related algorithms.
Hill Climbing is a heuristic search used for mathematical optimization problems in the field of
Artificial Intelligence. Given a large set of inputs and a good heuristic function, it tries to find a
sufficiently good solution to the problem. This solution may not be the global optimal maximum.
• ‘Heuristic search’ means that this search algorithm may not find the optimal solution to
the problem. However, it will give a good solution in a reasonable time.
• A heuristic function is a function that will rank all the possible alternatives at any
branching step in the search algorithm based on the available information. It helps the
algorithm to select the best route out of possible routes.
A. Variant of generate and test algorithm: It is a variant of generating and test algorithm. The
generate and test algorithm is as follows :
Hence, we call Hill climbing a variant of generating and test algorithm as it takes the feedback from
the test procedure. Then this feedback is utilized by the generator in deciding the next move in
search space.
B. Uses the Greedy approach: At any point in state space, the search moves in that direction only
which optimizes the cost of function with the hope of finding the optimal solution at the end.
1. Simple Hill climbing: It examines the neighboring nodes one by one and selects the first
neighboring node which optimizes the current cost as the next node.
Algorithm:
Step 1 : Evaluate the initial state. If it is a goal state then stop and return success. Otherwise, make
initial state as current state.
Step 2 : Loop until the solution state is found or there are no new operators present which can be
applied to the current state.
a) Select a state that has not been yet applied to the current state and apply it to produce a new
state.
b) Perform these to evaluate new state
i. If the current state is a goal state, then stop and return success.
ii. If it is better than the current state, then make it current state and proceed further.
iii. If it is not better than the current state, then continue in the loop until a solution is
found.
Step 3 : Exit.
2. Steepest-Ascent Hill climbing: It first examines all the neighboring nodes and then selects
the node closest to the solution state as of the next node.
Algorithm:
Step 1 : Evaluate the initial state. If it is a goal state then stop and return success. Otherwise, make
initial state as current state.
Step 2 : Repeat these steps until a solution is found or current state does not change
a) Select a state that has not been yet applied to the current state.
b) Initialize a new ‘best state’ equal to current state and apply it to produce a new state.
c) Perform these to evaluate new state
i. If the current state is a goal state, then stop and return success.
ii. If it is better then best state, then make it best state else continue loop with another
new state.
d) Make best state as current state and go to Step 2: b) part.
Step 3 : Exit
3. Stochastic hill climbing: It does not examine all the neighboring nodes before deciding
which node to select. It just selects a neighboring node at random and decides (based on the amount
of improvement in that neighbor) whether to move to that neighbor or to examine another.
Algorithm:
Step 1: Evaluate the initial state. If it is a goal state then stop and return success. Otherwise,
make the initial state the current state.
Step 2: Repeat these steps until a solution is found or the current state does not change.
a) Select a state that has not been yet applied to the current state.
b) Apply successor function to the current state and generate all the neighbor states.
c) Among the generated neighbor states which are better than the current state choose a state
randomly (or based on some probability function).
d) If the chosen state is the goal state, then return success, else make it the current state and
repeat step 2: b part.
Step 3: Exit.
In-Lab Task 1:
For the given graph , imagine node A as starting node and your goal is to reach Y. Apply hill
climbing and see how closer you can get to your destination.
Solution:
Instead of maintaining a fringe or a frontier to save the nodes that are to be explored, hill climbing just
explores the best child of a given node, then explores the best grandchild of a particular child and so on
and so forth.
This will give ['F', 'H', 'I', 'J'] as the solution, which means that the algorithm gets
stuck at J and doesn’t go further towards Y since the distance of G is greater than J. Remember that the
path is not important in local search. The solution should be your last node (currently that is J which
happens to be local maxima).
In-Lab Task 2:
Post-Lab Task 1:
In the ‘In-Lab Task 1’, it was observed that the basic hill climb algorithm did not provide a
complete solution. Therefore, using the Steepest-Ascent hill climbing algorithm from the Pre-
Lab Reading to implement the method for finding a better solution.
Post-Lab Task 2:
In the ‘In-Lab Task 2’, it was observed that the basic hill climb algorithm did not yield a
complete solution. Therefore, utilize the Stochastic Hill Climbing algorithm from the Pre-Lab
Reading to implement a method for achieving a more optimal solution. Additionally, alter the
initial and goal states to explore alternatives and observe the introduced randomness.
Note: The instructor can design graded lab activities according to the level of difficult and
complexity of the solved lab activities. The lab tasks assigned by the instructor should be
evaluated in the same lab
CRITICAL ANALYSIS/CONCLUSION
Lab Assessment
Pre-Lab /1
In-Lab /5
Data
Analysis
/4 /10
Data
Post-Lab /4 /4
Presentation
Writing
/4
Style
Objective:
This lab will introduce students to genetic algorithms. Students will get the opportunity to get into
details of genetic concepts of computation including crossover, mutation and survivor selection. This
lab will also introduce students into different schemes of chromosome encoding.
As pre-lab activity, read Chapters 4 from the book (Artificial Intelligence, A Modern Approach by
Peter Norvig, 4th edition) to know the basics of genetic algorithms.
In previous lab we implemented hill climbing and saw that it can stuck at local maxima. A possible
solution to avoid local maxima is to use genetic algorithms.
Genetic Algorithms(GAs) are adaptive heuristic search algorithms that belong to the larger part of
evolutionary algorithms. Genetic algorithms are based on the ideas of natural selection and genetics.
These are intelligent exploitation of random search provided with historical data to direct the search
into the region of better performance in solution space. They are commonly used to generate highquality
solutions for optimization problems and search problems.
Genetic algorithms simulate the process of natural selection which means those species who can
adapt to changes in their environment are able to survive and reproduce and go to next generation. In
simple words, they simulate “survival of the fittest” among individual of consecutive generation for
solving a problem. Each generation consist of a population of individuals and each individual
represents a point in search space and possible solution. Each individual is represented as a string of
character/integer/float/bits. This string is analogous to the Chromosome.
Search space
The population of individuals are maintained within search space. Each individual represents a
solution in search space for given problem. Each individual is coded as a finite length vector
(analogous to chromosome) of components. These variable components are analogous to Genes.
Thus a chromosome (individual) is composed of several genes (variable components).
Fitness Score
A Fitness Score is given to each individual which shows the ability of an individual to “compete”.
The individual having optimal fitness score (or near optimal) are sought.
The GAs maintains the population of n individuals (chromosome/solutions) along with their fitness
scores.The individuals having better fitness scores are given more chance to reproduce than others.
The individuals with better fitness scores are selected who mate and produce better offspring by
combining chromosomes of parents. The population size is static so the room has to be created for
new arrivals. So, some individuals die and get replaced by new arrivals eventually creating new
generation when all the mating opportunity of the old population is exhausted. It is hoped that over
successive generations better solutions will arrive while least fit die.
Each new generation has on average more “better genes” than the individual (solution) of previous
generations. Thus each new generations have better “partial solutions” than previous generations.
Once the offspring produced having no significant difference from offspring produced by previous
populations, the population is converged. The algorithm is said to be converged to a set of solutions
for the problem.
1) Selection Operator: The idea is to give preference to the individuals with good fitness
scores and
allow them to pass their genes to successive generations.
2) Crossover Operator: This represents mating between individuals. Two individuals are
selected
using selection operator and crossover sites are chosen randomly. Then the genes at these
crossover
sites are exchanged thus creating a completely new individual (offspring). For example:
3) Mutation Operator: The key idea is to insert random genes in offspring to maintain the
diversity in the population to avoid premature convergence. For example:
Algorithm
The whole method can be summarized as:
1) Randomly initialize populations p
2) Determine fitness of population
3) Until convergence repeat:
a) Select parents from population
b) Crossover and generate new population
c) Perform mutation on new population
d) Calculate fitness for new population
Consider maxOne problem where the goal is to arrange a string of L bits into all ones. At first the
solution may seem trivial i.e., for L=8 the solution is [1, 1, 1, 1, 1, 1, 1, 1]. Despite this we shall see
how many iterations it will take for an instance of genetic algorithm to find the solution.
Solution:
We start by looking at the GA class which is instantiated by providing individualSize (length of
chromosome or L in our case) and populationSize (how many survivors will we retain in each
generation). The instance variable population is a dictionary of items where each item contains a bit
array of an individual and its heuristic distance from the goal. The variable totalFitness maintains the
total heuristic value of each generation/population.
Whenever we update current population, we also update the heuristic value of each
individual/state/chromosome. This is done using method updatePopulationFitness(). The individual
fitness/heuristic value is calculated as simply the number of ones in an array of an individual. The goal
is to maximize this value generation after generation.
We now focus on how to select parents for reproduction. This is done using roulette wheel
implementation. We first determine the size of the roulette wheel i.e., how many values will it store.
This is simply set as 5 times the size of population, e.g., if we consider that we will retain 10 survivors
at each generation then the size of the wheel is 50. We now have to fill these 50 values based upon the
fitness value of each of those 10 individuals. This is done by calculating the probability of occurrence
(h/sum(h)) of each individual. The variable individualLength determines how many values out of those
50 belong to that specific individual. Finally we generate random locations from roulette wheel
populationSize times and select individuals based upon that. Those individuals now replace the original
population.
We now come to the method generateChildren() which gerates children based upon crossover. A
crossover probability is provided as input. For population size of 8 and crossover probability of 0.8 for
instance, we need to do the crossover only between 80% of the 4 pairs i.e., we will do crossover only
with 3 pairs and will take a pair as it is to next step (which is mutation). After doing crossover, we also
update the fitness value of that child (although this shouldn’t be required since we are already calling
updatePopulationFitness method at the end).
The next step is to mutate the population (not individual child) based upon a certain mutation
probability provided as an input. For example, for individualSize of 5 bits and populationSize of 8 and
a probability of 0.05, we need to swap 5% of those 8*5=40 bits, i.e., round(0.05*40) bits will be
swapped.
We now focus on our main function that will initialize the class. Whenever we find an individual in a
generation which has ideal heuristic value, we terminate the algorithm.
In-Lab Task 2:
Imagine an 8 queen problem, where the goal is to place 8 queens on an 8 X 8 board such that no two
queens are on the same row or column or diagonal. (Before proceeding, kindly refer to lectures). A
sample state is shown below.
Solution:
We start by encoding the state in the form of 8 alphabets. We then define a heuristic
function that evaluates how close a given state is with the goal. For this function we sum
up number of queens that are in the same row or diagonal within a given queen and
then we add up this measure for all 8 queens. What follows is the function definition,
We now calculate fitness of whole population and after that initialize the
population.
We now define the survivor selection method. Here we invert the probability of
selection, i.e., the state with lower heuristic value should be given higher
probability of selection.
Post-Lab Task 1:
The traveling salesman problem is an optimization problem where there is a finite number
of cities, and the cost of travel between each city is known. The goal is to find an ordered
set of all the cities for the salesman to visit such that the cost or total distance traveled by
the salesman is minimized.
The following shoes some cities in US and the route that is optimal for the salesman to
follow to minimize the total distance.
CSC462 Artificial Intelligence
4th city first, then 9th city and so on. The heuristic value of this individual will be the
total distance covered by the salesman or the sum of distance between consecutive
pair of those 15 points in that individual. Likewise generate a pollution of 45
individuals (possibly by randomly permuting this set). Keep in mind that the salesman
is not allowed to visit any city twice i.e., the list must not contain any point multiple
times. Use set() function in python for this purpose. See how the total fitness values of
the population decreases with each generation. Note that in this case we don’t know
the minimum heuristic value of the ideal solution. Actually, we don’t have an ideal
solution beforehand i.e., we don’t know how the ideal sequence of cities looks like
(unlike other problems where we already know the solution beforehand e.g., how 4
queens should be placed in order to obtain the heuristic value of 0 or how all bits
should be 1 to obtain the ideal heuristic value).
CRITICAL ANALYSIS/CONCLUSION
Lab Assessment
Pre-Lab /1
In-Lab /5
Data
Analysis
/4 /10
Data
Post-Lab /4 /4
Presentation
Writing
/4
Style
As pre-lab activity, read Chapters 5 from the book (Artificial Intelligence, A Modern Approach by
Peter Norvig, 4th edition) to know the basics of constraint satisfaction problems.
Constraint satisfaction is a technique where a problem is solved when its values satisfy certain
constraints or rules of the problem. Such type of technique leads to a deeper understanding of the
problem structure as well as its complexity.
In constraint satisfaction, domains are the spaces where the variables reside, following the problem
specific constraints. These are the three main elements of a constraint satisfaction technique. The
constraint value consists of a pair of {scope, rel}. The scope is a tuple of variables which participate in
the constraint and rel is a relation which includes a list of values which the variables can take to satisfy
the constraints of the problem.
• Discrete Domain: It is an infinite domain which can have one state for multiple variables. For
example, a start state can be allocated infinite times for each variable.
• Finite Domain: It is a finite domain which can have continuous states describing one domain
for one specific variable. It is also called a continuous domain.
Some special types of solution algorithms are used to solve the following types of constraints:
• Linear Constraints: These type of constraints are commonly used in linear programming
where each variable containing an integer value exists in linear form only.
• Non-linear Constraints: These type of constraints are used in non-linear programming where
each variable (an integer value) exists in a non-linear form.
Constraint Propagation
In local state-spaces, the choice is only one, i.e., to search for a solution. But in CSP, we have two
choices either:
Constraint propagation is a special type of inference which helps in reducing the legal number of
values for the variables. The idea behind constraint propagation is local consistency.
In local consistency, variables are treated as nodes, and each binary constraint is treated as an arc in
the given problem. There are following local consistencies which are discussed below:
• Node Consistency: A single variable is said to be node consistent if all the values in the
variable’s domain satisfy the unary constraints on the variables.
• Arc Consistency: A variable is arc consistent if every value in its domain satisfies the binary
constraints of the variables.
• Path Consistency: When the evaluation of a set of two variable with respect to a third variable
can be extended over another variable, satisfying all the binary constraints. It is similar to arc
consistency.
• k-consistency: This type of consistency is used to define the notion of stronger forms of
propagation. Here, we examine the k-consistency of the variables.
In-Lab Task 1:
Imagine you have a map of Australia that you want to color by state/territory (which we’ll
collectively call “regions”). No two adjacent regions should share a color. Can you color the regions
with only three different colors?
The answer is yes. Try it out on your own (the easiest way is to print out a map of Australia with a
white background). As human beings, we can quickly figure out the solution by inspection and a
little trial and error. It’s a trivial problem and a great first problem for our backtracking constraint
satisfaction solver. The problem is illustrated in figure:
To model the problem as a CSP, we need to define the variables, domains, and constraints. The
variables are the seven regions of Australia (at least the seven that we’ll restrict ourselves to):
Western Australia; Northern Territory; South Australia; Queensland; New South Wales; Victoria;
and Tasmania. In our CSP, they can be modeled with strings. The domain of each variable is the
three different colors that can possibly be assigned (we’ll use red, green, and blue). The constraints
are the tricky part. No two adjacent regions can be colored with the same color, and our constraints
are dependent on which regions border one another. We can use binary constraints (constraints
between two variables). Every two regions that share a border also share a binary constraint
indicating they can’t be assigned the same color.
To implement these binary constraints in code, we need to subclass the Constraint class.
The MapColoringConstraint subclass takes two variables in its constructor (therefore being a binary
constraint): the two regions that share a border. Its overridden satisfied() method check whether the
two regions both have a domain value (color) assigned to them—if either doesn’t, the constraint’s
trivially satisfied until they do (there can’t be a conflict when one doesn’t yet have a color). Then it
checks whether the two regions are assigned the same color (obviously there’s a conflict, meaning
the constraint isn’t satisfied, when they’re the same).
The class is presented here in its entirety. MapColoringConstraint isn’t generic in terms of type hinting,
but it subclasses a parameterized version of the generic class Constraint that indicates both variables
and domains are of type str.
Now that we have a way of implementing the constraints between regions, fleshing out the Australian
map-coloring problem with our CSP solver is a matter of filling in domains and variables, and then
adding constraints.
In-Lab Task 2:
In a four queens problem, a board is a four-by-four grid of squares. A queen is a chess piece that can
move on the chessboard any number of squares along any row, column, or diagonal. A queen is
attacking another piece if, in a single move, it can move to the square the piece is on without jumping
over any other piece. If the other piece is in the line of sight of the queen, then it’s attacked by it). The
four queens problem poses the question of how four queens can be placed on a chessboard without any
queen attacking another queen. The problem is illustrated in figure below. Solve the given problem
modelling it as a CSP.
Post-Lab Task 1:
Use instances of In-Lab Task 2 to try other sizes of chess board larger than 12x12.
In this lab students will learn the basics of linear regression and the concept of Gradient
decent with its implementation in the Python programming language.
Pre-Lab Reading 1:
Given below are the basic assumptions that a linear regression model makes regarding a dataset
on which it is applied:
Linear relationship: Relationship between response and feature variables should be linear. The
linearity assumption can be tested using scatter plots. As shown below, 1st figure represents
linearly related variables whereas variables in the 2nd and 3rd figures are most likely non-linear.
So, 1st figure will give better predictions using linear regression.
Homoscedasticity: Homoscedasticity describes a situation in which the error term (that is, the
“noise” or random disturbance in the relationship between the independent variables and the
dependent variable) is the same across all values of the independent variables. As shown below,
figure 1 has homoscedasticity while figure 2 has heteroscedasticity.
Trend lines: A trend line represents the variation in quantitative data with the passage of time
(like GDP, oil prices, etc.). These trends usually follow a linear relationship. Hence, linear
regression can be applied to predict future values. However, this method suffers from a lack of
scientific validity in cases where other potential changes can affect the data.
Economics: Linear regression is the predominant empirical tool in economics. For example, it is
used to predict consumer spending, fixed investment spending, inventory investment, purchases
of a country’s exports, spending on imports, the demand to hold liquid assets, labor demand, and
labor supply.
Finance: The capital price asset model uses linear regression to analyze and quantify the
systematic risks of an investment.
Now, the task is to find a line that fits best in the above scatter plot so that we can predict the
response for any new feature values. (i.e a value of x not present in a dataset)
Here,
b_0 and b_1 are regression coefficients and represent y-intercept and slope of regression line
respectively.
To create our model, we must “learn” or estimate the values of regression coefficients b_0 and
b_1. And once we’ve estimated these coefficients, we can use the model to predict responses!
Now consider:
and our task is to find the value of b_0 and b_1 for which J(b_0,b_1) is minimum!
Without going into the mathematical details, we present the result here:
Pre-Lab Reading 2:
Multiple linear regression attempts to model the relationship between two or more features and a
response by fitting a linear equation to the observed data. Clearly, it is nothing but an extension
of simple linear regression. Consider a dataset with p features(or independent variables) and one
response(or dependent variable). Also, the dataset contains n rows/observations. We define:
X (feature matrix) = a matrix of size n X p where x_{ij} denotes the values of jth feature for ith
observation.
So,
and
y (response vector) = a vector of size n where y_{i} denotes the value of response for ith
observation.
where h(x_i) is predicted response value for ith observation and b_0, b_1, …, b_p are the
regression coefficients.
We can generalize our linear model a little bit more by representing feature matrix X as:
Where,
And
As already explained, the Least Squares method tends to determine b’ for which total residual
error is minimized.
where‘ represents the transpose of the matrix while -1 represents the matrix inverse.
Knowing the least square estimates, b’, the multiple linear regression model can now be
estimated as:
Gradient Decent
Gradient descent is a name for a generic class of computer algorithms which minimize a
function. These algorithms achieve this end by starting with initial parameter values and
iteratively moving towards a set of parameter values that minimize some cost function or metric
that's the descent part. The movement toward best-fit is achieved by taking the derivative of the
variable or variables involved, towards the direction with the lowest (calculus-defined) gradient
that's the gradient part.
Gradient descent is an important concept in computer science, and an illustrative example of why
CS has kind of overtaken statistics in importance when it comes to machine learning: it's a general-
purpose tool that can be used to "brute force" an optimal solution in a wide range of scenarios,
which doesn't have the elegance, closed-form solution, and unfortunate sheer mathematical in
palatability of a statistical solution.
Ordinary linear regression is a good and simple way of demonstrating how gradient descent works.
We start with some error function. We could use any metric we want, but in OLS the obvious one
is the residual sum of squares.
Given a sequence of points, yi, and a sequence of points predicted by our model, ŷi, RSS is:
Our objective is to minimize this value. Inserting our linear regression model in for
the ŷi predictions, and assuming (for the sake of simplicty) that we're doing regression on only one
variable, we get:
Where b is the intercept and m is the slope of the line of best fit.
Now we need to take the gradient. Since this is an equation of two variables (b and m) the gradient
will consist of two partial derivatives. Hence the gradient is:
To solve, take a step in the negative gradient direction every iteration. Eventually we will have
something that converges.
In-Lab Task 1:
Implement a simple python code of above technique from Pre-Lab Reading Task 1 using small
dataset.
Solution:
In-Lab Task 2:
Implement a multiple linear regression techniques on the Boston house pricing dataset using
Scikit-learn library (Pre-Lab Reading 2).
Solution:
Output:
In the above example, we determine the accuracy score using Explained Variance Score.
We define:
where y’ is the estimated target output, y the corresponding (correct) target output, and Var is
Variance, the square of the standard deviation.
After building your linear regression model, you decide to investigate the impact of individual
features on the model's predictions. Choose one significant feature from Boston house pricing
dataset and answer the following questions:
• Choose a significant feature (e.g., "RM" for average number of rooms per dwelling).
• Investigate the impact of its coefficient on predictions.
• Evaluate model performance after removing this feature.
• Provide practical insights into the real-world implications of this feature's relationship
with house prices.
This question encourages you to delve into the specifics of a feature's role in the linear regression
model, assess its significance, and consider the practical implications of the relationship between
the chosen feature and the target variable. You may also use different data plotting methods for
visualizing the results i.e., scatter plot, regression line visualization, residual plot, feature
removal impact.
In-Lab Task 3:
Let's implement the Gradient decent and test (note that for the implementation we'll actually use
MSE, mean squared error. MSE is just RSS divided by the number of points, n. We do that because
it leads to "nicer" input numbers, as RSS is a really big number).
Solution:
import numpy as np
class GradientDescentLinearRegression:
def __init__(self, learning_rate=0.01, iterations=1000):
self.learning_rate, self.iterations = learning_rate, iterations
for _ in range(self.iterations):
b_gradient = -2 * np.sum(y - m*X + b) / n
m_gradient = -2 * np.sum(X*(y - (m*X + b))) / n
b = b + (self.learning_rate * b_gradient)
m = m - (self.learning_rate * m_gradient)
self.m, self.b = m, b
That's all we need! OK, let's see how this performs on some example data. We'll generate a cloud
of points that's normally distributed around the line y=x, and see what our algorithm do.
np.random.seed(42)
X = np.array(sorted(list(range(5))*20)) + np.random.normal(size=100,
scale=0.5)
y = np.array(sorted(list(range(5))*20)) + np.random.normal(size=100,
scale=0.25)
clf = GradientDescentLinearRegression()
clf.fit(X, y)
plt.scatter(X, y, color='black')
plt.plot(X, clf.predict(X))
plt.gca().set_title("Gradient Descent Linear Regressor")
Our model solution is very close to the ideal solution of m=1 and b=0. Now show the values
obtained by our implementation as:
The biggest advantage gradient descent has is that it requires no knowledge whatsoever of the
fundamentals of the model. We can apply the classifier we've built without knowing anything
about linear regression. In particular, we don't need to know that linear regression has a closed-
form solution, or what that solution looks like, or how to derive it. Instead we just pick a metric,
compute its derivative, and then use a computer to brute-force a solution.
Objective:
Logistic Regression is a Machine Learning classification algorithm that is used to predict the
probability of a categorical dependent variable. In logistic regression, the dependent variable is a
binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.). In other
words, the logistic regression model predicts P(Y=1) as a function of X.
In-lab Task 1:
Dataset
The dataset comes from the UCI Machine Learning repository, and it
is related to direct marketing campaigns (phone calls) of a Portuguese
banking institution. The classification goal is to predict whether the
client will subscribe (1/0) to a term deposit (variable y). The dataset
can be downloaded from here.
Code:
import pandas as pd
import numpy as np
from sklearn import preprocessing
import matplotlib.pyplot as plt
plt.rc("font", size=14)
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import seaborn as sns
sns.set(style="white")
sns.set(style="whitegrid", color_codes=True)
The dataset provides the bank customers’ information. It includes 41,188 records and 21 fields.
Input variables
1. age (numeric)
9. month: last contact month of year (categorical: “jan”, “feb”, “mar”, …, “nov”,
“dec”)
10. day_of_week: last contact day of the week (categorical: “mon”, “tue”, “wed”, “thu”,
“fri”)
11. duration: last contact duration, in seconds (numeric). Important note: this attribute
highly affects the output target (e.g., if duration=0 then y=’no’). The duration is not
known before a call is performed, also, after the end of the call, y is obviously
known. Thus, this input should only be included for benchmark purposes and should
be discarded if the intention is to have a realistic predictive model
12. campaign: number of contacts performed during this campaign and for this client
(numeric, includes last contact)
13. pdays: number of days that passed by after the client was last contacted from a
previous campaign (numeric; 999 means client was not previously contacted)
14. previous: number of contacts performed before this campaign and for this client
(numeric)
y — has the client subscribed a term deposit? (binary: “1”, means “Yes”, “0” means “No”)
The education column of the dataset has many categories and we need to reduce the categories
for a better modelling. The education column has the following categories:
Let us group “basic.4y”, “basic.9y” and “basic.6y” together and call them “basic”.
count_no_sub = len(data[data['y']==0])
count_sub = len(data[data['y']==1])
pct_of_no_sub = count_no_sub/(count_no_sub+count_sub)
print("percentage of no subscription is", pct_of_no_sub*100)
pct_of_sub = count_sub/(count_no_sub+count_sub)
print("percentage of subscription", pct_of_sub*100)
Our classes are imbalanced, and the ratio of no-subscription to subscription instances is 89:11.
Observations:
• The average age of customers who bought the term deposit is higher than that of the
customers who didn’t.
• The pdays (days since the customer was last contacted) is understandably lower for
the customers who bought it. The lower the pdays, the better the memory of the last
call and hence the better chances of a sale.
We can calculate categorical means for other categorical variables such as education and marital
%matplotlib inline
pd.crosstab(data.job,data.y).plot(kind='bar')
plt.title('Purchase Frequency for Job Title')
plt.xlabel('Job')
plt.ylabel('Frequency of Purchase')
plt.savefig('purchase_fre_job')
The frequency of purchase of the deposit depends a great deal on the job title. Thus, the job title
can be a good predictor of the outcome variable.
table=pd.crosstab(data.marital,data.y)
table.div(table.sum(1).astype(float), axis=0).plot(kind='bar', stacked=True)
plt.title('Stacked Bar Chart of Marital Status vs Purchase')
plt.xlabel('Marital Status')
plt.ylabel('Proportion of Customers')
plt.savefig('mariral_vs_pur_stack')
The marital status does not seem a strong predictor for the outcome variable.
table=pd.crosstab(data.education,data.y)
table.div(table.sum(1).astype(float), axis=0).plot(kind='bar', stacked=True)
plt.title('Stacked Bar Chart of Education vs Purchase')
plt.xlabel('Education')
plt.ylabel('Proportion of Customers')
plt.savefig('edu_vs_pur_stack')
pd.crosstab(data.day_of_week,data.y).plot(kind='bar')
plt.title('Purchase Frequency for Day of Week')
plt.xlabel('Day of Week')
plt.ylabel('Frequency of Purchase')
plt.savefig('pur_dayofweek_bar')
pd.crosstab(data.month,data.y).plot(kind='bar')
plt.title('Purchase Frequency for Month')
plt.xlabel('Month')
plt.ylabel('Frequency of Purchase')
plt.savefig('pur_fre_month_bar')
data.age.hist()
plt.title('Histogram of Age')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.savefig('hist_age')
Most of the customers of the bank in this dataset are in the age range of 30–40.
pd.crosstab(data.poutcome,data.y).plot(kind='bar')
plt.title('Purchase Frequency for Poutcome')
plt.xlabel('Poutcome')
plt.ylabel('Frequency of Purchase')
plt.savefig('pur_fre_pout_bar')
data=data1cat_vars=['job','marital','education','default','housing','loan','contact','month','day_o
f_week','poutcome']
data_vars=data.columns.values.tolist()
to_keep=[i for i in data_vars if i not in cat_vars]
With our training data created, I’ll up-sample the no-subscription using the SMOTE
Now we have a perfect balanced data! You may have noticed that I over-sampled only on the
training data, because by oversampling only on the training data, none of the information in the
test data is being used to create synthetic observations, therefore, no information will bleed from
Recursive Feature Elimination (RFE) is based on the idea to repeatedly construct a model and
choose either the best or worst performing feature, setting the feature aside and then repeating
the process with the rest of the features. This process is applied until all features in the dataset
are exhausted. The goal of RFE is to select features by recursively considering smaller and
smaller sets of features.
data_final_vars=data_final.columns.values.tolist()
y=['y']
X=[i for i in data_final_vars if i not in y]from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegressionlogreg = LogisticRegression()rfe =
RFE(logreg, 20)
rfe = rfe.fit(os_data_X, os_data_y.values.ravel())
print(rfe.support_)
print(rfe.ranking_)
import statsmodels.api as sm
logit_model=sm.Logit(y,X)
result=logit_model.fit()
print(result.summary2())
The p-values for most of the variables are smaller than 0.05, except four variables, therefore, we
Confusion Matrix
from sklearn.metrics import confusion_matrix
confusion_matrix = confusion_matrix(y_test, y_pred)
print(confusion_matrix)
[[6124 1542]
[2505 5170]]
The result is telling us that we have 6124+5170 correct predictions and 2505+1542 incorrect
predictions.
The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number
of false positives. The precision is intuitively the ability of the classifier to not label a sample as
positive if it is negative.
The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of
false negatives. The recall is intuitively the ability of the classifier to find all the positive
samples.
The F-beta score can be interpreted as a weighted harmonic mean of the precision and recall,
where an F-beta score reaches its best value at 1 and worst score at 0.
The F-beta score weights the recall more than the precision by a factor of beta. beta = 1.0 means
Interpretation: Of the entire test set, 74% of the promoted term deposit were the term deposit
that the customers liked. Of the entire test set, 74% of the customer’s preferred term deposits that
were promoted.
CRITICAL ANALYSIS/CONCLUSION
Lab Assessment
Pre-Lab /1
In-Lab /5
Data
Analysis
/4 /10
Data
Post-Lab /4 /4
Presentation
Writing
/4
Style
Objective:
• Understand where Decision Trees fit into the larger picture of this class and other models
• Understand what Decision Trees are and why we would care to use them
• How decision trees work
• Feel comfortable running sklearn's implementation of a decision tree
• Understand the concepts of bagging and random forests
Decision trees are a type of supervised machine learning algorithm used for classification and
regression tasks. They make decisions by asking a series of questions and branching based on the
answers. Example: Think of a decision tree as a flowchart. For instance, in a weather prediction
scenario, the first node might ask, "Is the humidity high?" with branches leading to "Yes" or
"No." For further details see the scikit-learn documentation https://scikit-
learn.org/stable/modules/tree.html.
Say we have input data 𝑋=(𝑋1,𝑋2,...,𝑋𝑛) and corresponding class labels 𝑌=(𝑌1,𝑌2,...,𝑌𝑛)
where 𝑛 represents the number of observations/instances (i.e., unique samples). Much of statistical
𝑋 and 𝑌. In particular, we
learning concerns trying to model this relationship between our data's
assert that the 𝑌's were produced/generated by some underlying function 𝑓(𝑋), and that there is
inevitably some noise and systematic, implicit bias and error 𝜖 that cannot be captured by any
𝑓(𝑋). Thus, we have:
𝑌=𝑓(𝑋)+𝜖
Statistical learning concerns either prediction or inference:
Prediction: concerns trying to learn a function 𝑓̂(𝑋) that is as close as possible to the true function
𝑓(𝑋). This allows us to estimate 𝑌 values for any new input data 𝑋.
Inference: concerns trying to understand/model the relationship between 𝑋 and 𝑌, effectively
learning how the data was generated.
Independent of this, if you have access to gold truth labels 𝑌, and you make use of them for your
modelling, then you are working on a supervised learning task. If you do not have or make use of
𝑌 values, and you are only concerned with the input data 𝑋, you are working on an unsupervised
learning task.
In Lab Task 1:
Load the required libraries
# imports
%matplotlib inline
import numpy as np
import scipy as sp
import matplotlib.cm as cm
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)