Session22 To 24 PYTHON COLAB

Session 22 - 24
Google CoLAB & PYTHON

Google Colab
Why Google Colab ?
 Google Colab comes with collaboration backed in the product.

 It also runs on Google servers
 Don’t need to install anything
 Notebooks are saved to your Google Drive account
 Google Colab recently added support for Tensor Processing Unit
( TPU ) apart from its existing GPU and CPU instances. So, it’s
a big deal for all deep learning people
Hardware accelerator
Creating Folder on Google Drive
Creating New Colab
Notebook
Create a new notebook via Right click > More > Colaboratory
Notebook Settings
Edit > Notebook settings or Runtime>Change runtime type and select GPU as Hardware accelerator
Open python notebook in different ways
Bar Graph
• # BAR GRAPH
• import matplotlib.pyplot as plt
• x = ['A', 'B', 'C', 'D', 'E']

• y = [22, 9, 40, 27, 55]
• plt.bar(x, y)
• plt.show()
Prime Numbers from n1 to n2
n1=int(input("Enter limit for Prime numbers From :"))
n2=int(input("Enter limit for Prime numbers upto :"))
print("Prime Numbers between",n1, "and", n2, "are :")
for n in range(n1,n2,+1):
x=n
s=1
for i in range(1,x,+1):
if(x%i)==0:
s=s+1
if s<=2:
print(n)
Cricket Data Analysis # Question 1 - Highest Run and Players Name
# Reading an excel file using Python for i in range(1,sheet.nrows,+1):
import xlrd k2=int(sheet.cell_value(i,3))
# Give the location of the file a.append(k2)
loc = ("Cricket_data.xls") largest2=max(a)
# To open Workbook for j in range (0,len(a),+1):

if (largest2==a[j]):
wb = xlrd.open_workbook(loc)
t2=j
sheet = wb.sheet_by_index(0)
print("Highest run scorer is ", m1[t2], "and score
m1=[] is ", largest2 )
m2=[] #Question 2-Highest Wicket And Players Name
a=[] for i in range(1,sheet.nrows,+1):
b=[] k3=int(sheet.cell_value(i,4))
c=[] b.append(k3)
largest2=max(b)
#data reading from excel to temprory memory
for j in range (0,len(b),+1):
for i in range(1,sheet.nrows,+1):
if (largest2==b[j]):
k0=(sheet.cell_value(i,0))
t2=j
m1.append(k0) print("Maximum Wicket Taker is ", m1[t2], "and
for i in range(1,sheet.ncols,+1): wicket taken is", largest2 )
k1=(sheet.cell_value(0,i))
m2.append(k1)
ABA Review
PYTHON
Session - 23
For Self Reading

https://www.w3schools.com/python/
List of Chapters
•Chapter 1: Basics
•Chapter 2: Conditionals
•Chapter 3: Functions
•Chapter 4: Iteration
•Chapter 5: Strings
•Chapter 6: Collection Data Types
•Chapter 7: Advanced Functions
•Chapter 8: Exception Handling
•Chapter 9: Python Modules
•Chapter 10: Files
•Chapter 11: Documentation
Why Python?
 High-level language, can do a lot with relatively little
code
 Fairly popular among high-level languages
 Robust support for object-oriented programming
 Support for integration with other languages

Chapter 1: Basics
 Running python programs
 Variables
 Printing
 Operators
 Input
 Comments
 Scope
Hello, world!

Let's get started! Here's an example of a python program
run as a script:
>>>print “Hello, world”
>>> print 'hello:', x, x**2, x**3

Easy examples
 Example basics.py
print 1+3
pi = 3.1415926
print pi
message = "Hello, world"
print message
Output:
4
3.1415926
Hello, world
Variable types - Examples
 Example types.py:
pi = 3.1415926
message = "Hello, world"
i = 2+2
print type(pi)
print type(message)
print type(i)
Output:
<type 'float'>
<type 'str'>
<type 'int'>
Operators
 + addition
 - subtraction
 / division
 ** exponentiation
 % modulus (remainder after division)
 Comparison operators
Operators
 Example operators.py
print 2*2
print 2**3
print 10%3
print 1.0/2.0
print 1/2
Output:
4
8
1
0.5
0
 Note the difference between floating point division and
integer division in the last two lines
Type conversion

int(), float(), str(), and
bool() convert to
integer, floating point,
string, and boolean
(True or False) types,
respectively

Example typeconv.py: 
Output:
print 1.0/2.0 0.5
print 1/2 0
print float(1)/float(2) 0.5
print int(3.1415926) 3
print str(3.1415926) 3.1415926
print bool(1) True
print bool(0) False
Chapter 2: Conditionals

True and False booleans

Comparison and Logical Operators

if, elif, and else statements
Comparison operators

== : is equal to?

!= : not equal to

> : greater than

< : less than

>= : greater than or equal to

<= : less than or equal to

is : do two references refer to the same object?
(See Chapter 6)
Logical operators

and, or, not
>>> 2+2==5 or 1+1==2

True
>>> 2+2==5 and 1+1==2
False
>>> not(2+2==5) and 1+1==2
True

Note: We do NOT use &&, ||, !, as in C!
If statements

Example ifelse.py
if (1+1==2):
print "1+1==2"
print "I always thought so!"
else:
print "My understanding of math must be faulty!"

Simple one-line if:
if (1+1==2): print “I can add!”
elif statement

Equivalent of “else if” in C

Example elif.py:
x=3
if (x == 1):
print "one"
elif (x == 2):
print "two"
else:
print "many"
Chapter 3: Functions

Defining functions

Return values

Local variables

Built-in functions

Functions of functions

Passing lists, dictionaries, and keywords to functions
Functions

Define them in the file above the point they're used

Body of the function should be indented consistently
(4 spaces is typical in Python)

Example: square.py
def square(n):
return n*n
print ("The square of 3 is “)

print (square(3))
Output:
The square of 3 is 9
Function variables are local

Variables declared in a function do not exist outside that
function

Example square2.py
def square(n):
m = n*n
return m
print "The square of 3 is ",

print square(3)
print m
Output:
File "./square2.py", line 9, in <module>
print m
NameError: name 'm' is not defined
Scope

Variables assigned within a function are local to that
function call

Variables assigned at the top of a module are global to
that module; there's only “global” within a module
 Within a function, Python will try to match a variable
name to one assigned locally within the function; if that
fails, it will try within enclosing function-defining (def)
statements (if appropriate); if that fails, it will try to
resolve the name in the global scope (but the variable
must be declared global for the function to be able to
change it). If none of these match, Python will look
through the list of built-in names
Multiple return values
 Can return multiple values by packaging them into a
tuple
def onetwothree(x):
return x*1, x*2, x*3
print onetwothree(3)
3, 6, 9
Built-in Functions

Several useful built-in functions. Example math.py
print pow(2,3)
print abs(-14)
print max(1,-5,3,0)
Output:
8
14
3
Chapter 4: Iteration

while loops

for loops

range function

Flow control within loops: break, continue, pass, and
the “loop else”
while

Example while.py
i=1
while i < 4:
print i
i += 1
Output:
1
2
3
for

Example for.py
for i in range(3):
print i,
output:
0, 1, 2

range(n) returns a list of integers from 0 to n-1.
range(0,10,2) returns a list 0, 2, 4, 6, 8
Flow control within loops

General structure of a loop:
while <statement> (or for <item> in <object>):
<statements within loop>
if <test1>: break # exit loop now
if <test2>: continue # go to top of loop now
if <test3>: pass # does nothing!
else:
<other statements> # if exited loop without
# hitting a break
Parallel traversals
 If we want to go through 2 lists (more later) in
parallel, can use zip:
A = [1, 2, 3]
B = [4, 5, 6]
for (a,b) in zip(A,B):
print a, “*”, b, “=”, a*b
output:
1*4=4
2 * 5 = 10
3 * 6 = 18
Chapter 5: Strings

String basics

Escape sequences

Slices

Block quotes

Formatting

String methods
String basics

Strings can be delimited by single or double quotes

Python uses Unicode, so strings are not limited to ASCII
characters

An empty string is denoted by having nothing between string
delimiters (e.g., '')

Can access elements of strings with [], with indexing starting
from zero:
>>> “snakes”[3]
'k'

Note: can't go other way --- can't set “snakes”[3] = 'p' to
change a string; strings are immutable

a[-1] gets the last element of string a (negative indices work
through the string backwards from the end)

Strings like a = r'c:\home\temp.dat' (starting with an r
character before delimiters) are “raw” strings (interpret
literally)
More string basics

Type conversion:
>>> int(“42”)
42
>>> str(20.4)
'20.4'

Compare strings with the is-equal operator, == (like
in C and C++):
>>> a = “hello”
>>> b = “hello”
>>> a == b
True

>>>location = “Chattanooga “ + “Tennessee”
>>>location
Chattanooga Tennessee
String methods

Strings are classes with many built-in methods.
Those methods that create new strings need to be
assigned (since strings are immutable, they cannot
be changed in-place).

S.capitalize()

S.center(width)

S.count(substring [, start-idx [, end-idx]])

S.find(substring [, start [, end]]))

S.isalpha(), S.isdigit(), S.islower(), S.isspace(),
S.isupper()

S.join(sequence)

And many more!
replace method

Doesn't really replace (strings are immutable) but
makes a new string with the replacement performed:
>>> a = “abcdefg”
>>> b = a.replace('c', 'C')
>>> b
abCdefg
>>> a
abcdefg
Regular Expressions
• Regular expressions are a way to do pattern-
matching. The basic concept (and most of the
syntax of the actual regular expression) is the same
in Java or Perl
Regular Expression Syntax
• Common regular expression syntax:
. Matches any char but newline (by default)
^ Matches the start of a string
$ Matches the end of a string
* Any number of what comes before this
+ One or more of what comes before this
| Or
\w Any alphanumeric character
\d Any digit
\s Any whitespace character
(Note: \W matches NON-alphanumeric, \D NON digits, etc)
[aeiou] matches any of a, e, i, o, u
junk Matches the string 'junk'
Match Object Funtions
• Search() and match() return a MatchObject. This
object has some useful functions:
group(): return the matched string
start(): starting position of the match
end(): ending position of the match
span(): tuple containing the (start,end) positions of
the match
Chapter 6: Collection Data Types

Tuples

Lists

Dictionaries
Tuples

Tuples are a collection of data items. They may be of
different types. Tuples are immutable like strings.
Lists are like tuples but are mutable.
>>>“Tony”, “Pat”, “Stewart”
('Tony', 'Pat', 'Stewart')
Python uses () to denote tuples; we could also use (),
but if we have only one item, we need to use a
comma to indicate it's a tuple: (“Tony”,).

An empty tuple is denoted by ()

Need to enclose tuple in () if we want to pass it all
together as one argument to a function
Lists

Like tuples, but mutable, and designated by square
brackets instead of parentheses:
>>> [1, 3, 5, 7, 11]
[1, 3, 5, 7, 11]
>>> [0, 1, 'boom']
[0, 1, 'boom']

An empty list is []

Append an item:
>>> x = [1, 2, 3]
>>> x.append(“done”)
>>> print x
[1, 2, 3, 'done']
Lists and Tuples Contain Object
References

Lists and tuples contain object references. Since lists
and tuples are also objects, they can be nested
>>> a=[0,1,2]
>>> b=[a,3,4]
>>> print b
[[0, 1, 2], 3, 4]
>>> print b[0][1]
1
>>> print b[1][0]
... TypeError: 'int' object is unsubscriptable
Dictionaries

Unordered collections where items are accessed by a
key, not by the position in the list

Like a hash in Perl

Collection of arbitrary objects; use object references
like lists

Nestable

Can grow and shrink in place like lists

Concatenation, slicing, and other operations that
depend on the order of elements do not work on
dictionaries
The “is” operator

Python “variables” are really object references. The
“is” operator checks to see if these references refer
to the same object (note: could have two identical
objects which are not the same object...)

References to integer constants should be identical.
References to strings may or may not show up as
referring to the same object. Two identical, mutable
objects are not necessarily the same object
“in” operator

For collection data types, the “in” operator
determines whether something is a member of the
collection (and “not in” tests if not a member):
>>> team = (“David”, “Robert”, “Paul”)
>>> “Howell” in team
False
>>> “Stewart” not in team
True
Chapter 7: Advanced Functions
 Passing lists and keyword dictionaries to functions
 Lambda functions
 apply()
 map()
 filter()
 reduce()
 List comprehensions
Chapter 8: Exception Handling

Basics of exception handling
Chapter 9: Python Modules

Basics of modules

Import and from … import statements

Changing data in modules

Reloading modules

Module packages

__name__ and __main__

Import as statement
Module basics
 Each file in Python is considered a module. Everything within
the file is encapsulated within a namespace (which is the
name of the file)
 To access code in another module (file), import that file, and
then access the functions or data of that module by prefixing
with the name of the module, followed by a period
 To import a module:
import sys
(note: no file suffix)
 Can import user-defined modules or some “standard” modules
like sys and random
 Any python program needs one “top level” file which imports
any other needed modules
Python standard library
 There are over 200+ modules in the Standard Library
 Consult the Python Library Reference Manual,
included in the Python installation and/or available
at http://www.python.org
What import does

An import statement does three things:
- Finds the file for the given module
- Compiles it to bytecode
- Runs the module's code to build any objects (top-level
code, e.g., variable initialization)

The module name is only a simple name; Python uses a
module search path to find it. It will search: (a) the
directory of the top-level file, (b) directories in the
environmental variable PYTHONPATH, (c) standard
directories, and (d) directories listed in any .pth files (one
directory per line in a plain text file); the path can be listed
by printing sys.path
The sys module

Printing the command-line arguments,
print-argv.pl
import sys
cmd_options = sys.argv
i=0
for cmd in cmd_options:
print "Argument ", i, "=", cmd
i += 1
output:
localhost(Chapter8)% ./print-argv.pl test1 test2
Argument 0 = ./print-argv.pl
Argument 1 = test1
Argument 2 = test2
The random module

import random
guess = random.randint(1,100)
print guess
dinner = random.choice([“meatloaf”, “pizza”,
“chicken pot pie”])
print dinner
Chapter 10: Files

Basic file operations
Opening a file

open(filename, mode)
where filename is a Python string, and mode is a
Python string, 'r' for reading, 'w' for writing, or 'a' for
append

Basic operations:
outfile = open('output.dat', 'w')

infile = open('input.dat', r')

Basic operations
output = open('output.dat', 'w')
input = open('input.dat', 'r')
A = input.read() # read whole file into string
A = input.read(N) # read N bytes
A = input.readline() # read next line
A = input.readlines() # read file into list of
# strings
output.write(A) # Write string A into file
output.writelines(A) # Write list of strings
output.close() # Close a file
Redirecting stdout
 Print statements normally go to stdout (“standard
output,” i.e., the screen)
 stdout can be redirected to a file:
import sys
sys.stdout = open('output.txt', 'w')
print message # will show up in output.txt
 Alternatively, to print just some stuff to a file:

print >> logfile, message # if logfile open
Chapter 11: Documentation

Comments

dir

Documentation strings
PYTHON EXAMPLES
• 1. Bike Sharing
• 2. Twitter Sentiment Analysis
https://www.onlinegdb.com/online_python_compiler
Exercise 1 - Bike Sharing (1) - Startup
Example
• Bike sharing systems are a new generation of traditional bike rentals where the
whole process from membership, rental and return back has become automatic.
Through these systems, the user is able to easily rent a bike from a particular
position and return back at another position. Currently, there are about over 500
bike-sharing programs around the world which are composed of over 500
thousands bicycles. Today, there exists great interest in these systems due to their
important role in traffic, environmental and health issues.
• Apart from interesting real-world applications of bike sharing systems, the
characteristics of data being generated by these systems make them attractive for
the research. Opposed to other transport services such as bus or subway, the
duration of travel, departure, and arrival position is explicitly recorded in these
systems. This feature turns the bike sharing system into a virtual sensor network
that can be used for sensing mobility in the city. Hence, it is expected that most of
the important events in the city could be detected via monitoring these data.
Example 1 – Bike Sharing (2)
• Overview & Python Coding
• https://medium.com/@wilamelima/analysing-bike-sharing-trends-with-
python-a9f574c596b9
• Data Set Description

• https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
• PYTHON online interpreter (another & easy one)

• https://repl.it/languages/python3
Exercise 2 – Twitter Sentiment Analysis
For Beginners :
https://towardsdatascience.com/a-beginners-guide-to-sentiment-analysis-in-python
95e354ea84f6
For Advanced Users:

https://www.datacamp.com/community/tutorials/simplifying-sentiment-analysis-pyt
hon
COLAB & PYTHON
EXAMPLES
# BAR GRAPH
import matplotlib.pyplot as plt
x = ['A', 'B', 'C', 'D', 'E']

y = [22, 9, 40, 27, 55]
plt.bar(x, y)
plt.show()
Printing prime numbers from N1 to N2
n1=int(input("Enter limit for Prime numbers From :"))
n2=int(input("Enter limit for Prime numbers upto :"))
print("Prime Numbers between",n1, "and", n2, "are :")
for n in range(n1,n2,+1):
x=n
s=1
for i in range(1,x,+1):
if(x%i)==0:
s=s+1
if s<=2:
print(n)
Cricket Data Analysis # Question 1 - Highest Run and Players Name
# Reading an excel file using Python for i in range(1,sheet.nrows,+1):
import xlrd k2=int(sheet.cell_value(i,3))
# Give the location of the file a.append(k2)
loc = ("Cricket_data.xls") largest2=max(a)
# To open Workbook for j in range (0,len(a),+1):
wb = xlrd.open_workbook(loc) if (largest2==a[j]):
sheet = wb.sheet_by_index(0) t2=j
m1=[] print("Highest run scorer is ", m1[t2], "and

score is ", largest2 )
m2=[]
#Question 2-Highest Wicket And Players Name
a=[]
b=[]
k3=int(sheet.cell_value(i,4))
c=[]
b.append(k3)
#data reading from excel to temprory memory
largest2=max(b)
for j in range (0,len(b),+1):
k0=(sheet.cell_value(i,0))
if (largest2==b[j]):
m1.append(k0)
t2=j
for i in range(1,sheet.ncols,+1):
print("Maximum Wicket Taker is ", m1[t2],
k1=(sheet.cell_value(0,i)) "and wicket taken is", largest2 )
m2.append(k1)
Applied Business Analytics
Review
Example 1 – Soft Drink Preferences
• A group of 25 people was surveyed to find their soft drink
preferences (1 – Pepsi, 2 – Bovonto, 3 – Coke, 4 – Limca)
• R Coding – Input from Key Board
• # Sample data - 3 4 1 1 3 4 3 3 1 3 2 1 2 1 2 3 2 3 1 1 1 1 4 3 1
soft<-scan()
barplot(table(soft),xlab="Soft Drink",ylab="Frequency of
Preferences", main="Soft Drink Preferences",col="white")
barplot(table(soft)/length(soft),xlab="Soft Drink",ylab="Freq. of
Preferences",main=" Soft Drink Preferences ",col="gray70")
09/05/2024 ABA 78
Example 2 – Students enrolled in a University 2015-2019
• R Coding
nos <-
c(2810,890,540,3542,1363,471,4301,1663,652,5362,2071,895,6593,2752,1113)
year.stud <- matrix(nos,byrow=T,ncol=3) # Vector to Matrix
rownames(year.stud) <- c("2015","2016","2017","2018","2019")
colnames(year.stud) <- c("Commerce","Science","Humanity")
barplot(t(year.stud),main="Students Enrolment in AU",beside=F)
09/05/2024 ABA 79
Example 3– Correlation of Asian Paints
closing price with other variables
• DATA SET : Asian paints.csv
cp<-read.csv(file.choose()) • Which variables are having highest positive
summary(cp) correlation? Why?
cor(cp$close, cp$OPEN)
• Any negative correlation between the variables?
cor(cp$close, cp$HIGH)
Why?
• Which variables are having lowest positive
cor(cp$close, cp$LOW)
correlation? Why?
cor(cp$close, cp$vwap)
par(mfrow=c(2,2))
plot(cp$close, cp$OPEN)
plot(cp$close, cp$HIGH)
plot(cp$close, cp$LOW)
plot(cp$close, cp$vwap)
plot(cp)
09/05/2024 ABA 80
Example 4 : Correlation of POC between ITES (INFY &
TCS)
• CSV files : INFY_2122_FY_POC.csv & TCS_2122_FY_POC
INFY=read.csv(file.choose())
TCS=read.csv(file.choose())
cor(INFY$closePOC, TCS$closePOC)
par(mfrow=c(2,2))
boxplot(INFY$closePOC, main="INFOSYS")
boxplot(TCS$closePOC, main="TCS")
hist(INFY$closePOC)
hist(TCS$closePOC)
• What is the correlation value of TCSPOC & INFYPOC?

• Who is performing better (based on one year data)?
• What is the outlier is communicating ?
• Does boxplot and histogram are same?
09/05/2024 ABA 81
Example 5 – Wilcox Test
# Data in two numeric vectors

women_weight <- c(38.9, 61.2, 73.3, 21.8, 63.4, 64.6, 48.4, 48.8, 48.5)
men_weight <- c(67.8, 60, 63.4, 76, 89.4, 73.3, 67.3, 61.3, 62.4)
# Create a data frame
my_data <- data.frame(group = rep(c("Woman", "Man"), each = 9), weight =
c(women_weight, men_weight))
#Question : Is there any significant difference between women and men weights?
# Compute two-samples Wilcoxon test
res <- wilcox.test(weight ~ group, data = my_data, exact = FALSE)
print(res)
# INFERENCE The p-value of the test is 0.02712, which is less than the significance level (0.05).
# We can conclude that men’s median weight is significantly different from women’s median weight
# if you want to test whether the median men’s weight is less than the median women’s weight,
wilcox.test(weight ~ group, data = my_data, exact = FALSE, alternative = "less")
#Or, if you want to test whether the median men’s weight is greater than the median women’s weight,
wilcox.test(weight ~ group, data = my_data, exact = FALSE, alternative =
"greater")
boxplot(men_weight,women_weight, xlab = "Gender", ylab="Weight", names=c("Men","Women"))
09/05/2024 ABA 82
Example 6 - Linear Modeling
• df1<-read.csv(file.choose())
• #Visualization
• boxplot(df1[3:9])
• boxplot(df1[11:13])
• boxplot(df1$POC1)
• #Simple Linear Regression, DV: Closing Price, IV: Opening Price
• reg1=lm(df1$close~df1$OPEN)
• summary(reg1)
• reg2=lm(df1$close~df1$LOW)
• summary(reg2)
• reg3=lm(df1$close~df1$HIGH)
• summary(reg3)
• reg4=lm(df1$close~df1$ltp)
• summary(reg4)
• reg5=lm(df1$close~df1$vwap)
• summary(reg5)
09/05/2024 ABA 83
Example 7 Regression using R
CSV - tcsnifty50POC_1year
data=read.csv(file.choose())
reg1=lm(data1$tcsPOC~data1$NIFTYPOC)
summary(reg1)
09/05/2024 ABA 84
Example 8 - Sales and Advertisement
• We’ll use the marketing data set [datarium package], which contains the impact of the
amount of money spent on three advertising medias (youtube, facebook and
newspaper) on sales.
• sales = b0 + b1*youtube + b2*facebook + b3*newspaper
• install.packages("datarium")
• library(datarium)
• data("marketing", package = "datarium")

• head(marketing, 4)
09/05/2024 ABA 85
Some concepts in R for Module II – (1)
Calculating Probability for Continuous distributions
Continuous Distributions Rname Parameter
Beta beta shape 1, 2
Cauchy cauchy location,scale
Chi-Squared chisq df
Expoential exp rate or scale
F f df1,df2
Gamma gamma rate
Log-Normal enorm meanlog, sdlog
Logistic logis location, scale
Normal norm mean, sd
Student’s t test t df
Uniform unif min, max
Weibull Weibull Shape, Scale
Wilcox Wilcox m,n
09/05/2024 ABA 86
Some concepts in R for Module II - (2)
Calculating Probability for Discrete distributions
Discrete Distributions Rname Parameter
Binomial binorm n,p
Geometric geom p
Hypergeometric hype m,n,k
Negative Binomial nbinorm size,prob/µn
Poisson pois lamda=mean
09/05/2024 ABA 87
General Statistics (1)
• Null Hypothesis (Ho)
• Nothing Happened, The mean was unchanged, The treatment has no effect,
The model did not improve
• Alternate Hypothesis (Ha)
• Something Happened, The mean rose, The treatment improved the patient’s
health, The model fit better
• Assume Ho is TRUE
• T-statistics
• P-Value
• Small (P<α ) – Strong evidence against Ho, i.e. Reject Ho
• not small (P >= α ) – Retain H0 (failing to reject Ho )
• Example : P< 0.05 –Reject Ho
• P < 0.05, (100-95)/100 = 5/100=0.05 = 95%
• High Risk Applications
P < 0.01, (100-99)/100 = 1/100=0.01 = 99%
P < 0.001, (100-99.9)/100 = 0.1/100=0.001 = 99.9%
09/05/2024 ABA 88
Testing mean of the sample – (t-test – small sample, n<30)
• You have sample from a population, given this sample you
want to know if the mean of the population could reasonably
be “m”
• t.test is making inferences about a population mean from the
sample.
• t test - ask if the population means could be 95

• x<-rnorm(50,mean=100,sd=15)
• t.test(x,mu=95) #p value is X.XXXXXX < 0.05
• plot(x)
• P<α (small) and so it is unlikely (based on the sample data)
that 95 could be the mean of the population i.e. Reject Ho
• Run the above 3 lines multiple times & Check the answer
09/05/2024 ABA 89
Testing for Normality
• You want a statistical test to determine whether your data
sample is from a normally distributed population.
• shapiro.test(x)
• plot(x)
• Shapiro-Wilk test:
• Null hypothesis: the data are normally distributed
• Alternative hypothesis: the data are not normally distributed
• P<α Reject NULL, indicates that the population is likely not
normally distributed.
• P> α large p-value suggests the underlying population could be
normally distributed.
09/05/2024 ABA 90
Confidence Interval for a Median - WILCOX
 The procedure for calculating the Confidence Interval for mean is well-defined
and widely known. The same is not true for the median. Wilcox on signed rank
test is pretty standard procedures for this.
Comparing the locations of two samples nonparametrically
• You want to know: Is one population shifted to the left/right compared with the
other?
• wilcox.test : tells us whether the central locations of the two populations are
significantly different or equivalently whether their relative frequency are
different
• Example : We randomly select a group of employees and ask each one to

complete the same task under 2 different circumstances (favourable and
unfavourable conditions). We measure their completion times and check they
are significantly different or not.
09/05/2024 ABA 91
Example 9 – Wilcox Test
# Data in two numeric vectors
women_weight <- c(38.9, 61.2, 73.3, 21.8, 63.4, 64.6, 48.4, 48.8, 48.5)
men_weight <- c(67.8, 60, 63.4, 76, 89.4, 73.3, 67.3, 61.3, 62.4)
# Create a data frame
my_data <- data.frame(group = rep(c("Woman", "Man"), each = 9), weight =
c(women_weight, men_weight))
#Question : Is there any significant difference between women and men weights?
# Compute two-samples Wilcoxon test
res <- wilcox.test(weight ~ group, data = my_data, exact = FALSE)
print(res)
# INFERENCE The p-value of the test is 0.02712, which is less than the significance level (0.05).
# We can conclude that men’s median weight is significantly different from women’s median weight
# if you want to test whether the median men’s weight is less than the median women’s weight,
wilcox.test(weight ~ group, data = my_data, exact = FALSE, alternative = "less")
#Or, if you want to test whether the median men’s weight is greater than the median women’s weight,
wilcox.test(weight ~ group, data = my_data, exact = FALSE, alternative =
"greater")
boxplot(men_weight,women_weight, xlab = "Gender", ylab="Weight", names=c("Men","Women"))
09/05/2024 ABA 92
Regression -Base Model 1 & 2
SAMPLE SKETCHES
16.00 b<0
b=0
14.00 0<b<1
b=1
12.00 b>1
10.00
SAMPLE SKETCHES
8.00 45.00
6.00 40.00
35.00
4.00
30.00
b<0
2.00
b=0
25.00 0<b<1
- b=1
- 0.50 1.00 1.50 2.00 2.50 3.00
20.00 b>1
15.00
10.00
5.00
-
- 0.50 1.00 1.50 2.00 2.50 3.00
09/05/2024 ABA 93
Statistical Inferences
• Statistical Inferences
• Intercept
• Beta
• P-value
*** - Statistical significant at 1 percent level
** - Statistical significant at 5 percent level
* - Statistical significant at 10 percent level
• R2 Value
Higher the R2 better the model is fit (0.70 and above)
acceptable if R2 is (0.50 to 0.70). (Application Specific)
• Residual Error
09/05/2024 ABA 94
Try it Exercise – Assignment 1
• Download Your Company & Sector
• Regression using POC (Minimum 3 SLR required)
• Plot all POCs
• Assignment content
• 1. Data set (% of change, cleaned)
• 2. LM Summary output
• 3. Linear Regression Equation(s)
• 4. Plot & Boxplot output
• 5. Inferences
09/05/2024 ABA 95
Example 10 - MLR- HR
CSV File Name

Supervisor_Performance_data.csv
R Coding
Employee_Supervisor_Performance.
r
09/05/2024 ABA 96
Example 11 – All Tyre brands are equal
• Imagine that you are interested in understanding whether knowing the brand of car
tyre can help you predict whether you will get more or less mileage before you
need to replace them.
• We’ll draw what is hopefully a random sample of 60 tyres from four different
manufacturers and use the mean mileage by brand to help inform our thinking.
• While we expect variation across our sample we’re interested in whether the
differences between the tyre brands (the groups) is significantly different than what
we would expect in random variation within the groups.
H0: μ1 = μ2 = μ3 = μ4
• Our research or testable hypothesis is
μApollo=μBridgestone=μCEAT=μFalken
Our null hypothesis is basically “Tyre brand doesn’t matter in predicting the
mileage – i.e. all tyre brands are same”.
Alternate hypothesis is at least one of the tire brand populations is different
than the other three.
09/05/2024 Data Analytics using R 97

ANOVA – A Real Time R Example
(1)
• Step 1 : Read the file - tyre.csv # file.choose()
• tyre<-read.csv(file.choose())
• summary(tyre)
• Step 2 : Dependent Variable is Mileage & Independent

Variable is Brands
- Is Mileage depends on Brand ?
• Step 3 : Graph - Mileage distribution of all brand of tyres
• boxplot(tyre$Mileage, horizontal = TRUE, main="Mileage
distribution across all brands", col = "blue")
ANOVA – A Real Time R Example (2)
• Step 4 :#Groupwise Boxplotting
• boxplot(tyre$Mileage~tyre$Brands, main="Boxplot comparing Mileage of
Four Brands of Tyre", col= rainbow(4), horizontal = TRUE)
• Step 5 : ONEWAY ANOVA Coding
• tyres.aov<- aov(Mileage~Brands, tyre)
• summary(tyres.aov)
• Step 6 : Inference
http://rpubs.com/ibecav/308410

ANOVA – A Real Time R Example (3)
Pairwise – t test
• Apollo VS Bridgestone
• Apollo VS CEAT
• Apollo VS Falken
• Bridgestone VS CEAT
• Bridgestone VS Falken and
• CEAT VS Falken.
• pairwise.t.test(tyre$Mileage,tyre$Brands,p.adjust.method = "none")
• What is your understanding on the results ?
• TukeyHSD(tyres.aov, conf.level = 0.95)
• NOTE : We can see the 6 pairings, The diff column is the difference between the
means of the two brands listed. So the mean for Bridgestone is 3,019 miles less than
Apollo. And Falken is 5845 miles more than Bridgestone etc.,

Example 12 – Detergent Sales, Discount and Location
• Location- KK Nagar & Anna Nagar

• Discount - No Discount, 10 % discount & 20 % discount
• Sales - in Kilograms
Assumption
NULL : Sales is not depending on Discount and location.

Example 13 – POC is depending on Months
• Date
• Close
• No. of Trade
• ClosePOC
• NOTPOC
• Month
• Assumption – POC is not depending on Months

Example 14 – Learning & Employability
• CGPA
• GENDER
• COURSE
• CTC
• Assumption – CTC depending on CGPA, GENDER and Course

Example 15 – Case 1
data1=read.csv(file.choose())
#ONEWAY ANOVA Coding
a3<- aov(data1$HOEP~data1$Time)
summary(a3)
boxplot(data1$HOEP~data1$Time,xlab = “Hour",
ylab="HOEP", main="HOEP based on Time")

Example 16 – Alcohol consumption behaviour
E:\TSM_2022_2023\TSM_TEACHING\ABA\Session_PPTs\goggles.csv
Gender Alcohol Participant Was

1 1 Male who consumed no alcohol
1 2 Male who consumed 2 pints
1 3 Male who consumed 4 pints
2 1 Female who consumed no alcohol
2 2 Female who consumed 2 pints
2 3 Female who consumed 4 pints

Equal opportunity in Employment
• Preemployment Test
• 1. Measure abilities that are directly related to the job under
consideration
• 2. Must not discriminate on the basis of race or national origin
• y – Job performance, x – Preemployment Test
• Model 1 (Pooled) :
• Model 2 (Minority):
• Model 2 (White) :

Employment on pretest

Inferences
• Model1 – Race distinction is ignored
• – is minimum required level of performance
• > for eligibility
• > for minority eligibility
• > for white eligibility
• If is – relaxation of pretest requirements for white
• If is – Tightening of pretest requirements for minorities

Adjusted R-Square
• When we add a new variable in the regression model it brings some
information to explain the dependent variable. i.e. if the adjusted R-
Square decreases after the addition of new variable, it means the
variable is not at all useful for the model.

Why MANOVA?
• There may be circumstances in which we are interested in several
dependent variables, and in these cases the simple ANOVA model is
inadequate.
• Why don’t we conduct multiple ANOVA for each dependent variable?

• The reason is that: more tests we conduct on the same data, more we
inflate the family-wise error rate (greater chance of making a Type I
error).
• MANOVA, by including all dependent variables in the same analysis, can

capture the relationship between outcome variables. Related to this point,
ANOVA can tell us only whether groups differ along a single dimension,
whereas MANOVA has the power to detect whether groups differ along
a combination of dimensions.

Theory of MANOVA (1)
• In ANOVA, variances (systematic and unsystematic) are single
values. In MANOVA, these variances are contained in a matrix.
• Hypothesis SSCP (Sums of Squares and Cross Products): the matrix

that represents the systematic variance and is called hypothesis sum
of squares and cross-products matrix, denoted by H.
• Error SSCP: matrix that represents the unsystematic variance and is

called error sum of squares and cross-products matrix, denoted
by E.
• Total SSCP: total amount of variance, denoted by T.

Theory of MANOVA (2)
• Pillai-Bartlett Trace (also known as Pillai’s trace)
• Pillai’s trace is used as a test statistic in MANOVA. It is a positive
valued statistic ranging from 0 to 1. Increasing values means that
effects are contributing more to the model; you should reject the null
hypothesis for large values.
• Pillai’s trace is considered to be the most powerful and robust statistic

for general use, especially for departures from assumptions. Other
commonly used tests include: Hotelling’s T2, Roy’s largest root and
Wilks’s lambda.
Example 17 : OCD - Obsessive Compulsive Disorder
• The effects of cognitive behavior therapy (CBT) on obsessive

compulsive disorder (OCD).
• Two dependent variables (DV1 and DV2) are considered: the
occurrence of obsession-related behaviors (Actions) and the
occurrence of obsession-related cognitions (Thoughts).
• OCD sufferers are grouped into three conditions: with Cognitive
Behavior Therapy (CBT), with behavior therapy (BT), and with no-
treatment (NT).

MANOVA using R
• ocdData <- read.csv(file.choose())
• # rename level label
• ocdData$Group<-factor(ocdData$Group, levels = c("CBT", "BT", "No
Treatment Control"), labels = c("CBT", "BT", "NT"))
• str(ocdData)
• outcome <- cbind(ocdData$Actions, ocdData$Thoughts)
• boxplot(outcome~ocdData$Group, main="Boxplot of comparing OCD",
col= rainbow(4), horizontal = TRUE, data=ocdData)
• ocdmodel1<-aov(outcome~Group,data=ocdData)
• summary(ocdmodel1)
• ocdModel2 <- manova(outcome ~ Group, data=ocdData)
• summary(ocdModel2, intercept=TRUE)
• # summary(ocdModel2, intercept=TRUE, test="Wilks")
• # summary(ocdModel2, intercept=TRUE, test="Hotelling")
• # summary(ocdModel2, intercept=TRUE, test="Roy")
AOV & MANOVA Output

Steps in MANOVA
1.Univariate ANOVA for DV1 (Actions)
2.Univariate ANOVA for DV2 (Thoughts)
3.The relationship between DVs: cross-products
4.The total SSCP matrix (T)
5.The residual (error) SSCP matrix (E)
6.The model (hypothesis) SSCP matrix (H)

MANOVA Result analysis
• For the Group variable, Pillai’s trace has p value 0.049, which
indicates a significant difference.
• ANOVA - The p values indicate that there was no significant
difference between therapy groups in terms of Thoughts (p=.136) and
Actions (p=.08).
• MANOVA - we know that therapy had a significant impact on OCD
based on MANOVA.
• The reason for the anomaly is simple: the MANOVA takes account of
the correlation between dependent variables, and so for these data it
has more power to detect group differences.
Try it exercise - MANOVA
• TCS four financial year equity data
• Date
• TCSCLOSEPOC
• TCSLTPPOC
• Month
• Assumption – (TCSCLOSEPOC+TCSLTPPOC) is depending on Months

Assignment 2 - 10 marks
• Your own company (mid term question) using POC
• Assignment content
• 1. Equity Data set 2 years (cleaned)
• 2. COR, LM, GLM, ANOVA, MANOVA (you have the freedom to select
variables)
• 3. Output 1 - Plot, Boxplot, Histogram
• 4. Output 2 – Statistical summary
• 5. Inferences
• Last Date – 09/4/2023, 17.00
9 May 2024 V. Senthil 120

• Linear vs Logistic Regression – Read the Y-Axis
Example 18 - R Coding for Logistic
Regression
#Logit Regression – Demo #data set – admission_data - Admission (Binary type) is depending on GRE, GPA & Rank of the college
df <- read.csv(file.choose())
summary(df)
#Logit Regression GLM
logit1 <- glm(admitted ~ GRE+GPA+RANK,data=df,family="binomial")
summary(logit1)
#Logit Regression GLM with different Ranked college
df$RANK<- as.factor(df$RANK)
logit2 <- glm(admitted ~ GRE+GPA+RANK,data=df,family="binomial")
summary(logit2)
#Predicion
#Let’s say a student have a profile with 790 in GRE,3.8 GPA and
#he studied from a rank-1 college.
#Now you want to predict the chances of that boy getting admit in future.
x <- data.frame(GRE=790,GPA=2.8,RANK=as.factor(1))
p<- predict(logit2,x) # Linear Moding when type is not specified
print(p)
p<- predict(logit2,x,type="response") # Logistic Regression
print(p)
• logreg1=glm(admit~GRE+GPA+RANK, family=“binomial”)
Prediction
• Let’s say a student have a profile with 790 in GRE,3.8 GPA and he
studied from a rank-1 college. Now you want to predict the chances of
that boy getting admit in future.
x <- data.frame(GRE=790,GPA=3.8,RANK=as.factor(1))
p<- predict(logit2,x,type="response")
print(p)
• We see that there is 70% chance that this student will get the admit.
SEM Basics(5)
• SEM = Reliability + Regression
• Structural equation models combine measurement models (e.g.,
reliability) with structural models (e.g., regression). The sem package,
developed by John Fox, allows for some basic structural equation
models. To use it, add the sem package by using the package
manager.
• Structural Equation Modeling may be thought of as regression

corrected for attenuation.
09/05/2024 Applied Business Analytics using R 125

What test ? When to apply ?
09/05/2024 Applied Business Analytics using R 126

WHAT WE DID SO FAR ?
Evaluation parameter Descriptions Marks
Mini Project/Assignments & Mini Project Report / Assignment & Quiz 10

Quiz
Case Presentation HBR Case Presentation and Interactions 10
Coursera Courses One Coursera course with score card 10
Mid Term Test Mid-term examination weightage will be 50% of total 30

internal marks.
End term Marks A written exam will be conducted at the end of the 40
trimester. This will carry 40% of total marks
Total marks 100
09/05/2024 ABA 127

All the BEST & Thanks

Session22 To 24 PYTHON COLAB

Uploaded by

Copyright:

Available Formats

Session22 To 24 PYTHON COLAB

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Session22 To 24 PYTHON COLAB

Uploaded by

Copyright:

Available Formats

Session 22 - 24

Google CoLAB & PYTHON

 Google Colab comes with collaboration backed in the product.

• x = ['A', 'B', 'C', 'D', 'E']

# To open Workbook for j in range (0,len(a),+1):

For Self Reading

 Robust support for object-oriented programming

 Support for integration with other languages

>>>print “Hello, world”

>>> print 'hello:', x, x**2, x**3

>>> 2+2==5 or 1+1==2

print ("The square of 3 is “)

print "The square of 3 is ",

outfile = open('output.dat', 'w')

 Alternatively, to print just some stuff to a file:

• Data Set Description

• PYTHON online interpreter (another & easy one)

For Advanced Users:

import matplotlib.pyplot as plt

x = ['A', 'B', 'C', 'D', 'E']

# Reading an excel file using Python for i in range(1,sheet.nrows,+1):

import xlrd k2=int(sheet.cell_value(i,3))

# Give the location of the file a.append(k2)

loc = ("Cricket_data.xls") largest2=max(a)

# To open Workbook for j in range (0,len(a),+1):

sheet = wb.sheet_by_index(0) t2=j

m1=[] print("Highest run scorer is ", m1[t2], "and

year.stud <- matrix(nos,byrow=T,ncol=3) # Vector to Matrix

rownames(year.stud) <- c("2015","2016","2017","2018","2019")

colnames(year.stud) <- c("Commerce","Science","Humanity")

barplot(t(year.stud),main="Students Enrolment in AU",beside=F)

• What is the correlation value of TCSPOC & INFYPOC?

# Data in two numeric vectors

• sales = b0 + b1*youtube + b2*facebook + b3*newspaper

• data("marketing", package = "datarium")

• t test - ask if the population means could be 95

• Example : We randomly select a group of employees and ask each one to

CSV File Name

09/05/2024 Data Analytics using R 97

• Step 2 : Dependent Variable is Mileage & Independent

09/05/2024 Data Analytics using R 99

09/05/2024 Data Analytics using R 101

• Location- KK Nagar & Anna Nagar

09/05/2024 Data Analytics using R 102

• Assumption – POC is not depending on Months

09/05/2024 Data Analytics using R 103

• Assumption – CTC depending on CGPA, GENDER and Course

09/05/2024 Data Analytics using R 104

09/05/2024 Data Analytics using R 105

Gender Alcohol Participant Was

09/05/2024 Data Analytics using R 106

09/05/2024 Data Analytics using R 107

09/05/2024 Data Analytics using R 108

09/05/2024 Data Analytics using R 109

09/05/2024 Data Analytics using R 110

• Why don’t we conduct multiple ANOVA for each dependent variable?

• MANOVA, by including all dependent variables in the same analysis, can

09/05/2024 Data Analytics using R 111

• Hypothesis SSCP (Sums of Squares and Cross Products): the matrix

>>> print 'hello:', x, x2, x3

• sales = b0 + b1youtube + b2facebook + b3*newspaper