Exploratory Data Analytics-1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

TOPIC

EXPLORATOR
Y DATA
ANALYTICS

Fundamentals of
Business Analytics
Team
Members
 GARVITA CHAURASIA

 ASMIT JAISWAL

 LAVANYA TANDON

 ASHAZ AHMAD KHAN

10/17/2024Annual Review 2
CORRELAT
ION

GARVITA CHAURASIA
WHAT IS
CORRELATION
Correlation in business analytics refers to a statistical technique used to
measure and analyze the strength and direction of the relationship between two
variables. It helps businesses understand how changes in one variable may
relate to changes in another, which can be valuable for decision-making and
strategy formulation.

TYPES OF CORRELATION
1. Positive Correlation: When both variables move in the same direction (e.g.,
as sales increase, profits increase).
2. 2. Negative Correlation: When one variable increases while the other
decreases (e.g., as prices increase, customer demand decreases).
3. Zero Correlation: When there is no relationship between the two variables.

10/17/2024Annual Review 4
Uses of Correlation in
Business Analytics:
1.Risk Management: Assists in identifying risks by correlating
variables such as market trends and business performance.
2.Sales Forecasting: Predicts future trends by analyzing the
relationship between sales and various factors like customer
preferences or market conditions.
3.Product Development: Correlation analysis helps in
understanding how customer feedback might relate to product
features or quality.
4.Operations and Efficiency: Identifies relationships between
operational metrics (e.g., correlation between employee training
and productivity).
5. Customer Analytics: Analyzes customer behavior by correlating
10/17/2024Annual Review 5
EXAMPLE
A real-life example of correlation in business analytics can be seen
in retail sales and advertising spend:
A retail company wants to understand the impact of its advertising
spend on sales. The company tracks monthly sales and advertising
expenses for one year.Positive Correlation: After performing a
correlation analysis, the company finds that when it spends more
on advertising, its sales tend to increase. For example, when the
advertising budget is $50,000 in a month, sales increase by 20%,
and when the budget is $100,000, sales increase by 40%. This
indicates a positive correlation between advertising spend and
sales.This insight helps the company decide to increase its
advertising budget during peak shopping seasons (e.g., holidays)
to boost sales further.
10/17/2024Annual Review 6
ADDITIONAL EXAMPLES
1. Customer Satisfaction and Revenue: A company might analyze
the correlation between customer satisfaction scores and
revenue. If higher satisfaction scores are correlated with
increased revenue, the company could focus on improving
customer service to boost profits.

2. Temperature and Ice Cream Sales: An ice cream company may


find that there is a strong positive correlation between the
temperature and the number of ice cream sales. As
temperatures rise in the summer, sales increase.

3. Interest Rates and Housing Prices: A bank may analyze the


relationship between interest rates and housing prices. They
might Review
10/17/2024Annual find a negative correlation where lower interest rates 7
Simple
Linear
Regression:
A Powerful
ToolAsmit
for
Jaiswal &
Prediction
Lavanya Tandon
INTRODUCTION

We’re going to dive into the world of simple linear regression, one
of the most fundamental techniques in statistics and data
analysis. Whether you are predicting future sales, analyzing the
relationship between temperature and energy consumption, or
studying the link between hours of study.
The Concept of Simple Linear Regression:
At its core, simple linear regression tries to find the best-fitting straight line
through the data points that represents the relationship between two variables.
For example, imagine we have a scatterplot of data points showing how the
number of hours studied (independent variable) relates to exam scores
(dependent variable). Simple linear regression will help us find a line that best
predicts exam scores based on hours of study.But how do we quantify “best
fitting”? Well, in mathematical terms, this involves minimizing the sum of the
squared differences between the actual data points and the values predicted by
our line. This process is called the least squares method, and it forms the
backbone of simple linear regression.
10/17/2024Annual Review 9
The Equation of Simple Linear
Regression
Mathematically, the equation for a simple linear regression line is expressed as:
y=β 0​+β 1​x+ϵ
Where:
• y is the dependent variable (the outcome we are predicting).
• x is the independent variable (the predictor).
• β₀ is the y-intercept of the line—this represents the value of y when x is zero.

• β₁ is the slope of the line—this tells us how much y changes for a one-unit
change in x.
• ε is the error term, which accounts for the difference between the predicted
and actual values of y.
Understanding the Slope and InterceptLet’s break it down. The slope (β₁) of the line is
crucial because it tells us the strength and direction of the relationship between x and
y. If the slope is positive, it means that as x increases, y also increases. If the slope is
negative, y decreases as x increases. For instance, if we’re studying the relationship
between hours studied and exam score, a positive slope would indicate that more
hours of study lead to higher scores.The intercept (β₀) is the point where the regression
line crossesReview
10/17/2024Annual the y-axis. In practical terms, it gives us the value of the dependent 10
The Assumptions of Simple Linear
Regression
For linear regression to provide reliable results, several key
assumptions need to be satisfied. Let’s go over these briefly:
1. Linearity: The relationship between the dependent and
independent variables should be linear. If the data doesn’t form a
straight line, linear regression might not be appropriate.
2. Independence of errors: The residuals (errors) should be
independent of each other.
3. Homoscedasticity: The residuals should have constant variance—
this means that the spread of errors is consistent across all levels
of the independent variable.
4. Normality of errors: The residuals should be approximately
normally distributed, particularly for making inferences about the
slope and intercept. Violating these assumptions can lead to
inaccurate predictions or invalid results, so it’s crucial to assess 11
10/17/2024Annual Review
Fitting the Regression
Line
Now that we understand the equation and assumptions,
let’s discuss how the regression line is fitted. The most
common method is the least squares method. As I
mentioned earlier, this method minimizes the sum of the
squared differences between the actual values of y and
the predicted values (those on the regression line). These
squared differences are often called “residuals” or
“errors.”The objective of least squares is to find the values
of β₀ and β₁ that minimize the total residual error. Once we
find these values, we can use the regression line to make
predictions.
10/17/2024Annual Review 12
Practical Applications of
Simple Linear Regression
Simple linear regression has a wide array of practical applications, which is why
it’s so popular in fields like economics, biology, engineering, and social sciences.
1. Sales Forecasting
Example: A retail company can use simple linear regression to predict future sales
based on advertising spending. The company collects data on past sales and the
corresponding advertising budgets, then fits a linear model to understand how
changes in the budget impact sales. This can help businesses optimize their
advertising expenditures for better sales outcomes.
2. Risk Assessment in Finance
Example: In finance, simple linear regression is often used to assess risk and
return in investments. One common application is to model the relationship
between a stock's returns and the overall market returns (market index). This is
known as the Capital Asset Pricing Model (CAPM). By using linear regression,
investors can
10/17/2024Annual estimate a stock's beta, which measures its volatility in relation to
Review 13
the market, helping in portfolio risk management.
3. Predicting House Prices
Example: Real estate professionals use simple linear regression to predict house
prices based on one important feature, such as square footage. By analyzing past
sales data, they can build a model that predicts the price of a house given its size,
which is useful for both buyers and sellers in understanding market trends.
4. Healthcare and Medicine
Example: In healthcare, simple linear regression can be used to predict patient
outcomes based on a single predictor variable. For instance, it can model the
relationship between body mass index (BMI) and blood pressure. This model could
help medical professionals predict blood pressure levels based on a patient’s BMI,
contributing to better health monitoring and personalized treatment plans.
5. Education Analytics
Example: In education, regression analysis can be used to predict student
performance. A university might want to understand how study hours relate to
exam scores. By applying simple linear regression, they can determine how much
time spent studying contributes to a student’s grades, offering insights into
effective study habits
10/17/2024Annual Review 14
Limitations of Simple Linear
Regression
While linear regression is a powerful tool, it does have limitations.
One major limitation is that it assumes a linear relationship
between the independent and dependent variables. In reality,
many relationships are nonlinear, and applying linear regression
to such data can lead to misleading results.Additionally, simple
linear regression only examines the relationship between two
variables. In many real-world scenarios, multiple factors influence
the outcome. In such cases, a more complex model like multiple
linear regression, which incorporates several predictors, would be
necessary.Moreover, outliers can significantly affect the
regression line, potentially distorting results. It’s always a good
idea to visualize the data and check for outliers before applying
regression
10/17/2024Annual analysis.
Review 15
Conclusion
To wrap things up, simple linear regression is a versatile and
powerful statistical method that allows us to model and predict
relationships between two variables. By fitting a straight line to
the data, we can quantify the strength and direction of this
relationship and make informed predictions. But like any tool, it
must be used appropriately. Ensuring that the assumptions of
linearity, homoscedasticity, and normality are met is crucial for
obtaining accurate results. And while simple linear regression is
useful in many cases, it may not always be the best choice,
especially when multiple factors are at play or when the
relationship between variables is nonlinear.In today’s data-driven
world, understanding the basics of regression analysis empowers
us to make better decisions, analyze trends, and predict future
outcomes.
10/17/2024Annual I hope this presentation has given you a solid
Review 16
Descripti
ve
Statistics
Ashaz Ahmad Khan
Introduction to Exploratory Data Analysis (EDA)
Exploratory Data Analysis (EDA) is the process of examining
datasets to summarize their main characteristics, often using visual
techniques. It helps uncover patterns, spot anomalies, test
assumptions, and validate hypotheses. EDA uses descriptive
statistics and data visualization to provide insights for further, more
formal analysis.

What Are Descriptive Statistics?


Descriptive statistics summarize and organize data to provide an
overview of its key characteristics. These statistics include measures
of central tendency (mean, median, mode), variability (range,
variance, standard deviation), and distribution shape (skewness,
kurtosis). Descriptive statistics simplify complex datasets, making
them easier to interpret and understand.
10/17/2024Annual Review 18
Click icon to add picture
Measures of Central Tendency
Mean (Arithmetic Average):
• The mean is calculated by summing all the data points and dividing by the number of points.
It is the most common measure used to determine the central value.
• Advantages: It incorporates all data points, providing a comprehensive measure.
• Limitations: It is sensitive to outliers, which can skew the result.
Median:
• The median is the middle value of a dataset when the data is arranged in order. If the dataset
has an even number of points, the median is the average of the two middle values.
• Advantages: It is not affected by outliers, making it more robust for skewed data.
• Limitations: It doesn’t consider all values in the dataset, only focusing on the middle point.
Mode:
• The mode is the value that appears most frequently in a dataset. It can be used for both
numerical and categorical data.
• Advantages: Useful in identifying the most common value, especially for categorical data.
• Limitations: A dataset can have multiple modes or none at all, which can make interpretation
more complex.
10/17/2024Annual Review 19
Measures of Variability
• Range: Difference between maximum and minimum values.
• Variance: Measures data spread by averaging squared
differences from the mean.
• Standard Deviation: The square root of variance, showing
data’s spread around the mean.
• Interquartile Range (IQR): The range between the first and
third quartile.

Measures of Shape
• Skewness: Indicates asymmetry in the data distribution.
• Kurtosis: Describes how heavy or light the tails are in the
data distribution
10/17/2024Annual Review 20
Measures of Relationships
• Correlation: Measures the strength and direction of the
relationship between two variables. It ranges from -1, indicating
perfect negative, to +1, indicating +1, with 0 indicating no
relationship.
• Covariance: Indicates the directional relationship between
two variables. Positive covariance indicates direct relationship
while negative indicates inverse relationship between the
variables.
Role in EDA
In EDA, descriptive statistics provide a quick, insightful summary of
data, helping to identify trends, patterns, and outliers. They simplify
large datasets by calculating central tendencies, variability, and
relationships.
10/17/2024Annual Review
This allows analysts to validate assumptions, detect 21
anomalies, and guide further analysis or data-driven decision-
Predictive
Analytics
Click icon to add picture

Armaan Chopra

10/17/2024Annual Review 22
Introduction
• Predictive analytics is a branch of advanced analytics that uses historical data,
statistical algorithms, and machine learning techniques to predict future outcomes. It
analyzes past data patterns to forecast future trends and behaviors, helping
organizations make informed decisions. Here's a breakdown of the key aspects.
• Key Components:Data: Historical data is the foundation of predictive analytics. It can
be structured (e.g., databases) or unstructured (e.g., text, images).
• Algorithms: These include statistical methods like regression analysis, machine
learning models, and time series analysis, among others.
• Modeling: Predictive models are built using data and algorithms to forecast future
outcomes. Common models include decision trees, neural networks, and regression
models.
• Validation: Models are tested on new or hold-out data to verify their accuracy before
being deployed.

10/17/2024Annual Review 23
Techniques & Application
Techniques:
Regression Analysis: Identifies relationships between variables and predicts continuous
outcomes.
Classification: Predicts discrete outcomes (e.g., “yes” or “no” decisions) using methods
like decision trees or logistic regression.
Clustering: Groups data points into clusters based on similarities, often used for market
segmentation.
Time Series Analysis: Analyzes sequential data (e.g., stock prices) to forecast future
values.
Applications:
Business: Demand forecasting, customer retention, pricing optimization.
Healthcare: Predicting disease outbreaks, personalized treatment plans.
Finance: Credit scoring, fraud detection, risk management.
10/17/2024Annual Review 24
Benefits & Challenges
Benefits:
Better decision-making: By forecasting future trends, companies can make proactive
decisions.
Cost savings: Predicting potential risks or inefficiencies helps reduce unnecessary
expenses.
• Customer insights: It enhances understanding of customer behavior, allowing for
better-targeted marketing efforts.
• Challenges:
• Data quality: The accuracy of predictions depends on clean, reliable data.
• Model complexity: Complex models may require significant computing resources and
expertise.
• Overfitting: Models can sometimes perform well on historical data but poorly on new
data if overfitted.
• Predictive analytics plays a crucial role in various industries by helping anticipate
10/17/2024Annual Review 25
trends, optimize operations, and create more personalized experiences.
Conclusion
Exploratory Data Analysis
(EDA) is a critical step in the
data analysis process that
allows analysts to gain
valuable insights, detect
patterns, and identify potential
issues in a dataset before
applying more advanced
statistical models or machine
learning techniques. EDA
involves the use of descriptive
statistics, data visualization,
and simple analytical tools to
summarize key characteristics
of the data, such as central
tendencies, spread, and
relationships between
variables. By understanding
the underlying structure of the
data, EDA helps
10/17/2024Annual Review inform 26
Thank you!
By,
Lavanya Tandon
Ashaz Ahmad Khan
Garvita Chaurasia Click icon to add picture

Asmit Jaiswal
Armaan Chopra

BBA-B 3rd Sem.

10/17/2024Annual Review 27

You might also like