DAV Exp.1-8 Output

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Experiment No.

Aim: Getting introduced to data analytics libraries in Python and R.

Theory:

Top 8 Python Libraries for Data Visualization:

1. Matplotlib
Matplotlib is a data visualization library and 2-D plotting library of Python It was
initially released in 2003and it is the most popular and widely-used plotting library
in the Python community. It comes with an interactive environment across multiple
platforms. Matplotlib can be used in Python scripts, the Python and IPython shells,
the Jupyter notebook, web application servers, etc. It can be used to embed plots
into applications using various GUI toolkits like Tkinter, GTK+, wxPython, Qt, etc.
So you can use Matplotlib to create plots, bar charts, pie charts, histograms,
scatterplots, error charts, power spectra, stemplots, and whatever other
visualization charts you want! The Pyplot module also provides a MATLAB-like
interface that is just as versatile and useful as MATLAB while being free and open
source.

2. Plotly
Plotly is a free open-source graphing library that can be used to form data
visualizations. Plotly (plotly.py) is built on top of the Plotly JavaScript library
(plotly.js) and can be used to create web-baseddata visualizations that can be
displayed in Jupyter notebooks or web applications using Dash or savedas
individual HTML files. Plotly provides more than 40 unique chart types like
scatter plots, histograms,line charts, bar charts, pie charts, error bars, box plots,
multiple axes, sparklines, dendrograms, 3-D charts, etc. Plotly also provides
contour plots, which are not that common in other data visualization libraries. In
addition to all this, Plotly can be used offline with no internet connection.
3. Seaborn
Seaborn is a Python data visualization library that is based on Matplotlib and
closely integrated with theNumPy and pandas data structures. Seaborn has various
dataset-oriented plotting functions that operate on data frames and arrays that have
whole datasets within them. Then it internally performs the necessary statistical
aggregation and mapping functions to create informative plots that the user desires.
It is a high-level interface for creating beautiful and informative statistical
graphics that are integral to exploring and understanding data. The Seaborn data
graphics can include bar charts, pie charts, histograms, scatterplots, error charts,
etc. Seaborn also has various tools for choosing color palettes that can reveal
patterns in the data.

4. GGplot
Ggplot is a Python data visualization library that is based on the implementation of
ggplot2 which is created for the programming language R. Ggplot can create data
visualizations such as bar charts, pie charts, histograms, scatterplots, error charts,
etc. using high-level API. It also allows you to add differenttypes of data
visualization components or layers in a single visualization. Once ggplot has been
told which variables to map to which aesthetics in the plot, it does the rest of the
work so that the user can focus on interpreting the visualizations and take less time
in creating them. But this also means that it isnot possible to create highly
customized graphics in ggplot. Ggplot is also deeply connected with pandasso it is
best to keep the data in DataFrames.

5. Altair
Altair is a statistical data visualization library in Python. It is based on Vega and
Vega-Lite which are a sort of declarative language for creating, saving, and
sharing data visualization designs that are also interactive. Altair can be used to
create beautiful data visualizations of plots such as bar charts, pie charts,
histograms, scatterplots, error charts, power spectra, stemplots, etc. using a
minimal amount of coding. Altair has dependencies which include python 3.6,
entrypoints, jsonschema, NumPy, Pandas, and Toolz which are automatically
installed with the Altair installation commands. You can open JupyterNotebook or
JupyterLab and execute any of the code to obtain that data visualizations in Altair.
Currently, the source for Altair is available on GitHub.
6. Bokeh
Bokeh is a data visualization library that provides detailed graphics with a high
level of interactivity across various datasets, whether they are large or small.
Bokeh is based on The Grammar of Graphics like ggplot but it is native to Python
while ggplot is based on ggplot2 from R. Data visualization experts can create
various interactive plots for modern web browsers using bokeh which can be used
in interactive web applications, HTML documents, or JSON objects. Bokeh has 3
levels that can be used forcreating visualizations. The first level focuses only on
creating the data plots quickly, the second level controls the basic building blocks
of the plot while the third level provides full autonomy for creating thecharts with
no pre-set defaults. This level is suited to the data analysts and IT professionals
that are wellversed in the technical side of creating data visualizations.

7. Pygal
Pygal is a Python data visualization library that is made for creating sexy charts!
(According to their website!) While Pygal is similar to Plotly or Bokeh in that it
creates data visualization charts that can be embedded into web pages and accessed
using a web browser, a primary difference is that it can output charts in the form of
SVG’s or Scalable Vector Graphics. These SVG’s ensure that you can observe your
charts clearly without losing any of the quality even if you scale them. However,
SVG’s are only useful with smaller datasets as too many data points are difficult to
render and the charts can become sluggish.

8. Geoplotlib
Most of the data visualization libraries don’t provide much support for creating
maps or using geographical data and that is why geoplotlib is such an important
Python library. It supports the creationof geographical maps in particular with
many different types of maps available such as dot-density maps, choropleths,
symbol maps, etc. One thing to keep in mind is that requires NumPy and pyglet as
prerequisites before installation but that is not a big disadvantage. Especially since
you want to create geographical maps and geoplotlib is the only excellent option
for maps out there!
Conclusion:

In conclusion, all these Python Libraries for Data Visualization are great options
for creating beautiful and informative data visualizations. Each of these has its
strong points and advantages so you can selectthe one that is perfect for your data
visualization or project. For example, Matplotlib is extremely popular and well
suited to general 2-D plots while Geoplotlib is uniquely suite to geographical
visualizations.
Experiment : 02

Code:
import matplotlib.pyplot as plt from
scipy import stats

x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]

slope, intercept, r, p, std_err = stats.linregress(x, y) def myfunc(x):


return slope * x + intercept

mymodel = list(map(myfunc, x))

plt.scatter(x, y)
plt.plot(x, mymodel) plt.show()

Output:
Experiment : 03

Code:

import pandas
from sklearn import linear_model

df = pandas.read_csv("data.csv")

X = df[['Weight', 'Volume']] y =

df['CO2']

regr = linear_model.LinearRegression() regr.fit(X, y)

#predict the CO2 emission of a car where the weight is 2300g, and the volume is 1300ccm:
predictedCO2 = regr.predict([[2300, 1300]])

print(predictedCO2)

Output:

[107.2087328]
4/9/24, 10:23 AM Time Series Analysis - Colaboratory

Code:
import pandas
as pd import
numpy as np
import matplotlib.pyplot
as plt import
statsmodels.api as sm

# Load the dataset


# For demonstration purposes, let's generate a synthetic
time series data # You can replace this with your own
dataset
date_range = pd.date_range(start='2022-01-01',
end='2023-12-31') data =
np.random.randn(len(date_range))
ts = pd.Series(data, index=date_range)

# Basic exploration of the time series data


print(ts.head()) # Print first few rows of the time
series data print(ts.describe()) # Summary statistics
of the time series data

# Visualize the time


series data
ts.plot(figsize=(12, 6))
plt.title('Time Series
Data') plt.xlabel('Date')
plt.ylabel('Value')
plt.show()

# Decompose the time series into trend, seasonality,


and residuals decomposition =
sm.tsa.seasonal_decompose(ts, model='additive')
trend = decomposition.trend
seasonal =
decomposition.seasonal
residual =
decomposition.resid

# Visualize the
decomposition
plt.figure(figsize=(12, 8))
plt.subplot(411)
plt.plot(ts, label='Original')
plt.legend(loc='best')
plt.subplot(412)
plt.plot(trend,
label='Trend')
plt.legend(loc='best')
plt.subplot(413)
plt.plot(seasonal,
label='Seasonality')
plt.legend(loc='best')
plt.subplot(414)
1/2
4/9/24, 10:23 AM Time Series Analysis - Colaboratory

plt.plot(residual,
label='Residuals')
plt.legend(loc='best')
plt.tight_layout()

# Perform any further analysis or modeling as required


# For example, you can fit an ARIMA model to the time series data

# ARIMA model
fitting # Example:
# model = sm.tsa.ARIMA(ts,
order=(p, d, q)) # fitted_model =
model.fit(disp=-1)

# Generate forecast using the fitted


ARIMA model # forecast =
fitted_model.forecast(steps=10)

# Visualize the forecast along with the


original data # plt.plot(ts, label='Original
Data')
# plt.plot(forecast, label='Forecast',
color='red') # plt.legend()
# plt.show()

Output:
2022-01-01 -0.663531
2022-01-02 1.133158
2022-01-03 0.083936
2022-01-04 -0.224117
2022-01-05 0.485578
Freq: D, dtype: float64
count 730.000000
mean -0.024024
std 1.001761
min -2.896642
25% -0.673117
50% -0.021265
75% 0.622095
max 3.277661
dtype: float64

2/2
4/9/24, 10:23 AM Time Series Analysis - Colaboratory

3/2
4/9/24, 10:51 AM Implementation of ARIMA Model - Colaboratory

Code:
import pandas
as pd import
numpy as np
import matplotlib.pyplot
as plt import
statsmodels.api as sm

# Load the dataset


# For demonstration purposes, let's generate a synthetic
time series data # You can replace this with your own
dataset
date_range = pd.date_range(start='2022-01-01',
end='2023-12-31') data =
np.random.randn(len(date_range))
ts = pd.Series(data, index=date_range)

# Fit an ARIMA model


# ARIMA parameters: p (AR), d
(difference), q (MA) # Example: ARIMA(1,
1, 1)
p = 1 # AR order
d = 1 # Difference order (1 for first-order
differencing) q = 1 # MA order
model = sm.tsa.ARIMA(ts, order=(p, d, q))

# Fit the model


fitted_model = model.fit()

# Generate forecast for the next 10


time steps forecast_steps = 10
forecast_index = pd.date_range(start=ts.index[-1] + pd.Timedelta(days=1),
periods=forecast_steps) forecast = fitted_model.forecast(steps=forecast_steps)

# Visualize the original data and


forecast plt.figure(figsize=(12, 6))
plt.plot(ts, label='Original Data')
plt.plot(forecast_index, forecast, label='Forecast',
color='red') plt.title('ARIMA Forecast')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

# Print summary of the


fitted model
print(fitted_model.summary())

1/2
4/9/24, 10:51 AM Implementation of ARIMA Model - Colaboratory

Output:

SARIMAX Results
==============================================================================
Dep. Variable: y No. Observations: 730

Model: ARIMA(1, 1, 1) Log Likelihood -1024.323

Date: Tue, 09 Apr 2024 AIC 2054.646

Time: 05:20:47 BIC 2068.421

Sample: 01-01-2022 HQIC 2059.960

- 12-31-2023
Covariance Type: opg

==============================================================================
coef std err z P>|z| [0.025 0.975]

2/2
Code:
# Python code
import numpy as np
import matplotlib.pyplot as plt

# Experiment 1: Line Plot


x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.figure(figsize=(8, 5))
plt.plot(x, y)
plt.title('Experiment 1: Line Plot')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True)
plt.show()

# Experiment 2: Scatter Plot


np.random.seed(0)
x = np.random.rand(50)
y = np.random.rand(50)
colors = np.random.rand(50)
sizes = 1000 * np.random.rand(50)
plt.figure(figsize=(8, 5))
plt.scatter(x, y, c=colors, s=sizes, alpha=0.5)
plt.title('Experiment 2: Scatter Plot')
plt.xlabel('X')
plt.ylabel('Y')
plt.colorbar(label='Color')
plt.grid(True)
plt.show()
# Experiment 3: Histogram
data = np.random.randn(1000)
plt.figure(figsize=(8, 5))
plt.hist(data, bins=30, edgecolor='black')
plt.title('Experiment 3: Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

# Experiment 4: Bar Plot


categories = ['A', 'B', 'C', 'D']
values = [20, 35, 30, 15]

plt.figure(figsize=(8, 5))
plt.bar(categories, values, color='skyblue')
plt.title('Experiment 4: Bar Plot')
plt.xlabel('Category')
plt.ylabel('Value')
plt.grid(axis='y')
plt.show()

Output:
Code:
# Python code using Plotly library
import plotly.graph_objs as go
import plotly.express as px
import numpy as np

# Create sample data


x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

# Line plot
trace1 = go.Scatter(x=x, y=y1, mode='lines', name='sin(x)')
trace2 = go.Scatter(x=x, y=y2, mode='lines', name='cos(x)')

layout = go.Layout(title='Sin(x) and Cos(x) Functions',


xaxis=dict(title='x'),
yaxis=dict(title='y'))

fig = go.Figure(data=[trace1, trace2], layout=layout)

# Scatter plot
df = px.data.iris()
fig2 = px.scatter(df, x='sepal_width', y='sepal_length', color='species', title='Iris Dataset')

# Display plots
fig.show()
fig2.show()
Output:
Code:
# R code
library(ggplot2)

# Create a sample data frame


data <- data.frame(
Category = c('A', 'B', 'C', 'D', 'E'),
Value1 = c(10, 15, 20, 25, 30),
Value2 = c(25, 20, 15, 10, 5)
)

# Display the data frame


print("DataFrame:")
print(data)

# Plotting a bar chart


bar_plot <- ggplot(data, aes(x = Category, y = Value1)) +
geom_bar(stat = 'identity', fill = 'skyblue') +
labs(title = 'Bar Chart', x = 'Category', y = 'Value') +
theme_minimal()

print(bar_plot)

# Plotting a line chart


line_plot <- ggplot(data, aes(x = Category)) +
geom_line(aes(y = Value1), color = 'blue', linetype = 'solid') +
geom_line(aes(y = Value2), color = 'red', linetype = 'dashed') +
labs(title = 'Line Chart', x = 'Category', y = 'Value') +
theme_minimal()
print(line_plot)
# Plotting a scatter plot
scatter_plot <- ggplot(data, aes(x = Value1, y = Value2)) +
geom_point(color = 'green', size = 3) +
labs(title = 'Scatter Plot', x = 'Value1', y = 'Value2') +
theme_minimal()

print(scatter_plot)

Output:

You might also like