Data Exploration With SQL & Python

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

11/30/24, 4:45 PM Data Exploration With SQL & Python

Data Exploration With SQL &


Python
Agenda:
1. Previous Session Practice Questions
2. Data Exploration with SQL & Python
3. Connect SQL with Python
3.1. SQLAlchemy
3.2. SQLite
3.3. PandaSQL

1. Previous Session Practice Questions


Practice: Get the average days between two consecutive transactions for each
customer. (Solve this question using CTE , write subquery as temporary table)

CREATE TABLE transactions ( transaction_id SERIAL PRIMARY KEY,


customer_id INT, amount DECIMAL(10, 2), transaction_date DATE ); INSERT
INTO transactions (customer_id, amount, transaction_date) VALUES (101,
50.00, '2024-04-01'), (102, 75.00, '2024-04-01'), (103, 100.00, '2024-04-
02'), (103, 100.00, '2024-04-05'), (104, 25.00, '2024-04-02'), (101,
150.00, '2024-04-03'), (102, 120.00, '2024-04-03'), (103, 90.00, '2024-
04-04'), (104, 110.00, '2024-04-04'), (104, 110.00, '2024-04-05'), (104,
110.00, '2024-04-06'), (104, 110.00, '2024-04-07'), (104, 120.00, '2024-
04-08'), (104, 120.00, '2024-04-09');

https://almabetter.notion.site/Data-Exploration-With-SQL-Python-11b1a19a56178038adbddb95a1a9d988 1/4
11/30/24, 4:45 PM Data Exploration With SQL & Python

select customer_id,round(avg(day_diff)) as avg_day_diff from (select *,


lag(transaction_date,1) over(partition by customer_id order by
transaction_date asc) as prev_trans_date, (transaction_date -
lag(transaction_date,1) over(partition by customer_id order by
transaction_date asc)) as day_diff from transactions) group by
customer_id

with transaction_with_lag_day as ( select *, lag(transaction_date,1)


over(partition by customer_id order by transaction_date asc) as
prev_trans_date from transactions ), transaction_with_lag_day_diff as (
select * , (transaction_date - prev_trans_date) as day_diff from
transaction_with_lag_day ) select customer_id, round(avg(day_diff)) as
avg_day_diff from transaction_with_lag_day_diff group by customer_id

Heading 3

2. Data Exploration with SQL & Python


Data exploration using SQL and Python offers a powerful combination for delving
into the intricacies of your datasets.
Leveraging SQL, you can efficiently query large volumes of structured data, perform
aggregations, and extract relevant subsets.
Python, on the other hand, provides a versatile environment for further analysis,
visualization, and statistical modelling. By integrating these two tools, you can
seamlessly transition from data retrieval and preprocessing with SQL to in-depth
analysis and visualization with Python libraries such as Pandas, Matplotlib, and
Seaborn.

https://almabetter.notion.site/Data-Exploration-With-SQL-Python-11b1a19a56178038adbddb95a1a9d988 2/4
11/30/24, 4:45 PM Data Exploration With SQL & Python

This synergy enables you to uncover patterns, identify trends, and derive actionable
insights from your data, empowering data-driven decision-making across various
domains and industries.

-- End point creation : -- Format & Syntax:


mysql+pymysql://username:password@host:port/database -- host: learning-
activity-rr.cejogcrmn6il.ap-south-1.rds.amazonaws.com -- port: 3306 --
Database: assignment -- username: almafolk -- password: 8l39zk60 --
Access: Read-only -- Final End Point : --
mysql+pymysql://almafolk:[email protected]
south-1.rds.amazonaws.com:3306/assignment

3. Connect SQL with Python


SQLAlchemy is a Python SQL toolkit and Object-Relational Mapping (ORM) library
that provides a flexible and efficient way to work with SQL databases. It allows you
to interact with databases using Python code, offering an abstraction layer over the
database engine to facilitate database operations such as querying, inserting,
updating, and deleting records.
SQLite is a lightweight, serverless, self-contained, and open-source SQL database
engine that is widely used for local storage and small-scale applications. It doesn't
require a separate server process and can be easily integrated into Python
applications using libraries like SQLite3 or SQLAlchemy.
PandaSQL is a library that enables SQL queries directly on pandas DataFrames. It
provides a SQL-like interface for filtering, aggregating, and manipulating data within
DataFrames, allowing users to leverage their SQL skills while working with pandas
DataFrames in Python. This library can be particularly useful for analysts and data
scientists familiar with SQL who want to apply similar operations to pandas
DataFrames without switching to SQL syntax.
Google Colab :: Python + SQL
https://colab.research.google.com/drive/15JvXgBfQlTIHL4My4aaNG5Kiyy9x1dWd?
usp=sharing

https://almabetter.notion.site/Data-Exploration-With-SQL-Python-11b1a19a56178038adbddb95a1a9d988 3/4
11/30/24, 4:45 PM Data Exploration With SQL & Python

https://almabetter.notion.site/Data-Exploration-With-SQL-Python-11b1a19a56178038adbddb95a1a9d988 4/4

You might also like