Data Exploration With SQL & Python

11/30/24, 4:45 PM Data Exploration With SQL & Python
Data Exploration With SQL &

Python
Agenda:
1. Previous Session Practice Questions
2. Data Exploration with SQL & Python
3. Connect SQL with Python
3.1. SQLAlchemy
3.2. SQLite
3.3. PandaSQL
1. Previous Session Practice Questions

Practice: Get the average days between two consecutive transactions for each
customer. (Solve this question using CTE , write subquery as temporary table)
CREATE TABLE transactions ( transaction_id SERIAL PRIMARY KEY,

customer_id INT, amount DECIMAL(10, 2), transaction_date DATE ); INSERT
INTO transactions (customer_id, amount, transaction_date) VALUES (101,
50.00, '2024-04-01'), (102, 75.00, '2024-04-01'), (103, 100.00, '2024-04-
02'), (103, 100.00, '2024-04-05'), (104, 25.00, '2024-04-02'), (101,
150.00, '2024-04-03'), (102, 120.00, '2024-04-03'), (103, 90.00, '2024-
04-04'), (104, 110.00, '2024-04-04'), (104, 110.00, '2024-04-05'), (104,
110.00, '2024-04-06'), (104, 110.00, '2024-04-07'), (104, 120.00, '2024-
04-08'), (104, 120.00, '2024-04-09');
https://almabetter.notion.site/Data-Exploration-With-SQL-Python-11b1a19a56178038adbddb95a1a9d988 1/4
select customer_id,round(avg(day_diff)) as avg_day_diff from (select *,

lag(transaction_date,1) over(partition by customer_id order by
transaction_date asc) as prev_trans_date, (transaction_date -
lag(transaction_date,1) over(partition by customer_id order by
transaction_date asc)) as day_diff from transactions) group by
customer_id
with transaction_with_lag_day as ( select *, lag(transaction_date,1)

over(partition by customer_id order by transaction_date asc) as
prev_trans_date from transactions ), transaction_with_lag_day_diff as (
select * , (transaction_date - prev_trans_date) as day_diff from
transaction_with_lag_day ) select customer_id, round(avg(day_diff)) as
avg_day_diff from transaction_with_lag_day_diff group by customer_id
Heading 3
2. Data Exploration with SQL & Python

Data exploration using SQL and Python offers a powerful combination for delving
into the intricacies of your datasets.
Leveraging SQL, you can efficiently query large volumes of structured data, perform
aggregations, and extract relevant subsets.
Python, on the other hand, provides a versatile environment for further analysis,
visualization, and statistical modelling. By integrating these two tools, you can
seamlessly transition from data retrieval and preprocessing with SQL to in-depth
analysis and visualization with Python libraries such as Pandas, Matplotlib, and
Seaborn.
This synergy enables you to uncover patterns, identify trends, and derive actionable
insights from your data, empowering data-driven decision-making across various
domains and industries.
-- End point creation : -- Format & Syntax:

mysql+pymysql://username:password@host:port/database -- host: learning-
activity-rr.cejogcrmn6il.ap-south-1.rds.amazonaws.com -- port: 3306 --
Database: assignment -- username: almafolk -- password: 8l39zk60 --
Access: Read-only -- Final End Point : --
mysql+pymysql://almafolk:[email protected]
south-1.rds.amazonaws.com:3306/assignment
3. Connect SQL with Python

SQLAlchemy is a Python SQL toolkit and Object-Relational Mapping (ORM) library
that provides a flexible and efficient way to work with SQL databases. It allows you
to interact with databases using Python code, offering an abstraction layer over the
database engine to facilitate database operations such as querying, inserting,
updating, and deleting records.
SQLite is a lightweight, serverless, self-contained, and open-source SQL database
engine that is widely used for local storage and small-scale applications. It doesn't
require a separate server process and can be easily integrated into Python
applications using libraries like SQLite3 or SQLAlchemy.
PandaSQL is a library that enables SQL queries directly on pandas DataFrames. It
provides a SQL-like interface for filtering, aggregating, and manipulating data within
DataFrames, allowing users to leverage their SQL skills while working with pandas
DataFrames in Python. This library can be particularly useful for analysts and data
scientists familiar with SQL who want to apply similar operations to pandas
DataFrames without switching to SQL syntax.
Google Colab :: Python + SQL
https://colab.research.google.com/drive/15JvXgBfQlTIHL4My4aaNG5Kiyy9x1dWd?
usp=sharing

Data Exploration With SQL & Python

Uploaded by

Copyright:

Available Formats

Data Exploration With SQL & Python

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Exploration With SQL & Python

Uploaded by

Copyright:

Available Formats

11/30/24, 4:45 PM Data Exploration With SQL & Python

Data Exploration With SQL &

1. Previous Session Practice Questions

CREATE TABLE transactions ( transaction_id SERIAL PRIMARY KEY,

select customer_id,round(avg(day_diff)) as avg_day_diff from (select *,

with transaction_with_lag_day as ( select *, lag(transaction_date,1)

2. Data Exploration with SQL & Python

-- End point creation : -- Format & Syntax:

3. Connect SQL with Python

You might also like