Targets QL

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Business Case: Target SQL

Contents
Business Case: Target SQL ......................................................................................................................................... 1
1. Basic EDA:....................................................................................................................................................... 2
1.1. Structure, characteristics, and Data type of columns in dataset ........................................................ 2
1.2. Time period of dataset: ......................................................................................................................... 5
1.3. Cities and States of customers ............................................................................................................. 5
2. In-depth Exploration...................................................................................................................................... 6
2.1. Is there a growing trend on e-commerce in Brazil? How can we describe a complete scenario?
Can we see some seasonality with peaks at specific months? ....................................................................... 6
2.2. What time do Brazilian customers tend to buy (Dawn, Morning, Afternoon or Night)? .................. 8
3. Evolution of E-commerce orders in the Brazil region: ................................................................................. 9
3.1. Get month on month orders by states................................................................................................. 9
3.2. Distribution of customers across the states in Brazil .......................................................................... 9
4. Impact on Economy: Analyze the money movement by e-commerce by looking at order prices, freight,
and others. ........................................................................................................................................................... 10
4.1. Get % increase in cost of orders from 2017 to 2018 (include months between Jan to Aug only) -
You can use “payment_value” column in payments table ............................................................................ 10
4.2. Mean & Sum of price and freight value by customer state .............................................................. 11
5. Analysis on sales, freight, and delivery time .............................................................................................. 12
5.1. Calculate days between purchasing, delivering and estimated delivery ......................................... 12
5.2. Find time_to_delivery & diff_estimated_delivery. Using (time_to_delivery =
order_purchase_timestamp-order_delivered_customer_date and diff_estimated_delivery =
order_estimated_delivery_date-order_delivered_customer_date) ............................................................ 13
5.3. Group data by state, take mean of freight_value, time_to_delivery, diff_estimated_delivery ..... 13
5.4. Top 5 states with highest/lowest average freight value - sort in desc/asc limit 5 ........................... 14
5.5. Top 5 states with highest/lowest average time to delivery .............................................................. 14
5.6. Top 5 states where delivery is fast/ not so fast compared to estimated date ................................ 15
6. Payment type analysis: ................................................................................................................................ 16
6.1. Month over Month count of orders for different payment types .................................................... 16
6.2. Count of orders based on the no. of payment installments ............................................................. 18
7. Insights ......................................................................................................................................................... 18
8. Recommendations....................................................................................................................................... 22
1. Basic EDA:

1.1. Structure, characteristics, and Data type of columns in dataset

➢ Customers Table

Number of Rows: 99,441


Primary Key: customer_id
Field name Type Number of Unique Items
customer_id STRING 99441
customer_unique_id STRING 96096
customer_zip_code_prefix INTEGER 14994
customer_city STRING 4119
customer_state STRING 27

Count for Customer’s with respect to States

➢ Geolocation Table

Number of Rows: 1,000,163


FOREIGN Key: zip_code_prefix

Field name Type Number of Unique Items


geolocation_zip_code_prefix INTEGER 19015
geolocation_lat FLOAT 717360
geolocation_lng FLOAT 717613
geolocation_city STRING 8011
geolocation_state STRING 27

➢ Order items Table


Number of Rows: 112,650
FOREIGN Key: order_item_id

Field name Type Number of Unique Items


order_id STRING 98666
order_item_id INTEGER 21
product_id STRING 32951
seller_id STRING 3095
shipping_limit_date TIMESTAMP 93318
price FLOAT 5968
freight_value FLOAT 6999

➢ Order Review Table


Number of Rows: 99,224
FOREIGN Key: review_id
Field name Type Number of Unique Items
review_id STRING 98410
order_id STRING 98673
review_score INTEGER 5
review_comment_title STRING 3365
review_creation_date TIMESTAMP 636
review_answer_timestamp TIMESTAMP 88867

Count of Order’s with each rating

➢ Orders Table
Number of Rows: 99,441
Primary Key: order_id
Field name Type Number of Unique Items
order_id STRING 99441
customer_id STRING 99441
order_status STRING 8
order_purchase_timestamp TIMESTAMP 98875
order_approved_at TIMESTAMP 90733
order_delivered_carrier_date TIMESTAMP 81018
order_delivered_customer_date TIMESTAMP 95664
order_estimated_delivery_date TIMESTAMP 459

➢ Payments Table
Number of Rows: 103,886
FOREIGN Key: order_id
Field name Type Number of Unique Items
order_id STRING 99440
payment_sequential INTEGER 29
payment_type STRING 5
payment_installments INTEGER 24
payment_value FLOAT 29077

Count of Order’s with respect to Payment type

➢ Products Table
Number of Rows: 32,951
Primary Key: product_id
Field name Type Number of Unique Items
product_id STRING 32951
product_category STRING 73
product_name_length INTEGER 66
product_description_length INTEGER 2960
product_photos_qty INTEGER 19
product_weight_g INTEGER 2204
product_length_cm INTEGER 99
product_height_cm INTEGER 102
product_width_cm INTEGER 95

➢ Sellers Table
Number of Rows: 3,095
Primary Key: seller_id

Field name Type Number of Unique Items


seller_id STRING 3095
seller_zip_code_prefix INTEGER 2246
seller_city STRING 611
seller_state STRING 23

1.2. Time period of dataset:

As mentioned in Context: “This business case has information of 100k orders from 2016 to 2018 made at
Target in Brazil.” It Covers Orders of 2016 to 2018 time-period

Query

SELECT min(order_purchase_timestamp) as MinimumDate,


max(order_purchase_timestamp) as MaximumDate FROM `map-it-266510.1234.orders`

Response

1.3. Cities and States of customers

Query

SELECT customer_state, customer_city, count(customer_id) as customer_count FROM `map-it-


266510.1234.customers` group by customer_state, customer_city

Response
Below SQL Query calculates total count of cities and States of Customers

Query

SELECT count( DISTINCT customer_state ) as states_count FROM `map-it-


266510.1234.customers`
SELECT count( DISTINCT customer_city ) as cities_count FROM `map-it-
266510.1234.customers`

Response

2. In-depth Exploration

2.1. Is there a growing trend on e-commerce in Brazil? How can we describe a


complete scenario? Can we see some seasonality with peaks at specific
months?

Query

select * from (
SELECT
EXTRACT(YEAR FROM order_purchase_timestamp) as year,
EXTRACT(MONTH FROM order_purchase_timestamp) as month,
count(order_id) as order_count
FROM `map-it-266510.1234.orders`
group by EXTRACT(YEAR FROM order_purchase_timestamp), EXTRACT(MONTH FROM order_purchase_t
imestamp)
) monthly_ordercounts
order by monthly_ordercounts.year, monthly_ordercounts.month
Response

Above grouped by Yearly & Monthly Query’s response shows


significant growth of orders-count across time-period,

Hence, it reflects certain growth trend in e-commerce in Brazil

Seasonal peak is observed in the months of


November 2017 and January 2018.

Here’s Months with Highest Order Counts.


2.2. What time do Brazilian customers tend to buy (Dawn, Morning, Afternoon or
Night)?

Query

select *,
CASE
when hourly_ordercounts.hour_of_day < 6 and hourly_ordercounts.hour_of_day >= 4
then 'DAWN 4am-5:59am'
when hourly_ordercounts.hour_of_day < 12 and hourly_ordercounts.hour_of_day >= 6
then 'MORNING 6am-11:59am'
when hourly_ordercounts.hour_of_day < 18 and hourly_ordercounts.hour_of_day >= 12
then 'AFTERNOON 12pm-17:59pm'
else 'NIGHT 18pm-3:59am'
END as period
from (
SELECT
EXTRACT(HOUR FROM order_purchase_timestamp) as hour_of_day,
count(order_id) as order_count
FROM `map-it-266510.1234.orders`
group by EXTRACT(HOUR FROM order_purchase_timestamp)
) hourly_ordercounts

order by hourly_ordercounts.order_count desc

Response
Above Response from SQL Query, clearly shows Afternoon period of day has maximum purchases for Brazilian
customers.
3. Evolution of E-commerce orders in the Brazil region:

3.1. Get month on month orders by states

Query

SELECT * from (
SELECT
cust.customer_state as STATE,
EXTRACT(YEAR FROM order_purchase_timestamp) as year,
EXTRACT(MONTH FROM order_purchase_timestamp) as month,
count(order_id) as total_orders
FROM `map-it-266510.1234.orders` ord
left join `map-it-266510.1234.customers` cust on cust.customer_id = ord.customer_id
group by cust.customer_state, EXTRACT(YEAR FROM order_purchase_timestamp),
EXTRACT(MONTH FROM order_purchase_timestamp)
) sqlq
order by sqlq.year, sqlq.month, sqlq.STATE

Response

3.2. Distribution of customers across the states in Brazil

Query

SELECT customer_state as state,


count(customer_id) as total_customers
FROM `map-it-266510.1234.customers`
group by customer_state
order by total_customers desc

Response

4. Impact on Economy: Analyze the money movement by e-commerce by looking at


order prices, freight, and others.

4.1. Get % increase in cost of orders from 2017 to 2018 (include months between
Jan to Aug only) - You can use “payment_value” column in payments table

Query output of WITH Clause (Just for Reference)

with yearly_payment as (
SELECT EXTRACT(YEAR FROM order_purchase_timestamp) as y,
sum(p.payment_value) as total_cost
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.payments` p on p.order_id = o.order_id
where EXTRACT(MONTH FROM order_purchase_timestamp) < 9 --(Only Jan to Aug)
group by EXTRACT(YEAR FROM order_purchase_timestamp)
)

SELECT ytp.y, ROUND(ytp.total_cost, 2) AS total_order_cost


FROM yearly_payment ytp

Response
Query – Complete

with yearly_payment as (
SELECT EXTRACT(YEAR FROM order_purchase_timestamp) as y,
sum(p.payment_value) as total_cost
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.payments` p on p.order_id = o.order_id
where EXTRACT(MONTH FROM order_purchase_timestamp) < 9 --(Only Jan to Aug)
group by EXTRACT(YEAR FROM order_purchase_timestamp)
)

SELECT ROUND(((ytp1.total_cost - ytp2.total_cost)/ytp1.total_cost)*100, 2) AS increasePer


centage
FROM yearly_payment ytp1 CROSS JOIN yearly_payment ytp2
where ytp1.y = 2018 and ytp2.y = 2017

Response

4.2. Mean & Sum of price and freight value by customer state

Query

SELECT c.customer_state, ROUND(AVG(price),2) as mean_price, ROUND(AVG(freight_value),2) a


s mean_freight,
ROUND(SUM(price),2) as sum_price, ROUND(SUM(freight_value),2) as sum_freight,
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.order_items` oi on oi.order_id = o.order_id
inner join `map-it-266510.1234.customers` c on c.customer_id = o.customer_id
group by c.customer_state

Response
5. Analysis on sales, freight, and delivery time

5.1. Calculate days between purchasing, delivering and estimated delivery

Query

SELECT order_id, order_status, DATE_DIFF(order_delivered_customer_date, order_purchase_t


imestamp, DAY) days_btw_delivery_and_order,
DATE_DIFF(order_estimated_delivery_date, order_delivered_customer_date, DAY) days_btw_es
tm_and_delivery,
DATE_DIFF(order_estimated_delivery_date, order_purchase_timestamp, DAY) days_btw_estm_an
d_purchase
FROM `map-it-266510.1234.orders`
where order_status in ('delivered')

Response
5.2. Find time_to_delivery & diff_estimated_delivery. Using (time_to_delivery =
order_purchase_timestamp-order_delivered_customer_date and
diff_estimated_delivery = order_estimated_delivery_date-
order_delivered_customer_date)

Query

SELECT order_id,
DATE_DIFF(order_purchase_timestamp,order_delivered_customer_date, DAY) time_to_delivery,
DATE_DIFF(order_estimated_delivery_date,order_delivered_customer_date, DAY) diff_estimate
d_delivery
FROM `map-it-266510.1234.orders`
where order_status in ('delivered')
LIMIT 10

Response

5.3. Group data by state, take mean of freight_value, time_to_delivery,


diff_estimated_delivery

Query

SELECT c.customer_state,
ROUND(AVG(DATE_DIFF(order_purchase_timestamp,order_delivered_customer_date, DAY)),2) avgt
ime_to_delivery,
ROUND(AVG(DATE_DIFF(order_estimated_delivery_date,order_delivered_customer_date, DAY)),2)
avgdiff_estimated_delivery,
ROUND(AVG(oi.freight_value),2) avgfreight
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.order_items` oi on oi.order_id = o.order_id
inner join `map-it-266510.1234.customers` c on c.customer_id = o.customer_id
where order_status in ('delivered')
group by c.customer_state
Response

5.4. Top 5 states with highest/lowest average freight value - sort in desc/asc limit 5

Query

with avg_value as (
SELECT c.customer_state, AVG(oi.freight_value) avgfreight
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.order_items` oi on oi.order_id = o.order_id
inner join `map-it-266510.1234.customers` c on c.customer_id = o.customer_id
group by c.customer_state
)

select avg_value.customer_state from avg_value order by avg_value.avgfreight limit 5


select avg_value.customer_state from avg_value order by avg_value.avgfreight desc limit 5

Response

Above Command has 2 SQL Queries, which gives First Table (Image on Left) as ASCENDING order of Avg
Freight value and Second Table (Image on Right) as DESCENDING order.

5.5. Top 5 states with highest/lowest average time to delivery

Query
with avg_value as (
SELECT c.customer_state, ROUND(AVG(DATE_DIFF(order_delivered_customer_date, order_purch
ase_timestamp, DAY)),2) avgtime_to_delivery
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.order_items` oi on oi.order_id = o.order_id
inner join `map-it-266510.1234.customers` c on c.customer_id = o.customer_id
group by c.customer_state
)

select avg_value.customer_state from avg_value order by avg_value.avgtime_to_delivery lim


it 5 --(Lowest avg delivery time)
select avg_value.customer_state from avg_value order by avg_value.avgtime_to_delivery des
c limit 5 --(Highest avg delivery time)

Response

Above Command has 2 SQL Queries, which gives First Table (Image on Left) as ASCENDING order (LOWEST
Avg Time of delivery) and Second Table (Image on Right) as DESCENDING order (Highest Avg Time of delivery)

5.6. Top 5 states where delivery is fast/ not so fast compared to estimated date

Query

with avg_value as (
SELECT c.customer_state, ROUND(AVG(DATE_DIFF(order_estimated_delivery_date, order_purch
ase_timestamp, DAY)),3) avg_esttime
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.order_items` oi on oi.order_id = o.order_id
inner join `map-it-266510.1234.customers` c on c.customer_id = o.customer_id
group by c.customer_state
)

select * from avg_value order by avg_value.avg_esttime limit 5 -


- Fastest 5 States for delivery
select * from avg_value order by avg_value.avg_esttime desc limit 5 -
- Slowest 5 States for delivery

Response

Above Command has 2 SQL Queries, which gives First Table (Image on Left) as ASCENDING order (FASTEST
Est. Avg delivery time) and Second Table (Image on Right) as DESCENDING order (SLOWEST Est. Avg delivery
time)

6. Payment type analysis:

6.1. Month over Month count of orders for different payment types

Query

select * from (
select
EXTRACT(YEAR FROM o.order_purchase_timestamp) as year,
EXTRACT(MONTH FROM o.order_purchase_timestamp) as month,
p.payment_type,
count(o.order_id) as orders
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.payments` p on o.order_id = p.order_id
group by year, month, p.payment_type
) pay
order by pay.year, pay.month, pay.payment_type

Response

SQL Query results year’s each monthly’s total order count based on each payment type
Query – To find monthly Increase of order count wrt earlier month of same Payment Type

select *,
monthpay.orders-
monthpay.earlierMonthCount_withSamePayment as month_month_growth_onSamePayment
from (
select *,
LAG(pay.orders, 1) OVER(partition by pay.payment_type order by pay.year, pay.month, pay
.payment_type) as earlierMonthCount_withSamePayment from (
select p.payment_type, EXTRACT(YEAR FROM o.order_purchase_timestamp) as year, EXTRACT
(MONTH FROM o.order_purchase_timestamp) as month,
count(o.order_id) as orders
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.payments` p on o.order_id = p.order_id
group by year, month, p.payment_type
) pay
order by pay.year, pay.month, pay.payment_type
) monthpay

Response
6.2. Count of orders based on the no. of payment installments

Query

select * from (
select p.payment_installments,
count(o.order_id) as orders
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.payments` p on o.order_id = p.order_id
group by p.payment_installments
) pay
order by pay.payment_installments

Response

7. Insights

➢ The Winter Period and AFTERNOON time of the days is observed to be Peak times for Sales.
➢ States SP, RJ and MG are Observed to be having huge product Income sales (sorted by sum_price
desc)
➢ 97% of orders are delivered and less than 1% are cancelled or being unavailable, which indeed
shows up a great number.

➢ There’s almost 6+ States in Brazil, where sellers are not originated/lesser in number, but orders
triggered from that individual state are huge (>500)

Query

select cs.c_state, cs.customer_count, sl.seller_count, ord.order_count from


( select count(c.customer_id) as customer_count, c.customer_state as c_state
FROM `map-it-266510.1234.customers` c
group by c.customer_state
) cs
left join
( select count(s.seller_id) as seller_count, s.seller_state as s_state
FROM `map-it-266510.1234.sellers` s
group by s.seller_state
) sl on cs.c_state = sl.s_state
left join
( select count(o.order_id) as order_count, c.customer_state as o_state
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.customers` c on c.customer_id = o.customer_id
group by o_state
) ord on cs.c_state = ord.o_state
where sl.seller_count < 10 and ord.order_count > 500

Response
➢ Insights based on Estimated and Delivery Dates: it’s observed that there’s average of ~10 days
difference between estimated delivery date and actual delivery date, and maximum delay was
around 188 days from estimated date.

Query

SELECT
MIN(DATE_DIFF(order_estimated_delivery_date, order_delivered_customer_date, DAY)) delaye
d_days_btw_estm_and_delivery,
MAX(DATE_DIFF(order_estimated_delivery_date, order_delivered_customer_date, DAY)) early_
reached_days_btw_estm_and_delivery,
AVG(DATE_DIFF(order_estimated_delivery_date, order_delivered_customer_date, DAY)) avg_da
ys_btw_estm_and_delivery
FROM `map-it-266510.1234.orders`
where order_status in ('delivered')

Response

➢ Insights based on Payment type: it’s states that ~74% of orders are placed using credit cards

Query

with paycount as (
select p.payment_type,
count(o.order_id) as orders
FROM `map-it-266510.1234.orders` o
left join `map-it-266510.1234.payments` p on o.order_id = p.order_id
group by p.payment_type
)
select paycount.payment_type, paycount.orders, (paycount.orders/(sum(paycount.orders)over
(partition by null)))*100 as percent_orders from paycount

Response
➢ Insights based on Reviews: it’s observed that ~77% of orders are given with review score more
than 3

Query

with scored as (
SELECT r.review_score as score, count(o.order_id) as orders
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.order_reviews` r on r.order_id = o.order_id
group by r.review_score
)
select scored.score, scored.orders, (scored.orders/(sum(scored.orders)over(partition by n
ull)))*100 as percent_orders from scored

Response

➢ Insights based on Products: it’s observed that bed table, Beauty Products and Sports has been
leading product category for given time period.
8. Recommendations

Based on Above Insights and Observations:

➢ Winter Period and/with Afternoon time of the day has been Peak period for e-commerce,
Promotional Sales at remaining/other Seasonal period would improve uniformity in Retail
Income across the year.
➢ Sellers Count in States with Higher demand could be improved, to enhance business scale and
reduce product delivery time.
➢ Both Cancelled and Unavailable items, combined are also less than 1.5% of total orders, which
shows up good State of business, but It’s Observed that approx. 23% of Purchases are given with
review score of less than 4,
For which Target can try to understand/communicate for possible reasons for Enhancing Overall
Quality and business relationship (as mentioned almost 19583 reviews given have score <4, but
don’t have any reasons/comments mentioned)
➢ Estimated delivery dates has major differences with actual product delivery dates, Improvement
in accuracy of estimation would help in better customers relationship and lesser status-
enquiries/turn-around time
➢ Major Purchase source is observed to be Credit card, Hence Providing Offers/Promotions with
Credit Payment could help in attracting more business.

You might also like