Targets QL
Targets QL
Targets QL
Contents
Business Case: Target SQL ......................................................................................................................................... 1
1. Basic EDA:....................................................................................................................................................... 2
1.1. Structure, characteristics, and Data type of columns in dataset ........................................................ 2
1.2. Time period of dataset: ......................................................................................................................... 5
1.3. Cities and States of customers ............................................................................................................. 5
2. In-depth Exploration...................................................................................................................................... 6
2.1. Is there a growing trend on e-commerce in Brazil? How can we describe a complete scenario?
Can we see some seasonality with peaks at specific months? ....................................................................... 6
2.2. What time do Brazilian customers tend to buy (Dawn, Morning, Afternoon or Night)? .................. 8
3. Evolution of E-commerce orders in the Brazil region: ................................................................................. 9
3.1. Get month on month orders by states................................................................................................. 9
3.2. Distribution of customers across the states in Brazil .......................................................................... 9
4. Impact on Economy: Analyze the money movement by e-commerce by looking at order prices, freight,
and others. ........................................................................................................................................................... 10
4.1. Get % increase in cost of orders from 2017 to 2018 (include months between Jan to Aug only) -
You can use “payment_value” column in payments table ............................................................................ 10
4.2. Mean & Sum of price and freight value by customer state .............................................................. 11
5. Analysis on sales, freight, and delivery time .............................................................................................. 12
5.1. Calculate days between purchasing, delivering and estimated delivery ......................................... 12
5.2. Find time_to_delivery & diff_estimated_delivery. Using (time_to_delivery =
order_purchase_timestamp-order_delivered_customer_date and diff_estimated_delivery =
order_estimated_delivery_date-order_delivered_customer_date) ............................................................ 13
5.3. Group data by state, take mean of freight_value, time_to_delivery, diff_estimated_delivery ..... 13
5.4. Top 5 states with highest/lowest average freight value - sort in desc/asc limit 5 ........................... 14
5.5. Top 5 states with highest/lowest average time to delivery .............................................................. 14
5.6. Top 5 states where delivery is fast/ not so fast compared to estimated date ................................ 15
6. Payment type analysis: ................................................................................................................................ 16
6.1. Month over Month count of orders for different payment types .................................................... 16
6.2. Count of orders based on the no. of payment installments ............................................................. 18
7. Insights ......................................................................................................................................................... 18
8. Recommendations....................................................................................................................................... 22
1. Basic EDA:
➢ Customers Table
➢ Geolocation Table
➢ Orders Table
Number of Rows: 99,441
Primary Key: order_id
Field name Type Number of Unique Items
order_id STRING 99441
customer_id STRING 99441
order_status STRING 8
order_purchase_timestamp TIMESTAMP 98875
order_approved_at TIMESTAMP 90733
order_delivered_carrier_date TIMESTAMP 81018
order_delivered_customer_date TIMESTAMP 95664
order_estimated_delivery_date TIMESTAMP 459
➢ Payments Table
Number of Rows: 103,886
FOREIGN Key: order_id
Field name Type Number of Unique Items
order_id STRING 99440
payment_sequential INTEGER 29
payment_type STRING 5
payment_installments INTEGER 24
payment_value FLOAT 29077
➢ Products Table
Number of Rows: 32,951
Primary Key: product_id
Field name Type Number of Unique Items
product_id STRING 32951
product_category STRING 73
product_name_length INTEGER 66
product_description_length INTEGER 2960
product_photos_qty INTEGER 19
product_weight_g INTEGER 2204
product_length_cm INTEGER 99
product_height_cm INTEGER 102
product_width_cm INTEGER 95
➢ Sellers Table
Number of Rows: 3,095
Primary Key: seller_id
As mentioned in Context: “This business case has information of 100k orders from 2016 to 2018 made at
Target in Brazil.” It Covers Orders of 2016 to 2018 time-period
Query
Response
Query
Response
Below SQL Query calculates total count of cities and States of Customers
Query
Response
2. In-depth Exploration
Query
select * from (
SELECT
EXTRACT(YEAR FROM order_purchase_timestamp) as year,
EXTRACT(MONTH FROM order_purchase_timestamp) as month,
count(order_id) as order_count
FROM `map-it-266510.1234.orders`
group by EXTRACT(YEAR FROM order_purchase_timestamp), EXTRACT(MONTH FROM order_purchase_t
imestamp)
) monthly_ordercounts
order by monthly_ordercounts.year, monthly_ordercounts.month
Response
Query
select *,
CASE
when hourly_ordercounts.hour_of_day < 6 and hourly_ordercounts.hour_of_day >= 4
then 'DAWN 4am-5:59am'
when hourly_ordercounts.hour_of_day < 12 and hourly_ordercounts.hour_of_day >= 6
then 'MORNING 6am-11:59am'
when hourly_ordercounts.hour_of_day < 18 and hourly_ordercounts.hour_of_day >= 12
then 'AFTERNOON 12pm-17:59pm'
else 'NIGHT 18pm-3:59am'
END as period
from (
SELECT
EXTRACT(HOUR FROM order_purchase_timestamp) as hour_of_day,
count(order_id) as order_count
FROM `map-it-266510.1234.orders`
group by EXTRACT(HOUR FROM order_purchase_timestamp)
) hourly_ordercounts
Response
Above Response from SQL Query, clearly shows Afternoon period of day has maximum purchases for Brazilian
customers.
3. Evolution of E-commerce orders in the Brazil region:
Query
SELECT * from (
SELECT
cust.customer_state as STATE,
EXTRACT(YEAR FROM order_purchase_timestamp) as year,
EXTRACT(MONTH FROM order_purchase_timestamp) as month,
count(order_id) as total_orders
FROM `map-it-266510.1234.orders` ord
left join `map-it-266510.1234.customers` cust on cust.customer_id = ord.customer_id
group by cust.customer_state, EXTRACT(YEAR FROM order_purchase_timestamp),
EXTRACT(MONTH FROM order_purchase_timestamp)
) sqlq
order by sqlq.year, sqlq.month, sqlq.STATE
Response
Query
Response
4.1. Get % increase in cost of orders from 2017 to 2018 (include months between
Jan to Aug only) - You can use “payment_value” column in payments table
with yearly_payment as (
SELECT EXTRACT(YEAR FROM order_purchase_timestamp) as y,
sum(p.payment_value) as total_cost
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.payments` p on p.order_id = o.order_id
where EXTRACT(MONTH FROM order_purchase_timestamp) < 9 --(Only Jan to Aug)
group by EXTRACT(YEAR FROM order_purchase_timestamp)
)
Response
Query – Complete
with yearly_payment as (
SELECT EXTRACT(YEAR FROM order_purchase_timestamp) as y,
sum(p.payment_value) as total_cost
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.payments` p on p.order_id = o.order_id
where EXTRACT(MONTH FROM order_purchase_timestamp) < 9 --(Only Jan to Aug)
group by EXTRACT(YEAR FROM order_purchase_timestamp)
)
Response
4.2. Mean & Sum of price and freight value by customer state
Query
Response
5. Analysis on sales, freight, and delivery time
Query
Response
5.2. Find time_to_delivery & diff_estimated_delivery. Using (time_to_delivery =
order_purchase_timestamp-order_delivered_customer_date and
diff_estimated_delivery = order_estimated_delivery_date-
order_delivered_customer_date)
Query
SELECT order_id,
DATE_DIFF(order_purchase_timestamp,order_delivered_customer_date, DAY) time_to_delivery,
DATE_DIFF(order_estimated_delivery_date,order_delivered_customer_date, DAY) diff_estimate
d_delivery
FROM `map-it-266510.1234.orders`
where order_status in ('delivered')
LIMIT 10
Response
Query
SELECT c.customer_state,
ROUND(AVG(DATE_DIFF(order_purchase_timestamp,order_delivered_customer_date, DAY)),2) avgt
ime_to_delivery,
ROUND(AVG(DATE_DIFF(order_estimated_delivery_date,order_delivered_customer_date, DAY)),2)
avgdiff_estimated_delivery,
ROUND(AVG(oi.freight_value),2) avgfreight
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.order_items` oi on oi.order_id = o.order_id
inner join `map-it-266510.1234.customers` c on c.customer_id = o.customer_id
where order_status in ('delivered')
group by c.customer_state
Response
5.4. Top 5 states with highest/lowest average freight value - sort in desc/asc limit 5
Query
with avg_value as (
SELECT c.customer_state, AVG(oi.freight_value) avgfreight
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.order_items` oi on oi.order_id = o.order_id
inner join `map-it-266510.1234.customers` c on c.customer_id = o.customer_id
group by c.customer_state
)
Response
Above Command has 2 SQL Queries, which gives First Table (Image on Left) as ASCENDING order of Avg
Freight value and Second Table (Image on Right) as DESCENDING order.
Query
with avg_value as (
SELECT c.customer_state, ROUND(AVG(DATE_DIFF(order_delivered_customer_date, order_purch
ase_timestamp, DAY)),2) avgtime_to_delivery
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.order_items` oi on oi.order_id = o.order_id
inner join `map-it-266510.1234.customers` c on c.customer_id = o.customer_id
group by c.customer_state
)
Response
Above Command has 2 SQL Queries, which gives First Table (Image on Left) as ASCENDING order (LOWEST
Avg Time of delivery) and Second Table (Image on Right) as DESCENDING order (Highest Avg Time of delivery)
5.6. Top 5 states where delivery is fast/ not so fast compared to estimated date
Query
with avg_value as (
SELECT c.customer_state, ROUND(AVG(DATE_DIFF(order_estimated_delivery_date, order_purch
ase_timestamp, DAY)),3) avg_esttime
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.order_items` oi on oi.order_id = o.order_id
inner join `map-it-266510.1234.customers` c on c.customer_id = o.customer_id
group by c.customer_state
)
Response
Above Command has 2 SQL Queries, which gives First Table (Image on Left) as ASCENDING order (FASTEST
Est. Avg delivery time) and Second Table (Image on Right) as DESCENDING order (SLOWEST Est. Avg delivery
time)
6.1. Month over Month count of orders for different payment types
Query
select * from (
select
EXTRACT(YEAR FROM o.order_purchase_timestamp) as year,
EXTRACT(MONTH FROM o.order_purchase_timestamp) as month,
p.payment_type,
count(o.order_id) as orders
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.payments` p on o.order_id = p.order_id
group by year, month, p.payment_type
) pay
order by pay.year, pay.month, pay.payment_type
Response
SQL Query results year’s each monthly’s total order count based on each payment type
Query – To find monthly Increase of order count wrt earlier month of same Payment Type
select *,
monthpay.orders-
monthpay.earlierMonthCount_withSamePayment as month_month_growth_onSamePayment
from (
select *,
LAG(pay.orders, 1) OVER(partition by pay.payment_type order by pay.year, pay.month, pay
.payment_type) as earlierMonthCount_withSamePayment from (
select p.payment_type, EXTRACT(YEAR FROM o.order_purchase_timestamp) as year, EXTRACT
(MONTH FROM o.order_purchase_timestamp) as month,
count(o.order_id) as orders
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.payments` p on o.order_id = p.order_id
group by year, month, p.payment_type
) pay
order by pay.year, pay.month, pay.payment_type
) monthpay
Response
6.2. Count of orders based on the no. of payment installments
Query
select * from (
select p.payment_installments,
count(o.order_id) as orders
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.payments` p on o.order_id = p.order_id
group by p.payment_installments
) pay
order by pay.payment_installments
Response
7. Insights
➢ The Winter Period and AFTERNOON time of the days is observed to be Peak times for Sales.
➢ States SP, RJ and MG are Observed to be having huge product Income sales (sorted by sum_price
desc)
➢ 97% of orders are delivered and less than 1% are cancelled or being unavailable, which indeed
shows up a great number.
➢ There’s almost 6+ States in Brazil, where sellers are not originated/lesser in number, but orders
triggered from that individual state are huge (>500)
Query
Response
➢ Insights based on Estimated and Delivery Dates: it’s observed that there’s average of ~10 days
difference between estimated delivery date and actual delivery date, and maximum delay was
around 188 days from estimated date.
Query
SELECT
MIN(DATE_DIFF(order_estimated_delivery_date, order_delivered_customer_date, DAY)) delaye
d_days_btw_estm_and_delivery,
MAX(DATE_DIFF(order_estimated_delivery_date, order_delivered_customer_date, DAY)) early_
reached_days_btw_estm_and_delivery,
AVG(DATE_DIFF(order_estimated_delivery_date, order_delivered_customer_date, DAY)) avg_da
ys_btw_estm_and_delivery
FROM `map-it-266510.1234.orders`
where order_status in ('delivered')
Response
➢ Insights based on Payment type: it’s states that ~74% of orders are placed using credit cards
Query
with paycount as (
select p.payment_type,
count(o.order_id) as orders
FROM `map-it-266510.1234.orders` o
left join `map-it-266510.1234.payments` p on o.order_id = p.order_id
group by p.payment_type
)
select paycount.payment_type, paycount.orders, (paycount.orders/(sum(paycount.orders)over
(partition by null)))*100 as percent_orders from paycount
Response
➢ Insights based on Reviews: it’s observed that ~77% of orders are given with review score more
than 3
Query
with scored as (
SELECT r.review_score as score, count(o.order_id) as orders
FROM `map-it-266510.1234.orders` o
inner join `map-it-266510.1234.order_reviews` r on r.order_id = o.order_id
group by r.review_score
)
select scored.score, scored.orders, (scored.orders/(sum(scored.orders)over(partition by n
ull)))*100 as percent_orders from scored
Response
➢ Insights based on Products: it’s observed that bed table, Beauty Products and Sports has been
leading product category for given time period.
8. Recommendations
➢ Winter Period and/with Afternoon time of the day has been Peak period for e-commerce,
Promotional Sales at remaining/other Seasonal period would improve uniformity in Retail
Income across the year.
➢ Sellers Count in States with Higher demand could be improved, to enhance business scale and
reduce product delivery time.
➢ Both Cancelled and Unavailable items, combined are also less than 1.5% of total orders, which
shows up good State of business, but It’s Observed that approx. 23% of Purchases are given with
review score of less than 4,
For which Target can try to understand/communicate for possible reasons for Enhancing Overall
Quality and business relationship (as mentioned almost 19583 reviews given have score <4, but
don’t have any reasons/comments mentioned)
➢ Estimated delivery dates has major differences with actual product delivery dates, Improvement
in accuracy of estimation would help in better customers relationship and lesser status-
enquiries/turn-around time
➢ Major Purchase source is observed to be Credit card, Hence Providing Offers/Promotions with
Credit Payment could help in attracting more business.