ANZ Virtual Internship Module Model Answer For Task 1
ANZ Virtual Internship Module Model Answer For Task 1
ANZ Virtual Internship Module Model Answer For Task 1
library(stringr)
library(lubridate)
library(tidyverse)
library(modelr)
library(sp)
library(leaflet)
library(geosphere)
library(knitr)
library(rpart)
df = read.csv("data/DSynth_Output_100c_3m_v3.csv")
The range of each feature should also be examined which shows that there is one customer that resides outside
Australia.
# examine the summary of the dataset
summary(df)
str(df)
# the dateset only contain records for 91 days, one day is missing
DateRange <- seq(min(df$date), max(df$date), by = 1)
DateRange[!DateRange %in% df$date] # 2018-08-16 transactions are missing
# split customer & merchant lat_long into individual columns for analysis
dfloc = df[,c("long_lat","merchant_long_lat")]
dfloc<- dfloc %>% separate("long_lat", c("c_long", "c_lat"),sep=' ')
dfloc<- dfloc %>% separate("merchant_long_lat", c("m_long", "m_lat"),sep=' ')
dfloc<- data.frame(sapply(dfloc, as.numeric))
df <- cbind(df,dfloc)
Location infomation suggested there is one customer who resides outside Australia. However, all his/her
transaction histories occured within AU thus these records are included for further analysis.
summary(df_csmp)
hist(df2$mon_avg_vol,
xlab= 'Monthly transaction volume', ylab='No. of customers', main = "Histogram of customer
s' monthly transaction volume")
1.4 Segment the dataset by transaction date and time.
ggplot(df4,aes(x=hour,y=trans_vol_per_hr))+geom_point()+geom_line(aes(group = 1))+
ggtitle('Average transaction volume by hour') +
labs(x='Hour',y='Transaction volume') + expand_limits( y = 0)
1.5 challenge: exploring location information
We could firstly see the distribution of distance between a customer and the merchange he/she trades with.
# exclude the single foreign customer whose location information was incorrectly stored (i.e lat
itude 573)
### This function takes in a customer Id and plot the location of the customer and all
### merchants he/she have traded with.
merch_dist(id ='CUS-51506836' )
+
−