Social Media Tourism Project
Social Media Tourism Project
Social Media Tourism Project
PROJECT
JASMEEN KAUR
PROBLEM STATEMENT
An aviation company that provides domestic as well as international trips to the customers now wants to apply a
targeted approach instead of reaching out to each of the customers. This time they want to do it digitally instead of tele
calling. Hence they have collaborated with a social networking platform, so they can learn the digital and social behavior
of the customers and provide the digital advertisement on the user page of the targeted customers who have a high
propensity to take up the product. Propensity of buying tickets is different for different login devices. Hence, you have to
create 2 models separately for Laptop and Mobile. [Anything which is not a laptop can be considered as mobile phone
usage.] The advertisements on the digital platform are a bit expensive; hence, you need to be very accurate while
creating the models.
1|Page
DATA REPORT
TOTAL DATA
(11760, 17)
From the data given we can observe that some of the column are in object type this means some character in
there in data this is bad data we have to clean this and convert these data into int/float. There are some
missing values in some features we have to treat them as well. We have "*" in data, we have to either convert
this into missing values or # we can replace this with mode. We will also drop user id feature.
DATA PRE-PROCESSING
In the column “preferred_location_type” we can see "Tours and Travel" is repeat as some difference Tours
Travel we have to clean this and append to one of the attribute.
In the column “yearly_avg_Outstation_checkins” we have
"*"" in data, we have to either convert this into missing values or we can replace this with mode.
For such features, data cleansing is needed.
AFTER PRE-PROCESSING THE DATA
2|Page
preferred_location_type
BEFORE
AFTER
yearly_avg_Outstation_checkins
3|Page
Since buying tickets is different for different login devices so we will create 2 models separately for Laptop and Mobile.
4|Page
TREATING MISSING VALUES
Based on the percentage of missing value we can use different imputing techniques . if missing values is minimal we can
impute with simple imputer like mean, mode, median # If missing values percentage is larger we need to impute with
some advanced techniques like KNN imputation.
As max missing values is less than 5% we can impute them. In our dataset we have float and object missing values where
we can impute float with median and object with mode ,we have 4 float and 3 object data type for imputation so
Replacing NULL values in Numerical Columns using Median .
CHECKING OUTLIERS
5|Page
Here we can observe that 2 features contain outliers so we will be taking help of Inter quartile range to treat outliers.
Checking skewness
6|Page
DATA VISUALIZATION: UNIVARIATE ANALYSIS
Numeric Data
7|Page
8|Page
Categorical Data
In case of categorical variable we are interested to know the frequencies of levels .we can observe the
frequencies in terms of count plot for categorical variables analyzing categorical variable frequencies levels
using seaborn count plots which gives the counts of observations in each category.
We can see here that probability of buying ticket for next month is less.
We can see here that of the people prefer booking from Mobile Phones.
9|Page
we can observe here that the most visited location is beach, financial and least visited place is hill station.
We can observe that user mostly travel along with 3 and 4 family member.
we can observe here that most of user do not follow company page.
10 | P a g e
BIVARIATE ANALYSIS
Here we can observe that the people who don’t follow company page have high average view on company page
and people who follow company page has less view.
Here we can observe that user who travel out since last outstation has higher probability of taking product.
11 | P a g e
PAIRPLOT
12 | P a g e
CORRELATION HEATMAP
13 | P a g e
BUSINESS INSIGHTS
We observed that user mostly travel in the group of 3 or 4 so I would recommend that company should make
offers for the users who are travelling in group of 3 and 4 so that we can retain most of customers.
We can observe here that the most visited location is beach, financial and least visited place is hill station so the
company should provide offers and discount based on most common locations.
Yes we have observed correctly that data is heavily imbalanced and we will use smote to treat it.
We also observe that the people who don’t follow company page have high average view on company page and
people who follow company page has less view this means our social media team is not effective to gain online
presence so I would recommend that social media campaigns should be there so that we can grab attention of
social media mob as it clearly impact business.
Since buying ticket probability is less for next month via online, the company should advertise more on social
media on different platform analyzing the platforms which are used more by the public.
14 | P a g e