Project report_merged
Project report_merged
Project report_merged
on
Travel Budget Prediction (Data science Project)
Problem Statement:
Develop an intelligent system to accurately predict total trip expenses for travelers, addressing
the challenges of variable costs across destinations, transportation modes, accommodations,
activities, and seasonal factors.
The primary objective is to create a reliable tool that helps travelers estimate total trip costs,
facilitating more informed decision-making and financial planning for travel experiences.
The topic of Travel budget prediction is chosen because it addresses a significant real-world
problem faced by travellers. With rising costs and increasing complexity in travel planning,
accurate cost estimation becomes crucial for budgeting and decision-making. It helps travellers
plan their trips more effectively by providing realistic budget estimates.
The project offers valuable learning experiences in data analysis, model development, and
communication of complex analytical findings.
Objective:
To develop an intelligent system capable of accurately predicting total trip expenses for
travellers, leveraging machine learning algorithms and incorporating various travel-related
factors, with the aim of improving financial planning and decision-making throughout the
travel experience.
1. Specific: Develops an intelligent system for predicting trip expenses.
2. Relevant: Addresses a real-world problem faced by travelers.
This objective provides a clear direction for the project while allowing flexibility in the
approach and implementation details. It emphasizes the core goal of creating a useful tool for
travellers.
Scope of the Project:
• Data Collection and Preprocessing: We will collect data related to various Tourist
places, Destination, Best time to visit, Length of stay, Season, Travel mode, Travel style
and many more from multiple sources. Dataset will be cleaned, preprocessed and
standardized for efficient use to predict the cost.
• Machine Learning: The cost prediction system will be built using machine learning
algorithms like Linear regression, Random Forest regressor and Gradient Boosting
techniques. The algorithms will predict the overall trip cost as per the input we’ll given
to it.
• Web Interface: A website will be developed where users can interact with the
prediction system. The website will have some form kind of structure in which it will
ask for the details about the trip and it will predict the cost and give as an output.
Methodology:
• Data Collection: Gather datasets related to Tourist places and the expenses related to
that.
• Data Cleaning and Preprocessing: Remove unnecessary columns, handle missing
data, and encoding data to ensure it is usable for machine learning models.
• Model Building: Use machine learning algorithms like Linear regression, Random
Forest regressor and Gradient Boosting techniques.
• Power BI Dashboard: Build the interactive dashboard to get the insights and can
interact with dashboard according to the requirements.
• Web interface: Will build the web interface so that user can interact with.
Expected Outcomes
The machine learning model is expected to accurately predict the overall expenses of a trip
based on various input factors. Specifically:
1. Predict total trip cost: Estimate the complete expenditure for a journey.
2. Handle multiple inputs: Process various travel-related parameters such as destination,
duration, mode of transport, accommodation type, etc.
3. Provide personalized estimates: Tailor predictions based on individual traveller
preferences and past behaviours.
4. Account for dynamic factors: Incorporate real-time data on pricing fluctuations and
seasonal changes.
5. Offer actionable insights: Help users make informed decisions about their travel plans
and budgets.
The model aims to deliver accurate, personalized, and timely expense predictions,
enhancing travellers’ ability to plan and budget for trips effectively.
Conclusion:
This project aims to revolutionize travel planning by developing an intelligent system that
accurately predicts total trip expenses. By leveraging machine learning algorithms and
integrating various travel-related factors, our tool will empower travelers to make informed
decisions about their journeys. The expected outcome is a user-friendly web application that
provides personalized, real-time expense predictions, ultimately enhancing travelers' ability to
budget and plan their trips effectively.
Chapter 1: Objective & Scope of the Project
i. Objective:
Develop an intelligent travel expense prediction system that leverages machine learning
algorithms to accurately forecast total trip costs based on key factors such as number of
travellers, type of travel, destination, and transportation mode. Additionally, suggest popular
attractions and activities at the chosen location, aiming to enhance financial planning and
decision-making throughout the entire travel experience.
a. Implement machine learning models to predict trip expenses with high accuracy.
b. Incorporate real-time data feeds for dynamic pricing and availability updates.
1
Chapter 2: Theoretical Background Definition of
Problem
i. Theoretical Background:
This objective encompasses both the expense prediction aspect and the recommendation
feature, utilizing machine learning to create a comprehensive tool for travellers to plan
their trips effectively and enjoyably.
Theoretical Framework
a. Predictive Analytics: This field focuses on using statistical models and machine
learning algorithms to forecast future events or behaviours. In our context,
predictive analytics will be applied to estimate travel expenses based on historical
data and current trends.
d. Decision Support Systems: These systems combine data, models, and algorithms to
support decision-making processes. Our travel planning tool embodies this concept
by providing travellers with comprehensive information to inform their choices.
2
algorithms and data science techniques, our project aims to revolutionize travel planning by
providing accurate, personalized, and comprehensive support for travellers
Develop an intelligent system to accurately predict total trip expenses for travellers,
addressing the challenges of variable costs across destinations, transportation modes,
accommodations, activities, and seasonal factors. The primary objective is to create a
reliable tool that helps travellers estimate total trip costs, facilitating more informed
decision-making and financial planning for travel experiences.
This system aims to overcome common travel and expense management challenges faced
by businesses and individuals alike, providing accurate predictions and personalized
recommendations to enhance the overall travel experience.
3
Chapter 3: System Analysis & Design vis-a-vis User
Requirements
i. System analysis
The intelligent travel expense prediction system is designed to provide accurate estimates
of total trip costs for travellers. This system combines advanced machine learning
algorithms with comprehensive data analysis to deliver personalized and adaptive
predictions.
a. Functional Requirements
• Performance:
4
▪ Optimize for quick loading times and smooth user experience.
• Reliability:
▪ Develop a reliable and error free system so that it can predict accurately.
ii. System design
a. Data Collection
• Sources:
• Process:
▪ Collect data manually from different websites.
b. Data Cleaning and Preprocessing
• Tools: Python libraries, specifically pandas, NumPy, and scikit-learn.
▪ Remove unnecessary columns: We began by identifying and removing
irrelevant data columns.
▪ Handle missing values: Used SimpleImputer with mean or median
imputation to manage missing values.
▪ Encode categorical variables: Applied OneHotEncoder to convert
categorical features into numerical form, essential for model compatibility.
▪ Normalize numerical features: Used StandardScaler to standardize
numerical features, ensuring consistent scaling across features.
▪ Outlier removal: Addressed outliers by employing statistical methods to
minimize data noise.
c. Model Building
• ML Algorithms:
5
▪ Random Forest Regressor: An ensemble method chosen for its effectiveness
with complex data and handling overfitting.
▪ Gradient Boosting: Evaluated algorithms such as XGBoost and LightGBM
for enhanced accuracy, especially useful in this context due to their
adaptability with imbalanced data.
▪ Pipeline Model: Built using Pipeline and make_pipeline to automate
preprocessing steps and model training within a seamless workflow.
• Feature Selection: Selected features based on correlation metrics, helping isolate the
most impactful variables.
• Evaluation Metrics:
d. Web Interface
• Tools: HTML, CSS, and JavaScript for front-end development; Flask for back-end
integration with Python.
• Components:
6
e. Flask Integration
• Purpose: Flask serves as the core web framework connecting the user interface with
the Python-based model.
7
Chapter 4: System Planning (PERT Chart)
PERT Chart for Time Management
Purpose: Ensure efficient time management by mapping out tasks, dependencies, and timelines
using a PERT (Program Evaluation and Review Technique) chart. This approach helps in
identifying the critical path and allocating resources effectively.
Tools:
Microsoft Project or other PERT software: For creating a detailed schedule, estimating task
durations, and managing dependencies.
Figure No. 1
8
i. Requirement Gathering and Analysis
a. Purpose: Identify and document project goals, scope, and functionalities to ensure all
stakeholders are aligned. This includes defining the key features like cost prediction,
real-time data updates, and personalized recommendations for attractions and activities.
b. Key Tools:
• Excel: Used for initial documentation of requirements, data schemas, and task
prioritization.
• Power BI: Helps visualize preliminary insights into the cost factors influencing
travel budgets, helping identify patterns in travel costs and user preferences.
a. Purpose: Gather relevant travel data from multiple sources, including information on
destinations, transportation modes, seasons, and popular attractions. Collecting real-
time data, where possible, for dynamic pricing and availability enhances the model’s
accuracy.
b. Key Sources:
• Flight and Hotel Booking Platforms (e.g., Goibibo): For transportation and
accommodation costs.
c. Key Tools:
a. Purpose: Prepare data for accurate modeling by handling missing values, encoding
categorical variables, normalizing numerical data, and removing outliers.
b. Process:
9
• Data Cleaning: Remove irrelevant columns and handle missing values using
techniques like mean or median imputation.
• Feature Encoding: Convert categorical variables to numerical ones using
OneHotEncoder.
• Normalization: Standardize numerical features using StandardScaler.
• Outlier Removal: Employ statistical methods to manage outliers, reducing data
noise.
c. Key Tools:
• Python libraries: pandas and NumPy for data cleaning; scikit-learn for feature
encoding, normalization, and outlier handling.
iv. System Design
a. Purpose: Establish a robust system architecture for seamless interaction between the
prediction model, recommendation engine, and user interface.
b. Design Components:
• Data Input Module: Collects user input on travel details like destination and
duration.
• Prediction Engine: Incorporates machine learning models for expense
forecasting.
• Recommendation System: Provides personalized activity suggestions based on
destination.
• Reporting and Visualization: Generates detailed reports and visualizes
predictions for easy understanding.
c. Tools Used:
• Flask: Acts as the backend framework, enabling interaction between the web
interface and the model.
v. Implementation
a. Purpose: Build and train machine learning models and integrate the models with the
web interface.
b. ML Models:
10
• Random Forest Regressor: Used to improve prediction accuracy by leveraging
ensemble methods.
d. Web Interface:
e. Backend API:
• Flask routes handle HTTP requests, allowing users to input data and receive
predictions.
vi. Testing
a. Purpose: Ensure model accuracy, validate the user interface, and verify that predictions
align with expectations across various scenarios.
b. Types of Testing:
• Model Evaluation: Use metrics like R², MAE, and RMSE to assess
prediction accuracy.
• System Testing: Validate integration between frontend and backend,
ensuring smooth data flow and accurate predictions.
• User Testing: Gather feedback from potential users to refine the web
interface and ensure usability.
c. Tools Used:
11
vii. Deployment and Maintenance
a. Purpose: Launch the system for public use and monitor its performance to ensure
continuous, reliable operation.
b. Deployment Process:
c. Maintenance Activities:
• Model Retraining: Update the model periodically with new data to maintain
accuracy.
• Bug Fixes and Updates: Regularly check for any issues, deploying patches
as necessary.
d. Tools Used:
12
Chapter 5: Methodology adopted, System
Implementation & Details of Hardware & Software
used System Maintenance &Evaluation
i. Methodology
a. Data Collection
• Sources:
▪ Tourism Boards' Websites: Official websites from various tourism boards are
used to gather data about destinations, average travel costs, and local
attractions. (e.g. MyHolidays)
▪ Flight and Hotel Booking Sites (e.g., Goibibo): Data from booking platforms
help gather real-time price trends, travel patterns, and accommodation costs.
• Process:
▪ Manual Collection: Data is collected by scraping these websites or manually
extracting information from available datasets (e.g., CSV, Excel).
▪ Data Quality Checks: Collected data undergoes initial quality checks for
completeness and accuracy before proceeding to the next stages of cleaning
and processing.
b. Data Cleaning and Preprocessing
• Tools:
▪ Python libraries such as pandas, NumPy, and scikit-learn are used for data
manipulation, transformation, and preprocessing tasks.
• Process:
▪ Removing Unnecessary Columns: Irrelevant columns, such as duplicates,
redundant information, or those that do not contribute to prediction (e.g.,
user IDs, unhelpful metadata), are removed.
▪ Handling Missing Values: Missing data is handled using imputation
strategies. Mean or Median Imputation is applied for numerical data, while
the most frequent category is used for categorical variables.
▪ Encoding Categorical Variables:
13
o One-Hot Encoding: Applied to variables such as destination or travel
style that do not have an inherent order.
o Label Encoding: Used when categorical variables have an ordinal
relationship (e.g., travel style and season with ordered categories).
▪ Data Validation Checks: Date and price ranges are validated to ensure inputs
are realistic and consistent (e.g., ensuring that travel dates fall within
reasonable bounds, and costs are within expected ranges).
• Feature Selection:
• Hyperparameter Tuning:
▪ GridSearchCV: Used to find the optimal hyperparameters by exhaustively
searching through a predefined set of hyperparameters.
14
▪ RandomizedSearchCV: Allows for faster tuning by sampling a random
subset of hyperparameters, particularly useful for models with many
hyperparameters.
• Ensemble Methods:
• Model Evaluation:
▪ Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and
R-Squared are used to evaluate the performance of the models and select
the best model based on these metrics.
d. Web Interface
• Tools and Technologies:
▪ HTML: For building the structure of the web pages, including forms for
input and output display.
▪ CSS: Used to style the web interface and ensure it is visually appealing
and user-friendly.
▪ pandas, numpy: These libraries are used for data processing tasks on the
backend, such as preprocessing user inputs for the prediction model.
• Components:
▪ Trip Planning Form: A user-friendly form with autocomplete features to
help users input their travel details such as destination, travel
style,season and mode of travel.
15
▪ Cost Prediction Widget: A tool integrated into the interface to predict the
travel budget based on user inputs.
The backend of the system is developed in Python, utilizing various libraries for data
manipulation, preprocessing, and machine learning model construction. The model
predicts travel budgets based on user inputs like destination, length of stay, trip type,
and mode of travel.
• Data Handling:
▪ pandas and NumPy are used to manage and process data. pandas handle
data frames and performs data loading, cleaning, and transformation,
while NumPy aids in numerical computations.
• Data Preprocessing Pipeline:
▪ Preprocessing is essential to handle missing values, encode categorical
variables, and standardize numerical inputs.
▪ A ColumnTransformer is used to apply different transformations to
different types of columns:
16
▪ StandardScaler: Standardizes numerical features (e.g., length of
stay, travel cost).
These transformations are combined in a Pipeline, ensuring that preprocessing steps are
consistently applied during model training and evaluation.
The frontend interface is built using a combination of HTML, CSS, and JavaScript,
providing an interactive platform where users can input travel details and view predicted
budgets.
• HTML:
HTML forms capture user inputs for destination, length of stay, trip type, and mode of
travel. Each input field is designed to ensure data collection consistency.
• CSS:
17
CSS styling is applied to make the interface visually appealing and user-friendly, with
clear form layout and styling that enhances usability.
• JavaScript:
JavaScript handles front-end interactivity, such as validating input fields and making
asynchronous requests to the backend for budget predictions without reloading the page.
Flask acts as the glue between the backend and frontend, handling requests and
responses:
• Routing:
Flask defines routes for the main prediction page and API endpoints that handle user input
and return predictions.
When a user submits travel information, Flask processes this data, applies the trained model
to predict the travel budget, and returns the result to the frontend.
• Response Formatting:
Flask structures the response to provide budget predictions to the frontend in a user-friendly
format, enabling seamless interaction.
To ensure that the application remains robust and efficient, ongoing development and
maintenance practices are followed.
• Development Cycle:
18
▪ The system is developed incrementally, with regular testing of new
features to ensure smooth functionality and minimal disruptions.
• Maintenance:
▪ Regular updates are applied to libraries and dependencies to avoid
compatibility issues.
▪ Monitoring and logging are implemented to track system usage and
identify any bottlenecks or failures, allowing for timely troubleshooting
and performance optimization.
▪ Model retraining is scheduled periodically to ensure that predictions
remain accurate as new data becomes available
▪ Python:
▪ Libraries Used:
▪ Flask for web framework to serve the model and create the web
interface.
19
Machine Learning Libraries:
▪ scikit-learn:
▪ Algorithms Used:
▪ pandas:
▪ NumPy:
▪ Excel:
20
▪ Purpose: Excel is used for data storage, especially for smaller
datasets or for manually curated datasets. It is also used as an
intermediate storage format for exchanging data with other
systems or stakeholders.
▪ Role: Excel files are used to store the raw data before processing
and analysis and for visualizing results when needed.
• Web Frameworks:
▪ Flask:
▪ Interactive Interface:
• Data Visualization:
21
with Pandas Data Frames. Seaborn includes themes and colour palettes
that make plots more visually appealing by default.
b. Hardware components
• Development Workstation:
▪ Processor: RYZEN 5 or higher
▪ RAM: 16GB
▪ Storage: 474GB SSD
▪ Operating System: Windows 11
22
Chapter 6: Detailed life cycle of the project
i. ERD and DFD of the project
• Data Flow: User provides details (destination, travel mode, Travel style).
• External Data Sources: Collects data from tourism boards, travel review
sites, and booking sites.
• Data Flow: Uses user input and processed data to predict trip costs.
• Sub-Processes:
d. Recommendation Engine:
• Sub-Processes:
23
▪ Generate Itinerary Suggestions: Creates itinerary suggestions based
on predicted costs and user preferences.
e. Display Results
• Data Flow: Shows the predicted budget and recommendations to the user.
Figure No. 2
24
Entity-Relationship Diagram (ERD) for Travel budget prediction:
a. User:
• Attributes: Name
• The User interacts with the web interface by providing input such as travel
preferences and trip details.
b. Web Interface:
• The Web Interface has a Home Page that collects data from the user. It has an
input field where users enter their travel details and preferences.
• The Web Interface takes this input and sends it to the Travel Budget
Prediction Model for processing.
• This entity represents the overall predictive system with three main components:
▪ Data Preprocessing: This component ensures that user inputs are clean,
standardized, and formatted for accurate prediction. It involves:
25
Relationships:
• This relationship represents the interaction between the User and the Web
Interface where users input travel-related data. This input includes factors like
destination, duration, length of stay, and travel preferences.
• The Web Interface forwards the user-provided input to the Travel Budget
Prediction Model for analysis. This interaction triggers the model’s data
pipeline, beginning with data preprocessing.
• After processing, the Travel Budget Prediction Model generates predictions and
recommendations as output, which is returned to the Web Interface for display
to the User.
Figure No. 3
26
ii. Input and Output Screen Design
a. Input Screen (Home Page)
Header:
• Instructions: Brief guidance like “Enter your trip details below to get an
estimated budget and personalized recommendations.”
• Form Fields:
Buttons:
• Submit: “Submit Trip details” button, which sends data to the prediction model
Design Style:
Header:
27
• Estimated Total Cost:
Additional Information:
• Recommendation Panel:
Design Style:
Additional Enhancements
Footer:
28
• Data Preprocessing:
• Categorical Encoding:
• Normalization:
• Outlier Removal:
▪ Identify and address outliers using statistical methods like the Interquartile
Range (IQR) method or Z-scores.
b. Model Building
• Baseline Model:
• Ensemble Model:
▪ Model Pipeline:
▪ Hyperparameter Tuning:
• Model Evaluation:
29
▪ Metrics:
c. Visualization
• Matplotlib: A versatile, low-level library offering full control over plot details,
ideal for customized visualizations. It supports a wide range of basic plots (e.g.,
line, bar, scatter, histograms) and works with both Numpy arrays and Pandas
DataFrames.
• Frontend Development:
• Backend Integration:
30
▪ User Input: Accept details such as destination, travel style, and number
of travellers via forms on the website.
e. Flask Integration
▪ Flask will handle the routes to facilitate the interaction between the
frontend (HTML/CSS/JavaScript) and the backend (Python model).
• Dynamic Updates:
▪ As users input their details (e.g., destination, travel dates, etc.), the
system will dynamically predict the estimated trip cost and suggest
personalized activities or attractions.
• Unit Testing:
▪ Test Data Preprocessing: Validate that the data preprocessing steps (like
handling missing values, encoding, and scaling) are applied correctly.
31
▪ Test Machine Learning Model: Ensure that the model’s prediction
function returns outputs in the expected format (e.g., numeric, valid
range).
• Integration Testing:
▪ End-to-End Workflow Testing: Test the flow from data input to cost
prediction output, ensuring all components (model, preprocessing
pipeline, recommendation engine) work together.
▪ Example Test: Ensure that when the user inputs a trip’s details,
the model generates a cost prediction and corresponding
recommendations.
b. System Testing
• End-to-End Testing:
▪ Test User Input: Verify that the user can input the trip details into the
system without issues.
▪ Example Test: Ensure that the form for entering trip details
accepts valid inputs, displays error messages for invalid inputs,
and passes the data to the backend.
▪ Test Prediction Output: Ensure that the output displayed to the user is
meaningful and accurate.
▪ Example Test: After inputting the trip data, ensure the predicted
cost is displayed correctly with relevant activity
recommendations.
32
▪ Example Test: Ask users to input different trip scenarios and
check if the predicted costs and activity recommendations make
sense.
c. Performance Testing
• Load Testing: Evaluate how the system performs under heavy user load
(multiple users inputting data simultaneously).
• Stress Testing: Test the system's behavior when the input data exceeds normal
usage (e.g., very large datasets or extreme trip scenarios).
d. Model Evaluation
• Cross-Validation:
▪ Test how the model performs across different subsets of the data.
• Train-Test Split:
▪ Split the dataset into training (e.g., 80%) and testing (e.g., 20%) sets to
evaluate the model's generalization ability.
▪ Test Metrics: Calculate R², MAE, and RMSE on the test set to evaluate
model accuracy.
• Hyperparameter Tuning:
e. Post-Testing Activities
▪ Work on fixes and re-test the areas where issues were found.
• Retesting:
33
▪ After implementing fixes or changes, retest the relevant parts of the
system to ensure the fixes work and no new issues have been introduced.
• Regression Testing:
▪ Test the system after each modification or update to ensure that the
changes haven’t negatively impacted existing functionality.
The objective of testing is to ensure that the Travel Budget Prediction System works
as expected. Specifically, we aim to verify that the system:
b. Scope of Testing
• Functional Testing: Verifying the core features of the application, such as cost
prediction, recommendation engine, and user inputs.
• Performance Testing: Load and stress tests to ensure system performance under
high traffic.
• Model Evaluation: Validate that the machine learning models predict costs
accurately.
34
• User Acceptance Testing (UAT): Gathered feedback from users to ensure the
system meets their expectations.
35
Chapter 7: Coding and Screenshots of the project
Front-End (HTML,CSS,javascript)
<html>
<head>
<title>Travel Budget Prediction</title>
</head>
<body>
<div class="navbar">
<img src="{{ url_for('static', filename='LOGO1.png') }}" alt="Travel Budget Prediction
Logo" class="logo">
<div class="nav-links">
<a href="#home">Home</a>
<a href="/about">About</a>
<a href="/contact">Contact</a>
</div>
</div>
<div class="container">
<header>
<h1>Plan Your Trip</h1>
<p>Plan your dream trip's expence with ease!</p><br><br>
<p>Enter your trip details below to get an estimated budget and personalized
recommendations.</p><br>
</header>
<form action="{{ url_for('form')}}" method="post">
<label for="destination">Destination:</label>
<select id="destination" name="Destination" required>
<option value="">Select Destination</option>
<option value="Kashmir">Kashmir</option>
<option value="Rajasthan">Rajasthan</option>
<option value="Sikkim">Sikkim</option>
<option value="Kerala">Kerala</option>
36
<option value="Manali">Manali</option>
<option value="Coorg">Coorg</option>
<option value="Leh Ladakh">Leh Ladakh</option>
<option value="Varanasi">Varanasi</option>
<option value="Kutch">Kutch</option>
<option value="Goa">Goa</option>
<option value="Tawang">Tawang</option>
<option value="Andaman and Nicobar Islands">
Andaman and Nicobar Islands
</option>
<option value="Rishikesh">Rishikesh</option>
<option value="Khajuraho Temples">Khajuraho Temples</option>
<option value="Madurai">Madurai</option>
<option value="Auli">Auli</option>
<option value="Pondicherry">Pondicherry</option>
<option value="Mcleodganj">Mcleodganj</option>
</select>
38
Figure No. 4
Figure No. 5
39
Figure No. 6
Figure No. 7
40
Back-End (Flask, Python)
import pickle
import joblib
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
app=Flask(__name__)
model=joblib.load(open('model.pkl','rb'))
@app.route('/')
def hello_world():
return render_template('web.html')
@app.route('/web2')
def web2():
return render_template('web2.html')
@app.route('/about')
def about():
return render_template('about.html')
@app.route('/contact')
def contact():
return render_template('contact.html')
def form():
41
if request.method == 'POST':
Destination = request.form.get('Destination')
season = request.form.get('Season')
data=pd.DataFrame({
'Destination':[Destination],
'Length of stay':[length_of_stay],
'Season':[season],
'Travel mode':[Travel_mode],
'Travel style':[Travel_style]
})
prediction=model.predict(data)
return render_template('web2.html')
if __name__=='__main__':
app.run(debug = True)
import pandas as pd
import numpy as np
42
from sklearn.pipeline import make_pipeline
import joblib
import pickle
import warnings
warnings.filterwarnings("ignore")
Data.head()
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=.30,random_state=42)
numerical_pipeline=Pipeline(steps=[
("imputational mean",SimpleImputer(missing_values=np.nan,strategy="mean")),
("scaler",StandardScaler())
])
categorical_value=Pipeline(steps=[
("imputational constant",SimpleImputer(fill_value="missing",strategy="constant")),
("onehot",OneHotEncoder(handle_unknown="ignore"))
])
43
preprocessor=ColumnTransformer([("categorical",categorical_value,['Destination','Season','Tr
avel mode','Travel style']),
("numerical",numerical_pipeline,['Length of stay'])
])
pipe=Pipeline([
("preprocessor",preprocessor),("regressor",RandomForestRegressor())
])
pipe.fit(X_train,y_train)
predict=pipe.predict(X_test)
import warnings
warnings.filterwarnings("ignore")
param={
"regressor__n_estimators":[10,50,100,200,300,400],
"regressor__max_features":["auto","sqrt","log2"],
"regressor__max_depth":[None,5,10,20,30,40]
grid_search=GridSearchCV(pipe,param_grid=param,n_jobs=-1,cv=5)
grid_search.fit(X_train,y_train)
grid_search.best_params_
pipe=Pipeline([
("preprocessor",preprocessor),("regressor",RandomForestRegressor(max_depth=30,max_fe
atures='log2',n_estimators=100))
])
pipe.fit(X_train,y_train)
44
Figure No. 8
predict1=pipe.predict(X_test)
score=r2_score(y_test,predict1)
score
joblib.dump(pipe,'model.pkl')
model=joblib.load(open('model.pkl','rb'))
print(type(model))
45
Chapter 8: Conclusion and Future Scope
Conclusion
The Travel Budget Prediction System successfully meets its objectives by providing accurate
travel cost predictions and personalized recommendations for activities and destinations. The
system utilizes machine learning algorithms, such as Random Forest Regressor and Gradient
Boosting, to predict travel expenses based on user inputs like destination, number of travellers,
and travel duration. Through comprehensive data preprocessing, model building, and
evaluation, the system consistently delivers predictions that are within an acceptable error
margin.
The integration of a recommendation system further enhances the travel planning experience,
offering users suggestions for attractions and activities tailored to their preferences. The
system's web interface, built using Flask, provides an intuitive user experience for easy
interaction with the predictive model. Additionally, the system has demonstrated scalability,
performing well under moderate user load conditions.
Overall, the Travel Budget Prediction System serves as an effective tool for helping users plan
their travel expenses and activities, improving financial decision-making and enhancing the
overall travel experience. The system has been rigorously tested and found to be reliable,
accurate, and user-friendly, ready for deployment in real-world scenarios.
Future Scope
While the Travel Budget Prediction System is functional and provides accurate predictions
and recommendations, there are several areas where the system could be enhanced in the future:
• Future versions of the system could integrate real-time data feeds from booking
platforms (e.g., flight and hotel bookings) to dynamically update travel costs
based on current prices. This would allow the system to reflect more accurate
predictions, especially as prices fluctuate due to demand, seasonal factors, or
last-minute deals.
46
• The system could incorporate a broader range of transportation options,
including trains, buses, and ride-sharing services. Predicting costs across
multiple modes of transport would enhance the system’s applicability to a wider
audience.
• By incorporating user preferences, travel history, and social media data, the
system could offer even more personalized recommendations for both
destinations and activities. For example, analyzing past trips could allow the
system to suggest new locations based on similar preferences.
• A mobile app version of the system could be developed, providing users with
on-the-go access to travel predictions, cost estimates, and recommendations.
This would increase user engagement and make the system more accessible.
• Incorporating more data sources, such as local weather data, festivals, and events
at the destination, could further improve predictions, as users’ travel costs may
vary based on these factors.
• Implementing voice recognition technology for input could make the system
more interactive and accessible for users, especially for those with disabilities
or those who prefer hands-free interaction.
47
• By analysing long-term travel data, the system could provide predictive insights
into future trends, such as popular travel destinations, emerging budget travel
options, or changes in travel pricing patterns.
48
Chapter 9: References
i. Websites and Online Resources:
49