PPR1 Abhay (1) .11 (1) .2
PPR1 Abhay (1) .11 (1) .2
PPR1 Abhay (1) .11 (1) .2
Progress
Report on
FUEL EFFICIENCY PREDICTION (ML)
Submitted in partial fulfillment for
award of
BACHELOR OF TECHNOLOGY
Degree In
COMPUTER SCIENCE &
ENGINEERING
2023-24
Under the Guidance of: Submitted By:
Mr. Vineet Srivastava Ahmed Ali (2000330100028)
Abhay Pratap Singh
(2000330100007)
Abhishek Kumar Yadav
(2000330100017)
Anup Kumar Gupta
(2000330100174)
6.
SUPERVISOR
TABLE OF CONTENTS
LIST OF TABLES
CHAPTER NO. TABLE NO. TITLE PAGE NO.
LIST OF FIGURES
CHAPTER NO. TITLE PAGE NO.
1 FIGURE 1.1 ML Model 04
1 FIGURE 1.2 RANDOM FOREST 06
1 FIGURE 1.3 SVM 07
1 FIGURE 1.4 SVM GRAPH 08
1 FIGURE 1.5 NLP 09
4 FIGURE 4.1 WATERFALL MODEL 14
4 FIGURE 4.2 RAD MODEL 15
4 FIGURE 4.3 SPIRAL MODEL 16
4 FIGURE 4.4 INCREMENTAL MODEL 16
5 FIGURE 5.1 APPLICATION ARCHITECTURE 19
5 FIGURE 5.2 ER DIAGRAM 21
5 FIGURE 5.3 USE CASE DIAGRAM 22
5 FIGURE 5.4 CLASS DIAGRAM 23
CHAPTER 1
INTRODUCTION
Fuel efficiency prediction refers to the process of estimating or forecasting the fuel efficiency of a
vehicle based on various factors and variables. It plays a crucial role in optimizing fuel consumption,
reducing environmental impact, and improving overall vehicle performance. As the automotive
industry continues to advance, integrating predictive models for fuel efficiency has become
increasingly important.
Predicting fuel efficiency involves analyzing multiple parameters that influence how efficiently a
vehicle utilizes fuel. Some of the key factors include engine specifications, vehicle weight,
aerodynamics, driving conditions, and maintenance status. Advanced technologies such as machine
learning and data analytics have been instrumental in developing accurate and reliable models for
predicting fuel efficiency.
The goal of fuel efficiency prediction is to assist drivers, fleet managers, and automotive engineers in
making informed decisions to enhance fuel economy. By understanding the potential impact of
different variables on fuel efficiency, it becomes possible to optimize driving behaviors, maintenance
schedules, and even vehicle design.
The fuel efficiency of heavy-duty trucks can be beneficial not only for the automotive and
transportation industry but also for the country’s economy and global environment
Several methods are employed in fuel efficiency prediction, ranging from traditional statistical
models to more sophisticated artificial intelligence algorithms. Real-time monitoring systems,
integrated with sensors and GPS technology, can provide instantaneous feedback to drivers, helping
them adopt fuel-efficient driving habits.
In the context of environmental sustainability and rising fuel costs, fuel efficiency prediction
contributes to a more eco-friendly and cost-effective transportation system. Governments,
manufacturers, and consumers are increasingly recognizing the importance of such predictive models
in shaping the future of mobility.
In conclusion, fuel efficiency prediction is a valuable tool in the modern automotive landscape,
offering insights that contribute to a more sustainable and economical use of fuel resources. As
technology continues to advance, we can expect further refinements and innovations in the field of
fuel efficiency prediction, ultimately benefitting both individuals and the broader community.
The problem statement for fuel efficiency prediction typically revolves around developing
accurate models that can forecast a vehicle's fuel efficiency under various conditions.
With the ever-increasing demand for sustainable and cost-effective transportation, predicting and
optimizing fuel efficiency in vehicles has become a critical concern. The need to reduce
greenhouse gas emissions, combat rising fuel costs, and enhance overall energy sustainability
underscores the importance of developing robust models for fuel efficiency prediction.
Vehicles operate in diverse conditions influenced by factors such as driving patterns, road
conditions, weather, and maintenance status. Creating a model that can consider the complex
interplay of these variables is a key challenge.
Fuel efficiency is not a static parameter; it varies with time and usage. Developing a model that
adapts to real-time data and dynamic driving scenarios is crucial for providing accurate
predictions.
Obtaining comprehensive and high-quality datasets encompassing various driving conditions and
vehicle specifications can be challenging. Ensuring the model's reliability requires addressing
data limitations and potential biases.
Incorporating emerging technologies such as machine learning, data analytics, and real-time
monitoring systems poses technical challenges. Balancing model complexity with practical
implementation is vital for widespread adoption.
Driver behavior significantly influences fuel efficiency. Predictive models must account for the
human factor, understanding how drivers' decisions impact fuel consumption and incorporating
this aspect into the prediction process.
The objective of this work is to study fuel consumption and maintenance and repairs which are
the two factors that influence the total cost of ownership in heavy-duty vehicles using machine
learning. Machine learning coined under artificial intelligence uses algorithms and neural
network models to progressively improve performance. These models apply historical data to
understand the patterns in heavy-duty vehicles (HDVs) to be able to predict new data for which
the classification or the output is unknown without using on-road testing or heavy equipment.
The main concepts covered in this dissertation are developing/using machine learning algorithms
to model real-world on-road heavy-duty vehicle data.
The transportation sector is one of the major contributors to greenhouse gas emissions con-
tributing about 27% (shown in Figure 1.1) of overall emissions in the United States. Among the
transportation sector emissions, medium- and heavy-duty vehicles produce 26% as per 2020
reports even though they only contribute 4% of vehicles on road [1]. The increasing greenhouse
gas emissions result in global warming adversely impacting human health, the environment, and
the economy.
• Scope
The scope of fuel efficiency prediction is continually expanding as technology advances and
industries strive to achieve sustainability goals, reduce costs, and minimize environmental
impact. Advances in data science, machine learning, and sensor technologies contribute to more
accurate and sophisticated fuel efficiency prediction models.
The scope of fuel efficiency prediction is broad and encompasses various aspects related to the
optimization and forecasting of fuel consumption in different systems.
Predicting fuel efficiency is crucial in the design and engineering of vehicles. Manufacturers aim
to develop cars, trucks, and other vehicles that are not only powerful but also fuel-efficient.
Predicting fuel efficiency is essential for companies managing vehicle fleets. Optimizing routes,
maintenance schedules, and driver behavior can contribute to fuel savings.
Predicting fuel consumption is critical in the transportation of goods and people. Airlines and
shipping companies use fuel efficiency models to plan flight paths, optimize cargo loads, and
enhance overall operational efficiency.
Predicting fuel efficiency is vital for public transportation systems, such as buses and trains, to
optimize routes and schedules, reducing energy consumption and costs.
Fuel efficiency prediction is closely linked to the reduction of greenhouse gas emissions.
Governments and organizations use these predictions to implement policies and technologies
aimed at lowering environmental impact.
Advanced analytics and machine learning models are increasingly used to predict fuel efficiency
based on historical data, real-time information, and various influencing factors.
• Existing Software
Machine Learning is the field of study that gives computers the capability to learn without being
explicitly programmed. ML is one of the most exciting technologies that one would have ever
come across. As it is evident from the name, it gives the computer that makes it more similar to
humans: The ability to learn. Machine learning is actively being used today, perhaps in many
more places than one would expect.
Figure 1.1
These unsupervised learning approaches find applications across diverse domains. Anomaly
detection becomes achievable by identifying unusual patterns or outliers within datasets,
enhancing the capacity to detect irregularities and potential issues. Data compression, a process
of reducing dimensionality while retaining essential information, proves instrumental in
managing and analyzing large datasets efficiently. Market Basket Analysis leverages
unsupervised learning to uncover associations and patterns in customer purchasing behavior,
facilitating targeted marketing strategies. Moreover, feature learning, the automatic discovery of
useful features from raw data, empowers algorithms to uncover intrinsic characteristics without
explicit labelling.
Unlike supervised learning, where objectives and metrics are predefined, assessing the
effectiveness of unsupervised learning is often subjective. The success of these algorithms is
gauged by the relevance of the patterns they unveil and their alignment with specific problem
domains or business goals.
Real-world applications of unsupervised learning are diverse, ranging from customer
segmentation in marketing to image and speech recognition, and even preprocessing data for
subsequent supervised learning tasks. In these applications, unsupervised learning serves as a
powerful tool for uncovering hidden structures and relationships within data, driving meaningful
insights and informed decision-making.
The Naïve Bayesian classifier works as follows: Suppose that there exist a set of training data, D,
in which each tuple is represented by an n-dimensional feature vector, X=x 1, x 2, …, x n,
indicating n measurements made on the tuple from n attributes or features. Assume that there are
m classes, C 1, C 2, ..., C m. Given a tuple X, the classifier will predict that X belongs to C i if
and only if: P (C i |X)>P (C j |X).
The random forest classifier was chosen due to its superior performance over a single decision
tree with respect to accuracy. It is essentially an ensemble method based on bagging. The
classifier works as follows: Given D, the classifier firstly creates k bootstrap samples of D, with
each of the samples denoting as Di. A Di has the same number of tuples as D that are
samples with replacement from D. By sampling with replacement, it means that some of the
original tuples of D may not be included in Di, whereas others may occur more than once. The
classifier then constructs a decision tree based on each Di. As a result, a “forest" that consists of k
decision trees is formed.
To classify an unknown tuple, X, each tree returns its class prediction counting as one vote. The
final decision of X’s class is assigned to the one that has the most votes. The decision tree
algorithm implemented in scikit-learn is CART (Classification and Regression Trees). CART
uses Gini index for its tree induction.
Support vector machine works comparably well when there is an understandable margin of
dissociation between classes. It is more productive in high dimensional spaces. It is effective in
instances where the number of dimensions is larger than the number of specimens.
Figure – 1.3 SVM
• Logistic Regression:
Logistic regression predicts the probability of an outcome that can only have two values (i.e. a
dichotomy). The prediction is based on the use of one or several predictors (numerical and
categorical). A linear regression is not appropriate for predicting the value of a binary variable
for two reasons:
• A linear regression will predict values outside the acceptable range (e.g. predicting
probabilities outside the range 0 to 1)
• Since the dichotomous experiments can only have one of two possible values for
each experiment, the residuals will not be normally distributed about the predicted line.
Natural language processing (NLP) is a subfield of Artificial Intelligence (AI). This is a widely
used technology for personal assistants that are used in various business fields/areas. This
technology works on the speech provided by the user breaks it down for proper understanding
and processes it accordingly. This is a very recent and effective approach due to which it has a
really high demand in today’s market. Natural Language Processing is an upcoming field where
already many transitions such as compatibility with smart devices, and interactive talks with a
human have been made possible. Knowledge representation, logical reasoning, and constraint
satisfaction were the emphasis of AI applications in NLP. Here first it was applied to semantics
and later to grammar.
NLP is used in a wide range of applications, including machine translation, sentiment analysis,
speech recognition, chatbots, and text classification. Some common techniques used in NLP
include:
Tokenization: the process of breaking text into individual words or phrases.
Part-of-speech tagging: the process of labelling each word in a sentence with its grammatical part
of speech.
Named entity recognition: the process of identifying and categorizing named entities, such as
people, places, and organizations, in text.
Sentiment analysis: the process of determining the sentiment of a piece of text, such as whether it
is positive, negative, or neutral.
Machine translation: the process of automatically translating text from one language to another.
Text classification: the process of categorizing text into predefined categories or topics.
CHAPTER – 2
BACKGROUND AND RELATED WORK
• Recent Papers On Fuel efficiency prediction
TABLE 2.1
The table presents a comprehensive overview of various research papers, each entry providing
crucial information about the author, publication year, and a succinct description of the
methodology employed in the respective studies. The first column of the table systematically
lists serial number. The second column of the table systematically lists the authors' names,
showcasing the diverse range of scholars contributing to the field. The third column features the
publication year, offering a chronological perspective on the temporal evolution of research
within the given subject area. This temporal dimension aids in understanding the historical
context and the progression of ideas over time. The fourth column of the table is dedicated to
concise descriptions of the methodologies employed in each research paper. This includes an
elucidation of the research design, data collection methods, and analytical tools used, affording
readers a nuanced understanding of the empirical approaches adopted by different authors. Such
a detailed table not only facilitates a quick scan of essential bibliographic details but also serves
as a valuable resource for researchers and academics seeking to compare and contrast the
methodological nuances across a spectrum of scholarly works in the field.
The detailed table allows for a quick overview of key bibliographic details and serves as a
valuable resource for researchers and academics aiming to compare methodological nuances in
various scholarly works within the field.
CHAPTER 3
HARDWARE AND SOFTWARE REQUIREMENTS
• Hardware Requirements
• Core i5 or above processor
• 8+ GB RAM
• At least 10 GB of Hard disk free space
• Software Required:
• Python 3: Python is an interpreted, object-oriented, high-level programming language
with dynamic semantics. Its high-level built in data structures, combined with dynamic typing
and dynamic binding, make it very attractive for Rapid Application Development, as well as for
use as a scripting or glue language to connect existing components together. Python's simple,
easy to learn syntax emphasizes readability and therefore reduces the cost of program
maintenance.
• NLTK Toolkit: The Natural Language Toolkit (NLTK) is a platform used for building
Python programs that work with human language data for applying in statistical natural language
processing (NLP). It contains text processing libraries for tokenization, parsing, classification,
stemming, tagging and semantic reasoning Tensor flow & IBM caffe
CHAPTER 4
SDLC METHODOLOGIES
The agile methodology was used. This is because the agile methodology is more adaptable and
can accommodate changes more easily. It is also more user-centric, which is important in this
case because the system is being developed for the users. Agile is an iterative approach to project
management and software development that enables teams to deliver value to customers faster
and with fewer headaches. An agile team delivers work in small but consumable increments
rather than betting everything on a "big bang" launch. Continuous evaluation of requirements,
plans, and results provides teams with a natural mechanism for responding to change quickly.
The following SDLC models are proposed:
• SDLC Models
• Waterfall HYPERLINK "https://www.javatpoint.com/software-
engineering-waterfall-model" HYPERLINK "https://www.javatpoint.com/software-
engineering-waterfall-model" HYPERLINK "https://www.javatpoint.com/software-
engineering-waterfall-model" HYPERLINK
"https://www.javatpoint.com/software-engineering-waterfall-model"
HYPERLINK "https://www.javatpoint.com/software-engineering-waterfall-
model" HYPERLINK "https://www.javatpoint.com/software-engineering-
waterfall-model"Model
The waterfall is a widely used SDLC model. The waterfall model is a continuous software
development model in which development is seen as flowing steadily downwards (like a
waterfall) through the steps of requirements analysis, design, implementation, testing
(validation), integration, and maintenance. To begin, some certification techniques must be used
at the end of each step to identify the end of one phase and the start of the next. Some
verification and validation usually do this by ensuring that the stage's output is consistent with
its input (which is the output of the previous step) and that the stage's output is consistent with
the overall requirements of the system.
The Rapid Application Development (RAD) process is an adaptation of the waterfall model that
aims to develop software in a short period of time. The RAD model is based on the idea that by
using focus groups to gather system requirements, a better system can be developed in less time.
• Business Modeling
• Data Modeling
• Process Modeling
• Application Generation
• Testing and Turnover
The spiral model is a process model that is risk-driven. This SDLC model assists the group in
implementing elements of one or more process models such as waterfall, incremental, waterfall,
and so on. The spiral technique is a hybrid of rapid prototyping and concurrent design and
development. Each spiral cycle begins with the identification of the cycle's objectives, the
various alternatives for achieving the goals, and the constraints that exist. This is the cycle's first
quadrant (upper-left quadrant).
The cycle then proceeds to evaluate these various alternatives in light of the objectives and
constraints. The focus of evaluation in this step is on the project's risk perception.
The incremental model does not stand alone. It must be a series of waterfall cycles. At the start
of the project, the requirements are divided into groups. The SDLC model is used to develop
software for each group. The SDLC process is repeated, with each release introducing new
features until all requirements are met. Each cycle in this method serves as the maintenance
phase for the previous software release. The incremental model has been modified to allow
development cycles to overlap. The following cycle may begin before the previous cycle is
completed.
CHAPTER 5
APPLICATION ARCHITECTURE
The application and architecture of fuel efficiency prediction involve the use of advanced
technologies, data analytics, and modeling techniques to optimize fuel consumption in various
domains.
Predicting fuel efficiency in automobiles, trucks, and other vehicles. Features Vehicle speed,
engine efficiency, driving patterns, road conditions, and weather .Benefits Optimize fuel
consumption, reduce emissions, and enhance overall vehicle performance.
Sources Vehicle sensors, GPS data, weather data, historical performance data, telematics
devices .Integration Collect and aggregate data from diverse sources for comprehensive
analysis.:
Cleaning Remove outliers and handle missing or inaccurate data. Normalization Standardize
data to ensure consistency and comparability
Implementing an effective fuel efficiency prediction system involves a multidisciplinary
approach, combining expertise in data science, domain knowledge, and technology integration.
The architecture should be flexible to accommodate different use cases and adapt to evolving
conditions in the operational environment. Continuous improvement through feedback and
adaptation is essential for maintaining the accuracy and effectiveness of fuel efficiency
prediction models.
• Empirical Models: These models are based on observed data and relationships between
input features and fuel efficiency.
• Statistical Models: Statistical techniques, including regression analysis, are employed
to identify correlations between input variables and fuel efficiency.
• Hybrid Model: Combines empirical, statistical, and physics-based approaches to
leverage the strengths of each model type. screen, and camera quality then aspect based is
used.
• Dynamic or Real-Time Model: Models that continuously update predictions based on
real-time data, allowing for dynamic adjustments. This is highly challenging and
comparatively difficult.
Some Applications of Fuel Efficiency Prediction as follows.
• Automotive Industry
• Fleet Management
• Aviation and Aerospace
• Shipping and Maritime Industry
• Public Transportation
• Industrial Processes
• Power Generation
• Phases of project
T h e p r o c e s s o f f u e l e ff i c i e n c y p r e d i c t i o n i n v o l v e s s e v e r a l p h a s e s , f r o m d a t a
collection to model deployment.
Phase 1: Planning Phase: Data collection and Data Preprocessing.
Gather relevant data that contains opinions or sentiments. This data can be collected from various
sources such as vehicles ,sensors, weather station or any other text-based content in our case we
used dataset of Amazon review from Kaggle.com.
Clean and prepare the collected data for analysis. Normalize or standardize data for consistency.
Handle missing or erroneous data. Convert data into a suitable format for analysis.
Phase 2: Development Phase: Exploratory Data Analysis and feature extraction.
Identify and select relevant features (input variables) that influence fuel efficiency. Analyze
correlations between variables. Transform and create new features as needed.
Explore the data to gain insights and understand patterns. Visualize data distributions and
relationships. Identify outliers or anomalies.
Phase 3: Model Development.
The third phase will be use of machine learning or natural language processing techniques to
classify the fuel efficiency prediction. Common approaches include supervised learning with
labeled datasets, unsupervised learning, or deep learning methods such as recurrent neural
networks (RNNs) and transformers.
Phase 4: Model training and testing.
Train models on historical data. The model learns to identify patterns and associations between
features and labels during this phase.
Evaluate the performance of the trained model on validation data to ensure that it generalizes
well to new, unseen data. Fine-tune the model as needed and then test its performance on a
separate test dataset.
Phase 5: Result analysis and feedback.
Analyze the results obtained from the fuel efficiency prediction. This may include generating
summary vehicles, sensors or other forms of reporting to gain insights into the eifficent
distribution within the analyzed data.
Collect feedback on the performance of the fuel efficiency prediction. If necessary, iterate on the
model, retrain it with additional data, or adjust parameters to improve its accuracy and
effectiveness.
5.2 ER DIAGRAM
Creating an Entity-Relationship (ER) diagram for fuel efficiency involves identifying the main
entities, their attributes, and the relationships between them. However, it's important to note that
fuel efficiency prediction is more of a process or a set of techniques rather than a traditional
database scenario. Therefore, the ER diagram for fuel efficiency prediction might be more
conceptual rather than directly mapping to a database schema.
CHAPTER 6
REFERENCES
• H. Wang, “Energy consumption in transport: an assessment of changing trend,
influencing factors and consumption forecast,” Journal of Chongqing University of
Technology (Social Science), vol. 7, 2017.
View at: Google Scholar
• J. N. Barkenbus, “Eco-driving: an overlooked climate change initiative,” Energy Policy,
vol. 38, no. 2, pp. 762–769, 2010.
View at: Publisher Site | Google Scholar
• T. Hiraoka, Y. Terakado, S. Matsumoto, and S. Yamabe, “Quantitative evaluation of eco-
driving on fuel consumption based on driving simulator experiments,” in Proceedings of
the 16th ITS World Congress and Exhibition on Intelligent Transport Systems and
Services, Stockholm, Sweden, September 2009.
View at: Google Scholar
• K. Ahn and H. Rakha, “The effects of route choice decisions on vehicle energy
consumption and emissions,” Transportation Research Part D: Transport and
Environment, vol. 13, no. 3, pp. 151–167, 2008.
View at: Publisher Site | Google Scholar
• K. Hu, J. Wu, and M. Liu, “Modelling of EVs energy consumption from perspective of
field test data and driving style questionnaires,” Journal of System Simulation, vol. 30,
no. 11, pp. 83–91, 2018.
View at: Google Scholar
• Z. Xu, T. Wei, S. Easa, X. Zhao, and X. Qu, “Modeling relationship between truck fuel
consumption and driving behavior using data from internet of vehicles,” Computer-Aided
Civil and Infrastructure Engineering, vol. 33, no. 3, pp. 209–219, 2018.
View at: Publisher Site | Google Scholar
• X.-h. Zhao, Y. Yao, Y.-p. Wu, C. Chen, and J. Rong, “Prediction model of driving energy
consumption based on PCA and BP network,” Journal of Transportation Systems
Engineering and Information Technology, vol. 5, pp. 185–191, 2016.
View at: Google Scholar
• D. A. Johnson and M. M. Trivedi, “Driving style recognition using a smartphone as a
sensor platform,” in Proceedings of the 2011 14th International IEEE Conference on
Intelligent Transportation Systems (ITSC), pp. 1609–1615, Toronto, Canada, October
2011.
View at: Publisher Site | Google Scholar
• G. Guido, A. Vitale, V. Astarita, F. Saccomanno, V. P. Giofré, and V. Gallelli, “Estimation
of safety performance measures from smartphone sensors,” Procedia—Social and
Behavioral Sciences, vol. 54, pp. 1095–1103, 2012.
View at: Publisher Site | Google Scholar
• W. J. Zhang, S. X. Yu, Y. F. Peng, Z. J. Cheng, and C. Wang, “Driving habits analysis on
vehicle data using error back-propagation neural network algorithm,” in Computing,
Control, Information and Education Engineering, vol. 55, CRC Press, Guilin, China,
2015.
View at: Google Scholar
CHAPTER 7
PROJECT MODULES DESIGN
1. Data collection Modu:
Gather relevant data regarding vehicles, including engine specifications, weight,
aerodynamics, fuel type, etc.
Cleanse the data by handling missing values, outliers, and inconsistencies.
Convert and format data into a suitable structure for analysis.
2. Feature Engineering:
Identify key features that significantly influence fuel efficiency (engine size, vehicle
weight, horsepower, aerodynamics, etc.).
Create new features through transformations, scaling, or combining existing ones.
Perform dimensionality reduction techniques if required.
dhdudui