Analysis of Mine Haul Truck Fuel Consumption Report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

University of the Witwatersrand,

Johannesburg
School of Mining Engineering
Course: Digital Technologis and Mine Data Analytics
Analysis of Mine Haul Truck Fuel consumption
Authors:
Vangile Thabethe
Kamogelo Oliver
Tukiso Tsomakae
Patience Purazi
DECLARATION:

We declare that this report is our own, unaided work. We have read the University Policy
on Plagiarism and hereby confirm that there is no plagiarism in this report. We also
confirm that there is no copying nor is there any copyright infringement. We willingly
submit to any investigation in this regard by the School of Mining Engineering and we
undertake to abide bythe decision of any such investigation.

Student Name Section Student Signature


Number
Vangile Thabethe Abstract, Results and 2583406 V.T
Analysis
Tukiso Tsomakae Literature Review, 2332715 T.T
Results and Analysis
Patience Purazi Introduction and 2585216 P.P
Methodology
Kamogelo Oliver Results and Analysis, and 2427104 K.O
Conclusion and
Recommendation

1
Abstract
Analysis of Mine Haul Truck Fuel Consumption
This report encompassed multitudinous sections dedicated to the delineation, the detection and
elimination of discrepancies, analysis, and implementation of the multi-linear regression model
to the collected data from a hypothetical company, ABC. The applied procedure to conduct
simple multi-linear regression required the utility of a data science platform, Anaconda which
contains various distributions of python. Various factors, including Payload, queuing time
(cycle times), engine power, rolling resistance and grade resistance were employed to evaluate
their impact on fuel consumption in Mine haul trucks. The assumption of linearity,
independence of observation, homoscedasticity and normality were gauged (using descriptive
statistics of the attained quantitative data) to validate the suitability of the model. The model
revealed that only two predictor factors possessed no concern pertaining to multi-collinearity,
the factors being payload and queuing time. The assessment of those factors indicated how
truck choice can influence fuel consumption and increase operational costs. The report further
provides visualizations to aid analysts with data exploration tools, model selection and
interpretability. The necessity of the study is embedded on the simplification of complex
problems within the operational environment to alleviate the fuel costs incurred by company
ABC.

2
Acknowledgements
University of the Witwatersrand,
School of Mining Engineering,
Course indoctrinators: Mr M. Mabala, Ms M.Madahana

3
Table of Contents
1. Introduction............................................................................................................................... 6
2. Literature review ....................................................................................................................... 6
2.1. Case Study 1: Application of Machine learning technology to predict Haul truck fuel
consumption in Open Pit Mines ................................................................................................ 6
2.2. Case Study 2: Energy efficiency improvement in surface Mining.............................................. 6
2.3. Case Study 3: Fuel optimization in Mining trucks using machine learning ............................... 7
2.4. Case Study 4: Determinants of fuel consumption in Mining trucks .......................................... 7
3. Methodology ............................................................................................................................. 8
3.1. Applied Machine Learning models and parameters .................................................................. 8
3.2. Data processing procedures ...................................................................................................... 8
4. Results and Analysis .................................................................................................................. 9
4.1. Data cleaning ............................................................................................................................. 9
4.2. Exploratory data analysis ......................................................................................................... 10
4.2.1. Integrated Exploratory data analysis for all truck types .................................................. 10
4.3. Multiple linear regression ........................................................................................................ 14
4.3.1. Dummy variable encoding for categorical data ................................................................ 14
4.3.2. Checking for the assumptions if linearity and non-multicollinearity ................................ 14
4.3.3. Fitting Multi-linear regression model................................................................................ 15
4.3.4. Evaluation and validation of model................................................................................... 17
Recommendations and Conclusions .................................................................................................... 18
References ............................................................................................................................................ 20

4
List of tables
Table 1: Analysis of measures of central tendency and dispersion ...................................................... 10
Table 2: Measures of central tendency of the different truck types ..................................................... 10
Table 3: Measures of dispersion of the different truck types ............................................................... 11
Table 4: Variance inflation factor .......................................................................................................... 15
Table 5: Dummy variable encoding for categorical data ....................................................................... 23

List of figures
Figure 1:Histogram of fuel consumption by truck type ........................................................................ 12
Figure 2:Boxplot of the fuel consumption of each truck type .............................................................. 13
Figure 3: Durbin Watson Code .............................................................................................................. 21
Figure 4: Autocorrelation plot of residuals ........................................................................................... 21
Figure 5: Residual vs Predicted values plot ........................................................................................... 22
Figure 6:Quantile plot of residuals ........................................................................................................ 22

List of abbreviations
EDA Exploratory data analysis
MLR Multi-linear regression

5
1. Introduction
The optimisation and reduction of hefty Mine operation costs are mainly contingent on the
efficiency of the utilised haulage system. Mine haulage systems constitute a vital component
of mining activities, serving the crucial function of conveying extracted materials from the
development end to various facilities for processing. ABC Mining company is a company that
aims to reduce the operational expenses and environmental disturbances through investigating
fuel consumed during the mining operations. The company collected data pertaining to various
parameters that impact fuel consumption. Data scrubbing, delineation, and analysis will aid the
company on implementing informed decisions pertaining to fuel optimisation and efficiency.
The prime objective is to reduce the cost margins incurred while conserving the environment
by minimising their carbon emissions and contributing to environmental sustainability.
Innumerable factors were analysed which include payload, loading time, travelling time,
queuing time, dumping time, truck type as well as engine power to understand how they affect
fuel consumption. These parameters were analysed for three different trucks with different
engine power which include BL753, SF319, and TR100.
This report aims to conduct data cleaning, exploratory data analysis and multi-linear regression
using machine learning tools such as Python, a programming language and Anaconda, a data
science platform.

2. Literature review
2.1.Case study 1: Application of Machine Learning Techniques to Predict Haul Truck Fuel
Consumption in Open-Pit Mines

The objective of this study is to create a model for assessing the diesel fuel consumption of
mining haul trucks. To accomplish this, various machine learning techniques were employed,
namely multiple linear regression, random forest, artificial neural network, support vector
machine, and kernel nearest neighbour. These methods were used to make predictions about
the fuel consumption of haul trucks based on several independent variables, including payload,
total resistance, and actual speed. The results of the study reveal that the artificial neural
network outperformed the other models in terms of accuracy. In contrast, the multiple linear
regression model displayed the least favourable performance across all statistical metrics. As a
final step, a sensitivity analysis was conducted to determine the significance of the independent
variables (Alamdari, Basiri, Mousavi. and Soofastaei, 2022).

2.2.Case study 2: Energy Efficiency Improvement in Surface Mining

This case study provides an overview of energy efficiency within the mining industry,
specifically focusing on the impact of fuel consumption in mining hauling operations. Diesel
consumption is singled out as a critical cost factor with substantial environmental implications
in surface mining. The objective of this research is to create an advanced data analytics model

6
aimed at estimating the energy efficiency of haul trucks used in surface mining, ultimately
aiming to reduce operational expenses (Soofastaei and Fouladgar, 2022).

The prediction of truck fuel consumption hinges on several key factors, namely total resistance,
truck payload, and truck speed. To achieve this prediction, a comprehensive analysis
framework is constructed. This framework is based on the development of a fitness function
derived from a model that links fuel consumption with its influencing factors. Subsequently,
the model is trained and validated using real data collected from significant surface mines in
Australia through field research. Ultimately, an artificial neural network is chosen as the tool
for predicting haul truck fuel consumption (Soofastaei, and Fouladgar, 2022).

2.3.Case study 3: Fuel Optimization in Mining Trucks using Machine Learning


This case study employs a holistic approach that combines data exploratory analysis and
machine learning techniques to assess the fuel efficiency of trucks and estimate potential cost
savings. The fuel consumption predictions are derived from extensive data related to tracked
truck dispatch cycles and refueling events. Additionally, an in-depth statistical analysis has
been conducted to pinpoint the key factors that impact fuel consumption, considering the
correlations between the independent variables. Various machine learning models, such as
XGBoost, Support Vector Machines (SVM), and Neural Networks, have been assessed for their
performance in this study (Terpstra et al, 2021). After a comprehensive analysis, the following
actionable recommendation were provided:

• Maintenance Optimization: The maintenance schedule for haul trucks should be


optimized to ensure that the engines and other key components are in peak condition.
Regular maintenance, such as cleaning air filters and optimizing tire pressure, can
significantly reduce fuel consumption.
• Monitoring and Operator driving style: The study recommended implementing a driver
training program focused on fuel-efficient driving techniques. In-cab monitoring
systems, such as telematics and real-time feedback tools, should be installed to provide
drivers with instant feedback on their driving behaviour, allowing them to make real-
time adjustments.
2.4.Case Study 4: Determinants of fuel consumption in mining trucks

This case study endeavour seeks to forecast the fuel consumption for each operational cycle of
heavy mining dump trucks within a specific mining site, where these cycles predominantly
involve loading, hauling, and dumping activities. The primary goal of this study is to provide

7
insights into estimating the fuel expenditures associated with different operational scenarios.
The aim is to identify scenarios that minimize both fuel consumption and the emission of
greenhouse gases. The study employs a combination of statistical techniques, including partial
least squares regression (PLSR) and autoregressive integrated moving average (ARIMA), to
make predictions regarding fuel consumption, drawing on the patterns observed during cyclic
activities (Dindarloo and Siami-Irdemoosa, 2016). The study provided the following actionable
recommendations:

• Route Optimization: The study emphasized the importance of optimizing haul truck
routes to minimize the distance travelled and reduce fuel consumption. This involves
the use of advanced GPS and route planning software to select the most efficient paths
for each haul.
• the study recommended using fuel consumption data and analytics to identify
inefficiencies and develop strategies for improvement.
• Fuel Quality Control: Ensuring the quality of the fuel used in haul trucks is essential.
The case study suggested implementing a rigorous fuel quality control system to
prevent contamination and degradation of fuel, which can negatively impact fuel
efficiency.

3. Methodology
3.1. Applied Machine Learning Models and Parameters
Machine learning accounts for a branch within the realm of artificial intelligence dedicated to
crafting algorithms and statistical models that enables computer systems to analyse and make
inferences based on acquired data. The quintessence of machine learning is embedded in its
ability to leverage data for pattern identification, prediction, and performance. Data collected
by the ABC mining company to examine how parameters affect fuel consumption was analysed
using Python (csv format), a programming language distributed in Anaconda. Anaconda is an
open-source platform that permits the execution of a code, through the importation of libraries
onto Jupiter notebook. Various libraries fulfilled different functions for instance matplotlib and
scikit-learn aided in plotting functions or visualisations. Pandas, Scipy NumPy and Seaborn all
enabled the processing of data into information that is comprehensive.
3.2.The processing of data procedure:
Data delineation: Refers to the process of describing data by highlighting the characteristics
present in the distribution. The data shape/ structure was established.
Data scrubbing: Refers to the detection of errors and discrepancies, removal of duplicates,
outlier handling and normalization of data. Data scrubbing was conducted to ensure inferences
are drawn from accurate data.
Exploratory data analysis: EDA enables data professionals to identify challenges and guide in
data modelling. EDA was effectively implemented to counteract data inconsistencies.

8
Application of suitable model: The multi-linear regression model was found fit to model the
data. Assumptions which include linearity, independence of observation, made by the model
were validated by analysing the attained data.

4. Results and Analysis


4.1. Data cleaning
Data cleaning is an important step in data analysis process. It involves identifying and
rectifying errors, inaccuracies, and inconsistencies in the data set to ensure that the data is
reliable and accurate for further analysis (Nongthombam and Sharma ,2021).
Justification of using clean data:
1. Detection and diagnosis of errors/ anomalies /discrepancy
a. Missing values
b. Inaccurately captured data
2. Data quality prerequisite for valid data analysis. Quality assurance
3. Using uncleaned data could lead to incorrect insights and conclusions, whereas
using cleaned data provides a solid foundation for making informed decisions,
generating insights, and drawing meaningful conclusions (Dasari and Varma,.2022).

In this report, data cleaning involved the following steps:


1. Importing important libraries and the dataset.
2. Exploration of the dataset. When data is explored, it is easier to understand its
structure and the datatypes. Understanding the structure and datatypes can help in
identifying potential errors and inaccuracies.
3. Checking and handling rows with missing values. Five rows with missing values
were identified and deleted. This decision was made because this manipulation
would not significantly misrepresent the overall data as only 5 out of a total of 1168
rows were deleted.
4. Checking and handling rows with duplicates. Duplicated data affects the
distribution of the data, therefore, the rows with duplicates were deleted in order to
maintain integrity of the data.
5. Checking and handling outliers. Identifying outliers helps to in detecting extreme
datapoints that do not make logical sense. In this report, a row with a negative fuel
consumption was identified and deleted.
6. Checking and handling of negative values. As much as the row with negative fuel
consumption was identified and deleted before, it was still important to separately
identified. This helps in identifying and handling negative values in variables that
cannot be negative integers.
The cleaned data was saved and used for further analysis, visualization, and creation of the
MLR model. It was used because the data cleaning process did not significantly alter the
distribution of the data. The follow comparison of fuel consumption statistics supports the
insignificant difference:

9
Table 1: Analysis of measures of central tendency and dispersion

Measures of central tendency Uncleaned data Cleaned data


or dispersion
Mean 152.06 151.02
Mode 120.65 120.65
Median 143.15 143.15
Skewness 5,95 1.08

The skewness of the uncleaned data (5.95) and cleaned data (1.08) show that both datasets
are skewed to the right. Furthermore, the means (152.06 ≈ 151.02), modes (120.65=120.65)
and medians (143.15=143.15) of the two datasets are roughly the same. Therefore, it was
concluded that the cleaned data did not affect the overall distribution of the uncleaned.
data.
The data cleaning process was carried out on Jupyter Notebook, using python programming.
language. The code was submitted together with the report and can be used as a reference.

4.2.Exploratory data analysis


The Application and purpose of Exploratory data analysis:
1. Discrepancies and anomalies can be detected.
2. Insightful data analysis
3. Establishes correlations.

4.2.1. Integrated Exploratory data analysis for all truck types.


Descriptive statistics refers to collecting, summarizing, and analysing data that was subject to
random variation and is an equal representation of the population. Descriptive statistics unlike
inferential statistics does not equip the analyst to make allows for inferences on the population.
Descriptive statistics encompasses two pillars:
a. Measures of central tendency which include but not limited to the Mean, Mode and
Median were calculated to evaluate the three truck types based on fuel consumption:

Table 2: Measures of central tendency of the different truck types

Truck type Mean Mode Median (50% of


data reached)
BL753 179.56 143.15 170.15
SF319 153.43 120.65 143.05
TR100 127.56 108.35 118.85

10
Interpretation and analysis:
BL753: The Mean of the BL735 truck is higher than the median value, this implies that the
distribution is positively skewed (right). The data is also however approximating symmetry,
this is due to the mean and median value not varying greatly in value (difference ≤ 10) thus
suggesting normality. The mode attained was 143.15 and it represents the most repetitive value
within the dataset.
SF319: The mean is greater than the median thus signalling positively skewed distribution. The
difference between the mean and the median is of little to no significance thus implying that
the dataset is approximating normality. The attained mode was 120.65.
TR100: The dataset is skewed to the right because of the mean value being higher than the
median.
The data also approximates normality as the difference between the mean and mode is ≤10.

b. Measures of dispersion
Inclusive of but not limited to: Variance, Standard deviation, Coefficient of variation,
Range, Skewness and Kurtosis. Observed measures of the three trucks:

Table 3: Measures of dispersion of the different truck types

Truck Type Range Standard Skewness


deviation
BL753 172.85 35.23 Right (positively)
SF319 169.65 31.08 Right (positively)
TR100 108.35 27.16 Right (positively)

Interpretation and analysis:


BL753: The range is significantly greater than the standard deviation thus signalling the
presence of outliers or values that are altering the value of the range. The established skewness
indicates the shape of the distribution.
SF319: The range is notably larger than the standard deviation, displaying the presence of
outliers that shift the distribution to the left. The observed skewness provides insights into the
distribution’s shape.
TR100: The range exceeds the standard deviation, indicating the impact of outliers which
influence the extent of the range. The detected skewness offers valuable inferences about the
form of data.

Appropriate visualizations:
Histogram:
Justification of the suitability of a histogram:

11
• Distribution type: A histogram clearly indicates the distribution of the data.
Checking for normality is relatively simplified.
• Data evaluation: The interpretation and evaluation of data is simpler. Histograms
highlight the necessity of processing the data. The presence of outliers is evident.

Figure 1:Histogram of fuel consumption by truck type

Interpretation and analysis:


BL753:
The distribution is skewed to the right (contains a peak on the left side of the distribution with
an extending tail to the left). The 125< frequency <150 corresponds to the fuel consumption
𝐿
range of: 100 < 𝑓𝑢𝑒𝑙 𝑐𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 ( ) < 150
𝑐𝑦𝑐𝑙𝑒

SF319: The distribution similarly to that of the SF319 is skewed to the right.
The 125< frequency<150 corresponds to the fuel consumption range of: 100 <
𝐿
𝑓𝑢𝑒𝑙 𝑐𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 (𝑐𝑦𝑐𝑙𝑒) < 120

TR100: The distribution is skewed to the right (contains a peak on the left side of the
distribution with an extending tail to the left). The 75< frequency <100 corresponds to the fuel
𝐿
consumption range of: 140 < 𝑓𝑢𝑒𝑙 𝑐𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 (𝑐𝑦𝑐𝑙𝑒) ≤ 150

12
Boxplot: Boxplots are classified as exploratory charts relevant for extracting and presenting
data in a meaningful way

Figure 2:Boxplot of the fuel consumption of each truck type

Interpretation and analysis of Boxplot:


BL753: The boxplot displays the highest minimum, median and maximum value in comparison
to truck type TR100 and SF319. Implication: This implies that at any given time truck type
BL753 consumes the most fuel.
SF319: The minimum value, median and maximum values are in between that of truck type
TR100 and BL753. The fuel consumption averages between the other two truck types.
TR100: The TR100 truck has the lowest minimum, median, and maximum value which
indicates the least consumption of fuel.
FURTHERMORE
Factors that influence fuel consumption:
1. Payload variance:
Payload variance refers to the differences between the loaded mass in a truck per cycle.
Overloaded trucks lead to more fuel consumption, this due to the higher engine power
required. Payload should be optimised to minimise fuel consumption and increase the
sustainability of Mine assets.
2. Haul road profile design and maintenance:
• Grade resistance: Grade resistance refers to the resistance encountered bu
haul trucks when conveying material and minerals from the development
end to processing plants. The gradient of the road, friction and traction
contribute to the increased utility of fuel. The power required to traverse
poorly designed haul road profiles increases fuel consumption and decreases

13
operational efficiency. The effects of grade resistance can be alleviated
through the optimisation of haul road design, conduct predictive
maintenance on tyre condition and monitoring road conditions.

• Rolling resistance: The force that counteracts the motion of truck tyre. The
force is a consequence of tyre deformation. Rolling resistance increases the
amount of energy required to propel a truck. As haul truck tyres encounter
more rolling resistance more energy is required to maintain a certain speed.
This leads to an increase in fuel consumption. To eradicate the effects of
rolling resistance low rolling resistance tyres and maintaining proper tyre
inflation is recommended.
3. Engine power: Engine power also referred to as horsepower computes the rate at which
an engine can perform work overtime. The higher the engine power the more fuel is
consumed for the same distance. The higher the energy output, the more fuel I converted
to energy.
4. Travelling time: Travelling time, queuing time, loading and dumping cycles influence
fuel consumption. The prolonged the cycle the higher the fuel consumed.

4.3.Multiple linear regression


4.3.1. Dummy variable encoding for categorical data
The cleaned data was used to create the MLR model. The first step includes the conversion of
the categorical data in the cleaned data into numerical data. This was done using the dummy
variable encoding method. This conversion is important for an MLR model as it makes
categorical data to be useable in the MLR model in the form of numeric data. The python code
which does the conversion is shown in appendix A.
4.3.2. Checking for the assumptions of linearity and non-multicollinearity
The predictor variables used in a MLR model must have a linear relationship with the response
variable. In addition, the predictors must not show multicollinearity when compared to each
other.
Linearity assumption:
Linearity is assumed in a multiple linear regression model between the dependent variable and
the various independent variables. Two methods were used to check for linearity, namely, the
Pearson correlation and scatter plots matrices. Pearson correlation coefficients along with their
associated p-values assess multi-collinearity. The null hypothesis in a Pearson correlation test
determines if there is a linear correlation between predictor variables. Generally, strong
correlations with p-values < 0.05 implies statistically significant relations thus indicating
multicollinearity. Strong correlations with p-values > 0.05 signals less multicollinearity.
These interpretations assist in the selection of predictor variables that contribute to distinct
information thus ensuring a more reliable interpretation of the MLR model. The correlation
matrix in Appendix 2 shows that payload has a high positive linear correlation of 0.8. Whereas
queuing time, loading time, dumping time, and travelling time show a moderate positive linear
correlation. Similarly, the scatter plots matrix in appendix 3 shows that payload, travelling time,
loading time, dumping time and queuing time have a linear correlation with the response.

14
Hence, engine power and truck type have been excluded from the MLR model as they show
no significant evidence of linearity.
Multi-collinearity assessed with appropriate methods.
The variance inflation factor is a tool aimed at evaluating multicollinearity. The factor measures
the extent to which multicollinearity increases the variance of a regression coefficient. A VIF
of 1 indicates the absence of multicollinearity. Generally, VIF values below 5 are appropriate
whilst values greater than 5 indicate substantial degree of multicollinearity. Variables with
elevated VIF values may be excluded from the model to reduce multicollinearity, resulting in
more reliable regression outcome.
Variance inflation factor (VIF) tool was utilised to measure the effects of multicollinearity on
variance. The table below displays the obtained values.
The conducted VIF
Table 4: Variance inflation factor

Variable Variance inflation factor (VIF)


1. Payload 1.015574
2. Travelling time 140.608244
3. Queuing time 1.106238
4. Loading time 97.489346
5. Dumping time 74.387576

Analysis:
A high variance inflation factor generates concerns pertaining to multicollinearity. The variance
factors for Payload and Queuing time are relatively low and pose no concern however
Travelling time, loading time, and dumping time have high variance inflation factors. This
serves as an indication of multicollinearity between those factors. The VIFs table shows that
payload and queuing time have VIFs of about 1, resembling non-multicollinearity. In contrast,
travelling time, loading time, and dumping time have VIFs greater than 5. Hence, only the
payload and queuing time have been used in the model.
4.3.3. Fitting model – Multi Linear regression
The simplified multi-linear regression model includes solely the most significant predictor
variables. There are enumerable advantages pertaining to the implementation of the model,
some of which are:
• Competitive predictive performance
• The comprehensive model with reduced intricacy
Variance
Evaluation of a MLR model requires the following parameters: The R-squared, mean squared
and the mean absolute percentage error.
The R-squared is a measure of how much variance in the response variable could be due to the
predictors in the model. The R-squared value ranges between 0 and 1. The value quantifies

15
how efficient the independent variables explain the variance in the response. The mean squared
error (MSE) quantifies the error in the predictions of the model. However, the MSE is difficult
to interpret as it shows squared errors. Therefore, the root mean squared error (RMSE) is
calculated to get an error in the same units as the response. Thereafter, the mean absolute
percentage error (MAPE) is calculated. The MAPE is the percentage error between the
predicted and actual values. 1.4.1. Calculations of evaluation parameters The R-squared, MSE,
RMSE and MAPE were calculated using python programming, the code is attached on
appendix 6. The results are as follows:

𝑅 − 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 = 0.83
𝑀𝑒𝑎𝑛 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝐸𝑟𝑟𝑜𝑟 = 231.72 (𝐿/𝑐𝑦𝑐𝑙𝑒)
𝑅𝑜𝑜𝑡 𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝐸𝑟𝑟𝑜𝑟 (𝑅𝑀𝑆𝐸) = 15.22 𝐿/𝑐𝑦𝑐𝑙𝑒
𝑀𝑒𝑎𝑛 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝐸𝑟𝑟𝑜𝑟 (𝑀𝐴𝑃𝐸) = 8.43 % 1.4.2.

An R-squared value of 0.83 suggests that about 83% of variation in fuel consumption is
explained by payload and queuing time, signifying a high degree of success in clarifying the
variability in the response variable. An RMSE value of 15.22 L/cycle signifies that, on average,
the predictions have an error of around 15.22 L/cycle when compared to the actual values. A
MAPE value of 8.43 % indicates that, on average, the predictions exhibit an absolute
percentage error of roughly 8.43 % in relation to the actual values.
The interpretation of the linear correlation using the equation:
• y: Indicates a dependent variable that the equation is modelling and predicting such as
fuel consumption.
• x: Independent variable also referred to as the predictor. The variable influences the
outcome of the dependent variable.
• m: The Gradient indicates how much the predicted value changes per unit of change in
the predictor variable. A positive value signifies direct proportionality whilst a negative
gradient indicates inverse proportionality.
• c: The y-intercept represents the value of y when x is zero.
The obtained linear regression equation

𝑌(𝑓𝑢𝑒𝑙 𝑐𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛) = 56.55 + (0.53 × 𝑝𝑎𝑦𝑙𝑜𝑎𝑑) + (0.21 × 𝑞𝑢𝑒𝑢𝑖𝑛𝑔 𝑡𝑖𝑚𝑒)

Analysis: The coefficient indicates the percentage contribution of a particular factor to fuel
consumption. Payload contributes 53% to fuel consumption which indicates how the estimated
fuel consumption changes with a one-unit increase in payload if all other variables remain
constant. Queuing time contributes 21% to fuel consumption indicates how the estimated fuel
consumption changes with a one-unit increase in payload if all other variables remain constant.
When the values of payload and queuing time are zero, the intercept represents the estimated

16
fuel consumption. In this scenario, when all other influences are null, the estimated fuel
consumption is approximately 56.55.
4.3.4. Evaluation and validation of MLR Model
Residual variance was utilised in analysing the dispersion and variability of residuals.
Residuals Normality
Residuals in regression analysis refer to the differences between observed and predicted
values. This report utilised the Quantile- Quantile plot to assess for the normality of
residuals. A Q-Q plot is a graphical tool that checks for similarity in the quantiles of the
observed residuals to the quantiles of theoretical normal distribution, see attached Q-Q plot
in appendix A.
There are numerable reasons why a Q-Q plot validates normality:
• The line in the visual tool represents Normal distribution. The x-axis represents the
quantiles of this theoretical normal distribution, while the y-axis represents the
quantiles of observed residuals.
• The correlation of the residuals to a linear pattern indicates normality.
• The dispersion of the Q-Q data points indicate divergence from normality as the
points do not conform to a linear pattern.
• Skewness and heavy tails may be detected through this methodology.
Uniform residual variability
Homoscedasticity is an assumption made in linear regression indicating that the dispersion of
residuals remains constant and stable throughout the entire spectrum of predictor values.
Residuals are examined through the analysis of scatter plots. The test revealed a low residual
variance which suggests that the model is suitable for the sampled data. A high residual
variance would have revealed the deviation of the data from the proposed model. The utility of
scatter plots is of significance as the patterns suggest either heteroscedasticity or
homoscedasticity. Heteroscedasticity reflects funnel-shaped pattern while homoscedasticity
reflects consistent variability.
Analysis: The scatterplot of Residuals versus Predicted values displays the lack of a discernible
pattern or trend. The outcome indicates homoscedasticity which implies that the assumptions
of the linear regression model are validated. The demonstration of this is displayed in appendix
A.
Autocorrelation plot analysis and Durbin-Watson statistic
A normality plot of Autocorrelation was attained through the following methodology:
Calculation of residuals:
𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠 = 𝑌𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝑌𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑
The partial autocorrelation coefficients are calculated through the attained residuals values. The
outcome displays the relation between lags and residuals. The coefficients are further
standardized by converting them into z-scores. Interpretation: The Autocorrelation plot of
residuals indicates that the residuals are approximating normality as they are closely aligned to

17
the straight line. In addition, the normality was confirmed with a statistical test known as the
Anderson Darling test. At different lags the residuals oscillate within the range stipulated
below:
−0.25 < 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠 < 0.25
The deviations are minimal and are of no significance. The absence of autocorrelation is
therefore validated.
A Durbin-Watson is a statistical tool applied to identify autocorrelation within the residuals of
regression analysis. Serial correlation also referred to as autocorrelation is a statistical concept
in which a dataset reflects a correlation with its previous values. This outputs a pattern of
mutual influence between consecutive data points.
Analysis: In this report the Durbin- Watson statistic amalgamated with the autocorrelation
plot were utilized to analyse Autocorrelation. The Durbin-Watson possesses the following
classification: If the statistic outputs the value of 2, this indicates no autocorrelation whilst a
value less than 2 signals positive correlation and a value above 2 indicates negative correlation.
The attained Durbin-Watson statistic:
𝐷𝑢𝑟𝑏𝑖𝑛 − 𝑊𝑎𝑡𝑠𝑜𝑛 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = 2.02
The result indicates the absence of autocorrelation in the residuals. This is implying that the
assumption of independence within the data is validated. A demonstration of the outcome is
attached on appendix A.
Overall Model Validation:
The model displays a high R-squared value of 0.83. Moreover, the model has a Mean Absolute
Percentage Error of 8.43%, which indicates an excellent accuracy as the value is less than 10%.
Furthermore, the model meets the assumptions of normality, homoscedasticity and absence of
autocorrelation for the residuals. Therefore, the model meets the essential validation criteria.

Recommendations and conclusions


In this project, a multiple linear regression model was created, evaluated and validated using a
given dataset as a basis. The objective was to understand the relationships between the
predictors and the response variable, and to develop a model that could be used for predicting
fuel consumption. The requirements of the project included investigating the fuel consumption
efficiency of each truck. The initial stages of the project involved cleaning the given data and
using it for an exploratory data analysis (EDA). The key findings of the EDA showed that truck
type TR100 has the best fuel consumption efficiency. Thereafter, a multiple linear regression
model was initiated by checking which independent variables satisfy the assumptions of
linearity and non-multicollinearity. Key findings of this stages reflected that payload and
queuing time are the only variables that satisfied the assumptions. Hence, the two variables
were used as predictors in the model. The model was then fitted and analysed.The key findings
in the model fitting stage showed that payload has the most influence on fuel consumption,
followed by queuing time. The final stage of the project involved the evaluation and validation
of the model. At this stage, it was discovered that the model has excellent accuracy and it meets
the essential validation criteria. In conclusion, the multiple linear regression model represents

18
a valuable asset for decision-making and problem-solving related to fuel consumption of mine
haul trucks.Although the model is well-constructed, regularly updating and extending the
dataset as more information becomes accessible is crucial. Increasing the dataset has the
potential to improve the predictive capabilities of the model. Additionally, it is advisable to
establish a monitoring system to continually evaluate the model's performance over time.
Consistently appraising the model's accuracy can help maintain its relevance and effectiveness.
Furthermore, implementing the k-fold cross-validation technique is also recommended for
assessing the model's stability and its ability to generalize.

19
References

Alamdari, S., Basiri, M.H., Mousavi, A. and Soofastaei, A., 2022. Application of machine
learning techniques to predict haul truck fuel consumption in open-pit mines. Journal of
Mining and Environment, 13(1), pp.69-85.

Camizuli, E. and Carranza, E.J., 2018. Exploratory data analysis (EDA). The encyclopedia of
archaeological sciences, pp.1-7.
Dasari, D. and Varma, P.S., 2022. Data Cleaning Techniques Using Python. Technology, 1(1),
pp.11- 21. Nongthombam, K. and Sharma, D., 2021. Data Analysis using Python. International
Journal of Engineering Research & Technology (IJERT), 10(7)

Dindarloo, S.R. and Siami-Irdemoosa, E., 2016. Determinants of fuel consumption in mining
trucks. Energy, 112, pp.232-240.

Sahoo, K., Samal, A.K., Pramanik, J. and Pani, S.K., 2019. Exploratory data analysis using
Python. International Journal of Innovative Technology and Exploring Engineering, 8(12),
pp.4727-4735.
Soofastaei, A. and Fouladgar, M., 2022. Energy Efficiency Improvement in Surface
Mining. Energy Recovery.

Terpstra, V.J., Lara-Yejas, O., Mokhtari, K., Santa Cruz, J.H. and de Mattos, M.P., 2021. Fuel
Optimization in Mining Trucks using Machine Learning. In IIE Annual Conference.
Proceedings (pp. 369-374). Institute of Industrial and Systems Engineers (IISE).

Van den Broeck, J. et al. (2005) ‘Data Cleaning: Detecting, diagnosing, and editing data
abnormalities’, PLoS Medicine, 2(10). doi:10.1371/journal.pmed.0020267.

20
Appendix A [Multi- Linear regression]

Figure 3: Durbin Watson Code

Figure 4: Autocorrelation plot of residuals

21
Figure 5: Residual vs Predicted values plot

Figure 6:Quantile plot of residuals

22
Table 5: Dummy variable encoding for categorical data

23

You might also like