Analysis of Mine Haul Truck Fuel Consumption Report
Analysis of Mine Haul Truck Fuel Consumption Report
Analysis of Mine Haul Truck Fuel Consumption Report
Johannesburg
School of Mining Engineering
Course: Digital Technologis and Mine Data Analytics
Analysis of Mine Haul Truck Fuel consumption
Authors:
Vangile Thabethe
Kamogelo Oliver
Tukiso Tsomakae
Patience Purazi
DECLARATION:
We declare that this report is our own, unaided work. We have read the University Policy
on Plagiarism and hereby confirm that there is no plagiarism in this report. We also
confirm that there is no copying nor is there any copyright infringement. We willingly
submit to any investigation in this regard by the School of Mining Engineering and we
undertake to abide bythe decision of any such investigation.
1
Abstract
Analysis of Mine Haul Truck Fuel Consumption
This report encompassed multitudinous sections dedicated to the delineation, the detection and
elimination of discrepancies, analysis, and implementation of the multi-linear regression model
to the collected data from a hypothetical company, ABC. The applied procedure to conduct
simple multi-linear regression required the utility of a data science platform, Anaconda which
contains various distributions of python. Various factors, including Payload, queuing time
(cycle times), engine power, rolling resistance and grade resistance were employed to evaluate
their impact on fuel consumption in Mine haul trucks. The assumption of linearity,
independence of observation, homoscedasticity and normality were gauged (using descriptive
statistics of the attained quantitative data) to validate the suitability of the model. The model
revealed that only two predictor factors possessed no concern pertaining to multi-collinearity,
the factors being payload and queuing time. The assessment of those factors indicated how
truck choice can influence fuel consumption and increase operational costs. The report further
provides visualizations to aid analysts with data exploration tools, model selection and
interpretability. The necessity of the study is embedded on the simplification of complex
problems within the operational environment to alleviate the fuel costs incurred by company
ABC.
2
Acknowledgements
University of the Witwatersrand,
School of Mining Engineering,
Course indoctrinators: Mr M. Mabala, Ms M.Madahana
3
Table of Contents
1. Introduction............................................................................................................................... 6
2. Literature review ....................................................................................................................... 6
2.1. Case Study 1: Application of Machine learning technology to predict Haul truck fuel
consumption in Open Pit Mines ................................................................................................ 6
2.2. Case Study 2: Energy efficiency improvement in surface Mining.............................................. 6
2.3. Case Study 3: Fuel optimization in Mining trucks using machine learning ............................... 7
2.4. Case Study 4: Determinants of fuel consumption in Mining trucks .......................................... 7
3. Methodology ............................................................................................................................. 8
3.1. Applied Machine Learning models and parameters .................................................................. 8
3.2. Data processing procedures ...................................................................................................... 8
4. Results and Analysis .................................................................................................................. 9
4.1. Data cleaning ............................................................................................................................. 9
4.2. Exploratory data analysis ......................................................................................................... 10
4.2.1. Integrated Exploratory data analysis for all truck types .................................................. 10
4.3. Multiple linear regression ........................................................................................................ 14
4.3.1. Dummy variable encoding for categorical data ................................................................ 14
4.3.2. Checking for the assumptions if linearity and non-multicollinearity ................................ 14
4.3.3. Fitting Multi-linear regression model................................................................................ 15
4.3.4. Evaluation and validation of model................................................................................... 17
Recommendations and Conclusions .................................................................................................... 18
References ............................................................................................................................................ 20
4
List of tables
Table 1: Analysis of measures of central tendency and dispersion ...................................................... 10
Table 2: Measures of central tendency of the different truck types ..................................................... 10
Table 3: Measures of dispersion of the different truck types ............................................................... 11
Table 4: Variance inflation factor .......................................................................................................... 15
Table 5: Dummy variable encoding for categorical data ....................................................................... 23
List of figures
Figure 1:Histogram of fuel consumption by truck type ........................................................................ 12
Figure 2:Boxplot of the fuel consumption of each truck type .............................................................. 13
Figure 3: Durbin Watson Code .............................................................................................................. 21
Figure 4: Autocorrelation plot of residuals ........................................................................................... 21
Figure 5: Residual vs Predicted values plot ........................................................................................... 22
Figure 6:Quantile plot of residuals ........................................................................................................ 22
List of abbreviations
EDA Exploratory data analysis
MLR Multi-linear regression
5
1. Introduction
The optimisation and reduction of hefty Mine operation costs are mainly contingent on the
efficiency of the utilised haulage system. Mine haulage systems constitute a vital component
of mining activities, serving the crucial function of conveying extracted materials from the
development end to various facilities for processing. ABC Mining company is a company that
aims to reduce the operational expenses and environmental disturbances through investigating
fuel consumed during the mining operations. The company collected data pertaining to various
parameters that impact fuel consumption. Data scrubbing, delineation, and analysis will aid the
company on implementing informed decisions pertaining to fuel optimisation and efficiency.
The prime objective is to reduce the cost margins incurred while conserving the environment
by minimising their carbon emissions and contributing to environmental sustainability.
Innumerable factors were analysed which include payload, loading time, travelling time,
queuing time, dumping time, truck type as well as engine power to understand how they affect
fuel consumption. These parameters were analysed for three different trucks with different
engine power which include BL753, SF319, and TR100.
This report aims to conduct data cleaning, exploratory data analysis and multi-linear regression
using machine learning tools such as Python, a programming language and Anaconda, a data
science platform.
2. Literature review
2.1.Case study 1: Application of Machine Learning Techniques to Predict Haul Truck Fuel
Consumption in Open-Pit Mines
The objective of this study is to create a model for assessing the diesel fuel consumption of
mining haul trucks. To accomplish this, various machine learning techniques were employed,
namely multiple linear regression, random forest, artificial neural network, support vector
machine, and kernel nearest neighbour. These methods were used to make predictions about
the fuel consumption of haul trucks based on several independent variables, including payload,
total resistance, and actual speed. The results of the study reveal that the artificial neural
network outperformed the other models in terms of accuracy. In contrast, the multiple linear
regression model displayed the least favourable performance across all statistical metrics. As a
final step, a sensitivity analysis was conducted to determine the significance of the independent
variables (Alamdari, Basiri, Mousavi. and Soofastaei, 2022).
This case study provides an overview of energy efficiency within the mining industry,
specifically focusing on the impact of fuel consumption in mining hauling operations. Diesel
consumption is singled out as a critical cost factor with substantial environmental implications
in surface mining. The objective of this research is to create an advanced data analytics model
6
aimed at estimating the energy efficiency of haul trucks used in surface mining, ultimately
aiming to reduce operational expenses (Soofastaei and Fouladgar, 2022).
The prediction of truck fuel consumption hinges on several key factors, namely total resistance,
truck payload, and truck speed. To achieve this prediction, a comprehensive analysis
framework is constructed. This framework is based on the development of a fitness function
derived from a model that links fuel consumption with its influencing factors. Subsequently,
the model is trained and validated using real data collected from significant surface mines in
Australia through field research. Ultimately, an artificial neural network is chosen as the tool
for predicting haul truck fuel consumption (Soofastaei, and Fouladgar, 2022).
This case study endeavour seeks to forecast the fuel consumption for each operational cycle of
heavy mining dump trucks within a specific mining site, where these cycles predominantly
involve loading, hauling, and dumping activities. The primary goal of this study is to provide
7
insights into estimating the fuel expenditures associated with different operational scenarios.
The aim is to identify scenarios that minimize both fuel consumption and the emission of
greenhouse gases. The study employs a combination of statistical techniques, including partial
least squares regression (PLSR) and autoregressive integrated moving average (ARIMA), to
make predictions regarding fuel consumption, drawing on the patterns observed during cyclic
activities (Dindarloo and Siami-Irdemoosa, 2016). The study provided the following actionable
recommendations:
• Route Optimization: The study emphasized the importance of optimizing haul truck
routes to minimize the distance travelled and reduce fuel consumption. This involves
the use of advanced GPS and route planning software to select the most efficient paths
for each haul.
• the study recommended using fuel consumption data and analytics to identify
inefficiencies and develop strategies for improvement.
• Fuel Quality Control: Ensuring the quality of the fuel used in haul trucks is essential.
The case study suggested implementing a rigorous fuel quality control system to
prevent contamination and degradation of fuel, which can negatively impact fuel
efficiency.
3. Methodology
3.1. Applied Machine Learning Models and Parameters
Machine learning accounts for a branch within the realm of artificial intelligence dedicated to
crafting algorithms and statistical models that enables computer systems to analyse and make
inferences based on acquired data. The quintessence of machine learning is embedded in its
ability to leverage data for pattern identification, prediction, and performance. Data collected
by the ABC mining company to examine how parameters affect fuel consumption was analysed
using Python (csv format), a programming language distributed in Anaconda. Anaconda is an
open-source platform that permits the execution of a code, through the importation of libraries
onto Jupiter notebook. Various libraries fulfilled different functions for instance matplotlib and
scikit-learn aided in plotting functions or visualisations. Pandas, Scipy NumPy and Seaborn all
enabled the processing of data into information that is comprehensive.
3.2.The processing of data procedure:
Data delineation: Refers to the process of describing data by highlighting the characteristics
present in the distribution. The data shape/ structure was established.
Data scrubbing: Refers to the detection of errors and discrepancies, removal of duplicates,
outlier handling and normalization of data. Data scrubbing was conducted to ensure inferences
are drawn from accurate data.
Exploratory data analysis: EDA enables data professionals to identify challenges and guide in
data modelling. EDA was effectively implemented to counteract data inconsistencies.
8
Application of suitable model: The multi-linear regression model was found fit to model the
data. Assumptions which include linearity, independence of observation, made by the model
were validated by analysing the attained data.
9
Table 1: Analysis of measures of central tendency and dispersion
The skewness of the uncleaned data (5.95) and cleaned data (1.08) show that both datasets
are skewed to the right. Furthermore, the means (152.06 ≈ 151.02), modes (120.65=120.65)
and medians (143.15=143.15) of the two datasets are roughly the same. Therefore, it was
concluded that the cleaned data did not affect the overall distribution of the uncleaned.
data.
The data cleaning process was carried out on Jupyter Notebook, using python programming.
language. The code was submitted together with the report and can be used as a reference.
10
Interpretation and analysis:
BL753: The Mean of the BL735 truck is higher than the median value, this implies that the
distribution is positively skewed (right). The data is also however approximating symmetry,
this is due to the mean and median value not varying greatly in value (difference ≤ 10) thus
suggesting normality. The mode attained was 143.15 and it represents the most repetitive value
within the dataset.
SF319: The mean is greater than the median thus signalling positively skewed distribution. The
difference between the mean and the median is of little to no significance thus implying that
the dataset is approximating normality. The attained mode was 120.65.
TR100: The dataset is skewed to the right because of the mean value being higher than the
median.
The data also approximates normality as the difference between the mean and mode is ≤10.
b. Measures of dispersion
Inclusive of but not limited to: Variance, Standard deviation, Coefficient of variation,
Range, Skewness and Kurtosis. Observed measures of the three trucks:
Appropriate visualizations:
Histogram:
Justification of the suitability of a histogram:
11
• Distribution type: A histogram clearly indicates the distribution of the data.
Checking for normality is relatively simplified.
• Data evaluation: The interpretation and evaluation of data is simpler. Histograms
highlight the necessity of processing the data. The presence of outliers is evident.
SF319: The distribution similarly to that of the SF319 is skewed to the right.
The 125< frequency<150 corresponds to the fuel consumption range of: 100 <
𝐿
𝑓𝑢𝑒𝑙 𝑐𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 (𝑐𝑦𝑐𝑙𝑒) < 120
TR100: The distribution is skewed to the right (contains a peak on the left side of the
distribution with an extending tail to the left). The 75< frequency <100 corresponds to the fuel
𝐿
consumption range of: 140 < 𝑓𝑢𝑒𝑙 𝑐𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 (𝑐𝑦𝑐𝑙𝑒) ≤ 150
12
Boxplot: Boxplots are classified as exploratory charts relevant for extracting and presenting
data in a meaningful way
13
operational efficiency. The effects of grade resistance can be alleviated
through the optimisation of haul road design, conduct predictive
maintenance on tyre condition and monitoring road conditions.
• Rolling resistance: The force that counteracts the motion of truck tyre. The
force is a consequence of tyre deformation. Rolling resistance increases the
amount of energy required to propel a truck. As haul truck tyres encounter
more rolling resistance more energy is required to maintain a certain speed.
This leads to an increase in fuel consumption. To eradicate the effects of
rolling resistance low rolling resistance tyres and maintaining proper tyre
inflation is recommended.
3. Engine power: Engine power also referred to as horsepower computes the rate at which
an engine can perform work overtime. The higher the engine power the more fuel is
consumed for the same distance. The higher the energy output, the more fuel I converted
to energy.
4. Travelling time: Travelling time, queuing time, loading and dumping cycles influence
fuel consumption. The prolonged the cycle the higher the fuel consumed.
14
Hence, engine power and truck type have been excluded from the MLR model as they show
no significant evidence of linearity.
Multi-collinearity assessed with appropriate methods.
The variance inflation factor is a tool aimed at evaluating multicollinearity. The factor measures
the extent to which multicollinearity increases the variance of a regression coefficient. A VIF
of 1 indicates the absence of multicollinearity. Generally, VIF values below 5 are appropriate
whilst values greater than 5 indicate substantial degree of multicollinearity. Variables with
elevated VIF values may be excluded from the model to reduce multicollinearity, resulting in
more reliable regression outcome.
Variance inflation factor (VIF) tool was utilised to measure the effects of multicollinearity on
variance. The table below displays the obtained values.
The conducted VIF
Table 4: Variance inflation factor
Analysis:
A high variance inflation factor generates concerns pertaining to multicollinearity. The variance
factors for Payload and Queuing time are relatively low and pose no concern however
Travelling time, loading time, and dumping time have high variance inflation factors. This
serves as an indication of multicollinearity between those factors. The VIFs table shows that
payload and queuing time have VIFs of about 1, resembling non-multicollinearity. In contrast,
travelling time, loading time, and dumping time have VIFs greater than 5. Hence, only the
payload and queuing time have been used in the model.
4.3.3. Fitting model – Multi Linear regression
The simplified multi-linear regression model includes solely the most significant predictor
variables. There are enumerable advantages pertaining to the implementation of the model,
some of which are:
• Competitive predictive performance
• The comprehensive model with reduced intricacy
Variance
Evaluation of a MLR model requires the following parameters: The R-squared, mean squared
and the mean absolute percentage error.
The R-squared is a measure of how much variance in the response variable could be due to the
predictors in the model. The R-squared value ranges between 0 and 1. The value quantifies
15
how efficient the independent variables explain the variance in the response. The mean squared
error (MSE) quantifies the error in the predictions of the model. However, the MSE is difficult
to interpret as it shows squared errors. Therefore, the root mean squared error (RMSE) is
calculated to get an error in the same units as the response. Thereafter, the mean absolute
percentage error (MAPE) is calculated. The MAPE is the percentage error between the
predicted and actual values. 1.4.1. Calculations of evaluation parameters The R-squared, MSE,
RMSE and MAPE were calculated using python programming, the code is attached on
appendix 6. The results are as follows:
𝑅 − 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 = 0.83
𝑀𝑒𝑎𝑛 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝐸𝑟𝑟𝑜𝑟 = 231.72 (𝐿/𝑐𝑦𝑐𝑙𝑒)
𝑅𝑜𝑜𝑡 𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝐸𝑟𝑟𝑜𝑟 (𝑅𝑀𝑆𝐸) = 15.22 𝐿/𝑐𝑦𝑐𝑙𝑒
𝑀𝑒𝑎𝑛 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝐸𝑟𝑟𝑜𝑟 (𝑀𝐴𝑃𝐸) = 8.43 % 1.4.2.
An R-squared value of 0.83 suggests that about 83% of variation in fuel consumption is
explained by payload and queuing time, signifying a high degree of success in clarifying the
variability in the response variable. An RMSE value of 15.22 L/cycle signifies that, on average,
the predictions have an error of around 15.22 L/cycle when compared to the actual values. A
MAPE value of 8.43 % indicates that, on average, the predictions exhibit an absolute
percentage error of roughly 8.43 % in relation to the actual values.
The interpretation of the linear correlation using the equation:
• y: Indicates a dependent variable that the equation is modelling and predicting such as
fuel consumption.
• x: Independent variable also referred to as the predictor. The variable influences the
outcome of the dependent variable.
• m: The Gradient indicates how much the predicted value changes per unit of change in
the predictor variable. A positive value signifies direct proportionality whilst a negative
gradient indicates inverse proportionality.
• c: The y-intercept represents the value of y when x is zero.
The obtained linear regression equation
Analysis: The coefficient indicates the percentage contribution of a particular factor to fuel
consumption. Payload contributes 53% to fuel consumption which indicates how the estimated
fuel consumption changes with a one-unit increase in payload if all other variables remain
constant. Queuing time contributes 21% to fuel consumption indicates how the estimated fuel
consumption changes with a one-unit increase in payload if all other variables remain constant.
When the values of payload and queuing time are zero, the intercept represents the estimated
16
fuel consumption. In this scenario, when all other influences are null, the estimated fuel
consumption is approximately 56.55.
4.3.4. Evaluation and validation of MLR Model
Residual variance was utilised in analysing the dispersion and variability of residuals.
Residuals Normality
Residuals in regression analysis refer to the differences between observed and predicted
values. This report utilised the Quantile- Quantile plot to assess for the normality of
residuals. A Q-Q plot is a graphical tool that checks for similarity in the quantiles of the
observed residuals to the quantiles of theoretical normal distribution, see attached Q-Q plot
in appendix A.
There are numerable reasons why a Q-Q plot validates normality:
• The line in the visual tool represents Normal distribution. The x-axis represents the
quantiles of this theoretical normal distribution, while the y-axis represents the
quantiles of observed residuals.
• The correlation of the residuals to a linear pattern indicates normality.
• The dispersion of the Q-Q data points indicate divergence from normality as the
points do not conform to a linear pattern.
• Skewness and heavy tails may be detected through this methodology.
Uniform residual variability
Homoscedasticity is an assumption made in linear regression indicating that the dispersion of
residuals remains constant and stable throughout the entire spectrum of predictor values.
Residuals are examined through the analysis of scatter plots. The test revealed a low residual
variance which suggests that the model is suitable for the sampled data. A high residual
variance would have revealed the deviation of the data from the proposed model. The utility of
scatter plots is of significance as the patterns suggest either heteroscedasticity or
homoscedasticity. Heteroscedasticity reflects funnel-shaped pattern while homoscedasticity
reflects consistent variability.
Analysis: The scatterplot of Residuals versus Predicted values displays the lack of a discernible
pattern or trend. The outcome indicates homoscedasticity which implies that the assumptions
of the linear regression model are validated. The demonstration of this is displayed in appendix
A.
Autocorrelation plot analysis and Durbin-Watson statistic
A normality plot of Autocorrelation was attained through the following methodology:
Calculation of residuals:
𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠 = 𝑌𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝑌𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑
The partial autocorrelation coefficients are calculated through the attained residuals values. The
outcome displays the relation between lags and residuals. The coefficients are further
standardized by converting them into z-scores. Interpretation: The Autocorrelation plot of
residuals indicates that the residuals are approximating normality as they are closely aligned to
17
the straight line. In addition, the normality was confirmed with a statistical test known as the
Anderson Darling test. At different lags the residuals oscillate within the range stipulated
below:
−0.25 < 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠 < 0.25
The deviations are minimal and are of no significance. The absence of autocorrelation is
therefore validated.
A Durbin-Watson is a statistical tool applied to identify autocorrelation within the residuals of
regression analysis. Serial correlation also referred to as autocorrelation is a statistical concept
in which a dataset reflects a correlation with its previous values. This outputs a pattern of
mutual influence between consecutive data points.
Analysis: In this report the Durbin- Watson statistic amalgamated with the autocorrelation
plot were utilized to analyse Autocorrelation. The Durbin-Watson possesses the following
classification: If the statistic outputs the value of 2, this indicates no autocorrelation whilst a
value less than 2 signals positive correlation and a value above 2 indicates negative correlation.
The attained Durbin-Watson statistic:
𝐷𝑢𝑟𝑏𝑖𝑛 − 𝑊𝑎𝑡𝑠𝑜𝑛 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = 2.02
The result indicates the absence of autocorrelation in the residuals. This is implying that the
assumption of independence within the data is validated. A demonstration of the outcome is
attached on appendix A.
Overall Model Validation:
The model displays a high R-squared value of 0.83. Moreover, the model has a Mean Absolute
Percentage Error of 8.43%, which indicates an excellent accuracy as the value is less than 10%.
Furthermore, the model meets the assumptions of normality, homoscedasticity and absence of
autocorrelation for the residuals. Therefore, the model meets the essential validation criteria.
18
a valuable asset for decision-making and problem-solving related to fuel consumption of mine
haul trucks.Although the model is well-constructed, regularly updating and extending the
dataset as more information becomes accessible is crucial. Increasing the dataset has the
potential to improve the predictive capabilities of the model. Additionally, it is advisable to
establish a monitoring system to continually evaluate the model's performance over time.
Consistently appraising the model's accuracy can help maintain its relevance and effectiveness.
Furthermore, implementing the k-fold cross-validation technique is also recommended for
assessing the model's stability and its ability to generalize.
19
References
Alamdari, S., Basiri, M.H., Mousavi, A. and Soofastaei, A., 2022. Application of machine
learning techniques to predict haul truck fuel consumption in open-pit mines. Journal of
Mining and Environment, 13(1), pp.69-85.
Camizuli, E. and Carranza, E.J., 2018. Exploratory data analysis (EDA). The encyclopedia of
archaeological sciences, pp.1-7.
Dasari, D. and Varma, P.S., 2022. Data Cleaning Techniques Using Python. Technology, 1(1),
pp.11- 21. Nongthombam, K. and Sharma, D., 2021. Data Analysis using Python. International
Journal of Engineering Research & Technology (IJERT), 10(7)
Dindarloo, S.R. and Siami-Irdemoosa, E., 2016. Determinants of fuel consumption in mining
trucks. Energy, 112, pp.232-240.
Sahoo, K., Samal, A.K., Pramanik, J. and Pani, S.K., 2019. Exploratory data analysis using
Python. International Journal of Innovative Technology and Exploring Engineering, 8(12),
pp.4727-4735.
Soofastaei, A. and Fouladgar, M., 2022. Energy Efficiency Improvement in Surface
Mining. Energy Recovery.
Terpstra, V.J., Lara-Yejas, O., Mokhtari, K., Santa Cruz, J.H. and de Mattos, M.P., 2021. Fuel
Optimization in Mining Trucks using Machine Learning. In IIE Annual Conference.
Proceedings (pp. 369-374). Institute of Industrial and Systems Engineers (IISE).
Van den Broeck, J. et al. (2005) ‘Data Cleaning: Detecting, diagnosing, and editing data
abnormalities’, PLoS Medicine, 2(10). doi:10.1371/journal.pmed.0020267.
20
Appendix A [Multi- Linear regression]
21
Figure 5: Residual vs Predicted values plot
22
Table 5: Dummy variable encoding for categorical data
23