Group D
Group D
Group D
Integration using ML
and AI in agriculture.
Report by Group D
Nandani Tripathi (23BCY10180)
Dheeraj Saraswat (23BCY10146)
Arav Achari (23BCE11788)
Yuvraj Singh (23BCY10138)
Yohaan Gambhir (23BCE11699)
1
Impact of GIS Integration using ML and
AI in agriculture.
Introduction
The integration of Geographic Information Systems (GIS) with Machine Learning (ML) and Artificial Intelligence
(AI) has transformative implications for modernizing agriculture. This report explores the methodologies employed,
datasets utilized, and evaluates the impact of GIS integrated with ML and AI in agricultural practices.
Technological innovations during the recent centuries have enabled us to significantly boost agricultural production
to feed the rapidly increasing global population. While advances in digital technologies triggered the onset of the
fourth revolution in agriculture, we also have several challenges such as limited cropland, diminishing water
resources, and climate change, underscoring the need for unprecedented measures to achieve agricultural resilience
to support the world population. Geographic information system (GIS), along with other partner technologies such
as remote sensing, global positioning system, artificial intelligence, computational systems, and data analytics, has
been playing a pivotal role in monitoring crops and in implementing optimal and targeted management practices
towards improving crop productivity.
As the world population is projected to grow close to 10 billion by 2050, we need to produce about 50% more food
compared to 2013 production to meet the global demand. This goal needs to be met while facing the challenges of
climate change, the limited scope of arable land expansion, and dwindling water resources. In addition, anticipated
food production also needs to incorporate practices for sustainable management of croplands to preserve soil health,
conserve water resources, and encompass biodiversity. Considering these challenges and constraints in achieving
our food production targets there is an unprecedented need for monitoring of crop growth and health and timely
interventions to maintain or improve crop productivity while reducing wastage of inputs and resources. Advances in
sensors, communication technologies, computational systems, and powerful data analytics are enabling us to
accomplish these tasks. Technologies that can enable efficient use of agricultural inputs and reduce environmental
losses while contributing to increased and sustainable production are of great value for achieving food security.
2
Study Area
The LULC maps were developed for entire India covering a total geographical extent of 3,287,469km . The
physical extent of India considered for mapping LULC in this study lies between 6.5° S and 38° N latitude and
67.5° E − 97.4° E.
Uttar Pradesh is the largest wheat-producing state in India. For the present study, we selected the Maharajganj
district in Uttar Pradesh which lies between 26◦53020” and 27◦28037”N latitude and 03◦07003” and 83◦56030”E
longitude as a case study
3
Figure . Location of the study area. The upper right inset is the union of India, showing the state of Uttar
Pradesh in yellow. In the lower right inset are the districts of UP, with the red color depicting the study
area, Maharajganj. The left inset is the Sentinel-2A image of the study area. The map coordinates are in
the UTM coordinate system and WGS 84 North datum.
Decision Trees:
- Purpose: Decision trees are employed for land-use classification based on GIS data.
- Characteristics: The simplicity and interpretable decision-making process make decision trees suitable for
understanding the allocation of land for different agricultural uses.
- Advantages: Facilitates transparent decision-making and is robust against noise in the data.
- Disadvantages: May struggle with complex, multifaceted GIS data, and there is a risk of overfitting.
4
Neural Networks:
- Purpose: Neural networks, particularly Long Short-Term Memory (LSTM) networks, are utilized for time-
series analysis of agricultural data.
- Characteristics: Deep learning is employed for trend prediction over time, capturing intricate temporal patterns
in the data.
- Advantages: Effective in forecasting crop yields, detecting trends, and handling non-linear relationships in the
data.
- Disadvantages: Requires substantial computational resources and technical expertise for implementation.
Clustering Algorithms:
- Purpose: Unsupervised learning techniques like K-Means clustering aid in identifying spatial patterns in crop
distribution.
- Characteristics: Clustering algorithms help in grouping similar spatial entities, allowing for targeted
interventions in specific agricultural regions.
- Advantages: Identifies spatial patterns and guides precision agriculture practices.
- Disadvantages: Challenges may arise when dealing with diverse data types and noise in the spatial
distribution.
Datasets:
Satellite Imagery:
- Use: High-resolution satellite imagery is crucial for land cover classification, monitoring crop health, and
identifying changes in agricultural landscapes over time.
Topographical Data:
- Use: GIS-derived topographical data assists in understanding the terrain, enabling precision agriculture
practices and water resource management.
5
Cons:
- Data Complexity: GIS data can be complex and multifaceted, posing challenges for ML algorithms,
especially when dealing with diverse data types.
- Technical Expertise: Implementing GIS-integrated ML solutions may require specialized skills, limiting the
accessibility for smaller farmers or less technologically advanced regions.
- Data Privacy Concerns: Integration of AI with GIS raises concerns about data privacy, requiring robust
policies and practices to safeguard sensitive agricultural information.
The integration of GIS with ML and AI presents a revolutionary approach to modernizing agriculture.
While the benefits are substantial, addressing challenges related to data complexity, technical expertise, and
privacy concerns is crucial for widespread adoption and success.
The mapping of different land use classes is an essential component for the management and planning
purpose. The two dominant mainland cover representing forestry and agriculture supports the economy of the
developing countries including India while they are most often under extreme anthropogenic pressure. The
climatic variabilities and climate change pose additional stress on these resources.
In this section, first the results of acreage calculation are discussed using SVM and RF methods, followed by the
estimation of the wheat yield.
Before SVM and RF for classifying wheat areas, a cropland area was extracted from the study
area for all the stages of the wheat. Figure 4a is the sentinel image of the study for the month of
March (flowering and early ripening stage). Figure 4b is binary map showing agriculture-only
classification products, with the symbology yellow depicting agriculture for the same date. SVM
and RF were then applied on the dynamic cropland mask (binary map)
6
Wheat Yield Estimation Using the CASA Model:
Although the study analyses the spread of wheat yield during the whole growing season, the
fPAR, NPP, and the light-use efficiency parameters of the study area had to be parametrized for
improved CASA model results [64]. The calculated NDVI maximum and minimum values for
fPAR evaluation is an essential step beforehand. The NDVI min and NDVI max values for FPAR
calculated are −0.0126 and 0.839. Figure 7 shows the results for fPAR. The fPAR calculated for
January is 0.72 and is less than compared to February (0.71) and March (0.78). In January, the
fPAR for wheat continued to rise due to photosynthesis and exceeded 0.72. The greatest average
fPAR of 0.78 was seen in February 2021, while the lowest mean fPAR was 0.71 in January 2021.
This is due to an increase in wheat photosynthetic activity and the corresponding increase in
growth, which supports the geographical importance of the study area for the suitability of wheat
growth
Figure 7. Spatial distribution of wheat fPAR (fractionally Photosynthetically Active Radiation)
for (a) January, (b) February, and (c) March
The results showed that in the 2020–2021 growing season, all the districts of Uttar Pradesh had
similar wheat growth trends. For assessing the accuracy of the yield estimates from the CASA
model and the NPP-yield conversions, we used 30 CCE (crop cutting experiment) data points to
verify the results. Each pixel around the CCE points was averaged with the eight corresponding
pixels to attain the projected yield for the specified CCE point [67]. Figure 10 shows the assessed
yield as a derivative of the estimated yield and the coefficient of determination R2 = 0.5544, and
the root mean square error (RMSE) was 3.361 Q/ha (Table 3). This analysis revealed a mean
absolute error of −0.56 t ha−1 and a mean relative error of −4.61%, showing a promising
accuracy for assessing regional wheat yield using this method. Moreover, the Pearson’s
correlation coefficient observed between the two is 0.74, which depicts a very good correlation
7
between the modeled and observed wheat yield. In 2020–2021, the estimated yield of wheat in
India was approximately 35 quintals per hectare. This is in close agreement with the results
obtained from the current study
Figure.10 Predicted versus observed wheat yield for the entire study area for the 2020–2021
growing season. The Pearson’s correlation coefficient between the two is 0.74, depicting a good
correlation between the modeled and observed wheat yield.
Conclusions
This study involved wheat acreage estimation using different machine-learning classification
algorithms and the subsequent calculation of wheat yield using the CASA model. This study
again justifies Sentinel-2 remote sensing data’s utility for assessing acreage estimation, as the
results corroborated well with the observational data. The wheat crop area, analyzed using SVM
and RF classifiers, is 148,866 Ha and 146,499 Ha, respectively. Of the two methods tested for
classifying the Sentinel-2 data, RF had a higher mapping accuracy. The Sentinel 2A satellite
8
data-based acreage data product used as input in the CASA model was utilized to assess the net
primary production (NPP) during the 2020–2021 wheat growing season. The NPP-yield
conversion model was thereafter utilized to calculate winter wheat yield at a regional scale. R2 =
0.554 was estimated between the observed and calculated wheat yield, with RMSE equal to 3.36
Q/ha and a relative deviation error of −4.61%. These results showed that the updated CASA
model integrated with the model for NPP-yield conversion could provide reliable regional-scale
calculations of wheat yield from satellite-based remote sensing and biophysical modeling
approaches.