Part 1 FINAL

Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

National Aeronautics and Space Administration

Large Scale Applications of Machine Learning using Remote Sensing for


Building Agriculture Solutions
Part 1: Data Preparation of Imagery & Labels for Large-Scale ML Modeling
John Just (Deere & Co., Iowa State University), Erik Sorensen (Deere & Co.)

March 5, 2024
About ARSET
About ARSET

• ARSET provides accessible, relevant, EARTH SCIENCE


APPLIED SCIENCES
and cost-free training on remote
sensing satellites, sensors, methods,
and tools. AGRICULTURE

CAPACITY BUILDING
• Trainings include a variety of CLIMATE & RESILIENCE
applications of satellite data and are
tailored to audiences with a variety of DISASTERS
experience levels. ECOLOGICAL CONSERVATION

HEALTH & AIR QUALITY

WATER RESOURCES

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 3
About ARSET Trainings

• Online or in-person
• Live and instructor-led or asynchronous and self-paced
• Cost-free
• Bilingual and multilingual options
• Only use open-source software and data
• Accommodate differing levels of expertise

• Visit the ARSET website to learn more.

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 4
Large Scale Applications of Machine Learning using
Remote Sensing for Building Agriculture Solutions
Overview
Motivation for Training

• Timely and accurate in-season crop maps at


local to regional scales is crucial for agricultural
decision-making and management.
• Irregularly-spaced time-series are common with
optical satellite images.
• Training robust models on remote sensing data
often requires very large data, but processing
and training is complex.
• The Cropland Data Layer (CDL, USDA–NASS)
only gives estimates of the types of crops
released to the public a few months after the
end of the growing season, and not their Montage of images shows differences in field
sequence or timing (e.g., for double crops) geometry and size in different parts of the world.
Image credit: NASA (Instrument: Terra – ASTER)

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 6
Training Learning Objectives

By the end of this training series, participants will be able to:

• Use recommended techniques to download and process remote sensing data from Sentinel-2 and
the Cropland Data Layer (CDL) at large scale (> 5GB) with cloud tools (Amazon Web Services [AWS]
Simple Storage Service [S3], Databricks, Spark/Pyspark, Parquet)
• Produce interactive plots of maps, tables, time series, etc. for investigation & verification of data and
models
• Filter data from both the measured (satellite images) and target (CDL) domains to serve modeling
objectives based on quality factors, land classification, area of interest (AOI) overlap, and
geographical location.
• Build training pipelines in TensorFlow to train machine learning algorithms on large scale remote
sensing/geospatial datasets for agricultural monitoring
• Utilize random sampling techniques to build robustness into a predictive algorithm while avoiding
information leakage across training/validation/testing splits

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 7
Prerequisites

• Fundamentals of Remote Sensing

• Crop Classification with Time Series, Part 2

• Sign up for and access Databricks Community Edition

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 8
Training Outline

Part 1 Part 2 Part 3


Data Preparation of Data Loaders for Training & Testing
Imagery & Labels Training ML Models ML Models for
for Large-Scale ML on Irregularly- Irregularly-Spaced
Modeling Spaced Time-Series Time Series of
of Imagery Imagery

March 05, 2024 March 12, 2024 March 19, 2024

Homework
Opens March 19 – Due April 1 – Posted on Training Webpage

A certificate of completion will be awarded to those who attend all live sessions and
complete the homework assignment(s) before the given due date.
NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 9
How to Ask Questions

• Please put your questions in the Questions box and we will address them at the
end of the webinar.
• Feel free to enter your questions as we go. We will try to get to all the questions
during the Q&A session after the webinar.
• The remainder of the questions will be answered in the Q&A document, which will
be posted to the training website about a week after the training.

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 10
Part 1 – Trainers

John Just Erik Sorensen


Principal Data Scientist Senior Data Scientist
John Deere John Deere

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 11
Large Scale Applications of Machine Learning using
Remote Sensing for Building Agriculture Solutions
Part 1: Data Preparation of Imagery & Labels for
Large-Scale ML Modeling
Part 1 Objectives

By the end of Part 1, participants will be able to programmatically:

• Submit lists of boundaries to the NASS API and retrieve CDL rasters back.
• Subsample and visualize retrieved data from CDL with interactive spatial images and
other statistical plots.
• Obtain Sentinel-2 raster files for a given area and timeframe corresponding to the
retrieved CDL data and manipulate the Sentinel-2 rasters into tables in preparation for
analysis and model training.
• Verify correct processing of data via various interactive plots (e.g. time series of pixels of
various land covers).

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 13
Part 1 Section 1:
Irregularly Spaced Time Series Modeling
Irregularly Spaced Time Series Modeling

Common due to: orbital geometry, variations in exact orbit timing/geolocation/image


extents, and atmospheric disturbances due to clouds/smoke or other random events.

NDVI for two fields from Sentinel-2

Irregular
Spacing/timing

A & B are 2 km apart


Points not fit by But A has 2x scenes
lines are scenes (coverage) due to orbit
with cloud cover path overlap

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 15
Motivation for this Example

We propose a real-time prediction for the Cropland Data Layer


(CDL) as the working example for this tutorial.

• The CDL algorithm is already using the same (or similar) satellite
data sources and irregular spacing/timing to make predictions
(documented example of success)
• Labels are readily available via API calls (highly
scalable/available).
• Accuracy is well studied & documented
• Resulting code & methods are highly transferable to other
problems/use cases

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 16
Predicting the CDL in Real-Time
According to the CDL FAQs,
• “The CDL Program uses medium spatial
resolution (30 meter) satellite imagery
because it’s too costly to use higher
resolution satellites to perform crop acreage
estimation over large areas.”
• The CDL is considered confidential and
market sensitive during the growing season
and cannot be released until after the
official NASS year end area county estimates
are published in late January/early February
following the end of the typical US growing
season
• The CDL only gives estimates of the types of “By early July we probably are pretty confident in the
crops, but not their sequence or timing (e.g. crop type just from NDVI in this area (probably even
for double crops) more so by using multispectral time series). By late
August we are really confident (6 months prior to when
However… do we really need to wait until the CDL is typically released).”
following year for accurate estimates?

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 17
Generalizable Approach
• Robust models require large-scale data management tools and approaches.
Leveraging multi-core and multi-machine parallel computing is a necessary
step to scale. We demonstrate these tools & approaches with this series.
• Note that a similar approach to crop modeling with the time-series of imagery
could be used for estimating the crop health or other time-dependent factors
as well (simply substitute the label/target).

By early Spring we are pretty confident in


the crop type for areas that planted
winter crops and will likely have a good
estimate of double cropping by early Fall.

Winter Wheat Soybeans

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 18
Irregularly-Spaced Time-Series Modeling

• There’s a dearth of statistical theory around unevenly-spaced time series, and thus
not much for out-of-the-box methods to apply directly for such situations.
– Most common solution is to manipulate the data into a regularly-spaced time-
series, then apply standard methods. E.g. interpolation or interval-binning (the
latter being what the CDL algorithm does).
– Note that resulting data format from this part-1 demo will support any modeling
approach.
– We follow a similar approach as the CDL (binned intervals) for parts 2 & 3 of
this demo for data loaders & model training due to simplicity.
• Newer ML sequence models such as transformers (“self-attention”) accept
positional encodings of inputs/outputs and learn meaningful absolute and relative
input (and output) information. This could facilitate direct modeling of unevenly-
spaced satellite data, but due to increased complexity we do not incorporate it
here.

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 19
Part 1 Section 1:
Cropland Data Layer (CDL)
Cropland Data Layer (Contiguous United States)

Best place to find information about it is the USDA NASS FAQs & metadata.
Some relevant info:
• Model: decision tree classifier (handles missing, non-continuous, non-normal,
nonlinear data, efficient computation). Probabilistic output (argmax to class)
• Input: Landsat 8 and 9 OLI/TIRS, ISRO ResourceSat-2 LISS-3, and ESA SENTINEL-2A
and -2B. Imagery is downloaded daily with the objective of obtaining at least one
cloud-free usable image every two weeks throughout the growing season
• Ground truth: FSA Common Land Unit (CLU) for ag/crops & National Land Cover
Database (NLCD) for non-ag areas
• Accuracy: Generally, 85% to 95% correct for the major crop-specific land cover
categories. 30m resolution

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 21
CDL Accuracy
As noted earlier, the CDL accuracy is well studied and documented. Below is an excerpt from the USDA
quality checks for Arkansas 2022.
• All states from all years can be found USDA NASS Cropland metadata.
• Some crops like sorghum (in this case) may have low accuracy, but they also represent a tiny
proportion of the farmland. Training a model specifically focused on certain classes could boost
accuracy for those classes, at the expense of others.

Crop-specific covers only *Correct Accuracy Error Kappa


OVERALL ACCURACY** 482475 87.30% 12.70% 0.817

USDA National Agricultural Statistics Service, 2022 Producer's


Arkansas Cropland Data Layer Cover Type *Correct Pixels Accuracy Omission Error Kappa User's Accuracy Commission Error Cond'l Kappa
Corn 55159 86.40% 13.60% 0.855 94.10% 5.90% 0.937
STATEWIDE AGRICULTURAL ACCURACY REPORT Cotton 50682 88.00% 12.00% 0.873 93.20% 6.80% 0.928
Rice 87048 90.30% 9.70% 0.893 96.10% 3.90% 0.957
Sorghum 156 22.70% 77.30% 0.227 77.20% 22.80% 0.772
Soybeans 254301 93.60% 6.40% 0.91 89.60% 10.40% 0.857

*Correct Pixels represents the total number of independent validation pixels correctly identified in the error matrix.
**The Overall Accuracy represents only the FSA row crops and annual fruit and vegetables

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 22
Rough calculations of data size for CDL on Contiguous US
Even though only about 20% of the US is specifically used for crop land, any given
model must run across all the US area to classify the land.
Assuming:
• Use Sentinel-2 for land classification at 10m resolution
• 1 cloud-free image every 2 weeks (26 images total per 100m2)
• 12 bands at 2 bytes (16 bits) per pixel (26 images total per 100m2)
• ~7,500,000 km2 of land

≈44 TB of data post-processed to run a predictive model. While not trivial, 10TB drives only
cost $200. using 30m2 resolution drops this down to ~5TB to work with.
If only using Sentinel-2, roughly 28TB of raster data must be downloaded and processed over
that land since Sentinel-2 tiles are 110 x 110 km2 and have 12 bands, amounting to roughly
640MB per scene, that occur every 5 days.

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 23
Part 1 Section 2:
Sentinel-2 Optical Data
Factors Affecting Quality & Temporal Spacing of Satellite Data
Example factors affecting data:
• Irregularity
– Orbit path overlap – closer to poles leads to increased coverage
– Orbit path/image capture repeatability – the exact position of the image can vary east/west
and result in areas on the edge of scenes to have more uncertainty in coverage.
– Thick cloud cover – when present and identified (thus ignored) there is a gap in coverage.
– SCL errors – when we ignore data from scenes due to SCL category but it’s wrong, we
introduce unnecessary gaps in coverage.
• Quality
– Thin cloud haze – cloud cover isn’t Boolean, it’s a gradient. Sometimes hard to identify when
thin (it alters the reflectance values and may not be caught)
– Tiling system overlap – at the edge of tiles (which is how the data is stored & queried) there is
overlap and slight differences between the values for the same scene and location in
different tiles.
– Geolocation/georeferencing – location of pixels can be incorrect and vary by more than the
size of a pixel (resulting in wrong information for a point location)

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 25
Sentinel-2 Orbit Path Overlap

Nominal availability from


Sentinel-2 is 1 image
every 5 days. However,
overlap between
adjacent orbits

Number of Tracks
increases further from
the equator. Thus,
certain parts of US on
the borders of orbits get
up to 2x coverage per
satellite and results in
intervals of 2 or 3 days
instead of 5.

Reference: https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/revisit-coverage

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 26
Orbit Path Repeatability

Each diagonal line is the west edge of


the same orbit path on different dates

Within a distance of 1km, the same nominal orbit path has several actual
orbit path variations. This can again affect availability of imagery.
NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 27
Sentinel-2 Tile System Overview
Some considerations to be aware of when processing S2 data for analysis & modeling:
• Scenes captured from Sentinel-2 are processed and made available in a unique tiling system
that is a slightly modified version of the military grid reference system (MGRS).

St. Louis, MO (reference city) The tiles overlap and can result in
values from the same scene in up to
four different tiles. There also exists
“joints” in the grid to tie it together
due to overlaying a grid on a
sphere.

For whatever reason, (perhaps


slightly different referencing or
calibration for each tile) band
values for the same scene/location
in different tiles can be slightly
different (“false” differences).

“Joint”
Reference: https://maps.eatlas.org.au/
NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 28
Sentinel-2 Orbit Paths VS Tiling Grid

Sentinel-2 scenes are


much larger than tiles
(swath width of 290km, vs
110km width of a tile). The
overlap at the edges of
orbit paths thus can cover
an entire tile at certain
latitudes.

Tile “Joint”
Reference: Sentinelhub EO Browser
NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 29
Scene Classification Layer (SCL)
The SCL band is useful for rapid identification of data of interest. While not a proper land-cover
classifier like the CDL, it facilitates rapid classification of per-scene pixels into 12 [mostly potentially
transient] categories. Some highlights:
• Most common use case is identification of cloud cover, and there is a separate cloud mask
available with probabilities (using the Sen2Cor algorithm). We also use for identifying vegetation
in this demo
• 60m resolution, using single pixel from a single scene for prediction
• Can be error prone

Algorithms

Cloud and cirrus cloud detection rates


and land, water, snow and shadow
misclassification rates as clouds as
determined using 108 Sentinel-2 scenes
hand labeled by Hollstein et al.
Reference

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 30
Common Issues/Limitations
Inconsistent geolocation/georeferencing of pixels and default scene classification labels from providers
(e.g., SCL layer from Sentinel-2) aren’t always accurate.
Inconsistent geolocation (10-15m disagreement Poor scene classification, mislabeling clouds as bare
between subsequent images 5 days apart) ground and water as snow due to cloud haze, etc.

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 31
Part 1 Section 3:
Databricks Procedural Demo (Run the code)
Databricks Community Edition Overview
Link to instructions to signup for Databricks Community Edition
• Jupyter Notebook style coding
• Databricks Community Edition allows up to 10GB persistent storage on the “FileStore”
– Can store generic files, tables, and code.
– Notebooks are stored in the “workspace” area
• Can spin up small instances with 2 cpus,15GB RAM, 130GB local storage, Spark
enabled out of the box
• Anything stored on the local machine is lost when the instance shuts down
• Running notebook code longer than ~60 minutes will cause the node shutdown.
However, as long as you are interacting with the notebook (writing and running
code manually) it usually will stay up longer

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 33
Demo Information and Notes
Available materials for this demo:
• Three data processing scripts for Part-1 (CDL acquisition, Sentinel-2 acquisition,
final data manipulation)
• Generalization: The CDL table could be any ground truth label + point location +
timeframe, and the rest of the data acquisition & modeling can remain the same.
E.g., crop health, vegetative stage, or other types of land cover classes.
• Any systematic error from the CDL will likely pass to models trained on it.
• To get data as it would exist after running the scripts in this training demonstration,
download the zip files located with the other training materials.
How to download the resulting files from your Databricks account FileStore:
• Is somewhat not intuitive. To download, you should navigate to the path of YOUR
file using the below format.
– https://community.cloud.databricks.com/files/path/to/folder/filename.extension
• ‘path/to-folder’ is the directory path where your file exists on the file store.

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 34
Code Steps

Strategy for processing and storing this data:


• Step 1: Define areas of interest (AOIs)
• Step 2: Acquire corresponding CDL data
• Step 3: Search and filter for available satellite data
• Step 4: Acquire corresponding satellite data
• Step 5: Rearrange Parquet file of satellite band values data into a single
row per pixel location & season, with columns for time series components in
the form of lists of values (e.g., band values, scene dates) to support
modeling

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 35
APIs Brief Overview
In place of manually retrieving data, APIs make data acquisition & processing
significantly more scalable by providing a consistent interface to search for and
retrieve large amounts of data via web requests. We primarily rely on two APIs in this
demo for data acquisition.
• CDL API from NASS geo data (link)
• AWS STAC API for sentinel-2 imagery searches.
– Sentinel-2 image raster data can be downloaded via web URL download links
that we can access directly once known from the imagery search. Since these
are large though and slow down processing, we will do our best to minimize
the downloading of any superfluous or low-impact scenes.

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 36
Area of Interest (AOI) & Boundary Creation
The only prior step needed typically before this part is to define AOIs. For this work we used the
nassgeodata web gui to draw 7 boxes and export them as ESRI shapefiles. Then convert them to
bounds (left, bottom, right, top) in the EPSG:5070 CRS (as required by the nassgeodata API). We
provide bounds already in the CDL acquisition code.
Example python code to get bounds from [zipped] ESRI shapefiles from NASSGEO:
import ge opa nda s a s gpd
f r om pypr oj i mpor t CRS
i mpor t pa nda s a s pd
r oot _pa t h = ' C: / Us e r s / myna me / Downl oa ds / '
# Li s t of f i l e pa t hs f or e s r i s ha pe f i l e bounda r i e s ( e xpor t e d i nt o z i ps f r om na s s ge o)
pa t hs = [ r oot _pa t h + ' CDL_12345. z i p' , r oot _pa t h + ' CDL_6789. z i p' ]
gdf _l i s t = [ ] # Cr e a t e a l i s t t o hol d t he Ge oPa nda s da t a f r a me s
# Re a d e a c h s ha pe f i l e i nt o a Ge oPa nda s da t a f r a me a nd a ppe nd i t t o t he l i s t
f or pa t h i n pa t hs :
gdf = gpd. r e a d_f i l e ( " z i p: / / " + pa t h)
gdf _l i s t . a ppe nd( gdf )
# Conc a t e na t e a l l t he da t a f r a me s i nt o a s i n gl e Ge oPa nda s da t a f r a me
c ombi ne d_gdf = gpd. Ge oDa t a Fr a me ( pd. c onc a t ( gdf _l i s t , i gnor e _i nde x=Tr ue ) )
t a r ge t _c r s = CRS( " EPSG: 5070" ) # De f i ne t he t a r ge t CRS ( EPSG: 5070)
gdf _5070 = c ombi ne d_gdf . t o_c r s ( t a r ge t _c r s ) # Conve r t t he Ge oDa t a Fr a me t o t he t a r ge t CRS
pr i nt ( gdf _5070. bounds . a ppl y( l a mbda r ow: ' , ' . j oi n( ma p( s t r , ma p( i nt , r ow) ) ) ,
a xi s =1) . t o_s t r i ng( i nde x=Fa l s e ) )

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 37
CDL Acquisition Code Summary
The results of this first part are a spatially down-sampled version of the CDL for the user specified AOIs and
years.
• This code part executes quite rapidly (few minutes) and results in a parquet table with a single 30m2
pixel/year per row (with associated CDL estimate).
• The below table summarizes the top 5 CDL categories across all AOIs, and % of entire dataset per year that
each CDL category represents. E.g., 2021 Soybeans represented ~36.6% of the land cover for that year.

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 38
Sentinel-2 Acquisition Code Results
Summary: For each pixel/year from CDL acquisition code, this code acquires associated
Sentinel-2 data for that entire year and saves in a Parquet table. Note that this part takes a
long time to execute. It will time-out the free version of Databricks after an hour.
“Duplicate” data has the same date & tile, but perhaps due to updated processing has slightly
different values. These are left in the data for this work and removed in data loader, but
probably best to only keep the one with largest number at the end of the tile (latest processing?).
When duplicate values are due to tile overlap, choosing one randomly can be fine.

…123 rows total available for this particular pixel/year. Each row includes the
band values for that location from a single scene/date.

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 39
Sentinel-2 Acquisition Code (Plotted)
Example data from a single pixel/year of corn. Includes duplicated data.
Example “duplicate” data
(slightly different band values Clearly an error in classification (only happens
for same scene) once and this during middle of growing season).

Clearly an error in classification (only happens


once and this pixel is in a corn field, so no
objects nearby casting a one-time shadow).
NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 40
Final Data Manipulation
A final rapid manipulation to the data combines all available scenes into a single row for each pixel/year.
• Several columns (bands, tiles, img dates, scl_vals) are all lists of values from the scenes for each row
• E.g., a pixel with 123 scenes would have 123*12 values in the list in the “bands” column.
• Lists of values are converted to binary strings for efficient storage (eliminating the “list” datatype, and
commas for everything except the “tiles” column)
Final table Example decoded binary columns

Each row is a single 30m2 CDL pixel from 1 year

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 41
Part 1:
Summary
Summary
• APIs allow us to automate and scale very large data processing pipelines in
preparation for analysis and model building.
• Storing data in Parquet format and using Spark/Databricks to query/pivot or
manipulate the data enables rapid investigation and transformation
– The Parquet format has useful abstractions like partitions, which are also
directories
• A convenient form for modeling time-series imagery data involves storing in
parquet table format, with each row representing a pixel for a given time interval
and having columns of:
– Band values, scene dates, scene classification values over that time interval
– Scalars for lat, lon representing the center point of the pixel (could substitute
an Uber H3 hex or Google S2 cell instead)
– A prediction target (ground truth)

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 43
Looking Ahead to Part 2

• Process data to prepare for model training using TensorFlow


• Properly split the data into train/val/test splits to avoid “data-leakage”
• Convert the irregularly-spaced time-series imagery into bucketed time-series to
prepare for model training
• Modify the CDL labels to align with our training goals

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 44
Homework and Certificates

• Homework:
– One homework assignment
– Opens on March 19
– Access from the training webpage
– Answers must be submitted via Google Forms
– Due by April 1

• Certificate of Completion:
– Attend all three live webinars (attendance is recorded automatically)
– Complete the homework assignment by the deadline
– You will receive a certificate via email approximately two months after
completion of the course.

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 45
Contact Information

Trainers: • ARSET Website


• John Just (John Deere) • Follow us on X (formerly Twitter)!
[email protected] – @NASAARSET

• Erik Sorensen • ARSET YouTube

[email protected]
• Sean McCartney
Visit our Sister Programs:
[email protected]
• DEVELOP
• SERVIR

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 46
Questions?

• Please enter your questions in


the Q&A box. We will answer
them in the order they were
received.
• We will post the Q&A to the
training website following the
conclusion of the webinar.

https://earthobservatory.nasa.gov/images/6034/pothole-lakes-in-siberia

NASA ARSET – Large Scale Applications of Machine Learning using Remote Sensing for Building Agriculture Solutions 47
Thank You!

NASA ARSET – Insert Training Title Here 48

You might also like