Interapro 2019 Bids

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/331524023

ADVANCES IN INTERACTIVE PROCESSING AND VISUALISATION WITH


JUPYTERLAB ON THE JRC BIG DATA PLATFORM (JEODPP)

Conference Paper · February 2019

CITATION READS

1 167

2 authors, including:

Davide De Marchi
European Commission
12 PUBLICATIONS   146 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

JRC Earth Observation & Social Sensing Big Data Pilot project View project

All content following this page was uploaded by Davide De Marchi on 05 March 2019.

The user has requested enhancement of the downloaded file.


ADVANCES IN INTERACTIVE PROCESSING AND VISUALISATION WITH JUPYTERLAB
ON THE JRC BIG DATA PLATFORM (JEODPP)

Davide De Marchi and Pierre Soille

European Commission, Joint Research Centre (JRC)


Directorate I. Competences, Unit I.3 Text Data Mining, via Fermi 2749, 21027 Ispra (VA), Italy

ABSTRACT 2. FROM JUPYTER TO JUPYTERLAB

The JRC Big Data Platform (JEODPP) is serving JRC JupyterLab1 is an evolution of Jupyter released to users in
projects and their partners for any big data application with February 2018. It provides a series of features improving the
emphasis on geospatial data. It has recently evolved into user-experience such as the use text editors, terminals, data
a multi-petabyte scale platform offering advanced web en- file viewers, and other custom components side by side with
abled services for container-based batch processing, remote notebooks in a tabbed work area [2]. In particular, it gives the
desktop, and interactive analysis and visualization. This possibility to redirect the map view area in a side map.
later service, based on Jupyter, has recently unfolded into a
complete and powerful prototyping environment. This paper
3. DATA COLLECTIONS
describes the most significant advances in this area.
Index Terms— deferred processing, Sentinel, Coperni- Since its inception, the JEODPP platform has been charac-
cus, visualisation, Jupyter, JupyterLab, IPython terized by providing users with a wide variety of raster and
vector geospatial datasets. Complex raster collections such
as those originating from the Sentinel-1 and Sentinel-2 satel-
1. INTRODUCTION lites are available alongside continental or global open-source
DEMs (EU-DEM, SRTM, GEBCO, MERIT, etc.). They can
be interactively visualised together with vector datasets such
The results presented in this paper are implemented on the
as NATURA2000, EFFIS forest fires, administrative territo-
JRC Big Data Platform (JEODPP) [4]. This platform serves
rial units, and land use-land cover at European level (Corine
the needs of JRC policy support activities requiring big data
Land Cover and Urban Atlas). New datasets are continuously
capabilities with emphasis on geospatial data as well as any
integrated in the interactive mode. This is for instance the
data sources associated with a geolocation (news articles,
case of the new global DEM at a resolution of 30 meters pro-
official statistics, pictures, etc.). The platform is implemented
duced by the Japan Aerospace Exploration Agency (JAXA)
on a commodity hardware solution scalable to the multi-
and called ”ALOS World 3D AW3D30” [6]. The deferred
petabyte scale. This is achieved through distributed storage
mode processing functions of the JEODPP APIs allow to ob-
(CERN EOS) coupled with a cluster of computing nodes for
tain high-impact visualizations such as that presented in Fig. 1
distributed computing. The JEODPP platform can be viewed
displaying the coloured hill-shading of the ALOS DEM with
as a three layer pyramid with a multi-petabyte scale storage
a custom colour scheme.
and processing basis. The first layer accommodates massive
Other dataset recently made available in the interactive
batch processing. The second layer provides a remote desktop
mode are the Sentinel-1 global mosaic [5], and a cloud-free
environment with all software needed for further developing
Sentinel-2 global mosaic calculated from images acquired
legacy applications. The tip of the pyramid (third layer) pro-
in 2017 [3]. No less important is the availability of many
vides interactive visualization and analysis in a web-based
basemaps that can be used as background of views and rang-
environment integrated in a Jupyter notebook [1].
ing from the classic OpenStreetMap, OpenTopoMap, up to
This paper concentrates on the advances regarding the MODIS data with daily granularity, high resolution aerial
interactive analysis and visualization layer. The following images, maps in neutral colors that better enhance the geo-
aspects are detailed: evolution from Jupyter to JupyterLab, graphic content superimposed on them. The selection of the
availability of new data collections, the possibility to execute basemap to use or of the dataset to be displayed, takes place
arbitrary Python code, and applications for users without pro- in a very simple way that allows the user to view the datasets
gramming capabilities exploiting the temporal dimension of
geospatial data cubes. 1 JupyterLab: https://jupyterlab.readthedocs.io
Fig. 1: ALOS Global Digital Surface Model ”ALOS World Fig. 2: (Top left) Informal description of the Stubble Burn-
3D 30m (AW3D30)” rendered on-the-fly on the JEODPP with ing detection algorithm. (Top right) Python implementation
a colored hill-shading whose parameters are user-defined. of the algorithm using Numpy methods. (Bottom) Multi-band
processing chain containing the execute step calling the cus-
tom Python function.
available in a tree structure, easily navigable and searchable
using keywords as well as the self-completion functions avail-
able in the Python language. Finally, place names originating Thanks to the Python inspect module, the source lines of
from CartoDB2 can be overlaid on the displayed layer. the user function are read and sent to the C++ Tile Engine
server, where a Python on-the-fly interpreter is instantiated.
4. EMBEDDING PYTHON CODE IN THE TILE The code is then executed within the interpreter at each tile
ENGINE request. The function can freely modify the input image pix-
els to pass them to the next step of the processing chain.
At the base of the interactive component of JEODPP there is As an example, in Fig. 2, an application of the execute
a library developed in C++ language that represents the real processing chain to the detection of stubble burning (the de-
heart of the Tile Engine, a highly parallelized component that liberate setting fire of the straw stubble that remains after
creates in real-time and in deferred mode, the raster tiles to wheat and other grains have been harvested) from Sentinel-
send to the visualization based on the ipyleaflet widget3 . 2 images, based on a simple algorithm that takes as inputs the
After selecting the datasets to be displayed, processing bands B04, B06, B08 and B11 (Short Wave Infra Red band).
chains transforming and processing the input data in order This new development is opening many new scenarios
to extract the desired information are defined. The basic ele- to the JEODPP users that gain a completely new flexibility
ments of these chains are a series of processing steps that have in the analysis and processing of geospatial datasets. More-
been implemented within the Tile Engine through the integra- over, it will allow, in the near future, to also take benefit from
tion of open source processing libraries or libraries developed the many Machine Learning libraries available in Python that
over the years at the JRC. These libraries (mialib and pktools, could be injected inside the server-side processing chain to
among others) provide the main functions necessary for the extract valuable information from EO data using artificial in-
processing of geographical data and allow for the creation telligence techniques.
of complex processing chains based on morphological oper-
ators, image classification and segmentation functions, band
arithmetic, and image filtering in space and time. 4.1. Widget enabled applications
Although the list of processing steps is rather extensive
and comprehensive, the need has emerged for adding user- Researchers or scientists who possess programming capa-
defined functions. In a language like Python, dozens of very bilities can easily get into using python to interact with the
efficient libraries manipulating images are available (e.g., viewing and analysis environment within Jupyyer notebooks.
Numpy, Scipy nd-image, Scikit-image, Python Imaging Li- Despite this, the need has emerged to create simpler tools
brary, and OpenCV). By adding a special processing step, that would allow to work with geographic datasets through
called execute, which allows the user to send to the Tile En- a graphical user interface. This is even more true for the
gine that operates server-side, any function written in Python manager level users and policy officers having no or very
and that could potentially use such libraries. little programming knowledge. For these users, an inter-
2 CartoDB:
https://carto.com
face exploiting the analysis/viewing capabilities without the
3 Ipyleaflet: https://github.com/jupyter-widgets/ need to write code is needed. Within the Jupyter world, this
ipyleaflet can be effectively and efficiently enabled by using the ipy-
Fig. 3: S2-Explorer notebook in action. Loading of a custom
vector shapefile on the map, search of the Sentinel-2 products
that cover it and filtering on cloud cover percentage.
Fig. 4: Extraction of the multi-temporal NDVI profile over
the pixels inside a polygon by the S2-Explorer tool.
widgets suite4 for the creation of user interfaces and on the
use of components such as Bqplot5 for charting functions
and Qgrid 6 for displaying alphanumeric data in rows with ery five days. This gives space to many applications were
intuitive scrolling, sorting, and filtering controls. the exploitation of the short revisit time allows for important
The first fully fledged application implemented, called the analytical results, whether for the agricultural, the forest, or
S2-Explorer, is devoted to easy browsing and searching of disaster monitoring applications. The JEODPP S2-Explorer
Sentinel-2 products. It consists of a tabbed interface contain- leverages on the temporal dimension of Sentinel products by
ing many functions going from searching of products cover- providing a series of analysis tools based on multi-temporal
ing the current view extents, displaying one or multiple prod- data.
ucts on the map, selecting between many possible RGB bands
compositions, apply local stretching to the products visualiza- 5.1. Temporal profiles
tion, and filtering of the searched products on multiple crite-
ria (for instance on cloud cover or product type or acquisition One of the tabs of the S2-Explorer tools is dedicated to the ex-
dates, etc.). Input capabilities allow the users of S2-Explorer traction of multi-temporal information over the pixels covered
to easily add a custom vector shapefile to the map, and use by polygonal features. After having searched and filtered the
it to select the Sentinel products covering its features. Many Sentinel-2 products, users can ask the system to calculate, for
export functions are available, e.g. export the list of the se- each of the products, the mean value of an index (e.g., NDVI,
lected Sentinel-2 products to be used for batch processing op- EVI, or SAVI) or of a band, together with their standard devi-
erations, export of the map view in high resolution TIFF file, ation and plot both values on a temporal graph where the line
creation of an animation video that displays one after the other series represent the mean value inside the polygon and the
all the filtered products. The measure and draw tabs can be vertical bars the homogeneity measured by the standard de-
used to measure distances and areas on the map and the cre- viation (see Fig. 4). This function has many applications and
ation of vector features with the possibility to save them in a can be activated also on a custom edited polygon. It has to
vector format. Figure 3 shows a snapshot of the S2-Explorer be noted that the extraction is done server-side on the highly
in action. parallel cluster and gives the results in few seconds even in
case of dozens or hundreds of input products. Once the de-
sired measurement is determined, it can be scaled to datasets
5. EXPLOITING THE TEMPORAL DIMENSION of any size via the batch processing service.

The Copernicus constellation of EO satellites is characterized


5.2. Easy comparison using the split map control
by the high temporal resolution of its acquisitions. As an ex-
ample, after the lunch of Sentinel-2B satellite, all emerged The ability to easily compare two different views of the same
lands will be covered by a new optical product at least ev- area, whether based on two different datasets or on two dif-
4 ipywidgets:
ferent processing chains applied to the same data, is a funda-
https://ipywidgets.readthedocs.io/en/
stable/
mental capability of any geospatial data viewer. It can be used
5 Bqplot: https://github.com/bloomberg/bqplot for comparing two complementary datasets (like a DEM and
6 Qgrid: https://qgrid.readthedocs.io/en/latest a EO image, or a basemap and a vector dataset, etc.) or two
6. OUTLOOK

Batch processing and remote desktop capabilities are fun-


damental to any platform aiming at extracting insights from
massive datasets. Interactive analysis and visualization are
also essential to leverage on the increasing number and diver-
sity of the available datasets. They help users to discover new
datasets and combine them for prototyping new information
extraction workflows. In this respect, the possibility to exe-
cute arbitrary code opens an avenue for countless possibilities
and in particular has unfolded the interactive analysis and vi-
sualisation layer into a complete and powerful prototyping
environment. In addition, the code used in the interactive
Fig. 5: Temporal comparison of two SWIR RGB composi-
mode is decoupled from the complexity of the visualisation
tions, from Sentinel-2 images acquired before and after the
engine so that it can be directly used for batch processing for
2018 Mati fires in Greece.
which precise analysis can be performed in any desired pro-
jection. On the other hand, higher level interfaces based on
widgets are increasing the outreach and impact of geospatial
data by attracting users with no programming skills. Future
developments of the JRC Big Data Platform include the inte-
gration of machine learning in all layers of the platform while
expanding the variety of the available datasets including those
not related directly to geospatial data in support to decision
and policy making.

7. REFERENCES
Fig. 6: Video overlay of a georeferenced temporal video
[1] De Marchi, D. et al. “Interactive visualisation and anal-
showing one after the other all the Sentinel-2 images acquired
ysis of geospatial data with Jupyter”. In: Proc. of the
over an area within a selected time period.
BiDS’17. 2017, pp. 71–74. DOI: 10.2760/383579.
[2] Granger, B. and Grout, J. “JupyterLab: Building Blocks
for Interactive Computing”. Slides of presentation made
acquisition dates from the same sensor. This is implemented
at SciPy’2016. 2016. URL: http : / / archive .
thanks to a new function available on ipyleaflet: the split map
ipython.org/media/SciPy2016JupyterLab.
control. When activated, the tool displays a vertical line at
pdf.
the center of the map and the two datasets on each side of the
line. The user can move the line horizontally to quickly com- [3] Kempeneers, P. and Soille, P. “Optimizing Sentinel-
pare the left and right map on the same geographic area. This 2 image selection in a Big Data context”. Big Earth
is illustrated in Fig. 5 in the case of forest fires by comparing Data 1.1–2 (2017), pp. 145–158. DOI: 10 . 1080 /
the last Sentinel-2 product acquired before the event with the 20964471.2017.1407489.
first acquired after it. [4] Soille, P. et al. “A Versatile Data-Intensive Computing
Platform for Information Retrieval from Big Geospa-
tial Data”. Future Generation Computer Systems 81.4
(Apr. 2018), pp. 30–40. DOI: 10.1016/j.future.
5.3. Georeferenced temporal video
2017.11.007.
Recent versions of ipyleaflet allow for the overlay of videos [5] Syrris, V. et al. “Mosaicking Copernicus Sentinel-1 data
on top of the map. We are using this new function to cre- at global scale”. IEEE Transactions on Big Data (2018).
ate georeferenced videos starting from the multi-temporal DOI : 10.1109/TBDATA.2018.2846265.
Sentinel-2 products acquired over the same area. The videos [6] Takaku, J. and Tadono, T. “Quality updates of AW3D
can be played directly over the map (even while zooming and global DSM generated from ALOS PRISM”. In: 2017
panning) and give an interesting insight on the evolution of IEEE International Geoscience and Remote Sensing
the land over time with possible application in agriculture Symposium (IGARSS). July 2017, pp. 5666–5669. DOI:
(like the evaluation of harvesting times) or in forest monitor- 10.1109/IGARSS.2017.8128293.
ing (e.g. control of illegal deforestation activities).

View publication stats

You might also like