Computer Vision News

October 2021
Computer Vision News

The Magazine of the Algorithm Community
with the new supplement

Medical
Imaging Best of
News MICCAI
2021
page 28
2 Summary
Editorial
Dear reader,
Over 5 successful years, Computer Vision News has been at the forefront
of every topic in AI and computer vision, from self-driving vehicles to
OCR, precision agriculture to general robotics.
But one field in particular has seen immense progress and enjoyed
growing coverage in our magazine: AI in Medical Imaging. We haven’t
failed to notice the stories you consume the most and recognize the
huge demand for quality MedTech content from the community.
This month, Computer Vision News is taking a giant leap forward. As a
pioneer in medical imaging, it’s only natural that RSIP Vision leads the
way again in publishing its newest supplement: Medical Imaging News.
Computer Vision News There is no better time to launch Medical Imaging News than now, as we
celebrate the outstanding success of MICCAI 2021. Don’t miss our BEST
Editor: OF MICCAI stories (page 32) to learn more about the fascinating and
Ralph Anzarouth innovative work on display at this premier virtual event.
Engineering Editors: Whether you’re interested in general computer vision, medical AI, or
Marica Muffoletto both, let us continue to be your online home for the latest scientific and
Ioannis Valasakis technological developments and highlights with a human touch.
Computer Vision News and Medical Vision News will always be published
Head Designer:
together and accessible from the same link, so please tell your colleagues
Rotem Sahar
and friends and share the link so that they can subscribe for free - the
Publisher: more, the merrier!
RSIP Vision Ralph Anzarouth
Contact us Editor, Computer Vision News
Free subscription Marketing Manager, RSIP Vision
Read previous magazines
“Thanks, Ralph Anzarouth for continuing to do an excellent job
Copyright: RSIP Vision
Copyright: highlighting the activities at MICCAI2021 with your daily newsletter.”
All rights reserved Anne Martel
Unauthorized reproduction Professor at University of Toronto and
is strictly forbidden. Senior Scientist at Sunnybrook Research Institute
Follow us:
Summary
Summary 3
Computer Vision News Medical Imaging News
04 30
10 32
Overview of UNETR. Our
proposed model consists Best of
of a transformer MICCAI
encoder that directly 2021
utilizes 3D patches and
is connected to a
CNN-based decoder via
skip connection.
16 54 Qualitative comparison
of different baselines.
UNETR has a
significantly better
segmentation accuracy
for left and right adrenal
glands, and UNETR is the
only model to correctly
detect branches of the
adrenal glands.
UNETR proposes to use a patch-based while CNN-based approaches fail to

approach with a transformer-based accurately segment these organs. See
22 64
encoder to increase the model’s the figure above for more qualitative
capability for learning long-range comparisons between UNETR and
dependencies and effectively capturing other CNN-based and transformer-
global contextual representation at based segmentation models.
multiple scales. For instance, in the UNETR has shown promising
multi-organ segmentation task, UNETR performance on various volumetric
can accurately segment organs with medical image segmentation tasks
complex shapes (e.g. adrenal glands) such as multi-organ segmentation
and low contrast (e.g. portal veins) using Multi Atlas Labeling Beyond The
04 Removing Diffraction Image...

Computer Vision Research
30 Bladder Panorama Generator
AI for Urology
by Marica Muffoletto
10 Creating a multi-object tracking...

Coding Workshop
32 Best of MICCAI
The best features chosen by us
by Ioannis Valasakis
16 Spleenlab
AI Systems for Autonomous Mobility
54 Transformers in Medical Imaging
by NVIDIA and MONAI
22 RePAIR Pompeii
AI for Archeology
64 Computer Vision for Automated...
ICCV Workshop Preview
2
4 Computer
Summary Vision Research
Removing Diffraction Image Artifacts in Under-Display
Camera via Dynamic Skip Connection Network
by Marica Muffoletto
Dear readers, welcome back to a new issue of Computer Vision
News full of great research!
This month we are reviewing a paper from CVPR 2021, Removing
Diffraction Image Artifacts in Under-Display Camera via
Dynamic Skip Connection Network, written by Ruicheng Feng,
Chongyi Li, Huaijin Chen, Shuai Li, Chen Change Loy, Jinwei Gu.
We are indebted to the authors for allowing us to use their
images to illustrate this review. Their paper can be found here.
If you like to play around with the quality of cameras in your tech devices, this
might become your favourite CVPR paper of the year. The subject of this research
is indeed how to remove image artifacts in the newly defined imaging system
called Under-Display Camera (UDC), which is employed in some smartphones, for
videoconferencing with UDC TV, laptops, or tablets. UDC introduces a new class of
complex image degradation problems (strong flare, haze, blur, and noise), which
still need to be satisfactorily dealt with by the computer vision community. A typical
UDC system is constituted by a camera module placed underneath and closely
attached to the semi-transparent Organic Light-Emitting Diode (OLED) display.
The picture on the left illustrates how this

setup affects the light propagation: although
the display looks partially transparent, the gaps
between the display pixels (a), where the light
can pass through, are usually in the micrometer
scale and hence the incoming light gets
substantially diffracted. This effect is modelled
through a Point Spread Function that can be
measured to convert a Real HDR Scene into a
UDC image.
This paper aims to mathematically describe

this phenomenon and investigate the artefacts
caused by the diffraction effects in such a
system.
53
Removing Diffraction
Summary
Image Artifacts in ...
The first contribution consists in the formulation of an image formation model for
UDC systems which considers dynamic range and saturation and could simulate
more complex and realistic degradation compared to the State-of-the-Art.
The degradation model is described as:
𝑦𝑦̂ = 𝜙𝜙[𝐶𝐶 (𝑥𝑥 ∗ 𝑘𝑘 + 𝑛𝑛)]
where x represents the real scene irradiance that has a high dynamic range (HDR), k
𝑦𝑦̂ = 𝜙𝜙[𝐶𝐶
is the known convolution kernel (PSF), (𝑥𝑥 ∗ 𝑘𝑘 +
denotes 𝑛𝑛)2D
the ] convolution operator, and
n models the camera noise. C(·) is a clipping operation with a set threshold and φ(·)
is a non-linear tone mapping. These two elements add a saturation effect derived
from the limited dynamic range of digital sensors and make the model closer to the
human perception of the scene.
The second element in the paper consists in defining the PSF. This can be simulated
but it was found that the real-measured PSF slightly differs in colours and contrast
due to model approximations and manufacturing imperfections.
Hence, the authors measure the real-word PSF by placing a white point light
source 1-meter away from the OLED display. The PSF is used as part of a
model-based data synthesis pipeline to generate realistic degraded images.
To do this, the objects considered are real scenes with high dynamic range. This is
essential because 1) the spike-shaped sidelobes (typical of the PSF) can be amplified
to be visible (flares) in the degraded image, and 2) due to the high dynamic range
of the input scene, the digital sensor (usually 10-bit) will inevitably get saturated in
real applications, resulting in an information loss.
Hence, images captured by UDC systems in real scenes will exhibit structured flares
near strong light sources. The previous imaging system, however, cannot model
this degradation, because it captures images displayed on an LCD monitor, which
commonly has limited dynamic range.
This is shown below, where is demonstrated that the real HDR scene captured by the
62 Computer
SummaryVision Research
UDC device (b) shows flare effects near strong light sources, while for the monitor-
generated LDR scene (c) with a limited dynamic range, these are no longer visible.
Moreover, the authors introduce a second experiment to show the importance of

using real HDR scenes in the image formation model to correctly model the real
degradation of a typical UDC system. The image below illustrates a scene clipped
from LDR on the left, and HDR on the right, where the flare artifacts caused by
diffraction effects are only visible in the latter.
As last contribution, the paper presents a DynamIc Skip Connection Network

(DISCNet) that incorporates the domain knowledge of the UDC image
formation model to get rid of the diffraction artefacts in the UDC images.
This is found within a main restoration branch which builds upon an encoder-
decoder architecture with skip connections to restore the degraded images. The
Summary
Image Artifacts in ... 73
extracted features from the encoder are fed into DISCNet which transforms them
into R1, R2, R3. These are then reconstructed back to the final tone-mapped sharp
images.
The network is fed with condition maps of size H x W x (b+C), where b stands for the
kernel code (a b-dimensional vector of the PDF dimensionally reduced by Principal
Component Analysis) and H, W, C represent size and channels of the degraded
images respectively.
Given the condition maps as input, the condition encoder extracts scale-specific
feature maps H1, H2, H3 using 3 blocks like the encoder of the restoration branch.
This manages to recover saturated information from nearby low-light regions in the
degraded images with spatial variability. Then, the extracted features at different
scales are fed into their corresponding filter generators, where each comprises a
3 × 3 convolution layer, two residual blocks, and a 1 × 1 convolution layer to expand
feature dimension. The predicted filters are output and passed into a dynamic
convolution element which finally refines the features and cast them into the
main restoration branch.
The network is trained on a combination of:

• A synthetic dataset, generated from HDR images with large dynamic ranges,
from which a degraded image has been simulated using the degradation
model defined above and the calibrated PSF.
• A real dataset, made of three images of different exposures, taken with a ZTE
Axon 20 phone and combined in a unique HDR image.
82 Computer
SummaryVision Research
Experimental results show that the method is effective for removing diffraction
image artifacts in UDC systems.
Figures below show how the proposed DISCNet successfully suppresses flare and
haze effects around highlights and removes most artifacts in nearby saturated
images. The results of the proposed method are compared with 4 State-of-the-
Art methods: the Wiener Filter, a classical deconvolution algorithm for linear
convolution formation which achieves the lowest image quality compared to the
other deep learning methods; the SRMDNF method which uses a super-resolution
network and cannot adapt to degraded regions caused by highlight sources; the
SFTMD network which iteratively connects the kernel code of degradations and,
in this application, also leverages the kernel information to solve the non-blind
problem (comparable performance but highest computational cost) and finally the
DE-UNet method, a Double-Encoder UNet which doesn’t explicitly use the kernel
information (blind model).
Similarly, comparisons with representative methods are shown for the real images
below. The network proposed by the authors manage to remove diffraction image
effects, while leaving least artifacts introduced by the camera. Since the ground-
truth images are inaccessible, another comparison is made with the camera output
of a ZTE phone.
Summary 3
Image Artifacts in ... 9
PyTorch codes and data from this paper are available on github. Best of luck with
making your images look great
See you all next month!
2
10 Coding
SummaryWorkshop
Coding Workshop: Creating a multi-object tracking
model using a pre-trained RNN
Nice to meet you everyone this month! I hope that you

enjoyed the last technical coding and you didn’t have a
great trouble following it. I am always eager to receive more
feedback from you Let me know what computer vision
tools are your favorite or if you have trouble implementing
a deep learning method on visual computing and I will be
happy to cover that in an article!
This month’s article is about using a pre-trained network (you can find that
implementation on the GitHub MOT Challenge.
MOT Challenge
There is remarkable progress on object detection and association in recent years
which are the core components for multi-object tracking. The main focus though
has been on improving singular networks. In this example (see the original paper
to read more about the topic), the proposed architecture (pre-trained network)
combines two tasks in a single network to improve the inference speed.
Older research has shown degraded results in this combined network, mainly
because the association branch is not appropriately learned. Here, after discovering
the reasons behind the failure, a simple baseline was presented to address the
problems. It was shown that it remarkably outperforms the state-of-the-arts on
the MOT challenge datasets at 30 FPS. Hopefully, this baseline could inspire and
help evaluate new ideas in this field. Now let’s see the implementation of the pre-
trained network!
Creating the R-CNN

With the following code you can implement a variation of the pre-trained model
and use COCO to download the trained weights, import the R-CNN mask and set a
specific image directory where the images can be run for the training.
Creating a multi-object
Summary
tracking model... 11
3
import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt
# Root directory of the project

ROOT_DIR = os.path.abspath("../")
# Import Mask RCNN

sys.path.append(ROOT_DIR)
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
# Import COCO config
sys.path.append(os.path.join(ROOT_DIR, "samples/coco/"))
import coco
%matplotlib inline
# Directory to save logs and trained model

MODEL_DIR = os.path.join(ROOT_DIR, "logs")
# Local path to trained weights file

COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
utils.download_trained_weights(COCO_MODEL_PATH)
# Directory of images to run detection on

IMAGE_DIR = os.path.join(ROOT_DIR, "images")
Using TensorFlow backend.
Configuration files
We'll be using a model trained on the MS-COCO dataset. The configurations of this
model are in the CocoConfig class in coco.py.
For inferencing, modify the configurations a bit to fit the task. To do so, sub-class
the CocoConfig class and override the attributes you need to change.
2 Coding
12 Summary
Workshop
class InferenceConfig(coco.CocoConfig):
# Set batch size to 1 since we'll be running inference on
# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
GPU_COUNT = 1
IMAGES_PER_GPU = 1
config = InferenceConfig()
config.display()
Configurations:
BACKBONE_SHAPES [[256 256]
[128 128]
[ 64 64]
[ 32 32]
[ 16 16]]
BACKBONE_STRIDES [4, 8, 16, 32, 64]
BATCH_SIZE 1
BBOX_STD_DEV [ 0.1 0.1 0.2 0.2]
DETECTION_MAX_INSTANCES 100
DETECTION_MIN_CONFIDENCE 0.5
DETECTION_NMS_THRESHOLD 0.3
GPU_COUNT 1
IMAGES_PER_GPU 1
IMAGE_MAX_DIM 1024
IMAGE_MIN_DIM 800
IMAGE_PADDING True
IMAGE_SHAPE [1024 1024 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 0.002
MASK_POOL_SIZE 14
MASK_SHAPE [28, 28]
MAX_GT_INSTANCES 100
MEAN_PIXEL [ 123.7 116.8 103.9]
MINI_MASK_SHAPE (56, 56)
NAME coco
NUM_CLASSES 81
POOL_SIZE 7
POST_NMS_ROIS_INFERENCE 1000
POST_NMS_ROIS_TRAINING 2000
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE 2
RPN_BBOX_STD_DEV [ 0.1 0.1 0.2 0.2]
RPN_TRAIN_ANCHORS_PER_IMAGE 256
STEPS_PER_EPOCH 1000
Summary
tracking model... 3
13
Create Model and Load Trained Weights
# Create model object in inference mode.

model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=confi
g)
# Load weights trained on MS-COCO

model.load_weights(COCO_MODEL_PATH, by_name=True)
Class Names
The model classifies objects and returns class IDs, which are integer value that
identify each class. Some datasets assign integer values to their classes and some
don't. For example, in the MS-COCO dataset, the 'person' class is 1 and 'teddy bear'
is 88. The IDs are often sequential, but not always. The COCO dataset, for example,
has classes associated with class IDs 70 and 72, but not 71.
To improve consistency, and to support training on data from multiple sources at

the same time, our Dataset class assigns its own sequential integer IDs to each
class. For example, if you load the COCO dataset using our Dataset class, the
'person' class would get class ID = 1 (just like COCO) and the 'teddy bear' class is 78
(different from COCO). Keep that in mind when mapping class IDs to class names.
To get the list of class names, you'd load the dataset and then use the class_
names property like this.
# Load COCO dataset

dataset = coco.CocoDataset()
dataset.load_coco(COCO_DIR, "train")
dataset.prepare()
# Print class names

print(dataset.class_names)
You won't need to download the COCO dataset just to run this demo! We are including
the list of class names below. The index of the class name in the list represents its ID
(first class is 0, second is 1, third is 2, ...etc.)
2 Coding
14 Summary
Workshop
# COCO Class names

# Index of the class in the list is its ID. For example, to get ID of
# the teddy bear class, use: class_names.index('teddy bear')
class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard',
'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza'
,
'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
Run Object Detection

You can run the following code to generate and detect the run object. To visualize
the results, we are using a display_instances function, inputing the images with their
respective region of interest (ROI), masks, the class id as well as the class name and
the scores.
# Load a random image from the images folder

file_names = next(os.walk(IMAGE_DIR))[2]
image = skimage.io.imread(os.path.join(IMAGE_DIR, random.choice(file_names)))
# Run detection
results = model.detect([image], verbose=1)
# Visualize results
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'],
class_names, r['scores'])
Processing 1 images
image shape: (476, 640, 3) min: 0.00000 max: 2
55.00000
molded_images shape: (1, 1024, 1024, 3) min: -123.70000 max: 1
20.30000
image_metas shape: (1, 89) min: 0.00000 max: 1
024.00000
Summary
tracking model... 3
15
Here are some videos, visualizing the example network:
Wrapping up!
I hope that I inspired you to look more into the world of pre-trained networks and
discover how easy it is to implement a new network, using a pre-trained model!
As always, please let me know if you have any questions, or suggestions for the
article! It would be great to hear more of you and what tools you would like to be
presented; feel free to reach out to me in any of the social media
2
16 AI Systems for
Summary
Autonomous Mobility
Spleenlab.ai is a pure AI software or explainable AI. But for Spleenlab,
company, specializing in safe and working in the field of autonomous
intelligent AI systems for autonomous mobility, safety is king.
mobility, including unmanned aerial
vehicles (UAVs) and autonomous The domains of UAVs, autonomous
driving. Their vision is to be an AI driving, and airborne systems are all
software supplier for air taxis, which regulated by entities such as EASA
are a composition of both UAVs and in Europe and the FAA in the United
autonomous vehicles. Stefan Milz States. There are important industrial
is Head of R&D, Managing Director standards such as ISO 26262 in the
and Founder of Spleenlab GmbH. automotive domain and DO-178 for
He received his PhD in Physics software in airborne systems, as well
from the Technical University of as new standards emerging in the UAV
Munich and has a strong history in domain. Systems are expected to have
professional software development a fully deterministic behaviour.
and automotive. Stefan tells us more “We believe that people need extreme
about Spleenlab. safety,” Stefan tells us.
To date, AI research has tended to “It’s important to foresee what
focus on achieving higher accuracy, state your system can achieve in an
better models, and more trustworthy automated scene. We want to have a
Automotive Sensor Fusion

Spleenlab
Summary 173
large if-else tree and foresee every “You have a dataset, you train on the
potential part in that tree and assess dataset, and then you have a validation
each state at the end of the tree with dataset that you collect in Germany,
probabilities. We say our software and it performs at 99 per cent accuracy.
could run through each part of the There is still a risk of uncertainty
tree. We call that determinism. If we because if you put your validation set
have no AI, so-called deep learning, to a different domain that's collected
we have a highly statistical system. We in France, with a demographic and
can validate our AI, as many people geographic domain shift, you may get
did, but at the end of the day, we have a 98 per cent accuracy, and you do
a degree of uncertainty in our system. not really know why. There is no direct
It’s hard to regulate those systems by if-else connection in a deep learning
a standard that requires determinism model.”
because you can’t foresee the last
percentage of your system.” People are already working on
Landing prediction
Uncertainty is bad. People want to transparent AI, but Stefan says a new
know they are safe in their cars and standard that explains its deepest
that drones will not hit other aircraft. behaviors is still many years away,
But why is there uncertainty in the while models are being sold and
system? The models are trained on deployed in the real world now. That
data and validated on data, with is why Spleenlab is focused first and
millions of parameters in the system foremost on safety by design.
that are deterministic, but some
methods at inference time, like Take the example of an automated
dropout and pruning make the system drone on a package delivery which
statistical. A statistical system always encounters an emergency. Its battery
has uncertainty. has dropped to a critical level. The
system has a Caroline
rest range
Essert
of 100 meters.
“Take autonomous driving, for There is no known emergency landing
example,” Stefan says. point in the area. What does it do?
2
18 AISummary
Systems for
Autonomous Mobility
“A computer vision model can predict safe.”
some good landing spots,” Stefan
explains. This idea is not fully new in the
automated domain, but it has not often
“The best method to choose is an AI been combined with AI. The model
model. Semantic segmentation can must be validated with a large amount
semantically parse the scene and find of labeled data, which is expensive.
the best spots. We know the model is Deploying a model in a different
99 per cent accurate, but we still don’t domain, a different country for
want to hit anyone on the head, so example, requires domain adaptation,
we take those landing spots and put which significantly lowers the cost.
a safety goal around them. We use a
second sensor from either the thermal “We have a simulation engine where
imager or a LiDAR sensor and we look all the flights and the labels can be
at that predicted spot and validate if it generated,” Stefan tells us.
is really safe. With the thermal imager
we can validate it in the sense of, are “We also have a lot of labeled data from
there people, animals, or vehicles companies and are working on how
there? Those are the most critical we can transfer this data to different
things we want to avoid. With the domains. We call it cross-domain AI.
LiDAR sensor, we can estimate if the That is our core product. We have been
scene is geometrically flat and if there working on a pure software product
is any dynamic object in sight. With called Visionairy perception software,
two deterministic passes to validate which has several features for UAVs
it, we can then say, okay, this spot is and aviation.”
Landing Sensor Fusion

Spleenlab
Summary 3
19
Object Detection with distance estimation
Other generic products the team yet, but it’s an important question that
are working on for the UAV market we have to keep asking ourselves.”
include a detect and avoid function
for detecting manned aircraft, which Stefan tells us the most difficult part of
heavily needs a vision component, so this work is the airborne certification
it is not solved yet. They also have and the proof of safety in terms of flight
emergency landing functionality hours or driving hours. Even with safety
and tracking functions in the by design or a deterministic approach,
pipeline, and are working on future that is still necessary.
architecture together with big air taxi
manufacturers like Volocopter and Many companies claim to have the
DLR. best drone or UAV on the market,
and that they will solve problems like
There have been some high-profile package delivery in the near future,
concerns about the use of AI in but there is one very important thing
everyday life, including fears that the that is not solved yet.
technology could be exploited by bad
actors. Does Stefan think this could be “It's the positioning system,” Stefan
an issue for Spleenlab? points out.
“At the moment, we are in a very early “You need a GPS redundant positioning
stage. We don’t see any potential issues system, which is safe, but if you want
2
20 AI Systems for
Autonomous Mobility
simply to fly near a population then with manufacturers to bring AI to their
you have to validate your system with products, and at the beginning of next
a single point of failure, which has the year will be launching their simple
probability of one divided by 10,000. follow me functions up in the air
This is the number of flight hours you with drone manufacturer Quantum-
have to do, and you have to do a strong Systems.
validation of your system. This is a big
By the end of next year, Stefan says
opportunity for us because we know
he hopes to see Spleenlab’s detect
there are many companies who want to
and avoid system in drones. They are
do that, but it’s not solved yet. I believe
also working on automatic inspection
there is a long way to go before we see
of cell towers, with the AI looking for
package delivery outside of a test field,
the cell tower, flying a drone around
so we want to solve an easier approach
it while collecting inspection data, and
first – pipeline inspection where no
then bringing it back. This will save
population is nearby. You still have to
money for customers who want to
show that your system is safe if you want
automate the inspection process.
to fly far with the pilot out of the loop.
That is called BVLOS – beyond visual
Spleenlab are currently 15 people, and
line of sight. This is not yet solved for
they are hiring.
certification for most of the use cases in
North America and Europe.” “We are looking for computer vision
engineers, deep learning engineers, and
Spleenlab have some very exciting PhD applicants with a focus on SLAM
plans for next year and beyond. As a perception and sensor processing. Come
software company, they collaborate join us!”
Semantic Landing Risk estimation

Feel at ICCV 213
as if you were at ICCV
Get your ICCV Daily news

in your mailbox!
Click here to subscribe

for free to ICCV Daily.
2 AI
22 Summary
for Archeology
Reconstructing the Past: Artificial Intelligence
and Robotics Meet Cultural Heritage (RePAIR)
Marcello Pelillo is Professor of containing thousands of pieces. Some
Computer Science and Artificial are very small, but some are pretty big.
Intelligence at Ca' Foscari University of There’s no puzzle box with a nice clear
Venice. Readers may also remember photo of what it will look like when we
him as General Chair of ICCV 2017 finish the job. In most cases, it’s more
and as Chief Editor of Frontiers in of a blind search. Also, it’s not like
Computer Vision. He is leading an EU- you have all the pieces and it’s just a
funded project called RePAIR, which matter of assembling them in the right
is using AI and computer vision to order. Pieces typically don’t match
facilitate the reconstruction of some perfectly, and many are missing. It’s a
ancient artwork at the archaeological really tricky problem.”
site of Pompeii. Marcello tells us more
about this exciting mission.
Around 2,000 years ago, a volcano

called Vesuvius put an end to life
in Pompeii (near Naples, Italy), but
many artifacts still remain from the
time when the Roman city flourished.
The Reconstructing the Past:
Artificial Intelligence and Robotics
Meet Cultural Heritage project is
working with the archaeological
site to physically reconstruct large-
scale broken frescoes. The project’s
acronym is RePAIR and that is exactly To solve this, the team use information
what Marcello Pelillo and his team are from the 3D geometry of the pieces,
aiming to do. which are basically stones with one flat
side that is colored and decorated. One
“Reconstructing a fresco, from a of the first tasks of the project, which
conceptual point of view, is like solving started earlier last month, is to scan all
a jigsaw puzzle,” he tells us. the pieces to get 3D images and create
a database containing information
“But in reality, it’s so much more abouttheir shape, 3D geometry,
difficult! These are large-scale frescoes decoration, and appearance.
RePAIR Pompeii
Summary 3
23
“We’re imagining a system in which
we incorporate expert knowledge and
combine it with machine learning to
improve that compatibility function.
We’re talking about frescoes that
were built 2,000 years ago. We have
art historians and archaeologists, for
example, who will give us precious
information concerning the style of
painting in those times.”
One of the team’s partners, Ohad Ben- The team will build a system that
Shahar from the Ben-Gurion University harnesses the power of deep learning
of the Negev, has been working on the methodology while incorporating
problem of solving jigsaw puzzles for expertise in a kind of loop. It will try to
more than 10 years and is an expert propose a solution to the expert who
in the field. He and Marcello actually
will check it and say, ‘I don’t think these
conceived the idea for this project
pieces go together, you need to propose
together a while ago. He has proposed
another guess,’ and so on.
state-of-the-start algorithms and in
the project they will use deep learning
and related methods to try to learn the
compatibility of two pieces and then
use that to guess the position of all the
pieces.
However, in the field of computer

vision there is not usually the same
level of difficulty that is associated with
real frescoes, because tiles are often
assumed to be square and there are
not always missing pieces.
“Given the difficulty of the problem, we

don’t think that just using a purely data-
driven or deep learning-based approach
will allow us to completely solve the
puzzle,” Marcello reveals.
2 AI
24 Summary
for Archeology
Marcello and his team are not the first the robot uses soft-hand technologies
to attempt to reconstruct frescoes, but to take and manipulate the pieces very
the key differences between this and delicately. You can control the pressure
past approaches are the sheer size of the and make sure that it is just right for the
frescoes, and the fact that ultimately, piece. We’re going to use the experts in a
they are going to physically reconstruct kind of interactive way, to tell us whether
them. the solution proposed by the robot is
plausible or not.”
“We will build a robot and use soft
robotics,” Marcello explains. Finally, the robot will put the pieces
together, but just next to each other.
“Archaeologists were initially terrified by Expert restorers at Pompeii will take
the idea of a robot holding these pieces charge from there. It is a very delicate
because if they break one, it risks losing process to put the pieces physically back
something very precious. So, once the together using techniques that cannot
puzzle has been solved, the information is be incorporated into a simple robotic
given as input to the robotic platform and system.
RePAIR Pompeii
Summary 3
25
Marcello and team on site
The team will soon be launching a “My archaeologist friends tell me that
dedicated website that describes the if we succeed, as we hope, then this
work. It is a Horizon 2020 project, under will be a huge breakthrough in their
the Future and Emerging Technologies field. When you have an object that has
broken into thousands or even tens of
(FET) Open program, which is an
thousands of pieces, it’s just hopeless
extremely selective call that gives very
to think that any human team can
few proposals the green light. Marcello solve it. Actually, in Pompeii they did
and the University of Venice are the try, but had to give up in the end.”
project coordinators, with other partners
including the Ben-Gurion University of Marcello hopes the technology they
the Negev in Israel, the Italian Institute of propose will be able to be used by other
Technology in Italy, the University of Bonn museums with broken frescoes, as well
as exported to other domains.
in Germany, Instituto Superior Técnico
in Portugal, and the Archaeological Park “There are other problems, such as
of Pompeii, which is one of the biggest reconstructing papyri, vases, or other
archaeological parks in the world. kinds of broken artifacts,” he adds.
“We are introducing a revolutionary “We hope our technology will turn out to
technology in archaeology,” Marcello be useful when the scale of the problem
says proudly. is unmanageable by humans!”
2
26 Summary
Hidden Stories of the Heart
To all readers of the magazine who live in London, this is a unique call if
you are looking for interesting plans for the second weekend of October!
Come join research students Marica (@maricaS8, King’s College London),
Sophie (www.richtersophie.com, Royal College of Arts) and Elizabeth (@
elizabetho157, Royal College of Arts), at the Science Museum on the 9th-
10th of October to explore the new installation Hidden Stories of the Heart.
Hidden Stories of the Heart invites you to connect with the

complexity of our hearts through creativity. Observe handmade
papier-maché models, investigate their uniqueness and discover
how medical imaging is advancing our understanding of how
traumatic life events can impact the shape and function of our
hearts.
Art x Science 2021
Our hearts are often thought of as Saturday 9 and Sunday 10 October
our emotional centre. Advances
11.00-16.00
in medical imaging have furthered
our understanding of our hearts Event location details
Medicine Galleries
and studies suggest that trauma
from extreme life events can have Science Museum, London UK
an impact on the structure and
function of the heart.
Drawing on personal accounts of women from

the Mountain of Fire and Miracles Ministry and
methods of analysing cardiac images, Hidden Stories
of the Heart translates experiences of trauma into abstract
models and sounds of the heart to explore how stress and
disease can affect our vital organs.
The use of papier-maché - which sees distressed, shredded paper transform

into a strong structure - echoes the resilience of the women and their hearts.
We invite you to this wonderful exhibition where we will welcome you to

listen and share your hidden stories of the heart! Read more at http://www.
artxscience.co.uk/art-x-science-2021/
AI Upcoming Events
Summary 3
27
NLP SUMMIT ICCV 2021 AI in Pharma

Online Event Virtual Virtual
5-7 Oct 11-17 Oct 13 Oct
Meet us there
CHEST IMVC Innovation

Live and On Demand Tel Aviv, Israel Summit SF
17-20 Oct 26 Oct Hyatt Regency SF
Airport, CA
1-3 Nov
SUBSCRIBE!
TCT SIPAIM
WSCG 2021 2020
Virtual Orlando, FL Campinas, Brazil
Plzen, Czech
and online Republic
17-19 Nov Join thousands of AI
4-6 Nov May 18-22 professionals who
receive Computer
Vision News as soon
as we publish it. You
AM Medical can also visit our
FREE SUBSCRIPTION
archive to find new
Days
Berlin (Germany) and (click here, its free) and old issues as
well.
Virtual Did you enjoy reading
22-23 Nov Computer Vision
News?
Would you like to We hate SPAM and
BMVC 2021 receive it every promise to keep
Online month? your email address
22-25 Nov safe, always!
Fill the Subscription Form
- it takes less than 1 minute!
Due to the pandemic situation, most shows are considering

to go virtual or to be held at another date. Please check the
latest information on their website before making any plans!
The new supplement to
Computer Vision News
October 2021
The MICCAI community asked
for a magazine fully dedicated to
medical imaging and MedTech......
... RSIP Vision did it !!!
Starting today, Computer Vision

News will be circulated with a new
supplement: Medical Imaging News
Subscribe to Computer Vision

News now and get
two magazines in one!
30 AI for Urology
Bladder Panorama Generator and Sparse
Reconstruction Tool
Bladder cancer is a relatively common
disease, affecting approximately 1 in 27
men (less common in women). When
detected early, survival rates are high.
Therefore, early and precise detection
is crucial for bladder cancer healthcare.
Typical symptoms include blood in urine,
painful urination, and even back pain.
When these appear, common practice
dictates that a cystoscopy is performed Panorama of a bladder model
– a scope is inserted into the patient’s
urinary tract until the bladder, which is For both tools:
then scanned in search for lesions and
tumors. Additionally, bladder cancer • Panorama – Several key points
has high recurrence rate, requiring coincide in two or more images. Using
periodical examinations for patients in these points, the images are stitched
remission. into a large, clear, panoramic image
The main challenge in bladder of the bladder wall by finding the
cystoscopy is navigation within the mapping between a reference image
bladder. It is vital that the bladder to another
is scanned in its entirety, without • Sparse Reconstruction – the
leaving unexamined tissue, ruling out key points are used in a shape-from-
missed lesions. Currently, the physician motion framework to get the 3D
manually scans the bladder repeatedly location of each key point and image
until they are certain it was completely camera. This point cloud is a sparse
viewed, and all suspicious areas were 3D model of the bladder. This model is
examined. essentially a map of the bladder. The
physician can use it to verify complete
RSIP Vision has recently implemented coverage of the scan, or as a tool to re-
“Shape-from-Motion” algorithms visit suspicious areas.
and created a solution for the
aforementioned challenge. This set of Additionally, as key points are shared
algorithms extracts key points from the in both tools, matched information is
cystoscopy video used for two tools: available to the surgeon.
Bladder Panorama
Generator 31
Point cloud of key points on a bladder model and video cameras route
This unique tool can significantly This module was designed to suit
shorten the cystoscopy, as well as the cystoscopy procedure. It can be
increase levels of confidence in the tailored to any other procedure which
procedural findings. It makes the uses a scope to scan inner lumens in
repetitive bladder scans redundant the body, e.g. gastroscopy, with some
and provides an image with better alterations. It can also be combined
accuracy which makes lesion detection with other computer vision tools
easier, as it increases the field of view such as automatic lesion detection to
compared to the narrow cystoscopy improve procedural outcomes.
view.
Overall, this is a tool which utilizes
Another benefit of this module is that advanced computer vision techniques
it does not require long run-time, to significantly improve bladder cancer
allowing fast and easy implementation healthcare. More projects in AI for
in the clinic. urology here.
Best of
32 BEST OF MICCAI 2021 MICCAI
2021
It is always great to see people recognized, especially when

it’s great friends and folks that you admire very much. Kudos
to Parvin Mousavi, Marleen de Bruijne, Camila González,
Nassir Navab, Alejandro Frangi and all award holders from
RSIP Vision and all the community!
Best of
MICCAI
2021
Awards 33
MICCAI Enduring Impact Award
MICCAI Young Scientists Award

34 Oral Presentation
Best of
MICCAI
2021
AMINN: Autoencoder-based Multiple Instance Neural Network

Improves Outcome Prediction of Multifocal Liver Metastases
Jianan Chen is a fourth-year

PhD student at the University of
Toronto under the supervision
of Dr Anne Martel. He has
developed an autoencoder-
based multiple instance neural
network for the prediction of
survival rates of multifocal
cancer patients. He spoke to us
ahead of his oral presentation
and poster session.
An autoencoder-based multiple instance neural network sounds

complicated, but it is actually fairly simple. Radiomic features are extracted
from MRI scans, an autoencoder selects the features, and then a multiple
instance neural network makes predictions based on those selected
features. It is all connected and trained end to end.
To the best of Jianan’s knowledge, this is the first work that has been
designed specifically for predicting the outcome of cancer patients with
multiple tumors. He spotted a gap because a large proportion of patients
have multiple tumors, but existing techniques only look at the largest one
or two lesions. Exploring all the tumors can lead to better treatment and
improvements in prognosis.
An overview of the proposed autoencoder-based multiple instance learning

network. The network structure between the curly brackets is shared for each
tumor of the same patient.
35
Best of
MICCAI Jianan Chen
2021
Example MRI scan of a CRLM patient who had multifocal cancer. Existing
biomarkers only use information from the largest lesion (marked in red)
while ignoring information from other lesions (marked in orange).
The multiple instance learning technique used in this work comes from
digital pathology, where you have very large whole-slide images that are
too big for the computer to deal with, so you crop them into smaller tiles
and learn the features. Then at the end you aggregate the information from
all tiles to get an understanding of the whole-slide image. This perfectly fits
the problem here, which is to predict a patient’s survival based on a
number of tumors.
Jianan tells us one of the biggest challenges has been collecting data.
“When we look at survival of patients, we collect patient data that has had
some kind of treatment,” he tells us. “You might be surprised, but 80 per
cent of colorectal cancer liver metastasis patients, which I studied for my
paper, can’t receive liver resection and that is the only curative treatment
for them. That’s why we don’t have a lot of data on them. Existing datasets
include mostly unifocal patients because we know how to treat them, but
it’s difficult to collect a large database with multifocal cancer patients. We
don’t know how multiple tumors affect patient survival or how aggressive a
tumor is.”
Best of
36 Oral Presentation MICCAI
2021
Volume (tumor size) of 181 lesions from 50 patients with Z-score

normalization and two-step normalization. Right-skewness of radiomic
features can be corrected with the proposed two-step normalization.
Thinking about next steps, Jianan says there are many ways he hopes to
take this work forward, including by validating its findings using
independent datasets, and extending it to other diseases and imaging
modalities.
Jianan received his Bachelor’s and Master’s degree in communications
engineering and artificial intelligence, respectively, before switching to
medical imaging with machine learning for his PhD.
“I wanted to do something meaningful with my research that could affect
real people,” he reveals. “I think cancer research is very important.”
Finally, we are keen to know how Jianan has found working with Anne
Martel, a good friend of this magazine.
Comparison with other

clinical and imaging
biomarkers for CRLM.
AMINN is the only one
showing predictive value
for multifocal patients.
Best of
MICCAI
2021
Jianan Chen 37
Comparison of machine learning models for predicting multifocal CRLM

patient outcome in 10 repeated runs of 3-fold cross validation. Taking
multiple lesions into account improves outcome prediction.
“Anne is a super nice supervisor!” he smiles. “She gives me full freedom to

explore what I want to study and always encourages me and gives me
advice when I am stuck. She taught me that it’s important to not only focus
on your research project itself, but also to help out in the community by
becoming a reviewer and teaching. This paper and my previous MICCAI
paper were all inspired by something I found while reviewing or teaching, so
it is great advice!”
Best of
38 Poster Presentation MICCAI
2021
On the relationship between calibrated predictors

and unbiased volume estimation
Teodora Popordanoska is a
PhD student at KU Leuven in
Belgium, under the
supervision of Matthew
Blaschko.
MICCAI this year is an extra
special occasion for Teodora
as she has had her first
accepted conference paper!
Her work investigates the
relationship between model
calibration and unbiased
volume estimation. She
spoke to us ahead of her
poster session.
Model calibration is especially important in medical applications, because

the output score of the model should reflect its own trustworthiness. In
other words, the model should be calibrated. A model is said to be
calibrated if the score for an input corresponds to the probability of the
prediction being correct.
“For example, from all the samples that are predicted with a probably of
0.9, we want 90 per cent to be accurately predicted,” Teodora explains. “We
want this to hold for all probability scores. The problem is that modern
neural networks are poorly calibrated, as shown in the 2017 paper ‘On
Calibration of Modern Neural Networks’. They tend to give over-confident
predictions. That’s why there is a growing interest in this field of model
calibration. For model calibration we care about a quantity called
calibration error.”
Best of
MICCAI
2021
Teodora Popordanoska 39
The
The second
second part
part of
of the
the title
title refers to volume
refers to volume estimation.
estimation. In
In medical
medical image
image
analysis, the segmentation of an image is usually performed with
analysis, the segmentation of an image is usually performed with a neural a neural
network
network andand is
is mostly
mostly used
used toto calculate
calculate certain
certain biomarkers.
biomarkers. In
In the
the medical
medical
domain, the volume of a tumor, organ, or lesion are important biomarkers.
domain, the volume of a tumor, organ, or lesion are important biomarkers.
From
From aa segmentation,
segmentation, one
one can
can obtain the volume
obtain the volume by
by summing
summing up the
up the
probability
probability scores for each
scores for each voxel.
voxel. An
An important
important quantity
quantity in
in this
this case
case is
is the
the
bias of
bias of volume
volume estimate.
estimate. Ideally,
Ideally, to
to have
have the
the true
true volume,
volume, there
there would
would bebe
zero bias.
zero bias.
“The main
“The main theoretical
theoretical result
result from
from this
this paper
paper is
is that
that the
the absolute
absolute value
value of
of
the bias
the bias is
is upper
upper bounded
bounded byby the
the calibration
calibration error,”
error,” Teodora
Teodora clarifies.
clarifies.
“If we’re
“If we’re optimizing
optimizing calibration,
calibration, we’re
we’re also
also simultaneously
simultaneously reducing
reducing the
the bias
bias
of the
of thevolume
volumeestimate.
estimate.If Ifthe
thecalibration error
calibration is zero,
error then
is zero, it means
then that
it means that
Best of
40 Poster Presentation MICCAI
2021
we had an unbiased volume estimate, and we get the true volume. In this
sense, the result is not specific on the architecture of the model or the type
of organ or tumor or whatever we’re measuring the volume of.”
The result is a fundamental mathematical result that has been empirically
validated on two datasets and 18 models, trained with several loss
functions and calibrated with multiple calibration strategies.
Designing a calibrated model by itself is not a trivial task, so the team are
currently working on developing a new method of training well-calibrated
models.
Teodora is the third Macedonian that we have interviewed over the years,
following her friend Ivona Najdenkoska just yesterday and Jelena Frtunikj
at CVPR19, as part of our Women in Science series.
Best of
MICCAI
2021
Teodora Popordanoska 41
“Macedonia is a very small but beautiful country,” she tells us. “The people
are warm and nice. I’m always glad to go back there!”
Teodora reveals the most challenging part of this work for her was
discussing the clinical relevance of the theoretical results.
“This is my first time working with medical data, but thankfully all my co-
authors have plenty of experience in this area and in the medical domain, so
it worked out great,” she adds.
“In medical applications, we want the models to be trustworthy and we
want to have unbiased volume estimates. With our work, we are showing
that focusing on model calibration is sufficient and calibration error is in
fact a superior model selection criterion also with respect to volume bias.”
Best of
42 Interview MICCAI
2021
Mert Sabuncu is an Associate

Professor in Electrical and
Computer Engineering at Cornell
University and Cornell Tech. He
also has a secondary appointment
at Weill Cornell Medicine in
radiology. He speaks to us about
his career to date and his hopes
for the future of medical imaging.
LAST MINUTE NEWS: Mert has
just been selected as winner of
the Michael Tien '72 Sustained
Excellence & Innovation in
Engineering Education Award, the
highest award for teaching at
@CornellEng… Congrats, Mert!
Mert, can you tell us about what that. While kicking around and
you do in the Sabuncu Lab at thinking about it, I ran into a bunch
Cornell? of neuroscientists at Princeton who
Our research group focuses on did functional MRI research. One of
developing computational methods, the group, James Haxby – who is
i.e., algorithms, for medical imaging now at Dartmouth College –
problems, including anything from introduced me to some image
MR acquisition to downstream analysis problems in fMRI. That was
applications using medical imaging my first experience with medical
data in the context of clinical imaging. Then I had the opportunity
workflows. during grad school to intern at
Siemens Corporate Research -
How did you come to work in the Siemens Healthineers now - which is
field of medical imaging? also located in Princeton, New
I did my PhD at Princeton University Jersey. I worked there over the
in electrical engineering. Princeton summer and during the year I would
doesn’t have a medical school and collaborate with them. They have a
when I first started my PhD, I wasn’t heavy focus on medical imaging.
sure what I wanted to work on. I What convinced you that this focus
knew that I wanted to do something was the right one for you?
that involved image processing, and
in the early 2000s, there was some I’m a strong believer that the
excitement around machine research we do should have a real-
learning, so I started learning about world impact. I’m very pragmatic in
Best of
MICCAI
2021
Mert Sabuncu 43
that sense. I view the healthcare and There is always a gap. Whatever field
biomedicine domain as a very you’re in, especially in the academic
interesting area where we can make setting, there’s going to be a gap
a strong impact and hopefully have a between the research and the way it
positive influence on many people’s impacts people’s lives in the real
lives. In my family, I have been world. The healthcare world has a lot
surrounded by doctors and medical of stakeholders, including patients,
researchers from a very young age, insurance companies, regulatory
and I guess that’s what first attracted bodies, hospitals, doctors, and
me to biomedical research. researchers. Having them all aligned
Do you feel that the work that you and getting through all their needs
and your team are doing, and the and requirements is a big challenge.
work that the MICCAI community is Taking an idea from a concept to a
doing, is having that impact? real product that can be used in a
clinical workflow on patients, there
I think that as a community what we are a lot of obstacles along the way.
do at MICCAI has a big impact. I also
think that the research that comes The other challenge is as researchers
out of my group has an impact in on the algorithmic side, we often
sometimes very non-obvious ways. focus on toy problems. That is a good
It’s important to remind oneself that starting point but can be distracting
what we do can feel a little bit like in terms of what matters in the real
basic research. A little bit removed world. We spend a lot of our
from real-world applications. There bandwidth on these “artificial
are still some open questions that we problems” and make good progress,
need to work on before we can take but to take those breakthroughs and
our technologies and apply them to translate them into the real world,
real-world problems. That said, as I there are other challenges that we
move along my career trajectory, I’m aren’t focusing on. There are
hoping I will increase my emphasis obviously exceptions, but there’s not
on real-world translation. That’s why a lot of incentive, at least in the
nowadays I’m focusing on real clinical academic world, to move along those
collaborations and understanding steps. A lot of those incentives are in
clinical workflows. I’m hoping to put the more basic research.
more effort into translating our Do you think today’s CLINNICAI,
technologies into the clinic, either by MICCAI’s first clinical day, is a step in
commercialization efforts, or at least the right direction?
trying to implement things in an I think these types of ideas and
academic hospital setting. especially communities like MICCAI
Why does it take so long sometimes make a difference. They enable
to translate MICCAI research into different groups of people
the clinic? to communicate with each other to
Best of
44 Interview MICCAI
2021
understand and align their everything virtual. Going forward,

motivations. We speak different maybe we can take the positives of
languages, so just coming up with a both - in-person meetings for people
common language and understanding who can make it but supporting
the different sets of problems people virtual talks and posters for people
are working on is going to go a long who can’t. We need better
way. There is often a gap between technologies for socializing too.
these groups because of a lack of Gather.town seems to work well in
communication, which societies like my experience, but there are others
MICCAI will help us get around. and I’m sure the technology is going
This is our second virtual MICCAI, to improve very quickly here.
and although it is still a fantastic Do you have any words for first-time
event, many of us long for the days participants at MICCAI about how
when meeting up was so much best to take advantage of everything
easier. How do you feel about this? that is on offer?
I definitely see the pluses of having I vividly remember my first MICCAI. It
virtual meetings. Back in the day was in 2007 - not too long ago! What
when everything was in person and I remember about those early
we didn’t have virtual posters and conferences was it felt quite
virtual talks, it was harder for people intimidating. I felt a lot of anxiety
from certain countries to travel. You about being in a roomful of people
couldn’t afford it, or it was hard to trying to give a talk or trying to
get a visa, or you had family introduce myself to established
responsibilities and it was hard to professors. It took me a while to get
leave your home. There were all sorts over that and build up the
of constraints. From that point of confidence to make the most of it. I
view, the virtual format is a good would strongly encourage people to
thing. It enables broader acknowledge that it’s challenging.
participation. But I also think that it’s When you’re at your first conference,
a much more limited experience. It a lot of the stuff is going to be brand
results in a situation where we don’t new to you. You’re not going to
get to randomly run into people and understand everything, you’re
have those social networking inexperienced, you don’t know many
opportunities. There’s also people, so you need to take it easy. I
something about traveling to a guess the advantage of the virtual
location and setting aside five days of format is you can be more proactive
your time to focus on the science. and absorb much more than you
You get to understand properly possibly could if you attended in
what’s going on, listen to talks, and person. But the potential risk when
meet different people. When you’re you’re attending from home is that
not doing that, it’s hard to set aside you try to multi-task or there’s other
that bandwidth. From that point of stuff going on around you, so you
view, I think it’s a negative to have can’t focus. The more bandwidth you
Best of
MICCAI
2021
Mert Sabuncu 45
the more confident you’re going to
be in general. There’s a big
difference between how
comfortable you are at the end of
your PhD versus the beginning.
At the beginning, everything is new.
At first, I didn’t even know how to
look at a poster and absorb that
information! Now, it takes me 10
seconds to go through a poster, but
it’s taken me years to get here. Not
everyone will stick with this field
and come to this conference as
many times as I have, but if you’re
going through grad school that’s a
can allocate to the conference span of 3-5 years at least, so during
experience, the more you will get that time, the more you attend
out of it. these conferences, the more
Also, don’t be shy. I assumed a lot of experience you’ll get and the more
the more senior people weren’t confident you’ll become. It’s also
approachable and it took me a while important to realize that a lot of it is
to connect with them. But now I psychological. Imposter syndrome is
know that we’re all human, so it just real and even people at very senior
takes saying hi or introducing levels suffer with it. It’s important to
yourself or texting or emailing them. acknowledge that and try to build
People are usually responsive. up your confidence proactively.
Sometimes they are busy, you just How do you build it?
need to understand that, but By little steps and by
otherwise people are very communicating. You need to have
responsive. good mentors and talk to people. I
Can you assure participants that if try to instil this in my own students.
they feel awkward the first or We talk about how someone feels in
second time, it will change their first presentation or their first
eventually? poster. Just acknowledging that
100 per cent. This is true for everybody goes through this. None
anything, but it is definitely true for of us are born as Nobel laureates or
a conference like MICCAI. The more full professors. It takes time to learn
you go to these events, the more things and build up that confidence.
time you spend amongst people, the For some people it doesn’t come
more talks you listen to, the more very naturally. It’s not even strongly
papers you read, the more papers correlated with what you know or
you write, the more talks you give, what you have experienced, it’s just
Best of
46 Interview MICCAI
2021
their fate. It’s important to

understand that noise and
understand human psychology, how
reviews happen and how reviewers
respond to certain things. You have
to roll the dice multiple times and be
persistent because if you declare a
loss that means you’ve given up on
an idea or a paper and there’s
nothing more to do. But the key is to
take what you’ve learned and make
your paper better and try again.
Learn from your mistakes and from
the feedback. Also, sometimes
something you have to be aware of acknowledge that the feedback really
and nurture. is just noise, and you can ignore it.
Thinking about your students, when It’s important for mentors to point
they get their first paper accepted, is out what’s noise and what’s not.
your excitement comparable with the Do you have any wow moments
excitement you felt in the past when from your students that you would
the same thing happened to you? like to share with us?
My students are much more That’s very hard for me! I have many
successful than me! They work good students and a lot of respect for
harder, are more motivated, more all of them. I can highlight one
productive. It’s obviously a function though, Meenakshi Khosla, who just
of the competitive environment graduated and is now going on to be
they’re in. It’s also a function of the a postdoc at MIT with Nancy
resources they have. But it is true Kanwisher in neuroscience.
that having your paper accepted at Meenakshi has done some very
something like MICCAI is very interesting work where she used
valuable in an objective way. It’s deep learning models to characterize
going to be an important line in their how the brain responds to visual and
CV. It can make or break a career auditory stimuli. There’s a lot of
sometimes. If you have a few solid excitement around the idea of
papers during your PhD, that might attention in machine learning –
make a big difference in terms of whether you’re looking at images,
what kinds of career prospects you text, or language data, if you build
have versus if you don’t. Sometimes models that can incorporate an idea
a lot of that is random. I tell my of attention, it will yield better
students there’s a lot of noise in the solutions. Meenakshi was building a
review process. It’s not perfect the model where the input was a picture,
way we review papers and decide and the output was how your brain
Best of
MICCAI
2021
Mert Sabuncu 47
would respond to that picture as it!
measured by functional MRI data. Do you have a final message for the
She built an attention mechanism MICCAI community?
into that model. You show the model
a picture and it figures out where to I have lots of messages! One thing I’d
pay attention to in that picture, then like to see the community focus on
encodes it and somehow figures out more is translation. The clinical day is
how the brain will respond to that. It important in that sense. In academia,
worked really well and predicted how a lot of the incentives are on getting
the brain responds. your paper out, getting a good
publication, obtaining funding, and
What’s interesting is when you look these are not strongly correlated
at what the model is paying attention with the effort to translate your ideas
to, it is very strongly correlated with into the real world. That last mile is
eye-gaze data, which the model was not rewarded in academia. Maybe
not trained on at all. We had some we need to come up with a way to
eye-tracking information and we incentivize that in the MICCAI
looked at where the model was Society. It could have an emphasis on
paying attention to and where clinical translation as an area of focus
humans are paying attention to and so that we can award papers in that
there was an incredibly strong area and recognize that effort more
correlation. I was blown away by directly. MICCAI and other societies
this! To me that was one of the most could influence grant-giving entities
interesting themes. I get excited to focus on translation and not just
about finding similarities between basic research because at the end of
these artificial neural networks and the day the stuff that we work on
human biology. But for some reason, only matters to the extent that we
that element of the work didn’t can turn it into something in the real
attract the appreciation I expected it world. I don’t mean that we
to. You know when you write a paper shouldn’t be doing any basic
that you’re excited about, it gets research, but we should value that
accepted, it gets published, but then last mile effort much more than we
people don’t pick up on the thing do right now. For anyone new to the
that you’re excited about? I find that field or struggling to figure out what
frustrating, but there’s nothing you they want to work on, I think that’s
can do about it. Maybe 10 years from the big opportunity. We’ve made a
now somebody will pick up on it and lot of interesting developments in
it will have a bigger impact! basic science or basic research, but
I am sure we will have many more we’re still lacking on the translation.
surprises to come in the future from You have launched that message out
your students. there now, so let’s see if your
I hope so, yes. I’m looking forward to colleagues will catch it!
Best of
48 Women in Science MICCAI
2021
Catarina Barata is an Assistant gap between the sunburn and the

Professor at Instituto Superior disease actually appearing. That’s
Técnico in Lisbon as well as a why, in some countries like Australia,
Research Fellow at Institute for where people are very fair skinned,
Systems and Robotics. people are in direct contact with the
sun and get a lot of sun burns. Then
More than 100 inspiring interviews the disease appears after like
with successful Women in Science twenty-something years. That’s why
in our archive
it is a big problem. Sun burns that
Catarina, you live in Lisbon and you people got in the 90’s are coming
are Portuguese. back to haunt them now.
Yes. Would it be better to go to the
You work mainly on skin cancer. beach in full clothes like one
hundred years ago?
Yes, I started on skin cancer during
my Master’s thesis, so it’s a while No, but you should respect the hours.
ago. My main focus in the beginning
was to develop artificial intelligence
systems for diagnosing skin cancer,
in particular melanoma, which is a They are like: “Are you
huge problem here in Portugal and
it’s becoming worse! In recent years,
the professor?!”
I’ve tried to move a little bit beyond
that. More than the diagnosis, I
want to understand this disease, and
how people treated for this disease
react to the different therapies.
Why is this problem becoming
worse?
It is related to the way we behave. If
you think about our grandparents,
when they went to the beach, they’d
be fully dressed. In the present,
people are more aware of the
dangers of the sun and getting
sunburnt, so they are starting to
wear protection. But the skin
remembers! It’s not like you get a
sunburn in the summer, and you get
melanoma next year. There is a big
Best of
MICCAI
2021
Catarina Barata 49
“This can make beautiful things happen!”
Doctors say you should go during the combined the biomedical part with
morning and avoid it from noon to 4 machine learning. Let’s give it a try!
PM, but most people don’t do that. Then I realized that this is very
Here, people spend the entire day at challenging. Skin imaging is in
the beach. between real imaging, that you find
We like the beach! in different computer vision tasks,
We do! We are Portuguese! and medical imaging, which are 2-D
images. You have challenges from
I love Albufeira. What brought you the two sides. That combined with
to this field? machine learning just convinced me
That’s a very interesting question. I to keep working on this!
have a biomedical engineering
degree. I studied many engineering Did you meet with some doctors
topics, from chemistry and physics before you started?
to mathematics. I didn’t know what I Yes, the co-supervisor of my
wanted to do, I just wanted to go Master’s thesis was a dermatologist.
into research. In the last year of my I met doctors and actually I still work
degree, I did a course on machine with doctors. We need them to keep
learning, and I fell in love with it. It working on these things, not only for
was just perfect for me! Then I was the data but for everything, for
looking for a Master’s thesis on this feedback, to understand if we are
topic. This thesis about skin cancer going in the right direction, or if we
just popped out. It was perfect! I need to change a little bit.
Best of
2021
Sometimes we have different ideas. also supervise students. I supervise
You told us only half of the story - several Master’s students in
the half of the story about the different computer vision topics and
research you are doing. I know that in robotics as well, mostly using
you are also teaching. Can you tell machine learning. We try to
us a little bit about the second part challenge them to use recent
of the story? methods.
It’s funny because I just came out of You have supervised more female
a meeting to prepare for the next students than male students. How
course on machine learning that did this come about?
starts in two weeks. At the moment, To be honest, I don’t know. I don’t
we have more than 300 students in know if it’s because I am a female
this course. As you can imagine, it supervisor. Sometimes I know that it
has changed from 50 students ten plays a role, although my PhD and
years ago to 300 now. A lot of Master’s supervisor was male. We
people are interested in these share the students sometimes. What
topics. We have to let them know, is changing is the way women
for example, about these recent approach engineering. When I took
deep learning models that everyone the machine learning course 10
is hearing about. We have to years ago… no 11 years ago, there
contextualize these for the students. were six female students out of
That’s mainly what I do. I teach about 60 students. It’s less than one
machine learning at the university. I tenth of the students. Now, we have
“We are just one piece of this puzzle…”

Best of
MICCAI
2021
Catarina Barata 51
higher numbers. In my class, it’s not where the connection with the
half-half, but things are definitely students is closer. In those classes,
changing. I don’t know if it’s because we have like 30 students.
I’m a female supervisor. I work on How do you find yourself speaking
other medical applications as well. to an audience of 30 people? Does
We also do security, surveillance, it come naturally?
and fire detection as well. Maybe
that engages the female students a I’m very talkative! People use to tell
little bit more. me that I’m a good communicator. I
don’t know if that’s true. When I’m
Do they choose you or do you in the classroom with 30 people, I
choose them? feel that it’s closer so I can approach
They apply, and then I choose. I almost each one of them
don’t choose based on gender. individually. That’s something that I
That’s not something I want for me like. I teach these classes in shifts,
as well. I choose based on who is and each shift will be different. The
best for the position. students will have different capabilities
Do you ever see students making and different backgrounds.
mistakes that you made when you
were in their shoes?
[laughs] Yes! Often! One of the
biggest things that I see that
happens often in class is when they
try to program something for the
first time. It works, and all of a
sudden, they just change a little bit
and everything stops working. I
remember myself when I was doing
my Master’s thesis. It worked, and
then I changed the resolution of the
image just a little bit. It stopped
working, and I was like, “Oh my God!
It doesn’t work anymore!” Students
are exactly the same. People get
very scared! [laughs]
What is the largest class that you
have?
We have theoretical classes where
the number of students is higher.
Then we have small lab classes,
Best of
2021
Let’s say that I’m your little brother, Everyone is just taking classes
and I want to start teaching. What online.
advice would you give me? Share your secret with us please!
The first thing you need to pay [laughs] In these 30-student
attention to is first of all, you need classrooms, at a certain point, you
to prepare the classes. It’s a mistake start to know their names. You can
to imagine that you just go in there. try to call them by their names.
If you have a background in machine That’s something I did for the Zoom
learning, you can’t just go there and classes. I remember some students
rely on that background. You have to were more talkative than others. I
prepare. The second thing that you remember saying: “I want to hear a
definitely need to do is guide the different voice.” I asked them to try
students. That will make a big to talk to me. Just looking at their
difference in the way you teach. As I faces, I don’t understand if they are
try to engage with the students, to following what I’m saying or not. If
make them talk to me, tell me their you are solving a problem, try to
difficulties and if they understand. make them solve a problem with
That allows me to conduct the class. you. It’s hard. Like I said, it depends
How do you engage them? It’s on the shift. Some students are very
difficult to engage people! proactive, some aren’t. Some are
Yes! Especially in this Covid time. afraid to fail.
“Opening up our work to others!”

Best of
MICCAI
2021
Catarina Barata 53
I’m sure that it is much more fun to When they get comfortable with me,
do in-person teaching. they say things like, “So teacher,
[Laughs] Yes, definitely! when did you finish your degree?” I
tell them, “I finished my degree way
Can you tell us a story that before you started yours!” [laughs]
happened in class? It’s not impolite. They are just dying
I look younger than I am. Whenever to ask! [laughs]
I go to the classroom, usually the Is there anything you’d like to tell
students think that I’m another the world?
student! [laughs] I enter the
classroom. I put everything on the There’s something I want to share
table, and they remain outside. They about my field of skin cancer. We
just remain outside! I go to the door need to bring people together. We
and tell them, “You can come in!” are researchers in computer vision
They are like, “Are you the and machine learning. We are just
professor?!” one piece of this puzzle. For MICCAI,
if we are working in medicine, we
I can relate: as a very young should bring in doctors. We should
graduate, I interviewed candidates bring researchers from other fields.
for my university. One of them tried This can make beautiful things
to explain to me the difference happen! Not just looking at the
between “people of his generation” things we do, but opening up our
and mine… he didn’t realize I was work to others!
much younger than he was! [Both
laugh] More than 100 inspiring interviews
with successful Women in Science!
Best of
54 Medical Imaging Technology MICCAI
2021
Ali Hatamizadeh is a research scientist

at NVIDIA. He received his PhD and
MSc in Computer Science from the
University of California Los Angeles.
Prerna Dogra is a Senior Product

Manager for Healthcare at NVIDIA,
where she leads the Clara Application
Framework and the collaborative
open-source initiative Project MONAI.
Introduction
Recently, transformer-based models have gained a lot of traction in natural
language processing and computer vision due to their capability of learning pre-
text tasks, scalability, better modeling of long-range dependencies in the
sequences of input data. In computer vision, vision transformers and their
variants have achieved state-of-the-art performance by large-scale pretraining
and fine-tuning on downstream tasks such as classification, detection and
segmentation. Specifically, input images are encoded as a sequence of 1D patch
embeddings and utilize self-attention modules to learn a weighted sum of values
that are calculated from hidden layers. As a result, this flexible formulation
allows us to effectively learn long-range information. This warrants the question,
what is the potential of Transformer-based networks in Medical Imaging for 3D
segmentation ?
Novel proposed methodologies that leverage transformer-based or hybrid
(CNN+transformer) approaches have demonstrated promising results in medical
image segmentation for different applications. In this article, we will deep dive
into one such network architecture (UNETR) and will also evaluate other
transformer based approaches in medical imaging (TransUNET & CoTr).
1. UNETR
NVIDIA researchers have proposed to leverage the power of transformers for
volumetric (3D) medical image segmentation and introduce a novel architecture
dubbed as UNEt TRansformers (UNETR). UNETR employs a pure vision
transformer as the encoder to learn sequence representations of the input
volume and effectively capture the global multi-scale information, while also
following the successful U-shaped network design for the encoder and decoder.
Why UNETR: Although Convolutional Neural Networks (CNN)-based approaches
have powerful representation learning capabilities, their performance in learning
long-range dependencies is limited to their localized receptive fields. As a result,
such a deficiency in capturing multi-scale contextual information leads to sub-
optimal segmentation of structures with various shapes and scales.
Best of
55
MICCAI
Potential of Transformers for 3D
2021 Medical Image Segmentation
Overview of UNETR. Our

proposed model consists
of a transformer
encoder that directly
utilizes 3D patches and
is connected to a
CNN-based decoder via
skip connection.
Qualitative comparison
of different baselines.
UNETR has a
significantly better
segmentation accuracy
for left and right adrenal
glands, and UNETR is the
only model to correctly
detect branches of the
adrenal glands.
UNETR proposes to use a patch-based while CNN-based approaches fail to

approach with a transformer-based accurately segment these organs. See
encoder to increase the model’s the figure above for more qualitative
capability for learning long-range comparisons between UNETR and
dependencies and effectively capturing other CNN-based and transformer-
global contextual representation at based segmentation models.
multiple scales. For instance, in the UNETR has shown promising
multi-organ segmentation task, UNETR performance on various volumetric
can accurately segment organs with medical image segmentation tasks
complex shapes (e.g. adrenal glands) such as multi-organ segmentation
and low contrast (e.g. portal veins) using Multi Atlas Labeling Beyond The
Best of
2021
Cranial Vault (BTCV) dataset and spleen and brain tumor segmentation using
Medical Segmentation Decathlon (MSD) dataset. On the BTCV dataset, UNETR is
currently the state-of-the-art methodology on both Standard (only training with
challenge data) and Free Competition (training with additional data) public leaderboards.
In addition, UNETR so far has shown to be more efficient in comparison to other
transformer-based models (e.g. TransUNet) and CNN-based baselines in terms of
number of FLOPs and inference time. See Table 1 for comparison of number of
parameters, FLOPs and averaged inference time for various models in BTCV experiments.
Comparison of number
of parameters, FLOPs
and averaged inference
time for various models
in BTCV using a sliding
window approach.
Overview of UNETR
architecture. A 3D input
volume is divided into a
sequence of uniform
non-overlapping patches
and projected into an
embedding space using
a linear layer. The
sequence is added with
a position embedding
and used as an input to
a transformer model.
The encoded
representations of
different layers in the
transformer are
extracted and merged
with a decoder via skip
connections to predict
the final segmentation.
In the spirit of open innovation and to accelerate the research in this emerging
field, NVIDIA has open-sourced UNETR via MONAI Github public repository. In
addition, a standalone UNETR repository is available in MONAI research
contributions repository. Furthermore, two UNETR tutorials (pure MONAI and
MONAI + PyTorch Lightning) for multi-organ segmentation using BTCV datasets
are available on MONAI tutorials for researchers to further explore this
methodology in practice.
Two notable approaches that have leveraged transformers for medical image
segmentation are TransUNet and CoTr. These approaches will be discussed in
detail in the following sections.
Best of
57
MICCAI
2. TransUNet
TransUNet is a 2D hybrid CNN-Transformer segmentation model that leverages a
vision transformer (ViT) as a standalone layer into the encoder of UNet
architecture. Specifically, TransUNet uses a CNN as a feature extractor to
generate feature maps as input of the ViT model in the bottleneck of the
architecture. The ViT model uses self-attention layers to effectively process the
extracted feature maps that are fed into the decoder for computing the final
segmentation output. TransUNet has achieved comparable performance on the
tasks of multi-organ segmentation using BTCV dataset as well as Automated
Cardiac Diagnosis Challenge (ACDC) for automated cardiac segmentation.
Here is the paper explaining the architecture and the approach in further details,
while the code and models are available here.
Overview of TransUNet architecture and schematic of the transformer layer.
CoTr: CoTr proposes a 3D framework that efficiently bridges CNNs with

transformers for medical image segmentation. For this purpose, it introduces a
deformable Transformer (DeTrans) to capture long-range dependencies in the
extracted feature maps. The deformable self-attention mechanism in DeTrans
allows for selectively paying more attention to a small set of key positions in
extracted image embeddings. CoTr was tested and trained on BTCV multi-organ
segmentation dataset and achieved competitive performance in this task.
Here is the paper explaining the details of this approach and the code is
available at https://github.com/YtongXie/CoTr
Technical Differences: While all 3 approaches explore the potential application
of using Transformer based networks for medical image segmentation. There are
key differences between them. Unlike TransUNet which is a 2D segmentation
Best of
2021
Overview of CoTr architecture. It consists of CNN and DeTrans encoders as well

as a decoder. Multi-scale features are extracted from the CNN encoder,
projected to embeddings and processed in DeTrans encoder to capture long
range dependencies. The decoder processes features from the DeTrans encoder
to compute the final segmentation output.
model, CoTr and UNETR utilize volumetric inputs and hence can benefit
from the spatial context of data. UNETR and TransUNet both use the
transformer layers of the ViT model whereas CoTr leverages a deformable
transformer layer that narrows down the self-attention to a small set of key
positions in an input sequence. In addition, each of these models utilize the
transformer layers differently in their architecture. TransUNet uses the
transformer layers in the bottleneck of a UNet, while CoTr utilizes them in
between the CNN encoder and decoder by connecting them in different
scales via skip connections. On the other hand, UNETR uses the transformer
layers as the encoder of a U-shaped architecture and generates input
sequences by directly utilizing the tokenized patches. The transformer
layers of UNETR are connected to a CNN decoder via skip connections in
multiple scales.
Conclusion
Convolutional neural networks (CNNs) have been the de facto standard for
3D medical image segmentation so far. However, Transformers have the
potential to bring a fundamental paradigm shift with their strong innate
self-attention mechanisms and hold the potential to serve as strong
encoders for medical image segmentation tasks. The pre-trained
embedding can then be adapted for various down-stream tasks (example,
segmentation, classification & detection). In the years to come, we will see
new breakthroughs powered by Transformers for medical imaging - the
future is exciting, so we should brace ourselves.
Best of
57
MICCAI
Did you miss out

on MICCAI 2021?
No worries!
We got you covered!
Keep in contact
with the
community!
Best of
60 Startup Village MICCAI
2021
Deepcell is a life science company offering

technology at the intersection of microfluidics,
genomics, and AI machine learning.
They have been awarded the winner prize at
the MICCAI 2021 Startup village. Once again,
we chose well when we decided to feature
them, before the award: co-founder and CTO
Mahyar Salek told us much more about them.
Cell morphology, which is the study of the shape, structure, form, and size
of cells, is the backbone of cell research being performed in laboratories
around the world, typically utilizing traditional microscopes and other
devices to view the cells of a given sample.
Deepcell, which spun out of Stanford University in 2017, is bringing
innovation that could reshape the way we think about cells, with a novel
imaging and sorting platform powered by microfluidics and supported by
machine learning. It has developed a way to automatically analyze and
understand cells based on how they look and created hardware that not
just analyses these cells but also acts on them. It is developing a highly
advanced camera with an artificial brain that can identify, annotate, sort,
enrich, and preserve single cells.
The way this works is that a sample from a cancer patient, for example, is
fed into the device, then images are taken which are fed into a machine
learning brain, and these images are classified in real time to identify
which cells are cancerous and which are not. After that, it isolates those
cancer cells and applies deep sophisticated visual analysis. Based on that
analysis, one could access viable cells to perform a range of interesting
biological experiments.
“This is something that’s never been possible before Deepcell technology,”
Mahyar tells us. “Imagine you are trying to figure out which drug is effective
for your patient. Today, people try out different drugs, which may or may
not work, and have to deal with all the related side effects and
consequences that entails. I don’t think that should have a place in 21st
century medicine. Imagine instead, you have a sample from your patient,
and you try different drugs on the actual cells. You could isolate different
reactions and morphologies to make a connection between how the cells
look and how effective the drug is on a patient before they take it.”
Best of
MICCAI
2021
Deepcell 61
In a way, this is bridging the gap between the centuries-old practice of
looking at cells under the microscope, and the modern wave of biology that
wants to really understand the molecular makeup of our cells. Biology and
life sciences are rapidly becoming data science and engineering fields. They
have always been about trial and error – you observe something, you try
something, you learn something – but data science is allowing a lot more
control over how we develop new medicine and new insights.
“Molecular understanding of cells has seen a revolution in the last few years
and advancements in machine learning, hardware, and data have all paved
the way for technologies like deep learning to flourish,” Mahyar points out.
“We have replaced human intelligence with what some would argue was
much more like a super intelligence.”
Deepcell has the largest dataset of high-resolution single cell images
anywhere in the world. With such powerful data, it wants to reach an
audience that truly understands the value of applying machine learning
techniques to medical imaging.
If all this is piquing your interest, you will be pleased to hear that Deepcell is
hiring! They have many open positions in AI, machine learning, and data
science, so make sure that you pay them a visit.
“I want to start a conversation with the brightest minds out there,” Mahyar
says. “No single company or genius in the world can solve the kinds of
problems that we want to solve here at Deepcell. We need a whole
community behind us and that’s why we’re reaching for the best people out
there. We want people with the highest skillsets as well as the highest ethics
to join this effort and to help us build the next revolution in life sciences and
data science. If you are interested in applying big data skillsets to innovate in
this area, then we have the most exciting dataset to offer you!”
62 Congrats, Doctor!
Étienne Léger recently completed his PhD at
Concordia University under the supervision of Marta
Kersten-Oertel. His research interest lies in Human
Computer Interaction, evaluating how new methods
can potentially improve neurosurgical workflows.
His research focused on developing and assessing
neurosurgical guidance tools making use of novel
paradigms, methods and hardware to make it more
intuitive and interactive. He believes that through
new hardware integration, neurosurgical guidance
can be made more accessible, which can lead to
improved patient outcomes.
It is estimated that 13.8 million patients per year require neurosurgical interventions
worldwide, be it for a cerebrovascular disease, stroke, tumour resection, or epilepsy
treatment, among others. These procedures involve navigating through and around
complex anatomy in an organ where damage to eloquent healthy tissue must be
minimized. Neurosurgery thus has very specific constraints compared to most other
domains of surgical care. These constraints have made neurosurgery particularly
suitable for integrating new technologies. Any new method that has the potential to
improve surgical outcomes is worth pursuing, as it has the potential to not only save
and prolong lives of patients, but also increase the quality of life post-treatment.
In his work, Étienne developed novel neurosurgical image-guidance methods,
making use of currently available, low-cost off-the-shelf components. In particular,
a mobile device (e.g. smartphone or tablet) is integrated into a neuronavigation
framework to explore new augmented reality visualization paradigms and novel
intuitive interaction methods. The developed system, called MARIN for Mobile
Augmented Reality Interactive Neuronavigator, aims at improving image-guidance
using augmented reality to improve intuitiveness and ease of use. Further, gestures
on the mobile device are used to increase interactivity with the neuronavigation
system. These touchscreen interactions enable partially mitigating the problem
of accuracy loss or brain shift that occurs during surgery. It also gives the control
over the visualization back to the surgeon, enabling them to switch between
different visualization methods, be it traditional cut-planes guidance, virtual
3D models extracted from the preoperative scan or a virtual reality view where
segmented structures are overlaid into the surgical field in real-time. The AR view
is also customizable: structures (e.g. vessels, cortex surface, etc.) can be individually
added or removed from the view and the augmentation can be limited to only
Étienne Léger 63
parts of the operating field through a movable AR window for instance. All of
these customizations can be done directly on the device using touchscreen
interactions and enable real-time control over the guidance for the surgeon.
Our developed mobile AR system displaying CTA acquired vessels (virtual

anatomical data) over a phantom head (the real world as captured by the iPad).
The results of his work show the feasibility of using mobile devices to improve
neurosurgical processes. Augmented reality enables surgeons to focus on the
surgical field while getting intuitive guidance information. Mobile devices
also allow for easy interaction with the neuronavigation system thus enabling
surgeons to directly interact with systems in the operating room to improve
accuracy and streamline procedures. To encourage further research and
accelerate the pace of innovation, Étienne released the developed application
under an open source license, making it accessible to others to reuse and
keep improving upon.
On the left: screenshot of the system when used in standard IGNS mode.
On the right: screenshot of the system when used in augmented reality mode.
64 ICCV Workshop Preview
Computer Vision for
Automated Medical Diagnosis
Yuyin Zhou is a postdoctoral researcher
at Stanford University, working with
Matthew Lungren and Lei Xing on
medical image analysis and other
related machine learning problems.
She is also co-organizing the very
first Computer Vision for Automated
Medical Diagnosis workshop at this
year’s ICCV. With the conference only
a few weeks away, Yuyin is here to tell
us what to expect.
Yuyin Zhou
Machine learning has been a helpful
tool for doctors in dealing with different while many new medical devices have
medical imaging problems for some time been developed in conjunction with
now. It can also support disease diagnosis industry.
and treatment planning. Over the past
few years, there has been a great deal In spite of this, the safe and reliable
of progress made in this area because of adoption of such technologies in
huge advancements in computer vision hospitals in the real world remains
and artificial intelligence techniques. limited, and many problems, such as
Problems such as medical image cancer diagnosis, are still not solved.
registration, structure detection, and
tissue and organ segmentation have “This is the key reason why we have
achieved state-of-the-art performance, created this workshop,” Yuyin reveals.
Automated Medical Diagnosis 65
Computer Vision for
“We want to bring researchers in the a distinguished professor at UCLA

computer vision, machine learning, who is heavily involved in the medical
healthcare, and clinical fields together industry. He is going to talk about
to discuss the current progress and the how to bridge the gap between
related challenges which are yet to be medical research, computer vision,
addressed. What are the next steps? and industry. We are also going to
What possible solutions should we be feature my advisor, Matthew Lungren,
looking for? Are there any important an Associate Professor of Radiology
research directions in the field that we at Stanford, who will speak from a
haven’t explored yet?” doctor’s perspective.”
Their first meeting at ICCV later this Another benefit of this module is that
month already has a stellar line-up of it does not require long run-time,
speakers on board, including Russell allowing fast and easy implementation
Taylor from Johns Hopkins University, in the clinic.
who will be discussing Autonomy and
Semi-Autonomous Behavior in Surgical Other confirmed speakers include
Robot Systems. Yizhou Yu – How should machines
analyze medical images to aid
“I think this topic is one of the most diagnosis? – and Lena Maier-Hein –
important and has not been addressed Statistics meets machine learning in
enough in the computer vision biomedical image analysis. There will
community,” Yuyin tells us. also be 10 engaging oral talks from
UCLA, University of Oxford, Google,
“We’ve also got Demetri Terzopoulos, and more.
66 ICCV Workshop Preview
This is not the first medical computer “We’re going to give a more holistic
vision workshop – CVPR has had a and complete view of this field and we
similar event for a number of years want to discuss things from broader
now. The community recognizes and perspectives – not just medical computer
understands the importance of the vision like CT and MRI, but also NLP,
topic, so the foundations have been medical robots, surgical planning, and
laid for this new meeting at ICCV to be how to better adapt existing computer
a success this year and for many more vision and machine learning expertise
to come. into all of these different medical
However, the team do not intend to problems.”
create a carbon copy of another event.
They plan to discuss topics which The workshop is very close to Yuyin’s
haven’t been covered before. But with own body of work. During her PhD
so much on the menu at ICCV this year, career at Johns Hopkins University, the
including another medical workshop, team have been working on the Felix
how should attendees choose between Project, which aims to detect pancreatic
this one and everything else on offer? cancer earlier.
“What really sets our workshop apart “We started from pancreas
from others is that we are focusing on segmentation and went deeper into
the general challenges in the medical
pancreatic tumor segmentation and
computer vision arena,” Yuyin points
out. detection problems,” Yuyin explains.
Automated Medical Diagnosis 67
Computer Vision for
“For my PhD research, I performed diverse band of co-organizers include

accurate segmentation and detection Lequan Yu from the University of
from different medical images, especially Hong Kong; Maithra Raghu from
for smaller and more challenging subjects Google Brain; Qi Dou from the Chinese
like tumors. I also began to study the University of Hong Kong; Yuankai
issues of generalization, transfer, and Huo, Assistant Professor at Vanderbilt
robustness because there can be a huge University; Holger Roth, a Senior
gap when you transfer your technology Applied Research Scientist at NVIDIA;
from one hospital to another hospital, or and many other professors and senior
one population group to another. This is scholars from different affiliations.
why these existing technologies cannot
be fully trusted to be applied to real-
“We hope that everybody can join our
world clinical applications yet. A lot of
workshop,” Yuyin adds, finally.
my research focus has shifted to studying
how to make these existing techniques
“We have such an amazing line-
more robust and have better transfer
up of speakers and people covering
characteristics and better generalization
different topics. Please take part in the
to different problems.”
discussions and ask lots of questions.
The team are currently finalizing the You will definitely learn something
details for the ICCV workshop. Yuyin’s new. I guarantee it!”
IMPROVE YOUR
VISION WITH
Copmuter Vision
News
To the magazine of the

algorithm community
and get also the New
supplement Medical
Imaging News!

Computer Vision News - October 2021

Uploaded by

Copyright:

Available Formats

Computer Vision News - October 2021

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Vision News - October 2021

Uploaded by

Copyright:

Available Formats

October 2021

with the new supplement

UNETR proposes to use a patch-based while CNN-based approaches fail to

04 Removing Diffraction Image...

10 Creating a multi-object tracking...

The picture on the left illustrates how this

This paper aims to mathematically describe

The degradation model is described as:

𝑦𝑦̂ = 𝜙𝜙[𝐶𝐶 (𝑥𝑥 ∗ 𝑘𝑘 + 𝑛𝑛)]

Moreover, the authors introduce a second experiment to show the importance of

As last contribution, the paper presents a DynamIc Skip Connection Network

The network is trained on a combination of:

Nice to meet you everyone this month! I hope that you

Creating the R-CNN

# Root directory of the project

# Import Mask RCNN

# Directory to save logs and trained model

# Local path to trained weights file

# Directory of images to run detection on

Using TensorFlow backend.

# Create model object in inference mode.

# Load weights trained on MS-COCO

To improve consistency, and to support training on data from multiple sources at

# Load COCO dataset

# Print class names

# COCO Class names

Run Object Detection

# Load a random image from the images folder

Here are some videos, visualizing the example network:

Automotive Sensor Fusion

Landing Sensor Fusion

Object Detection with distance estimation

Semantic Landing Risk estimation

Get your ICCV Daily news

Click here to subscribe

Around 2,000 years ago, a volcano

However, in the field of computer

“Given the difficulty of the problem, we

Marcello and team on site

Hidden Stories of the Heart invites you to connect with the

Drawing on personal accounts of women from

The use of papier-maché - which sees distressed, shredded paper transform

We invite you to this wonderful exhibition where we will welcome you to

NLP SUMMIT ICCV 2021 AI in Pharma

CHEST IMVC Innovation

Due to the pandemic situation, most shows are considering

... RSIP Vision did it !!!

Starting today, Computer Vision

Subscribe to Computer Vision

It is always great to see people recognized, especially when

MICCAI Enduring Impact Award

MICCAI Young Scientists Award

AMINN: Autoencoder-based Multiple Instance Neural Network

Jianan Chen is a fourth-year

An autoencoder-based multiple instance neural network sounds

An overview of the proposed autoencoder-based multiple instance learning

Volume (tumor size) of 181 lesions from 50 patients with Z-score

Comparison with other

Comparison of machine learning models for predicting multifocal CRLM