Research Article: Retinal Mosaicking With Vascular Bifurcations Detected On Vessel Mask by A Convolutional Network

Hindawi
Journal of Healthcare Engineering

Volume 2020, Article ID 7156408, 13 pages
https://doi.org/10.1155/2020/7156408
Research Article
Retinal Mosaicking with Vascular Bifurcations Detected on Vessel
Mask by a Convolutional Network
Xiuxia Feng ,1,2 Guangwei Cai ,1 Xiaofang Gou ,1 Zhaoqiang Yun ,1

Wenhui Wang ,2 and Wei Yang 1
1
Guangdong Provincial Key Laboratory of Medical Image Processing, School of Biomedical Engineering,
Southern Medical University, Guangzhou 510515, China
2
The Information Department, The Sixth Affiliated Hospital (Gastrointestinal & Anal Hospital), Sun Yat-sen University,
Guangzhou 510515, China
Correspondence should be addressed to Wenhui Wang; [email protected] and Wei Yang; [email protected]
Received 1 July 2019; Revised 27 October 2019; Accepted 12 December 2019; Published 9 January 2020
Academic Editor: Niccolò Camarlinghi
Copyright © 2020 Xiuxia Feng et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Mosaicking of retinal images is potentially useful for ophthalmologists and computer-aided diagnostic schemes. Vascular bi-
furcations can be used as features for matching and stitching of retinal images. A fully convolutional network model is employed
to segment vascular structures in retinal images to detect vascular bifurcations. Then, bifurcations are extracted as feature points
on the vascular mask by a robust and efficient approach. Transformation parameters for stitching can be estimated from the
correspondence of vascular bifurcations. The proposed feature detection and mosaic method is evaluated on retinal images of 14
different eyes, 62 retinal images. The proposed method achieves a considerably higher average recall rate of matching for paired
images compared with speeded-up robust features and scale-invariant feature transform. The running time of our method was
also lower than other methods. Results produced by the proposed method superior to that of AutoStitch, photomerge function in
Photoshop cs6 and ICE, demonstrate that accurate matching of detected vascular bifurcations could lead to high-quality mosaic of
retinal images.
1. Introduction Numerous retinal image registration and stitching

methods have been proposed in the literature. Can et al.
Retinal images are crucial for ophthalmologists to diagnose a [2–4] introduced a layered mosaicking algorithm that uses
series of diseases. Fundus lesions caused by fundus and bifurcations as features of the vascular structure. However,
systemic diseases, such as diabetes, hypertension, macular one disadvantage of this algorithm is the occasionally in-
lesions, fundus arteriosclerosis, and retinopathy, can appear distinguishable landmarks. Ryan et al. [5] also studied the
in the retina [1]. Patients can receive timely and appropriate registration of retinal images using landmark correspon-
treatment if specialists examine and diagnose these diseases dence. However, this approach obtains remarkably few
early. At present, mosaic images have been extensively used matching point pairs. Landmark matching formulation [1] is
to provide a comprehensive view of the retina to assist based on retinal image alignment by enforcing sparsity in
ophthalmologists during laser surgery or other procedures. correspondence matrix. However, such approach needs a
However, the retinal image captured by a fundus camera or complicated computational process and high cost.
scanning laser ophthalmoscope can only cover a local area of In 2010, Kwon and Ha [6] introduced panoramic video
the eye. The retinal images captured in different field areas technology based on extracting important information from
must be stitched together to form a mosaic image that ul- the image. The scale-invariant feature transform (SIFT) is
timately meets the needs of the analysis of the entire area of one of the most robust and widely used methods. However,
the fundus in research and clinical diagnosis. identifying corresponding points becomes difficult in the
2 Journal of Healthcare Engineering
case of changing illumination or two surfaces with a similar automatic mosaic of multiple retinal images. The image
intensity, as SIFT extract features using only gray infor- stitching process can be summarized by the steps shown in
mation. Brown and Lowe [7] used the invariant feature for Figure 2. The novelty of the paper include the following:
automatic mosaicking of natural images, adopting the
(1) The binary image of vascular structures instead of the
multiresolution image fusion. Alomran and Chai [8] pre-
original image was used for bifurcation detection.
sented an image stitching algorithm based on speeded-up
robust features (SURF), but the algorithm is restricted to an (2) Vascular structures of the retinal image are seg-
image set without exposure differences and extremely high mented using deep CNN model instead of traditional
lens distortion. methods. A more accurate binary image of vascular
The disadvantages of methods based on SIFT or SURF structures can be obtained using deep CNN model.
include clustered detection points on the edge, uneven (3) Using morphology operations to locate Y- or T-form
distribution, and insufficient effective information to de- bifurcation points from the binary image which has
scribe the image. Matching of bifurcations is more appealing been proved to be superior to the existing methods
compared with image registration. These previous methods such as SIFT.
are limited only to images containing clearly visible vascular (4) Optimal transformation parameters for stitching are
structures. However, the vascular structures in retinal im- obtained by further revising parameters. Then, a
ages are often blurred when the fundus bleeds or becomes special coordinate mapping method was used to
tumorous. Thus, the detection of robust features in retinal transform image coordinates to the same coordinate
images for feature matching presents a difficulty. Simulta- system for stitching. These steps are comprehensively
neously, indistinguishable features may result in matching described as follows. (1) Vascular structures of the
ambiguities. retinal image are precisely segmented using a deep
This work aims to improve the robustness of feature CNN model. The center lines of vascular structures
detection for indistinct vascular structures in retinal images. are extracted for subsequent processing. (2) Vascular
Numerous publications focus on using the convolutional bifurcations are detected as key points by mor-
neural network (CNN) to segment vascular structures. Jiang phology operations, and their descriptors are cal-
and Tan [9] proposed the use of conditional deep con- culated. (3) The correspondences between retinal
volutional generative adversarial networks to segment the images are established using second and nearest
retinal vessels , by using skip connection to connect the neighbor ratio matching [13] by calculating the
output of the convolutional layer with the output of the Euclidean distance between feature descriptors. (4)
deconvolution layer to avoid low-level information sharing. Parameters of transformation are estimated to warp
However, a few number of images exist in the data set to images into a uniform coordinate system using the
train the network, and the accuracy of vessel segmentation homography matrix, which is calculated by the
and the robustness of the network cannot be verified well. correspondences between image pairs. (5) The ret-
Alom et al. [10] proposed a Recurrent Convolutional Neural inal images and masks are warped to uniform co-
Network (RCNN) based on U-Net as well as a Recurrent ordination, and vertex coordinates of images are
Residual Convolutional Neural Network (RRCNN) based on calculated after a uniform coordinate transforma-
U-Net models. 190,000 patches are randomly selected from tion. (6) Multibland blending is used to seamlessly
20 of the images as the input of networks which was a fuse retinal images to generate a mosaic image.
complicated computational process. Hu et al. [11] intro-
duced a retinal vessel segmentation method based on CNN
and fully connected conditional random fields. In this paper, 2.2. Feature Detection
we first employ a CNN segmentation model [12] to segment
vascular structures in retinal images. Then, an effective 2.2.1. Segmentation of Vascular Structures. Classic methods
approach is proposed to detect bifurcations as features for for addressing vascular structure segmentation involve
matching and stitching by morphology operations in the hand-crafted filters, such as line detectors and vessel en-
vascular structures. Figure 1 shows an example of a mosaic hancement techniques [12, 14–16]. Recent techniques
image produced by the proposed method, our method for [17, 18] are available for segmenting vascular structures of
feature detection and matching benefits from previous retinal images, but these techniques exhibit poor
works on retinal images. performance.
The rest of the paper is organized as follows. First, Deep CNN architectures are designed to solve the initial
Section 2 describes the framework and details of our pro- classification of natural images; they are also very effective
posed method. Section 3 provides experimental results. for vascular structure segmentation according to the liter-
Section 4 discusses the results. ature [12]. Maninis proposed a CNN architecture based on
visual geometry group (VGG) network for retinal vascular
2. Method segmentation.
A CNN model for vascular structure segmentation is
2.1. Overview. This work aims to develop a practical and already trained on the DRIVE [19] and STARE [20] datasets
useful method for detecting robust and sufficient features (40 and 20 images, respectively). The trained CNN model is
with indistinct vascular structures and constructing an publically available (All the resources of this paper, including
Journal of Healthcare Engineering 3
(a) (b)
Figure 1: Example of a retinal mosaic image produced by our method. (a) Retinal images and (b) mosaics produced by our methods.
Multiple retinal images Vascular structures Vascular centerlines Bifurcation points
Blending Warping
Mosaic image
Feature matching
Figure 2: Flowchart of the proposed method for mosaic images.
code and pretrained models, are available at http://www. Subsequently, a fast parallel algorithm [21] is applied for
vision.ee.ethz.ch/∼cvlsegmentation/driu/). Our experiment thinning images to extract the center lines of retinal vessels
is installed and configured Caffe on Ubuntu 14.04 with segmented by the CNN model. This algorithm extracts the
NVIDIA GPU (TitanX). Download the original DRIU model center lines of an image through the removal of image
and run the model to segment the vascular structures. The contour points, except for points belonging to the skeleton.
precision-recall between the obtained mask and gold standard Iteration over several times will obtain the final skeletons as
(0.822) is higher than that of the precision-recall between the vascular centerlines (e.g., Figure 3(c)).
human-annotated result and gold standard (0.791) on the
DRIVE data set. This model is employed to segment retinal
vascular structures in retinal images in our experiments. 2.2.2. Bifurcation Detection and Descriptors. Bifurcations of
Given that the size of the retinal image is inadequate for vascular centerlines are detected as matching features.
the operation of the CNN architecture, the image must be Analysis of geometrical structures of bifurcations is im-
sampled before retinal vessel segmentation; the size of portant to design a scheme to solve this problem. From the
downsampling image is 700 × 605 pixels. Figure 3(a) shows a observations, the vascular branch and crossover are gen-
retinal image, and Figure 3(b) is the segmented retinal erally presented as Y- or T-form, respectively. Hence, we use
vascular structures obtained using the CNN model. We will morphological operations to locate bifurcation points with a
further research how to improve the model to obtain the set of structural elements. Multiple Y- or T-form structural
resulting output without boundary. elements are applied to erode the vascular centerlines mask.
(a) (b) (c)
Figure 3: Segmentation of vascular structures and extracted vascular centerlines. (a) Retinal image, (b) Vascular structures of retina, and (c)
Vascular centerlines.
The structure element of size 3 × 3 is selected according to through each pixel in binary image, if a pixel point with a gray
the literature [22]. The last four Y-form structural elements value of 255, it sets the grayscale value of the 5 × 5 neigh-
shown in Figure 4 include most vascular bifurcations by borhood on the point to 0. Finally, the features are expected to
comparing the various structural elements to erode the same demonstrate vascular bifurcations. The coordinates of these
binary image in our experiments. Finally, 14 structure el- features are marked in the vascular centerlines image, as
ements are selected from a set of structural elements to erode shown in Figure 5(a). The operation on the gray image only
vascular centerlines images to produce the highest number locates the position of the feature. Then, feature matching is
of correctly detected bifurcations and no falsely detected conducted on the original retinal image. Figure 5(b) shows the
features. More criteria will be considered for selecting marked features in the original retinal image.
structural elements in future work. Compared with features detected in the original retinal
One of the structural elements (Figure 4) is used to erode image using SIFT algorithm (Figure 5(c)), the SIFT features
the vascular centerlines image, which will result in an eroded are mainly concentrated on the edge and are low in number.
image containing numerous discrete points. Similarly, several features are detected in the original retinal
A total of 14 images are produced during image erosion image using oriented FAST and rotated BRIEF (ORB)
by 14 structural elements. An existing problem is the ad- (Figure 5(d)). We apply different feature detection algo-
dition of the 14 images. We add up all eroded images in a rithms, including SIFT, ORB, and SURF, on 62 fundus
certain weight. The function is defined as follows: images and calculate the average running time and the
number of detected features. As listed in Table 1, SIFT al-
1 1
C1 � AΘB1 􏼁 + A Θ B2 􏼁 gorithm spends the longest time, and SURF contains a
2 2 number of features. Although SURF acquires the maximum
for i � 2 to 13 do number of features, this algorithm is time consuming and
(1) mainly focuses on one area, which is easily noticed in feature
i 1 matching in the next section. Trading off between the
Ci � C + A Θ Bi+1 􏼁 running time and number of features, our feature detection
i + 1 i− 1 i + 1
algorithm is superior to other feature detection algorithms.
end for Numerous bifurcations will be produced after feature
detection. A descriptor is then constructed to describe the
where Bi denotes the 3 × 3 structural element, B1, B2, . . ., B14 distribution of intensity content within the feature neigh-
contains all 14 structural elements. For example, C1 rep- borhood. Similar to the gradient information extracted by
resents the first image that adds the image eroded by B1 and SIFT [23] and its variants, SURF [24] builds on the distri-
that by B2 together with 1/2 weight. Similarly, C2 represents bution of first-order Haar wavelet responses in x and y di-
the image that adds image C1 and the image eroded by B3. rections rather than the gradient and uses only 64 dimensions.
C13 is the ultimate image that adds the 14 eroded images with SURF first calculated the Haar wavelet responses in x and y
weight 1/14. directions within a circular neighbourhood of radius 6s
A binary image is obtained by adding all eroded images, around the interest point, with s the scale at which the interest
in which the white points with the gray value of 255 are the point was detected. The dominant orientation is estimated by
initially detected features. The vessel bifurcation presented as calculating the sum of all responses within a sliding orien-
cross-form may be satisfied with multiple Y-form structures. tation window of size π/3. The horizontal and vertical re-
Therefore, several subpixel points around the vessel bifur- sponses within the window are summed. The two summed
cation may be detected as features by different structural responses then yield a local orientation vector. The longest
elements. Only one of these points is taken as the vessel such vector over all windows defines the orientation of the
bifurcation point in the ultimate binary image. Iterating interest point.
1 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 1
0 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 0 1 0
0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 1
(1) (2) (3) (4) (5) (6) (7)
1 0 0 1 0 1 0 0 1 1 0 1 0 1 0 1 0 0 0 0 1
0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 1 1 1 0
1 0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 0 0 0 0 1
(8) (9) (10) (11) (12) (13) (14)

Figure 4: 3 × 3 Y- and T-form structural elements.
(a) (b)
(c) (d)
Figure 5: (a) Features (in red) detected by our algorithm. (b) Features marked in green dots on the retinal image. (c) and (d) Several features
detected by SIFT and ORB, respectively, only concentrating on the edge.
A square window centered around a feature was con- subregion, Haar wavelet responses are computed at 5 × 5
structed and oriented along the orientation we already got regularly spaced sample points. The Haar wavelet responses
above. The size of the window is set to 20s. The region is split in horizontal and vertical directions (filter size 2 s) are
up regularly into small 4 × 4 square subregions. For each represented as dx and dy, respectively. Then, wavelet
Table 1: Comparison of the running time and the average number is less than that in our algorithm. Otherwise, although SURF
of features for 62 images using different feature detection contains more matching points, it mainly concentrates on
algorithms. certain areas drawn by a blue circle. These points contain
Algorithm ORB SIFT Ours SURF little information as they occupy a small number of retinal
Running time (s) 0.152 0.741 0.141 0.255 vessels.
Average number of features 65 98 256 585
2.3.2. Image Warping. The next step estimates transfor-
mation through the matching of detected bifurcations for
responses dx and dy are summed up over each subregion as retinal image pairs. Assuming that the camera rotates
􏽐dx and 􏽐dy, respectively. The sum of the absolute values of around the center of the optical axis, the transformation
responses (|dx | and |dy |) is also extracted to obtain infor- model of the image is estimated. This section first introduces
mation regarding the polarity of intensity changes. Hence, how to use the homography matrix to calculate the pa-
each subregion has a 4D descriptor vector v for its under- rameters of the transformation model and then uses these
lying intensity structure v � (􏽐dx, 􏽐dy, Σ|dx |, Σ|dy |). Con- parameters to transform image coordinates to the same
catenation for all 4 × 4 subregions results in a descriptor coordinate system and perform image stitching.
vector of length 64. Once the feature descriptors have been Assuming that the camera rotates around its central axis,
calculated, the correspondences between all image pairs can the transformation model of the image is defined as follows:
be established by calculating the Euclidean distance of those
descriptors. T2 (p; R, K) � RK− 1 p. (2)
Each image has a camera intrinsic parameter matrix K

2.3. Transformation Estimation and an extrinsic matrix R. K is a 3 × 3 intrinsic parameter
f 0 cx
⎢ x ⎥
2.3.1. Bifurcation Matching. The matched candidate of each matrix, K � ⎡⎢⎢⎣ 0 fy cy ⎤⎥⎥⎦. f is the focal length of the camera
feature is determined by identifying its nearest neighbor in 0 0 1
the features from the target image [23]. The nearest neighbor and (cx, cy) is the center point coordinates of the image.
refers to the feature with minimum Euclidean distance of the Estimation of focal length f [27]is as follows:
descriptors. We obtain the descriptor of a bifurcation in h223 − h213
image Ii and calculate its Euclidean distances to the de- f20 � , if h211 + h212 ≠ h221 + h222
h211 + h212 − h221 − h222
scriptors of all features in another image Ij. An effective
measure is obtained by comparing the distance of the closest h13 h23
neighbor to that of the second-closest neighbor to determine or f20 � − , if h11 h21 ≠ − h12 h22 ,
h11 h21 + h12 h22
correct matching. To suppress matches that could be
regarded as possibly ambiguous, matches with a distance (3)
ratio larger than 0.3 will be rejected [25]. One of the bi- where h11, h12, . . . are elements of the homography matrix H.
furcation matching parameters was customized. This pa- Calculate f of two images from images set and take the
rameter is the confidence that indicates two images come median of f of all image as the focal length of all images.
from the same mosaics. The default value is 1.0. In our user R is a 3 × 3 extrinsic matrix. The matrix R of the reference
interface, the confidence was set to an option to choose a image is the identity matrix. R can be roughly estimated
number between 0.5 and 1.5 freely to generate different according to the intrinsic parameter matrix and the
mosaics. homography matrix, Rj � Ri K−i 1 H−ij1 Kj . Given the matrix R
Each pair of potentially matching images includes a set of of an image, intrinsic parameter matrix Ki Kj of the two
feature matches that are geometrically consistent (inliers) and images, and the homography matrix H of the two images, the
another set of inconsistent features (outliers) inside the area of matrix R of another image can be obtained. All image camera
overlap. Random sample consensus (RANSAC) [7, 26] can be parameters are revised by beam adjustment. The matching
used to select a set of inliers that are compatible with a points of pairs of images are mapped into the three-di-
homography between images. Let X be a point on the retina mensional space, calculating the camera parameters with the
with projections xi and xj in two images taken from different minimum sum of squares of matching errors. The function is
angles. A homography matrix H describes the transformation defined as follows:
that connects xi and xj for any point X on the retina, xj � H xi. �� 2
The inliers or outliers are classified by the distance threshold t. (R, K) � min 􏽘 ��Ri K−i 1 p − Rj K−j 1 q�� , (4)
When ‖xj − Hxi ‖ > t, the matching point is considered as p∈P,q∈Q
outliers, t � 3 and the number of iterations is 2000. Then, it
can get a pure sample with probability p � 0.99. The optimal where Ri K−i 1 � Rj K−j 1 Hij . Let the matching points of pairs of
homology matrix H is obtained. images be denoted as P and Q. An initial correspondence
Figure 6 shows the correspondence between different set (pi, qi) ∈ C0 containing all matching points is estab-
features of an image pair. The first, second, third, and fourth lished. Levenberg-Marquardt [28] algorithm is used to
rows indicate our algorithm, ORB, SIFT, and SURF, re- solve the function, and the optimal parameters R and K are
spectively. The number of matching points of ORB and SIFT obtained.
Ours
Ours
ORB
ORB
SIFT
SIFT
SURF
SURF
(a) (b)
Figure 6: Matching result images of evaluation and comparison for different algorithms. The first column shows the initial matching result,
whereas the second column depicts the result obtained after the elimination of outliers using RANSAC. (a) Initial matching result and (b)
matching result after eliminating outliers by RANSAC.
The parameters of the transformation model are used to (3) The coordinate is backprojected onto the warped
transform the coordinates of all stitched images to the same image by means of backprojection. First, the coor-
coordinate system. Coordinate transformation is performed dinates (x′, y′, z′) are obtained by the inverse
on the image in the following steps: transform for spherical coordinates in (2):
x′ � sin v × sin u, y′ � − cos v, z′ � sin v × cos u.
(1) Suppose the point (x, y, z � 1) of the initial image T is
Then the coordinates of the warped image are ob-
projected to the same coordinate (x′, y′, z′).
tained by backprojection:
According to K, R, there is a relation as follows:
x x′
⎡⎢⎢⎢ ⎤⎥⎥⎥ ⎡⎢ ⎤⎥
x′ x ⎢⎢⎢ y ⎥⎥⎥ � KR− 1 ⎢⎢⎢⎢⎢ y′ ⎥⎥⎥⎥⎥. (6)
⎢
⎡
⎢
⎢ ⎤⎥⎥⎥ ⎢ ⎤⎥⎥
⎡
1⎢
⎢ ⎢⎣ ⎥⎦ ⎢⎣ ⎥⎦
⎢
⎢ ⎥ −
⎢⎢
⎢ ⎥⎥
⎣ y′ ⎥⎥⎦ � RK
⎢
⎢ ⎣ y ⎥⎥⎦.
⎢
⎢ (5) z z′
z′ z
Finally, the coordinates of the initial image T are
transformed to the same coordinate system.
(2) Mapping the coordinate in (1) to the spherical co-
ordinate system (u, w, v). The relationship is as
􏽱��
2 2 2
follows: u � atan (x′/z′), w � y′ / x′ + y′ + z′ , 2.3.3. Image Blending. When all images are warped into the
v � π − a cos w. same coordinate system, the images are blended to construct
a mosaic image with invisible image boundaries [29]. Each performance metric. Using the recall rate as the metric to
sample (pixel) at the same location should exhibit the same evaluate the matching performance is mainly from the in-
intensity in the images for mosaics. However, no such case is spiration of this reference [32]. The recall rate is defined as
observed in actual situations. After gain compensation, follows:
several image edges remain visible due to several causes, retrieved correct matches
such as vignetting (decreasing intensity toward the edge of Recall � , (8)
all correct matches
the image), misregistration of the mosaicking procedure,
and radial distortion [7]. In 2D, the work of Burt [30] and where retrieved_correct_matches denote the number of
Adelson on multiband blending has proven particularly correct matching points obtained using our matching al-
effective for image mosaicking without blurring and gorithm. All_correct_matches are manually marked by
ghosting artifacts. Suppose image A and image B are images experts in the reference and target images.
to be blended. The method includes the following steps:
(1) Gaussian pyramid is constructed according to im- 3. Experiments
ages A and B, respectively: GA0, GA1, GA2, GA3, GA4
and GB0, GB1, GB2, GB3, GB4. The construction of 3.1. Image Data. We implemented the proposed method in
Gaussian pyramid is divided into two steps: Gaussian C++ with OpenCV library. The experiments were performed
smoothing and downsampling. The layers of on a PC with 8 GB RAM and an Intel i5-6500 CPU
Gaussian pyramid are 5. Gaussian kernel of size (3.20 GHz). The data set, which includes 62 retinal images,
5 × 5. The frequency band of the two adjacent layers was collected from the hospital in Guangdong, China.
decreases by 1/8 times and the width and height of Images are obtained from 14 different eyes. A total of 2–9
the image reduces by 1/2 times. images are acquired for each eye. The image format is JPG
format. The original size of images in the data set is in-
(2) Laplace pyramid is constructed, respectively, by consistent, including 2304 × 1728 and 1600 × 1200. A total of
subtracting adjacent two layer Gaussian pyramid: 190 image pairs with overlaps are used to evaluate the
LA0, LA1, LA2, LA3, LA4 and LB0, LB1, LB2, LB3, LB4 performance of feature matching. Table 2 provides a detailed
(3) The Laplace pyramid at the same level is blended by description of the overlaps between image pairs. In the
mask, denoted as LC0, LC1, LC2 LC3, and LC4 experiment, several stitching software were compared with
(4) The Laplacian pyramid of the upper layer is sampled the proposed method. The stitching software included
to the same resolution as LC0 AutoStitch, photomerge function in Photoshop cs6 and ICE
(5) The images obtained in 4 are superimposed to obtain (Image Composite Editor).
a final output image C.
Figure 7 shows the comparison of four sets of mosaic 3.2. Comparison of Feature Matching Methods. The experi-
images before and after image blending. The first row shows ment is conducted to investigate the effect of the overlap
mosaic images with visible image boundaries (blue arrows). ratio of image pairs on the matching error. Table 2 lists 10
The second row shows mosaic images after the application of pairs of retinal images from the data set and shows matching
the multiband blending. The boundary of the overlapping errors with different overlap ratios. In general, desirable
areas of the image transitions naturally, and the quality of the matching can generally be achieved when the overlap ratio of
mosaic image has been greatly improved. image pairs is high. The number and spatial distribution of
matched features are important factors that affect regis-
tration results. From experiment results, if an image pair
2.4. Evaluation Metrics. The root mean square error (RMSE) exhibits a high overlap ratio but contains several detected
[31] has been used as a standard statistical metric to measure vascular bifurcations on the images or unevenly distributed
the matching performance of our algorithm. Suppose n detected bifurcations, then a high matching error would be
samples are used to calculate matching errors (ei, i � 1, 2, . . ., n). obtained. The average root mean square error of all image
RMSE is calculated for the data set as shown in the following: pairs is 1.82 pixels. To measure the performance of the
􏽲�� matching algorithm more accurately, we plan to compare
1 n 2
RMSE � 􏽘 e . (7) the registered image coordinates with the ground truth
n k�1 k
coordinates on more manually annotated samples.
Following a certain order, the errors ek, Table 3 lists the quantitative matching performances of
ek � Ri K−i 1 pk − Rj K−j 1 qk , Ri, Ki, Rj, Kj, which can be known different features. The numbers of matched points for ORB,
from equation (4), k � 1, 2, . . ., and n can be written into an SIFT, SURF, and our method total 29, 17, 42, and 62, re-
n-dimensional vector, where n denotes the total number of spectively. Although the number of matched points obtained
matching points from two images. For each matching point by SURF is close to that of our matching method, SURF
(pk, qk) ∈ C0, according to equation (7), the RMSE of the spends much more time (0.689 s) than our proposed method
matching points from two images is calculated. A low value (0.249 s). ORB and SIFT consume less time but have fewer
of RMSE indicates desirable matching. matched points than our algorithms. Our method found
The feature matching evaluation also refers to the more correct matches than approaches such as SIFT or
concept of recall rate [13, 32], which is a well-known SURF. Vascular bifurcations obtained by our method are
Before
After
(a) (b) (c)

Figure 7: The mosaicking before and after the application of the image blending method. The first row shows the mosaics before the image
blending, whereas the second row depicts the result after the image blending.
Table 2: Performance of feature matching with pairs of images within the data set.
Test sample Overlap (%) RMSE (pixel)
T01-02 24.9 2.85
T03-04 33.1 1.61
T05-06 36.6 2.63
T07-08 38.4 1.09
T09-10 39.1 1.59
T11-12 52.4 1.35
T13-14 57.3 1.20
T15-16 61.2 2.07
T17-18 61.3 1.66
T19-20 67.3 1.40
Mean value 47.16 1.745
Table 3: Average computation time and the average number of matching points for different algorithms.
Algorithm ORB SIFT Ours SURF
Running times (s) 0.149 0.259 0.249 0.689
Average number of matching points 29 17 62 42
more evenly distributed than features obtained by SIFT as matching method, the average recall rate obtained by our
shown in Figure 5. In the meanwhile, matching points matching method is considerably higher than that of SURF
obtained by our method are more evenly distributed than (0.70 to 0.35). The matching recall rate produced by ORB is
SIFT or SURF according to the matching results. In this case, lower than that of SIFT (0.47 to 0.57). Therefore, the method
having more matches leads to better mosaic. proposed in this paper performs better than other algo-
Figure 8 compares the recall rate of different features for rithms according to the statistics of the aforementioned
image pairs. This figure lists five pairs of retinal images from indicators.
the data sets and shows the quantitative results of the recall
rate. As shown in Figure 8, an optimal recall rate is obtained
using our registration method. Although the number of 3.3. Qualitative Evaluation. Figure 9 shows the mosaic
matching points obtained by SURF is close to that of our image stitched by four methods. a∼d are the results of
0.8
0.6
Recall rate
0.4
0.2
0.0
1 2 3 4 5
Subject index
Ours ORB
SIFT SURF
Figure 8: Recall rate of paired retinal images for different matching algorithms.
(a) (b)
(c) (d)
Figure 9: Results of four different stitching methods: (a) ICE; (b) Photoshop cs6; (c) AutoStitch; (d) proposed method.
ICE, photomerge function in Photoshop cs6, AutoStitch, Figure 9 roughly compares the performance of several
and our method, respectively. Figures 9(a) and 9(b) both stitching software by mosaics. In order to evaluate the
have mismatched vascular structures, as shown in the quality of the mosaic image more accurately, the mosaic
white arrow. The visible boundary between the stitched image is magnified for comparison, as shown in Figures 10
image in Figure 9(c) and the image transition is unnat- and 11. Figures 10 and 11 show two examples of mosaics
ural. In contrast, the mosaic image produced by our produced by our method. In the figures, the left images refer
method possesses the highest similarity with the original to the original images captured from one retina at different
image. angles. The proposed method provides visually appealing
I1 I2 I3 (a) (b) (c)
I4 I5 I6
I7 I8 I9
Figure 10: Comparison of mosaic images. Left: retinal images captured from one retina. (a) Magnified source image from sample data. (b)
Mosaics produced by the proposed method. (c) Stitching result obtained by AutoStitch. Regions in red rectangles are zoomed in to display
details. Three vascular mismatching structures are indicated by white arrows in (c).
I1 I2 (a) (b) (c)
I3 I4
I5 I6
Figure 11: Detailed comparison of mosaic images produced by different methods. (a) Original retinal image. (b) Mosaic image produced by
the proposed method. (c) Stitch result obtained by AutoStitch. Two vascular mismatching structures and one blurred structure are indicated
by white arrows in (c).
mosaic images. Mosaics in Figures 10(b) and 11(b) are con- containing clearly visible vascular structures. However, the
structed from nine and six images, respectively. The retinal vascular structure in the retinal image will often become
vessels in the images are accurately stitched. The mosaic images blurred when the fundus bleeds or becomes tumorous. Thus,
produced by our method and AutoStitch are visually similar. for all previous methods, detecting robust features in such a
However, mismatched vascular structures by AutoStitch are retinal image presents a difficulty. This paper addresses the
found as indicated by the white arrows in Figure 10(c). The problem of robust feature detection based on obscure vas-
corresponding vascular structures of the mosaic image are cular structures of retinal images.
stitched excellently by the proposed method. Our paper employs a CNN model [12], which effectively
Our method can produce mosaic images with less blur, as obtains clear vascular structures, to segment vascular
shown in the zoomed-in regions in Figure 11(b). By contrast, structures of retinal images. Then, bifurcations in vascular
evident blurring around the vessels can be observed in the structure are extracted as features, and different transfor-
mosaic image produced by AutoStitch (Figure 11(c)). No mation estimations are utilized to warp images into a
evident blur is observed around the vessels in the mosaic image uniform coordinate system for mosaic image blending.
produced by our method. The contents of the mosaic image Experimentally, we show the mosaic examples and list a
produced by our method are more consistent with the original summary of numerical results, such as RMSE, recall rate. The
retinal images compared with that produced by AutoStitch. proposed feature detection method is superior to other
methods, such as ORB and SURF. The quality of mosaic
4. Discussion and Conclusion images produced by our method is superior to that of
AutoStitch, photomerge function in Photoshop cs6 and ICE,
We construct mosaics from a series of images of the human which demonstrate uneven vascular mismatching. The
retina. Previous methods were limited to work with images quality of mosaic images produced by our method is
especially valuable for retina change detection. Although [9] Y. Jiang and N. Tan, “Retinal vessel segmentation based on
visually appealing results can be produced with the proposed conditional deep convolutional generative adversarial net-
method, its clinical usefulness should be further confirmed. works,” 2018, https://arxiv.org/abs/1805.04224.
We plan to improve the efficiency of the algorithm to realize [10] M. Z. Alom, M. Hasan, C. Yakopcic, T. M. Taha, and
real-time stitching algorithm. In addition, we plan to im- V. K. Asari, “Recurrent residual convolutional neural network
based on U-net (R2U-net) for medical image segmentation,”
prove the algorithm for precise registration on large images
2018, https://arxiv.org/abs/1802.06955.
with different overlaps. [11] K. Hu, Z. Zhang, X. Niu, Y. Zhang, and C. Cao, “Retinal vessel
segmentation of color fundus images using multiscale con-
Data Availability volutional neural network with an improved cross-entropy
loss function,” Neurocomputing, vol. 309, 2018.
The data used to support the findings of this study are in- [12] K. K. Maninis, J. Pont-Tuset, P. Arbeláez, and L. V. Gool, Deep
cluded within the article and also available from the cor- Retinal Image Understanding, Springer International Pub-
responding author upon request. lishing, Berlin, Germany, 2016.
[13] K. Mikolajczyk and C. Schmid, “A performance evaluation of
local descriptors,” IEEE Transactions on Pattern Analysis and
Conflicts of Interest Machine Intelligence, vol. 27, no. 10, pp. 1615–1630, 2005.
The authors declare that there are no conflicts of interest [14] M. M. Fraz, P. Remagnino, A. Hoppe et al., “Blood vessel
segmentation methodologies in retinal images—a survey,”
regarding the publication of this paper.
Computer Methods and Programs in Biomedicine, vol. 108,
no. 1, pp. 407–433, 2012.
Acknowledgments [15] J. V. B. Soares, J. J. G. Leandro, R. M. Cesar, H. F. Jelinek, and
M. J. Cree, “Retinal vessel segmentation using the 2-D Gabor
This work was supported by the grants from the National wavelet and supervised classification,” IEEE Transactions on
Natural Science Foundation of China (no. 81771916) and Medical Imaging, vol. 25, no. 9, pp. 1214–1222, 2006.
Guangdong Provincial Key Laboratory of Medical Image [16] B. Zhang, L. Zhang, L. Zhang, and F. Karray, “Retinal vessel
Processing (no. 2014B030301042). extraction by matched filter with first-order derivative of
Gaussian,” Computers in Biology and Medicine, vol. 40, no. 4,
pp. 438–445, 2010.
References [17] B. Sheng, P. Li, S. Mo et al., “Retinal vessel segmentation using
[1] Y. Zheng, E. Daniel, A. A. Hunter et al., “Landmark matching minimum spanning superpixel tree detector,” IEEE Trans-
based retinal image alignment by enforcing sparsity in cor- actions on Cybernetics, vol. 49, no. 7, pp. 2707–2719, 2019.
respondence matrix,” Medical Image Analysis, vol. 18, no. 6, [18] M. A. U. Khan, T. M. Khan, D. G. Bailey, and T. A. Soomro,
pp. 903–913, 2014. “A generalized multi-scale line-detection method to boost
[2] A. Can, C. V. Stewart, B. Roysam, and H. L. Tanenbaum, “A retinal vessel segmentation sensitivity,” Pattern Analysis and
feature-based, robust, hierarchical algorithm for registering Applications, vol. 22, no. 3, pp. 1177–1196, 2019.
pairs of images of the curved human retina,” IEEE Trans- [19] J. Staal, M. D. Abramoff, M. Niemeijer, M. A. Viergever, and
actions on Pattern Analysis and Machine Intelligence, vol. 24, B. van Ginneken, “Ridge-based vessel segmentation in color
no. 3, pp. 347–364, 2002. images of the retina,” IEEE Transactions on Medical Imaging,
[3] A. Can, C. Stewart, B. Roysam, and H. Tanenbaum, “A fea- vol. 23, no. 4, pp. 501–509, 2004.
ture-based technique for joint, linear estimation of high-order [20] A. Hoover, V. Kouznetsova, and M. Goldbaum, “Locating
image-to-mosaic transformations: application to mosaicing blood vessels in retinal images by piece-wise threshold
the curved human retina,” IEEE Computer Vision and Pattern probing of a matched filter response,” IEEE Transactions on
Recognition, vol. 2, no. 3, pp. 585–591, 2000. Medical Imaging, vol. 19, no. 3, pp. 203–210, 2000.
[4] A. Can, C. V. Stewart, B. Roysam, and H. L. Tanenbaum, “A [21] T. Y. Zhang and C. Y. Suen, “A fast parallel algorithm for
feature-based technique for joint, linear estimation of high- thinning digital patterns,” Communications of the ACM,
order image-to-mosaic transformations: mosaicing the vol. 27, no. 3, pp. 236–239, 1984.
curved human retina,” IEEE Transactions on Pattern Analysis [22] R. C. Gonzalez and R. E. Woods, Digital Image Processing,
and Machine Intelligence, vol. 24, no. 3, pp. 412–419, 2002. Pearson Education, Bangalore, India, 3rd edition, 2009.
[5] N. Ryan, C. Heneghan, and P. de Chazal, “Registration of [23] D. G. Lowe, “Distinctive image features from scale-invariant
digital retinal images using landmark correspondence by keypoints,” International Journal of Computer Vision, vol. 60,
expectation maximization,” Image and Vision Computing, no. 2, pp. 91–110, 2004.
vol. 22, no. 11, pp. 883–898, 2004. [24] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up
[6] O.-S. Kwon and Y.-H. Ha, “Panoramic video using scale- robust features (SURF),” Computer Vision and Image Un-
invariant feature transform with embedded color-invariant derstanding, vol. 110, no. 3, pp. 346–359, 2008.
values,” IEEE Transactions on Consumer Electronics, vol. 56, [25] T. Lindeberg, “Scale invariant feature transform,” Scholar-
no. 2, pp. 792–798, 2010. pedia, vol. 7, no. 5, p. 10491, 2012.
[7] M. Brown and D. G. Lowe, “Automatic panoramic image [26] M. A. Fischler and R. C. Bolles, “Random sample consensus: a
stitching using invariant features,” International Journal of paradigm for model fitting with applications to image analysis
Computer Vision, vol. 74, no. 1, pp. 59–73, 2007. and automated cartography,” Communications of the ACM,
[8] M. Alomran and D. Chai, “Feature-based panoramic image vol. 24, no. 6, pp. 381–395, 1981.
stitching,” in Proceedings of the 2016 14th International [27] H.-Y. Shum and R. Szeliski, “Construction of panoramic
Conference on Control, Automation, Robotics and Vision, image mosaics with global and local alignment,” Panoramic
pp. 1–6, Phuket, Thailand, November 2017. Vision, Springer, Berlin, Germany, pp. 227–268, 2001.
[28] H. B. N. K. Madsen and O. Tingleff, Methods for Non-Linear

Least Squares Problems, Technical University of Denmark,
Kongens Lyngby, Denmark, 2nd edition, 2004.
[29] P. C. Cattin, H. Bay, L. Van Gool, and G. Székely, “Retina
mosaicing using local features,” in Proceedings of the Inter-
national Conference on Medical Image Computing & Com-
puter-assisted Intervention, pp. 185–192, Copenhagen,
Denmark, October 2006.
[30] P. J. Burt and E. H. Adelson, “A multiresolution spline with
application to image mosaics,” ACM Transactions on
Graphics, vol. 2, no. 4, pp. 217–236, 1983.
[31] T. Chai and R. R. Draxler, “Root mean square error (RMSE) or
mean absolute error (MAE)?—Arguments against avoiding
RMSE in the literature,” Geoscientific Model Development,
vol. 7, no. 3, pp. 1247–1250, 2014.
[32] K. X. Deng, “Retinal image registration based on hyper-edge
graph matching,” Qinghua Daxue Journal of Tsinghua Uni-
versity, vol. 54, no. 5, pp. 568–574, 2014.
International Journal of
Rotating Advances in
Machinery Multimedia
The Scientific
Engineering
Journal of
Journal of
Hindawi
World Journal
Hindawi Publishing Corporation Hindawi
Sensors
Hindawi Hindawi
www.hindawi.com Volume 2018 http://www.hindawi.com
www.hindawi.com Volume 2018
2013 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018
Journal of
Control Science
and Engineering
Advances in
Civil Engineering
Hindawi Hindawi
www.hindawi.com Volume 2018 www.hindawi.com Volume 2018
Submit your manuscripts at

www.hindawi.com
Journal of
Journal of Electrical and Computer
Robotics
Hindawi
Engineering
Hindawi
www.hindawi.com Volume 2018 www.hindawi.com Volume 2018
VLSI Design
Advances in
OptoElectronics
Modelling &
Simulation
Aerospace
Hindawi Volume 2018
Navigation and
Observation
Hindawi
in Engineering
Hindawi
Engineering
Hindawi
Hindawi
www.hindawi.com www.hindawi.com Volume 2018
International Journal of Antennas and Active and Passive Advances in
Chemical Engineering Propagation Electronic Components Shock and Vibration Acoustics and Vibration
Hindawi Hindawi Hindawi Hindawi Hindawi
www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

Research Article: Retinal Mosaicking With Vascular Bifurcations Detected On Vessel Mask by A Convolutional Network

Uploaded by

Copyright:

Available Formats

Research Article: Retinal Mosaicking With Vascular Bifurcations Detected On Vessel Mask by A Convolutional Network

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research Article: Retinal Mosaicking With Vascular Bifurcations Detected On Vessel Mask by A Convolutional Network

Uploaded by

Copyright:

Available Formats

Hindawi

Journal of Healthcare Engineering

Xiuxia Feng ,1,2 Guangwei Cai ,1 Xiaofang Gou ,1 Zhaoqiang Yun ,1

Academic Editor: Niccolò Camarlinghi

1. Introduction Numerous retinal image registration and stitching

Multiple retinal images Vascular structures Vascular centerlines Bifurcation points

(a) (b) (c)

(1) (2) (3) (4) (5) (6) (7)

(8) (9) (10) (11) (12) (13) (14)

Each image has a camera intrinsic parameter matrix K

(a) (b) (c)

I1 I2 I3 (a) (b) (c)

I1 I2 (a) (b) (c)

[28] H. B. N. K. Madsen and O. Tingleﬀ, Methods for Non-Linear

Submit your manuscripts at

You might also like