Deep Residual Learning For Compressed Sensing Mri

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

DEEP RESIDUAL LEARNING FOR COMPRESSED SENSING MRI

Dongwook Lee, Jaejun Yoo and Jong Chul Ye

Bio Imaging and Signal Processing Lab., Dep. of Bio and Brain Engineering, KAIST

ABSTRACT Rademacher complexity [4]) has been attributed to its suc-


cess [5, 6].
Compressed sensing (CS) enables significant reduction of
Wang et al [7] was the first to apply deep learning to CS-
MR acquisition time with performance guarantee. However,
MRI. They trained the deep neural network from the down-
computational complexity of CS is usually expensive. To ad-
sampled reconstruction images to learn fully sampled recon-
dress this, here we propose a novel deep residual learning
struction. Then, they used the deep learning output either as
algorithm to reconstruct MR images from sparsely sampled
an initialization or a regularization term in classical CS ap-
k-space data. In particular, based on the observation that co-
proaches. Deep network architecture using unfolded iterative
herent aliasing artifacts from downsampled data has topolog-
CS algorithm was also proposed [8]. Rather than using hand-
ically simpler structure than the original image data, we for-
crafted regularizers, the authors in [8] tried to learn a set of
mulate a CS problem as a residual regression problem and
optimal regularzers using a reaction diffusion model.
propose a deep convolutional neural network (CNN) to learn
the aliasing artifacts. Experimental results using single chan- Unlike these existing deep learning researches for CS-
nel and multi channel MR data demonstrate that the proposed MRI, this paper is mainly interested in deep residual learn-
deep residual learning outperforms the existing CS and paral- ing [9]. As shown in Fig. 1, the main idea is to learn alias-
lel imaging algorithms. Moreover, the computational time is ing artifacts rather than the aliasing-free fully sampled recon-
faster in several orders of magnitude. struction. Once the aliasing artfacts are estimated, aliasing-
free image is then obtained by subtracting the estimated alias-
Index Terms— Compressed sensing MRI, deep learning, ing artifact. This network architecture is proposed based on
residual learning, CNN our conjecture that aliasing artifacts from uniformly under-
sampled patterns may have simpler topological structure such
1. INTRODUCTION
that learning residual is easier than learning the original alias-
In MR acquisition, efficient acceleration scheme is impor- ing free images. Using the persistent homology analysis from
tant. Nowadays, parallel MR imaging and compressed sens- computational topology [10], we show that this conjecture
ing MRI (CS-MRI) are the two most important tools in ac- is true. Accordingly, we investigate several architectures of
celerating the speed of MR acquisition. Generalized Autocal- residual learning and show that deconvolution network with
ibrating Partially Parallal Acquisitions (GRAPPA) is a rep- contracting path - which is often called U-net structure [11]
resentative parallel technique that interpolates the missing k- for image segmentation - was the most effective in estimating
space data by exploiting the diversity of coil sensitivity maps. the aliasing artifacts. Experimental results using single and
CS-MRI reconstructs high resolution image from randomly multi channel CS-MRI show significant improvement in per-
sampled k-space data by exploiting the sparsity of the data formance over the existing state-of-the-art MR reconstruction
in transformed domain. CS algorithms are commonly formu- algorithms.
lated as a penalized inverse problems that minimize the trade- 2. THEORY
off between data fidelity term in k-space and sparsity penalty
in the image domain. In recent annihilating filter-based low- 2.1. Generalization bound for residual learning
rank Hankel matrix approach (ALOHA) [1], CS-MRI and
As shown in Fig. 1, based on an input X ∈ X and a label
parallel MRI can be unified as k-space interpolation [1]. One
Y ∈ Y generated by a distribution D, we are interested in
of the limitations of these CS algorithms is, however, that the
estimating a regression function f : X → Y in a functional
computational complexity is usually high.
space F that minimizes the risk L(f ) = ED kY − f (X)k2 .
Recently, deep learning using CNNs have achieved
A major technical issue is, however, that the associated prob-
tremendous success in classification problems [2] as well
ability distribution D is unknown. Moreover, we only have
as regression problems [3]. The exponential expressivity
access to a finite sequence of independent, identically dis-
under a given network complexity (e.g. VC-dimension or
tributed training data
Grant sponsor: Korea Science and Engineering Foundation, Grant num-
ber NRF-2016R1A2B3008104. S = {(X1 , Y1 ), · · · , (Xn , Yn )}

978-1-5090-1172-8/17/$31.00 ©2017 IEEE 15


Fig. 1. Residual learning of aliasing artifacts.
Pn
such that only an empirical risk L̂n (f ) = n1 i=1 kYi −
f (Xi )k2 is available. Direct minimization of empirical risk
is, however, problematic due to the potential of overfitting.
To address this issues, statistical learning theory has been
developed to bound the risk of a learning algorithm in terms
of complexity measures (eg. VC dimension and shatter coef- Fig. 2. (a) Point cloud data K of true space Y and its configu-
ficient) and the empirical risk. Rademacher complexity [4] is ration over -distance filtration. (b) Zero and one dimensional
one of the most modern notions of complexity that is distri- barcodes of the original (blue) and residual (red) of one chan-
bution dependent and defined for any class real-valued func- nel MR data. Similar results were shown in four channel MRI
tions. Specifically, with probability ≥ 1−δ, for every function data.
f ∈ F,
can infer the topology of the data by varying the distance
r
ln(2/δ)
L(f ) ≤ L̂n (f ) + 2R̂n (F) +3 (1) measure . As allowable distance  increases, point clouds
| {z } | {z } n
empirical risk complexity penalty merge together and finally become a single cluster. During
this filtration process [10], Betti numbers (βm ) are calculated
where the empirical Rademacher complexity R̂n (F) is de- which are the numbers of m-dimensional holes of a manifold.
fined to be Specifically, β0 and β1 are the number of connected compo-
nents and cycles, respectively. Therefore, the point clouds
 !
X 1X n
R̂n (F) = Eσ  σi f (Xi )  , with high diversity will merge slowly, which can be reflected
n i=1 as a slow decrease in Betti numbers. This trend can be il-
f ∈F
lustrated using so-called barcode of Betti numbers [10], and
where σ1 , · · · , σn are independent random variable uniformly persistent homology investigates the topology of a manifold
chosen from {−1, 1}. Therefore, to reduce the risk, we need using barcodes.
to minimize both the empirical risk (i.e. data fidelity) and the To compare the topological structure of residual (alias-
complexity terms in (1) simultaneously. In neural network, ing artifact) and the original image space, we calculated Betti
the value of empirical risk is determined by the representa- numbers using a toolbox called JAVAPLEX (http://applied
tion power of network [5], whereas the complexity term is topology.github.io/javaplex) to calculate Betti numbers. To
determined by the structure of a network [4]. generate a point cloud, each label image (either aliasing ar-
tifact image or aliasing-free MR image from fully sampled
2.2. Topological structure of residual manifold data) is regarded as a point in a high dimensional space, and
There have been several studies explaining the benefit of then we calculated Euclidean distance between each point and
depth in neural networks [5, 6]. In deep networks, the rep- normalized the value by the maximum distance. The topo-
resentation power grows exponentially with respect to the logical complexity of the original and residual image spaces
number of layers while it grows at most polynomially in were compared by the change of Betti numbers in Fig. 2(b).
shallow ones [5]. Therefore, with the same number of re- Indeed, β0 and β1 of residual image manifold decreased faster
sources, theoretical results supports that deep architecture is to a single cluster, which informs that the residual image man-
prefered to shallow one, and it increases the performance of ifold has a simpler topology than the original one. In experi-
the network by reducing the empirical risk in (1). Thus, if the mental results in Section 5, we confirmed that the prediction
manifold of label Y is simple enough to meet the representa- by the Betti number fully reflects reconstruction performance.
tion power of a given network architecture, then the empirical
3. RESIDUAL LEARNING ARCHITECTURE
risk as well as the risk upper bound can be reduced.
The complexity of a manifold is a topological concept, so To construct a residual learning architecture, we utilize
it should be analyzed using topological tools. Here, we em- the convolution, batch normalization, rectified linear unit
ployed the recent computational topology tool called persis- (ReLU), and contracting path connection with concatena-
tent homology [10]. Specifically, as shown in Fig.2(a), we tion [9, 11]. Specifically, let xl denote the l-th layer input and

16
4.2. Network training
The original k-spaces were retrospectively downsampled by 4
times with 13 ACS (autocalibration signal, 5 percents of total
PE) lines in the k-space center. As shown in Fig. 1, the resid-
ual is constructed as the difference between the reconstruc-
tion images from fully sampled data and the down-sampled
data. During the training, residual images were used as labels
(Y ) whereas the aliased images from down-sample data were
used as input (X). Since MR images are complex valued and
standard CNNs are real-valued, we trained the two residual
networks: one for the magnitude and the other for the phase
of the images. Both networks have the same residual learning
structure (however, due to the page limit, we only show the
magnitude results).
Fig. 3. (a) Single scale residual learning with a modified The network was implemented using MatConvNet tool-
deconvolution network framework[12] with symmetric con- box(ver.20, http://www.vlfeat.org/matconvnet/) in MATLAB
tracting path. (b) Multi scale residual learning with U-net 2015a environment (Mathworks, Natick). We used a GTX
architecture. 1080 graphic processor and i7-4770 CPU (3.40GHz). The
weights of convolutional layers were initialized by Gaussian
wl , bl represent weights and bias of l-th convolution layer, random distribution with Xavier method to achieve proper
respectively. Then, the l-th layer of the network performs fol- scale. This helped to prevent the signal from exploding or
lowing operation, repeatively, f l (xl ) := σ(BN (wl ∗xl +bl )), vanishing in the early stage of learning. The stochastic gradi-
where σ(·) is the ReLu function, and BN is a batch normal- ent descent (SGD) method with momentum was used to train
ization. In contracting path, we concatenate xl along channel the weights of the network and minimized the loss function.
dimension [11]. It took about 9 hours for training the network.
Fig. 3 illustrates the several network configuration we To verify the performance of the network, for multi-
have investigated for residual learning. Fig. 3(a) is single channel dataset, we compared the reconstruction results with
scale residual learning with a modified deconvolution net- those of GRAPPA. We also compared the ALOHA [1] recon-
work framework[12]. Fig. 3(b) is a multi-scale residual struction as the state-of-the-art CS algorithm for both single
learning with additional pooling and unpooling(conv trans- and 4-channel reconstructions.
pose) layers on top of Fig. 3(a). To make the number of 5. RESULTS
network features similar to Fig. 3(a), the number of channels
In single channel experiment with x4 acceleration (Fig. 4(b)),
are doubled after the pooling layer. This architecture is of-
there exists significant amount of aliasing artifacts from the
ten called the U-net [11]. In the following, we investigate
zero-filled reconstruction. Moreover, due to coherent aliasing
the performance of each network configuration for residual
artifacts from uniform downsampling, most of the existing
learning.
CS algorithm failed and only ALOHA was somewhat suc-
cessful with slightly remaining aliasing artifacts. However,
4. MATERIALS AND METHODS
the residual learning results clearly showed very accurate re-
4.1. MR dataset construction visually and quantitatively by removing the co-
herent aliasing artifacts. In four channel parallel imaging ex-
We used brain MR image dataset, which consists of total 81 periments in Fig. 4(c), GRAPPA shows the strong aliasing ar-
axial brain images from 9 subjects. The data were acquired in tifacts due to the insufficient number of coils and ACS lines.
cartesian coordinate with a 3T MR scanner that has four Rx ALOHA reconstruction could remove most of the aliasing ar-
coils (Siemens, Verio). The following parameters were used tifacts, but the results were not perfect due to the coherent
for SE and GRE scans: TR 3000-4000ms, TE 4-20ms, slice sampling. However, the proposed method provided near per-
thickness 5mm, 256× 256 acquisition matrix, 4 coils, FOV fect reconstruction.
240× 240, FA 90 degrees. For the dataset, we split the train- In Fig. 5(a)(b), the convergent plots for a test data set from
ing and test data set by randomly choosing about 80% of total single and four channel reconstruction are illustrated. Among
images for training and about 20% for testing. For data aug- the various residual learning architectures, the multi-scale
mentation, we generated 32 times more training samples by residual learning (Fig. 3(b)) provided the best reconstruction
rotating, shearing and flipping the images. For single channel results. Moreover, residual learning was significantly better
experiments, we chose 1 channel data from the four channel than direct image learning with the same U-net architecture
data. as shown in Fig. 5.

17
only multi channel data but also single channel data. Even
with severe coherent aliasing artifacts, the proposed residual
learning successfully learns the aliasing artifacts, whereas the
existing parallel and CS reconstruction fails. Moreover, com-
pared to existing algorithms which need heavy computational
cost, the proposed network produces the results instantly.

7. REFERENCES

[1] Kyong Hwan Jin, Dongwook Lee, and Jong Chul Ye, “A gen-
eral framework for compressed sensing and parallel MRI using
annihilating filter based low-rank hankel matrix,” IEEE Trans.
on Computational Imaging, (in press), 2016.
[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, “Im-
agenet classification with deep convolutional neural networks,”
in Advances in Neural Information Processing Systems, 2012,
pp. 1097–1105.
[3] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and
Lei Zhang, “Beyond a Gaussian denoiser: Residual learn-
ing of deep CNN for image denoising,” arXiv preprint
arXiv:1608.03981, 2016.
[4] Peter L Bartlett and Shahar Mendelson, “Rademacher and
Fig. 4. (a) Original image: (top) single channel, (bottom) 4- Gaussian complexities: Risk bounds and structural results,”
channel. (b) Single channel reconstruction results at x4 accel- Journal of Machine Learning Research, vol. 3, no. Nov, pp.
eration. (c) 4 channel reconstruction at x4 acceleration. The 463–482, 2002.
resulting normalized mean square error(NMSE) is displayed [5] Matus Telgarsky, “Benefits of depth in neural networks,” arXiv
at the bottom of each figure. preprint arXiv:1602.04485, 2016.
[6] Monica Bianchini and Franco Scarselli, “On the complexity
of neural network classifiers: A comparison between shallow
and deep architectures,” IEEE Trans. on Neural Networks and
Learning Systems, vol. 25, no. 8, pp. 1553–1565, 2014.
[7] Shanshan Wang, Zhenghang Su, Leslie Ying, Xi Peng, Shun
Zhu, Feng Liang, Dagan Feng, and Dong Liang, “Accelerating
magnetic resonance imaging via deep learning,” in 2016 IEEE
13th International Symposium on Biomedical Imaging (ISBI).
IEEE, 2016, pp. 514–517.
[8] K Hammernik, F Knoll, D Sodickson, and T Pock, “Learning
a variational model for compressed sensing MRI reconstruc-
Fig. 5. NMSE convergence graph for test data. (a) single tion,” in Proceedings of the International Society of Magnetic
channel reconstruction, (b) four channel reconstruction. Resonance in Medicine (ISMRM), 2016.
[9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,
The reconstruction time of GRAPPA was about 30 sec- “Deep residual learning for image recognition,” arXiv preprint
onds for multi channel image under the aforementioned hard- arXiv:1512.03385, 2015.
ware setting. The reconstruction time for ALOHA was about
[10] Herbert Edelsbrunner and John Harer, “Persistent homology-a
10 min for four channel and about 2 min for single channel survey,” Contemporary Mathematics, vol. 453, pp. 257–282,
data. The proposed network only took less than 41 ms for 2008.
multi channel image and about 30 ms for single channel im-
[11] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net:
age.
Convolutional networks for biomedical image segmentation,”
in International Conference on Medical Image Computing and
6. DISCUSSIONS AND CONCLUSIONS Computer-Assisted Intervention. Springer, 2015, pp. 234–241.
In this article, we have presented an accelerated MRI re- [12] Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han, “Learn-
construction method from uniform downsampled MR brain ing deconvolution network for semantic segmentation,” in Pro-
data using residual learning. By learning aliasing artifacts ceedings of the IEEE International Conference on Computer
that have simpler topology, the resulting risk of the proposed Vision, 2015, pp. 1520–1528.
residual learning can be reduced so that we can obtain more
accurate MR images. The proposed method works on not

18

You might also like