Unfolding a blurred image
A. N. Rajagopalan1
Kuldeep Purohit1
Anshul Shah2 *
1
2
Indian Institute of Technology Madras, India
University of Maryland, College Park
arXiv:2201.12010v1 [cs.CV] 28 Jan 2022
[email protected],
[email protected],
[email protected]
Abstract
the exposure time. In [42], it has been shown that standard
network models used for vision tasks and trained only on
high-quality images suffer a significant degradation in performance when applied to images degraded by blur.
We present a solution for the goal of extracting a video
from a single motion blurred image to sequentially reconstruct the clear views of a scene as beheld by the camera during the time of exposure. We first learn motion
representation from sharp videos in an unsupervised manner through training of a convolutional recurrent video autoencoder network that performs a surrogate task of video
reconstruction. Once trained, it is employed for guided
training of a motion encoder for blurred images. This
network extracts embedded motion information from the
blurred image to generate a sharp video in conjunction with
the trained recurrent video decoder. As an intermediate
step, we also design an efficient architecture that enables
real-time single image deblurring and outperforms competing methods across all factors: accuracy, speed, and
compactness. Experiments on real scenes and standard
datasets demonstrate the superiority of our framework over
the state-of-the-art and its ability to generate a plausible
sequence of temporally consistent sharp frames.
Motion deblurring is a challenging problem in computer
vision due to its ill-posed nature. Recent years have witnessed significant advances in deblurring [44, 25, 23, 2, 27,
28, 45, 26, 33, 32, 21, 44, 22, 20, 43, 31, 30, 17, 17, 29, 18].
Several methods [39, 24, 4, 37, 3, 9, 12, 13, 39] have
been proposed to address this problem using hand-designed
priors as well as Convolutional Neural Networks (CNN)
[1, 35, 36] for recovering the latent image. A few methods [40, 6] have been proposed to remove heterogeneous
blur but they are limited in their capability to handle general
dynamic scenes. Most of these methods strongly rely on
the accuracy of the assumed image degradation model and
include intensive, sometimes heuristic, parameter-tuning
and expensive computations, factors which severely restrict
their accuracy and applicability in real-world scenarios. The
recent works of [19, 21, 14, 41] overcome these limitations
to some extent by learning to directly generate the latent
sharp image, without the need for blur kernel estimation.
We present a two-stage deep convolutional architecture
to carve out a video from a motion blurred image that is
applicable to non-uniform motion caused by individual or
combined effects of camera motion, object motion and arbitrary depth variations in the scene. We avoid overly simplified models to represent motion and hence refrain from
creating synthetic datasets for supervised training. The first
stage consists of training a video auto-encoder wherein the
encoder accepts a sequence of video frames to extract a latent motion representation while the decoder estimates the
same video by applying estimated motion trajectories to
a single sharp frame in a recurrent fashion. We use this
trained video decoder to guide the training of a CNN (which
we refer to as Blurred Image Encoder (BIE)) to extract the
same motion information from a blurred image as the video
encoder would from the image sequence corresponding to
that blurred image. For testing, we propose an efficient deblurring network to first estimate a sharp frame from the
given blurred image. The BIE is responsible for extracting
1. Introduction
Recent works on future frame prediction reveal that direct intensity estimation leads to blurred predictions. Instead, if a frame is reconstructed based on the original image and corresponding transformations, both scene dynamics and invariant appearance can be preserved well. Based
on this premise, [5, 51] and [16] model the task as a flow
of image pixels. The methods [46, 48] generate a video
from a single sharp image, but have a severe limitation in
that they work only on the specific scene for which they are
trained. All of these approaches work only on sharp images
and videos. However, motion during exposure is known
to cause severe degradation in the captured image quality
due to the blur it induces. This is usually the case in lowlight situations where the exposure time of each frame is
high and in scenes where significant motion happens within
* Work
done while at Indian Institute of Technology Madras, India.
1
motion features from the blurred image. The video decoder
uses the outputs of the BIE and the deblurred sharp frame
to generate the video underlying the motion blurred image.
As the only other work of this kind, [8] very recently
proposed a method to estimate a video from a single blurred
image by training multiple neural networks to estimate the
underlying frames. In contrast, our architecture utilizes a
single recurrent neural network to generate the entire sequence. Our recurrent design implicitly addresses temporal
ambiguity to a large extent, since generation of any frame in
the sequence is naturally preconditioned on all the previous
frames. The approach of [8] is limited to small motion, owing to its architecture and training procedure. We estimate
pixel level motion instead of intensities which proves to be
an advantage for the task at hand, especially in cases with
large blur (which is an issue with [8]). Our deblurirng architecture not only outperforms all existing deblurring methods
but is also smaller and significantly faster. In fact, separating the processes of content and motion estimation allows
our architecture to be used with any off-the-shelf deblurring
approach.
the encoding returned by RVE for training due to lack of
ground truth for the encoded representation. Instead, the
BIE is trained such that the predicted video at the output of
RVD for the given xB matches as closely as possible to the
ground truth frames x1..N . This ensures that the BIE learns
to capture ordered motion information for the RVD to return
a realistic video. Directly training the BIE-RVD pair poses
a challenge since it requires learning to perform two tasks
jointly: “video generation from motion representation” and
“ambiguity-invariant motion extraction from a blurred image”. Such training delivers below-par performance (see
supplementary material).
The overall architecture of the proposed methodology is
given in Fig. 1. It is fully convolutional, end-to-end differentiable and can be trained using unlabeled high frame-rate
videos, without the need for optical flow supervision, which
is challenging to produce at large scale. During testing, the
central sharp frame is not available and is estimated using
an independently trained deblurring module (DM). We now
describe the design aspects of the different modules.
2. The Proposed Architecture
At each time-step, a frame is fed to a convolutional encoder, which generates a feature-map to be fed as input
to the ConvLSTM cell. Interpreting ConvLSTM’s hiddenstates as a representation of motion, the kernel-size of a
ConvLSTM is correlated with the speed of the motion
which it can capture. Since we need to extract motion taking
place within a single exposure at fine resolution, we choose
a kernel-size of 3 × 3. As can be seen in Fig. 2(a), the encoder block is made of 4 convolutional blocks with 3 × 3
filters. The first block is a conv layer with stride of 1 and
the rest contain a conv layer with stride of 2, followed by
a Resblock. The number of feature maps in the outputs of
these blocks are 16, 32, 64 and 128, respectively. A ConvLSTM cell operates on the features returned by the last block
and augments it with memory from previous time-steps.
Overall, each module can be represented as henc
=
n
enc
enc(henc
is encoder ConvLSTM state at
n−1 , xn ), where hn
time step n and xn is the nth sharp frame of the video.
In our proposed video autoencoder, the encoder utilizes all the video frames to extract a latent representation,
which is then fed to decoder which estimates the frame sequence in a recurrent fashion. The Recurrent Video Encoder (RVE) reads N sharp frames x1..N , one at each timestep. It returns a tensor at the last time-step, which is utilized as the motion representation of the image sequence.
This tensor is used to initialize the first hidden state of another ConvLSTM based network called Recurrent Video
Decoder (RVD) whose task is to recurrently estimate N optical flows. Since the RVE-RVD pair is trained using reconstruction loss between the estimated frames x̂1..N and
ground-truth frames x1..N , the RVD must return the predicted video. To enable this, the (known) central frame of
the video is acted upon by the flows predicted by the RVD.
Specifically, the estimated flows are individually fed to a
differentiable transformation layer to transform the central
frame xb N c to obtain the frames x̂1..N . Once trained, we
2
have an RVD which can estimate sequential motion flows,
given a particular motion representation.
In addition, we introduce another network called Blurred
Image Encoder (BIE) whose task is to accept blurred image
xB corresponding to the spatio-temporal average of the input frames x1..N and return a motion encoding, which too
can be used to generate a sharp video. To achieve this task,
we employ the already trained RVD to guide the training
of BIE so as to extract the same motion information from
the blurred image as the RVE would from that image sequence. In other words, the weights are to be learnt such
that BIE(xB ) ≈ RV E(x1..N ). We refrain from using
2.1. Recurrent Video Encoder (RVE)
2.2. Recurrent Video Decoder (RVD)
The task of RVD is to construct a sequence of frames
using the motion representation provided by RVE and the
(known) central frame (xb N c ) of the sequence. The RVD
2
contains a flow encoder which utilizes a structure similar
to the RVE. Instead of accepting images, it accepts optical
flows. The flow encoding is fed to a ConvLSTM cell whose
first hidden state is initialized with the last hidden state
he,N of the RVE. To estimate optical flows for a time-step,
the output of the ConvLSTM cell is passed to a Flow
decoder network (FD ). The flow estimated by FD at each
time-step is fed to a transformer module (T ) which returns
RVE : Recurrent Video Encoder
Transformation
Layer
Recurrent Video
Decoder Cell
Recurrent Video
Encoder Cell
BIE:
Blurred
Image
Encoder
Motion Embedding
RVD : Recurrent Video Decoder
A3
A4
A1
A2
Figure 1. An overview of our video generation architecture during training. The first step involves training the RVE-RVD for the task of
video reconstruction. This is followed by guided training of BIE through the trained RVD.
Conv
LSTM
A1
3x3x16
stride 1
A2
3x3x32
stride 2
A3
3x3x64
stride 2
A4 3x3x128 stride 2
(a) RVE architecture.
(b) BIE architecture.
Figure 2. Architectures of BIE and RVE. The RVE is trained to
extract a motion representation from a sequence of frames while
the BIE is trained to extract a motion representation from a blurred
image and a sharp image.
the estimated frame x̂n . The descriptions of FD and T are
provided below.
Flow Decoder (FD ): Realizing that the flow at current
step is related to the previous one, we perform recurrence
on optical flows for consecutive frames. The design of
FD is illustrated in Fig. 3. FD accepts the output of
ConvLSTM unit at any time-step and generates a flow-map.
For robust estimation, we further perform estimation of
flow at multiple scales using deconvolution (deconv) layers
which “unpool” the feature maps and increase the spatial
dimensions by a factor of 2. Inspired by [34], we make
use of skip connections between the layers of flow encoder
and FD . All deconv operations use 4 × 4 filters and the
convolutional operations use 3 × 3 filters. The output of
the ConvLSTM cell is passed through a convolutional layer
to estimate the flow fn,1 . The cell output is also passed
through a deconv layer before being concatenated with the
upsampled fn,1 and the corresponding feature-map coming
from the encoder, to obtain a hybrid feature map at that
scale. As shown in Fig. 3, this process is repeated 3 more
times to obtain the flow maps at subsequently higher scales
(fn,2...4 ).
Figure 3. Our Recurrent Video Decoder (RVD). This module recurrently generates optical flows which are warped to transform
the sharp frame. Flows are estimated at 4 different scales.
2.3. Blurred Image Encoder (BIE)
We make use of the trained encoder-decoder couplet to
solve the task of extracting video from a blurred image. We
advocate a novel strategy of utilizing spatio-temporal embeddings to guide the training of a CNN. The trained decoder has learnt to generate optical flow for all time-steps
from the encoder’s hidden state. We make use of this proxy
network to solve the task of blurred image to video generation.
The use of optical flow recurrence enables our network
to prefer temporally consistent sequences, which preempts
it from returning arbitrarily ordered frames. However, directional ambiguity stays. For a scene with multiple objects,
the ambiguity becomes more pronounced as each object can
have its own independent motion. The BIE is connected
with the pre-trained RVD and the pair is trained (RVD is
fine-tuned) using a combination of ordering-invariant frame
reconstruction loss and spatial motion smoothness loss over
Method
Input image
Deblurred image
[49] [47] [40]
24.5
0.851
1500
54.1
PSNR(dB) 21 24.6
SSIM 0.740 0.845
Time (s) 3800 700
Size(MB)
-
[6]
26.4
0.863
1200
41.2
[19]
28.9
0.911
6
300
[14]
27.2
0.905
0.8
45.6
[41]
30.10
0.933
0.4
27.5
Ours
30.58
0.941
0.02
17.9
Table 1. Performance comparison of our deblurring network with
existing methods on the benchmark dataset [19].
Convolution Layer
Stride 1
Convolution Layer
Stride 2
Residual Dense Block
Deconvolution Layer
Bottleneck Block
Figure 4. An overview of our dense deblurring architecture which
we utilize to estimate the central sharp frame. It follows an
encoder-decoder design with residual-dense blocks, bottleneck
blocks, and skip connections present at 3 different sub-scales.
the RVD outputs (described later). No such ambiguity exists in the video autoencoder since the RVD has to exactly
reproduce the video which is fed to RVE.
The BIE is implemented as a CNN which specializes in
extracting motion features from a blurred image (we experimentally found that feeding the central sharp frame along
with the blurred image improves its performance). The BIE
is tasked to extract the sequential motion in the image by
capturing local motion, e.g. at the smeared edges in the
image. Moreover, the generated encoding should be such
that the RVD can reconstruct motion trajectories. The BIE
has 7 convolutional layers with kernel sizes as shown in
Fig. 2(b). Each layer (except the last) is followed by batchnormalization and leaky ReLU non-linearity.
2.4. Deblurring Module (DM)
We propose an independent network for deblurring the
motion blurred observation. The estimated sharp frame is
fed to both BIE and RVD during testing.
Recent works on image restoration have proposed endto-end trainable networks which require labeled pairs of
degraded and sharp images. Among them, [19, 41] have
achieved promising results using multi-scale CNN composed of residual connections. We explore a more effective
network architecture which is inspired by prior methods that
use multi-level and multi-scale features. Our high-level design is similar to that of U-Net [34], which has been used
extensively for preserving global context information in various image-to-image tasks [7]. Based on the observation
that increase in number of layers and connections across
them leads to a boost in feature extraction capability, the
encoder structure of our network utilizes a cascade of Residual Dense Blocks (RDB) [50] instead of convolutional layers. An RDB is a cascade of convolutional layers connected
through a rich set of residual and concatenation connections
which immensely improves feature extraction capability by
reusing features across multiple layers. Inclusion of such
connections maximizes information flow along the intermediate layers and results in better convergence. These units
efficiently learn deeper and more complex features than a
network with residual connections (which have been used
extensively in recent deblurring methods[19, 14, 41, 8]),
while requiring fewer parameters.
Our proposed deblurring architecture is depicted in Fig.
4. The decoder part of our network contains 3 pairs of upsampling blocks to gradually enlarge the spatial resolution
of feature maps. Each up-sampling block contains a bottleneck layer [10] followed by a deconvolution layer. Each
convolution layer (except the last) is followed by a nonlinearity. Similar to U-Net, features corresponding to the
same dimension in encoder and decoder are merged with
the help of projection layers. The output of the final upsampling block is passed through two additional convolutional layers to reconstruct the output sharp image. Our
network uses an asymmetric encoder-decoder architecture,
where the network capacity becomes higher benefiting from
the dense connections.
3. Experiments
In this section, we carry out quantitative and qualitative
comparisons of our approach with state-of-the-art methods
for deblurring as well as video extraction tasks.
3.1. Results and Comparisons for Video Extraction
In Fig 6, we give results on standard test blurred images
from the dataset of [19]. Note that some of them suffer
from significant blur. Fig. 6(a) shows an image of a planar scene which is blurred due to dominant camera motion.
Fig. 6(b) shows a 3D scene blurred due to camera motion.
Figs. 6(c-f) show results on blurred images with dynamic
object motion. Observe that the videos generated by our
approach are realistic and qualitatively consistent with the
blur and depth of the scene, even when the foreground incurs large motion. Our network is able to reconstruct videos
from blurred images with diverse motion and scene content.
In comparison, the results of [8] suffer from local errors in deblurring, inconsistent motion estimation, as well
as color distortions. We have observed that in general the
method of [8] fails in cases involving high blur as direct im-
Blurred Image
Blurred patch
Whyte et al. [47]
Nah et al. [19]
DelurGAN [14]
SRN [41]
Ours
Figure 5. Visual comparisons of deblurring results on test dataset [19] (best view in high resolutions).
(a)
(b)
(c)
(d)
(e)
(f)
Figure 6. Comparisons of our video extraction results with [8] on motion blurred images obtained from the test dataset of [19]. The first
row shows the blurred images while the second and third rows show videos generated by our method and [8], respectively. Videos can be
viewed by clicking on the image, when document is opened in Adobe Reader.
age regression becomes difficult for large motion. In contrast, we divide the overall problem into two sub-tasks of deblurring and motion extraction. This simplifies learning and
yields improvement in deblurring quality as well as motion
estimation. The color issue in [8] can be attributed to the
design of their networks, wherein feature extraction and reconstruction branches are different for different color channels. Our method applies the same motion to each color
channel. By having a single recurrent network to generate
the video, our network can be directly trained to extract even
higher number of frames (> 9) without any design change
or additional parameters. In contrast, [8] requires training
of an additional network for each new pair of frames. Our
overall architecture is more compact (34 MB vs 70 MB)
and much faster (0.02s vs 0.45s for deblurring and 0.39s vs
1.10s for video generation) as compared to [8].
To perform quantitative comparisons with [8], we also
trained another version of our network on the restricted
case of blurred images produced by averaging 7 successive
sharp frames. For testing, 250 blurred images of resolution 1280 × 704 were created using the 11 test videos from
the dataset of [19]. We compared the videos estimated by
the two methods using the ambiguity invariant loss function. The average error was found to be 49.06 for [8] and
44.12 for our method. Thus, even for the restricted case of
small blur, our method performs favorably. Repeating the
same experiment for 9 frames (i.e. for large blur from the
same test videos) led to an error of 48.24 for our method,
which is still less than the 7-frame error of [8]. We could
not compute the 9-frame error for [8] as their network is
rigidly designed for 7 frames only.
4. Conclusions
We introduced a new methodology for video generation
from a single blurred image. We proposed a spatio-temporal
video auto-encoder based on an end-to-end differentiable
architecture that learns motion representation from sharp
videos in a self-supervised manner. The network predicts
a sequence of optical flows and employs them to transform
a sharp central frame and return a smooth video. Using the
trained video decoder, we trained a blurred image encoder
to extract a representation from a single blurred image, that
mimics the representation returned by the video encoder.
This when fed to the decoder returns a plausible sharp video
(a)
(b)
(c)
(d)
(e)
(f)
Figure 7. Video generation from images blurred with global camera motion from datasets of [6],[11] and [15]. First row shows the blurred
images. The generated videos using our method are shown in second row.
8
Figure 8. Video generation results on real motion blurred images from dataset of [38]. The first row shows the blurred images. Second row
contains the extracted videos with our method.
representing the action within the blurred image. We also
proposed an efficient deblurring architecture composed of
densely connected layers that yields state-of-the-art results.
The potential of our work can be extended in a variety of
directions including blur-based segmentation, video deblurring, video interpolation, action recognition etc. Refined
and complete version of this work appeared in CVPR 2019.
[7]
[8]
References
[1] A. Chakrabarti. A neural approach to blind motion deblurring, 2016. 1
[2] P. Chandramouli and A. Rajagopalan. Inferring image transformation and structure from motion-blurred images. In
BMVC, pages 73–1, 2010. 1
[3] S. Cho and S. Lee. Fast motion deblurring. ACM Trans.
Graph., 28(5):1–8, dec 2009. 1
[4] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T.
Freeman. Removing camera shake from a single photograph.
ACM Trans. Graph., 25(3):787–794, jul 2006. 1
[5] J. Flynn, I. Neulander, J. Philbin, and N. Snavely. Deep
stereo: Learning to predict new views from the world’s imagery. In 2016 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pages 5515–5524, 2016. 1
[6] D. Gong, J. Yang, L. Liu, Y. Zhang, I. Reid, C. Shen, A. Van
Den Hengel, and Q. Shi. From motion blur to motion flow:
[9]
[10]
[11]
A deep learning solution for removing heterogeneous motion blur. In 2017 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pages 3806–3815, 2017. 1, 4,
6
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image
translation with conditional adversarial networks, 2018. 4
M. Jin, G. Meishvili, and P. Favaro. Learning to extract
a video sequence from a single motion-blurred image. In
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6334–6342, 2018. 2, 4, 5
N. Joshi, R. Szeliski, and D. J. Kriegman. Psf estimation
using sharp edge prediction. In 2008 IEEE Conference on
Computer Vision and Pattern Recognition, pages 1–8, 2008.
1
S. Jégou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio. The one hundred layers tiramisu: Fully convolutional
densenets for semantic segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1175–1183, 2017. 4
R. Köhler, M. Hirsch, B. Mohler, B. Schölkopf, and
S. Harmeling.
Recording and playback of camera
shake: Benchmarking blind deconvolution with a real-world
database. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato,
and C. Schmid, editors, Computer Vision – ECCV 2012,
pages 27–40, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. 6
[12] D. Krishnan and R. Fergus. Fast image deconvolution using hyper-laplacian priors. NIPS’09, page 1033–1041, Red
Hook, NY, USA, 2009. Curran Associates Inc. 1
[13] D. Krishnan, T. Tay, and R. Fergus. Blind deconvolution
using a normalized sparsity measure. In CVPR 2011, pages
233–240, 2011. 1
[14] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and
J. Matas. Deblurgan: Blind motion deblurring using conditional adversarial networks, 2018. 1, 4, 5
[15] W.-S. Lai, J.-B. Huang, Z. Hu, N. Ahuja, and M.-H. Yang. A
comparative study for single image blind deblurring. In 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1701–1709, 2016. 6
[16] Z. Liu, R. A. Yeh, X. Tang, Y. Liu, and A. Agarwala. Video
frame synthesis using deep voxel flow, 2017. 1
[17] M. Mohan, S. Girish, and A. Rajagopalan. Unconstrained
motion deblurring for dual-lens cameras. In Proceedings
of the IEEE/CVF International Conference on Computer Vision, pages 7870–7879, 2019. 1
[18] M. M. Mohan, G. Nithin, and A. Rajagopalan. Deep dynamic scene deblurring for unconstrained dual-lens cameras. IEEE Transactions on Image Processing, 30:4479–
4491, 2021. 1
[19] S. Nah, T. H. Kim, and K. M. Lee. Deep multi-scale convolutional neural network for dynamic scene deblurring. In 2017
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 257–265, 2017. 1, 4, 5
[20] T. Nimisha, A. Rajagopalan, and R. Aravind. Generating
high quality pan-shots from motion blurred videos. Computer Vision and Image Understanding, 171:20–33, 2018. 1
[21] T. M. Nimisha, A. K. Singh, and A. N. Rajagopalan. Blurinvariant deep learning for blind-deblurring. In 2017 IEEE
International Conference on Computer Vision (ICCV), pages
4762–4770, 2017. 1
[22] T. M. Nimisha, K. Sunil, and A. Rajagopalan. Unsupervised
class-specific deblurring. In Proceedings of the European
Conference on Computer Vision (ECCV), pages 353–369,
2018. 1
[23] J. Pan, Z. Hu, Z. Su, and M.-H. Yang. Deblurring text images via l0-regularized intensity and gradient prior. In 2014
IEEE Conference on Computer Vision and Pattern Recognition, pages 2901–2908, 2014. 1
[24] J. Pan, Z. Lin, Z. Su, and M.-H. Yang. Robust kernel estimation with outliers handling for image deblurring. In 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2800–2808, 2016. 1
[25] J. Pan, D. Sun, H. Pfister, and M.-H. Yang. Deblurring
images via dark channel prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(10):2315–2328,
2018. 1
[26] C. Paramanand and A. Rajagopalan. Shape from sharp and
motion-blurred image pair. International journal of computer vision, 107(3):272–292, 2014. 1
[27] C. Paramanand and A. N. Rajagopalan. Depth from motion
and optical blur with an unscented kalman filter. IEEE Transactions on Image Processing, 21(5):2798–2811, 2011. 1
[28] C. Paramanand and A. N. Rajagopalan. Non-uniform motion
deblurring for bilayer scenes. In CVPR, pages 1115–1122,
2013. 1
[29] K. Purohit and A. Rajagopalan. Region-adaptive dense network for efficient motion deblurring. In Proceedings of the
AAAI Conference on Artificial Intelligence, volume 34, pages
11882–11889, 2020. 1
[30] K. Purohit, A. Shah, and A. Rajagopalan. Bringing alive
blurred moments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages
6830–6839, 2019. 1
[31] K. Purohit, A. B. Shah, and A. Rajagopalan. Learning based
single image blur detection and segmentation. In 2018 25th
IEEE International Conference on Image Processing (ICIP),
pages 2202–2206. IEEE, 2018. 1
[32] M. P. Rao, A. Rajagopalan, and G. Seetharaman. Harnessing
motion blur to unveil splicing. IEEE transactions on information forensics and security, 9(4):583–595, 2014. 1
[33] M. P. Rao, A. Rajagopalan, and G. Seetharaman. Inferring
plane orientation from a single motion blurred image. In
ICPR, pages 2089–2094. IEEE, 2014. 1
[34] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation, 2015.
3, 4
[35] C. J. Schuler, H. C. Burger, S. Harmeling, and B. Schölkopf.
A machine learning approach for non-blind image deconvolution. In 2013 IEEE Conference on Computer Vision and
Pattern Recognition, pages 1067–1074, 2013. 1
[36] C. J. Schuler, M. Hirsch, S. Harmeling, and B. Schölkopf.
Learning to deblur. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 38(7):1439–1451, 2016. 1
[37] Q. Shan, J. Jia, and A. Agarwala. High-quality motion
deblurring from a single image. ACM Trans. Graph.,
27(3):1–10, aug 2008. 1
[38] J. Shi, L. Xu, and J. Jia. Discriminative blur detection features. In 2014 IEEE Conference on Computer Vision and
Pattern Recognition, pages 2965–2972, 2014. 6
[39] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-k. Wong, and
W.-c. Woo. Convolutional lstm network: A machine learning
approach for precipitation nowcasting. In Proceedings of the
28th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, page 802–810, Cambridge, MA, USA, 2015. MIT Press. 1
[40] J. Sun, W. Cao, Z. Xu, and J. Ponce. Learning a convolutional neural network for non-uniform motion blur removal.
In 2015 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pages 769–777, 2015. 1, 4
[41] X. Tao, H. Gao, X. Shen, J. Wang, and J. Jia. Scale-recurrent
network for deep image deblurring. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages
8174–8182, 2018. 1, 4, 5
[42] I. Vasiljevic, A. Chakrabarti, and G. Shakhnarovich. Examining the impact of blur on recognition by convolutional networks, 2017. 1
[43] S. Vasu, V. R. Maligireddy, and A. Rajagopalan. Non-blind
deblurring: Handling kernel uncertainty with cnns. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pages 3272–3281, 2018. 1
[44] S. Vasu and A. N. Rajagopalan. From local to global: Edge
profiles to camera motion in blurred images. In 2017 IEEE
Conference on Computer Vision and Pattern Recognition
(CVPR), pages 558–567, 2017. 1
[45] C. S. Vijay, C. Paramanand, A. N. Rajagopalan, and R. Chellappa. Non-uniform deblurring in hdr image reconstruction.
IEEE Transactions on Image Processing, 22(10):3739–3750,
2013. 1
[46] C. Vondrick, H. Pirsiavash, and A. Torralba. Generating
videos with scene dynamics, 2016. 1
[47] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce. Non-uniform
deblurring for shaken images. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,
pages 491–498, 2010. 4, 5
[48] L. Xu, S. Zheng, and J. Jia. Unnatural l0 sparse representation for natural image deblurring. In 2013 IEEE Conference
on Computer Vision and Pattern Recognition, pages 1107–
1114, 2013. 1
[49] L. Xu, S. Zheng, and J. Jia. Unnatural l0 sparse representation for natural image deblurring. In 2013 IEEE Conference
on Computer Vision and Pattern Recognition, pages 1107–
1114, 2013. 4
[50] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu. Residual dense network for image restoration. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 43(7):2480–
2495, 2021. 4
[51] T. Zhou, S. Tulsiani, W. Sun, J. Malik, and A. A. Efros. View
synthesis by appearance flow, 2017. 1