Literature Review On Image Classification Architecture

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Literature Review on Image Classification Architecture

Sowvik Sarker (id: 17-33228-1)a , Mainul Islam Mahi (id: 18-38468-2)a , MD.
Rezvi Khalid Hridoy (id: 18-38472-2)a , Abir Hassan (id: 18-39206-3)a
a
Department of Computer Sciences, American International University-Bangladesh

Abstract
Convolutional neural networks (CNNs) were used to solve visual tasks since
the late 1980s. Despite a few dispersed applications, CNN leave unused until
the mid-2000s, when advances in machine learning and the availability of
large amounts of labeled data, combined with better algorithms, catapulted
it to the leading edge of a neural network rebirth that has seen quick advances
since 2012. Analyzing the scale up networks in ways that makes most of
the additional amount compute by implementing appropriately factorized
convolutions and strong regularization. This literature review helps to find
us which image classification algorithm to use in different scenarios. We
have studied 5 different image classification algorithms, U-Net, VGGNet-19,
ResNet, DenseNet, Inception V3 and We also introduce some of their current
trends and remaining challenges.
Keywords: Deep learning, Computer Vision, Object detection, NN, CNN

1 1. Introduction 1

2 Deep learning models use multi layers of nonlinear processing information 2


3 to extract and transform features, as well as analyze and classify patterns 3
4 in recent years. Image classification is a fundamental topic in computer 4
5 vision and describes as the process of categorizing images into one of several 5
6 specified groups. It serves as the foundation for other computer vision tasks 6
7 such as localization, detection, and segmentation Karpathy et al. [2016]. 7
8 Since Krizhevsky’s Russakovsky et al. [2015] winning entry in the 2012 8
9 ImageNet competition Krizhevsky et al. [2012], their network ”AlexNet” has 9
10 been successfully applied to a wider range of computer vision tasks. Since 10
11 2014, as a result of using the deeper and larger networks, the quality of net- 11
12 work architectures has been improved significantly Szegedy et al. [2016a]. 12

Preprint submitted to Journal July 1, 2021


13 Olaf Ronneberger, Philipp Fischer, and Thomas Brox developed U-Net in 13
14 the paper Ronneberger et al. [2015] published in 2015. It is an uodated ver- 14
15 sion for Long et al. [2015]. VGG network was invented by Simonyan and 15
16 Zisserman, Visual Geometry group at the University of Oxford in 2014. This 16
17 family of architectures achieved 2nd place in the 2014 ImageNet Classifica- 17
18 tion competition. This model is a simple linear chain of layers, which is 18
19 noteworthy Simonyan and Zisserman [2015]. A Microsoft research group in- 19
20 troduced ResNet, a deep convolutional neural network, in 2015 and in the 20
21 ILSVRC 2015 classification competition, they took first place. ResNet is sim- 21
22 ilar to VGG net in terms of depth Simonyan and Zisserman [2014], although 22
23 ResNet is around eight times deeper Too et al. [2019]. The IEEE Conference 23
24 on Computer Vision and Pattern Recognition (CVPR) 2017 awarded Huang 24
25 et al. [2017a] the Best Paper Award. DenseNet was created specifically to 25
26 address the vanishing gradient’s effect on high-level neural networks’ accu- 26
27 racy. In simply, the information evaporates before it reaches its destination 27
28 due to the long journey between the input and output layers. Inception v3 28
29 is a convolutional neural network for assisting in image analysis. It starts as 29
30 a module for GoogleNet. It has three editions and it is the third edition. It 30
31 is just as ImageNet as a database of a classified visual object. It helps to 31
32 classified the objects Simonyan and Zisserman [2015]. 32
33 Modern GPU computing and parallel computing technologies have dra- 33
34 matically boosted the ability to train CNN models, leading to their devel- 34
35 opment in academia and industry. Steinkraus et al. [2005] demonstrated 35
36 the value of utilizing GPUs for machine learning. E xtremely large datasets 36
37 like ImageNet, we can now train networks with millions, or even billions, of 37
38 parameters reasonably quickly. Intraclass variation, occlusion, deformation, 38
39 and size variation are all examples of frequent computer vision problems. 39
40 Methods that work well for image classification are likely to work well for 40
41 other important computer vision tasks like detection, localization, and seg- 41
42 mentation as well. In this paper we will focus general principles, optimization 42
43 ideas, applications and limitations of CNN image classification architecture. 43

44 2. Literature review 44

45 2.1. U-Net 45
46 Ronneberger et al. [2015] proposed the model U-net and is called U- 46
47 net because of its U-shaped architecture. The left side of this model is 47
48 called the contracting path, and the right side is called the expansive path. 48

2
49 Moreover, four concatenations occurred between the expansive path and its 49
50 corresponding contracting path in the network. The contracting path started 50
51 with one channel input image, consisting of (572*572) pixels. The network 51
52 then goes with an unpadded convolution with a kernel size of (3*3) for two 52
53 repeated times. As this is an unpadded convolution, the pixels reduce to 53
54 (570*570) in the first step and then (568*568) in the second step. For these 54
55 two convolutions, the channel number has been set to 64. The next step 55
56 started with down-sampling with (2*2) max-pooling, which reduces pixel size 56
57 to (284*284), i.e., half, and again the unpadded convolution occurs two times 57
58 as the previous step, but this time the channel number has been increased 58
59 to 128 from 64. This continues three more times and finishes with pixel size 59
60 (28*28), consisting of 1024 channels, which is the end of the contracting path 60
61 and the start of the extensive path. From this step, instead of max pooling, 61
62 up-convolution takes place, which is the opposite of max pooling. The (2*2) 62
63 up-convolution increases the pixel size from (28*28) to (56*56), i.e., double. 63
64 This time concatenation occurs with the channel of the contracting path 64
65 with its corresponding expansive path. Therefore, the channel increased by 65
66 1536(1024+512). Through expansive path, unpadded convolution happens 66
67 two more times with (3*3) kernel size, reducing the pixel size to (52*52). 67
68 The channel number has been reduced to 512 in this step. Next, again up- 68
69 convolution, as well as concatenation, happens and repeats the previous step 69
70 three times. The network completed its expansive path and came up with 70
71 the 64-channel image with a pixel size of 388*388. Finally, (1*1) convolution 71
72 happens to reduce the channel from 64 to 2. So, the model lastly shows the 72
73 two-channel output image of (388*388) pixels. 73

74 2.2. VGGNet-19 74
75 The input to VGG based ConvNet is a (224*224) RGB image. Prepro- 75
76 cessing layer takes the RGB image with pixel values in the range of 0-225 76
77 and subtracts the main image value which is calculated over the entire Ima- 77
78 geNet training set. The input images after prepossessing are passed through 78
79 these weight layers. The training images are passed through these weight 79
80 layers. The training images are passed through a stack of convolution lay- 80
81 ers. VGG-19 has 19 weight layers consisting of 16 convolutional layers with 81
82 3 fully connected layers and the same 5 pooling layers. VGG-19, there 2 82
83 fully connected layers with 4096 channels which are followed by another fully 83
84 connected layer with 1000 channels to predict 1000 labels. The last fully 84

3
85 connected layer uses the SoftMax layer for classification purposes Simonyan 85
86 and Zisserman [2015]. 86

87 2.2.1. ResNet 87
88 The input layers of this network are made up of many residual blocks, and 88
89 the operating principle is to optimize a residual function He et al. [2016]. This 89
90 unique architecture allows for greater accuracy when layer depth is increased. 90
91 The authors proposed residual mapping to accommodate the adding layers in 91
92 their research. If we designate H(x) by the underlying mapping, then F(x): 92
93 = H(x) - x determines the residual mapping. The residual block function 93
94 is defined by: y = F (x, Wi) + Wsx when the input x and the output y = 94
95 H(x) have the same dimension. All convolutional layers in ResNet models 95
96 use the same convolutional window of size (3*3), and the number of filters 96
97 rises with network depth, from 64 to 512 (for ResNet-18 and ResNet-34), 97
98 and from 64 to 2048 (for ResNet-50, ResNet-101, and ResNet-152). Only 98
99 one max-pooling layer with pooling size (3*3) is used in all models, and a 99
100 stride of 2 is applied after the first layer. As a result, reducing the resolution 100
101 of the input throughout the training phase is severely constrained. The 101
102 average pooling layer is used to replace completely linked layers at the end 102
103 of all models. This alternative has a few advantages. Firstly, there are no 103
104 parameters to optimize in this layer, hence it aids in the reduction of model 104
105 complexity. Secondly, this layer is more native in terms of enforcing feature 105
106 map and category correspondences. The number of neurons in the output 106
107 layer corresponds to the number of categories in the ImageNet dataset, which 107
108 is 1000. In addition, in this layer, a SoftMax activation function is used to 108
109 calculate the likelihood that the input belongs to each class. 109

110 2.2.2. DenseNet 110


111 The vanishing gradient problem describes how gradients aren’t sufficiently 111
112 back-propagated to the network’s original layers as networks go deeper. As 112
113 one moves backward into the network, the gradients become smaller, and the 113
114 earliest layers lose their ability to learn basic low-level features. Regardless 114
115 of how networks are built, they all attempt to build channels for information 115
116 to travel between the initial and final levels. DenseNet establishes paths be- 116
117 tween the network’s layers. Each layer in a dense block gets feature maps 117
118 from all preceding levels and transfers their output to all following layers, 118
119 given to the network’s feed-forward structure. Concatenation is used to join 119
120 feature maps from different levels (like in ResNet). Each dense layer is divided 120

4
121 into two convolutional operations: (1 * 1) CONV (standard conv operation 121
122 for extracting features) and (3 * 3) CONV (reducing feature depth/channel 122
123 count). The growth rate (used K=32) is the number of channels output by a 123
124 dense layer (1*1 conv to 3*3 conv). This means that a dense layer (l) will get 124
125 32 features from its preceding dense layer (l-1). Because 32 channel charac- 125
126 teristics are concatenated and provided as input to the following layer after 126
127 each layer, this is referred to as the growth rate. With the same number of 127
128 parameters, the DenseNet model has a considerably smaller validation error 128
129 than the ResNet model. These tests were carried out on both models with 129
130 hyper-parameters that were more suitable for ResNet. After rigorous hyper- 130
131 parameter searches, the authors claim that DenseNet will perform better 131
132 Huang et al. [2017b]. 132

133 2.2.3. Inception V3 133


134 The inception v3 model was introduced to explore the inception archi- 134
135 tecture. Inception-v3 is a convolutional neural network architecture from 135
136 the Inception family that makes several improvements including using La- 136
137 bel Smoothing, factorized (7*7) convolutions, and the use of an auxiliary 137
138 classifier to propagate label information lower down the network. Convolu- 138
139 tions with larger spatial filters (5*5) or (7*7) tend to be disproportionally 139
140 expensive in terms of computation. For example, a (5*5) convolution with 140
141 n filters over a grid with m filters is 25/9 = 2.78 times more computation- 141
142 ally expensive than a (3*3) convolution with the same number of filters. Of 142
143 course, a (5*5) filter can capture dependencies between signals and between 143
144 activations of units in earlier layers, so reducing the geometric size of the 144
145 filters comes at a significant cost in expressiveness. However, we can ask 145
146 whether a (5*5) convolution could be replaced by a multi-layer network with 146
147 fewer parameters with the same input size and output depth. If we zoom 147
148 into the computation graph of the (5*5) convolutions, we see that each out- 148
149 put looks like a small fully-connected network sliding over (5*5) tiles over 149
150 its input. Since we are constructing a vision network, it seems natural to 150
151 exploit translation invariance again and replace the fully connected compo- 151
152 nent with a two-layer convolutional architecture: the first layer is a (3*3) 152
153 convolution, the second is a fully connected layer on top of the (3*3) output 153
154 grid of the first layer. Sliding this small network over the input activation 154
155 grid boils down to replacing the (5*5) convolutions with two layers of (3*3) 155
156 convolutions Simonyan and Zisserman [2015]. 156

5
157 3. Discussion 157

158 Deep Convolutional Neural Networks have succeeded at object recogni- 158
159 tion, detection, and localization, as well as a variety of other computer vision 159
160 tasks. Despite all of the advancements demonstrated by the many proposed 160
161 designs, there were little insights and logical thinking about how they had 161
162 attained state-of-the-art records, leaving additional improvements to trial 162
163 and error tactics. For example, one of our benchmark findings was that 163
164 ResNet152 architecture, which has 152 layers of depth, outperformed VGG- 164
165 19 architecture Sachin [2016] and only has 19 layers of depth. DenseNet de- 165
166 signs have been found to be the best in terms of parameter space utilization, 166
167 with up to 4x less parameter space size when compared to AlexNet model 167
168 and 10x less when compared to VGG-19 Muhammed et al. [2017]. U-Net 168
169 has a wide range of applications in biomedical image segmentation, includ- 169
170 ing brain and liver image segmentation. A (224*224) RGB image is used 170
171 as the input to a VGG-based ConvNet. Preprocessing layer takes the RGB 171
172 image with pixel values in the range of 0-225 and subtracts the main image 172
173 value which is calculated over the entire ImageNet training set. The input 173
174 images after prepossessing are passed through these weight layers. There 174
175 is various usage of VGGNet-19 in the field of medical science. The input 175
176 layers of ResNet are made up of many residual blocks, and the operating 176
177 principle is to optimize a residual function. This unique architecture allows 177
178 for greater accuracy when layer depth is increased. ResNet has vast usage 178
179 in the field of agricultural science. DenseNet establishes paths between the 179
180 network’s layers. Each layer in a dense block gets feature maps from all 180
181 preceding levels and transfers their output to all following layers, given to 181
182 the network’s feed-forward structure Szegedy et al. [2016b]. Concatenation 182
183 is used to join feature maps from different levels (like in ResNet). From 183
184 the idea of ResNet, dense connections have inspired optimizations in many 184
185 other deep learning areas such as picture super-resolution, image segmenta- 185
186 tion, medical diagnosis, and so on. The inception v3 model was introduced 186
187 to explore the inception architecture. Inception-v3 is a convolutional neural 187
188 network architecture from the Inception family that uses Label Smoothing, 188
189 factorized (7*7) convolutions, and an additional classifier to transport label 189
190 information down the network. Inception v3 enables health experts to take 190
191 some sample tests to determine systemic diseases from patients, but the re- 191
192 search has studied the analyses of systemic diseases through digital image 192
193 processing methods based on color analysis of nails. 193

6
Architecture ImageNet Top-1 Error
U-Net 22.50%
VGGNet-19 27.30%
ResNet 21.66%
DenseNet 25.53%
Inception V3 21.90%
Table 1: Comparison of accuracy for different architecture on ImageNet

194 3.1. Applications 194


195 Using live images collected at various stages of the dragon fruit, the 195
196 RESNET 152 deep CNN-based model was constructed to determine the 196
197 mellowness of the dragon fruit Vijayakumar and Vinothkanna [2020]. A mo- 197
198 bile application is developed to detecting early banana diseases comparing 198
199 ResNet-152 (accuracy 99.20%) and Inception V3 (accuracy 95.41%) Sanga 199
200 et al. [2020]. A Medical Monitoring System was presented to minimize in- 200
201 formation loss in the traditional pooling layer and layer-by-layer dimension 201
202 reduction Duan et al. [2021]. U-Net has a wide range of applications in 202
203 biomedical image segmentation, including brain and liver image segmenta- 203
204 tion. Variations of the U-Net have also been used to rebuild medical images 204
205 Andersson et al. [2019]. Sparse Annotation-Based Dense Volumetric Seg- 205
206 mentation was achieved using a 3D U-Net Yao et al. [2018]. U-Net with 206
207 VGG11 Encoder for Image Segmentation TernausNet Iglovikov and Shvets 207
208 [2018]. To estimate fluorescent stains, image-to-image translation is used 208
209 Kandel et al. [2020]. VGG-19 can be used to detect melanoma thickness in 209
210 skin cancer patients Jaworek-Korjakowska et al. [2019]. It can also be used 210
211 to classify Wilson disease tissue using brain MRI Saba et al. [2020]. An im- 211
212 proved version of VGG-19 can be used to determine if someone is wearing 212
213 a mask or not Xiao et al. [2020]. From the idea of ResNet, dense connec- 213
214 tions have inspired optimizations in many other deep learning areas such as 214
215 picture super-resolution Rafi et al. [2019], image segmentation Zeng et al. 215
216 [2019], medical diagnosis Aldoj et al. [2020]Stawiaski [2018], and so on. This 216
217 work exhibits the architecture’s proficiency in image categorization Huang 217
218 et al. [2020]. Inception v3 enables health experts take some sample tests to 218
219 determine systemic diseases from patients, but the research has studied the 219
220 analysis of systemic diseases through digital image processing methods based 220
221 on color analysis of nails. It enables to detection of diseases without painful 221
222 sampling. Terry’s nail is one of the nail disorders that can indicate systemic 222

7
223 diseases such as liver cirrhosis Jaworek-Korjakowska et al. [2019]. 223

224 3.2. Limitations 224


225 In U-Net architecture training takes a long time because there are so 225
226 many layers. For larger photos, there is a relatively substantial GPU mem- 226
227 ory footprint. The diversity of features is lost due to the fixed receptive field 227
228 of the convolution kernel Su et al. [2021]. U-Net succeeds to segment the 228
229 polyp with a low-level performance and a bad segmentation shape in polyp 229
230 segmentation Khanh et al. [2020]. VGG-19 has very high weight parameters 230
231 Simonyan and Zisserman [2015]. The models are very heavy and take a lot 231
232 of space, which increases inference time Cheng and Zhou [2020]. Takes a lot 232
233 of training time Saba et al. [2020]. Vanishing gradient is a big issue Xiao 233
234 et al. [2020]. Deeper networks mean higher test errors which make VGG-19 234
235 vulnerable Jaworek-Korjakowska et al. [2019]. The dense shortcut avoids the 235
236 downside of demanding additional GPU resources while avoiding the problem 236
237 of representational capacity decline in ResNet Zhang et al. [2021]. Follow-up 237
238 research has looked into the identity shortcut’s flaws. The identity short- 238
239 cut bypasses the residual blocks in order to retain features, which may limit 239
240 the network’s representation power Zagoruyko and Komodakis [2016]. The 240
241 disadvantage of the identity shortcut is that it produces the collapsing do- 241
242 main problem, which decreases the network’s learning capacity Zhang et al. 242
243 [2020]. Excessive connections not only reduce the computing and parame- 243
244 ter efficiency of networks but also make them more prone to overfitting in 244
245 ResNet Liu and Zeng [2018]. The inception v3 is generally designed by an it- 245
246 erative trial-and-error process, which requires a large amount of labeled data 246
247 during the training phase, and also in this model, a huge number of neuron 247
248 connections would bring in heavy computation expense. Saba et al. [2020]. 248

249 4. Conclusion 249

250 One of the most well-known tasks in computer vision is image classifi- 250
251 cation: given an image, classify it into one of several predefined categories. 251
252 Image classification is a classic problem because of its wide range of applica- 252
253 tions. In the future, image classification systems may become an important 253
254 factor of accessibility software, assisting people with vision impairments in 254
255 making sense of their environment. A literature review of image classification 255
256 architectures is presented in this paper. It classifies the growth and contribu- 256
257 tion to the deep learning renaissance during the last few years. It focuses on 257

8
258 the progress in particular by debating and examining the designs, supervisory 258
259 components, regularization processes, optimization strategies, and computa- 259
260 tion. This paper does not categorize in terms of popularity, performance in 260
261 terms GPU/CPU and experimental results with comparisons. These are the 261
262 limitations of this paper and could be solved in future update. 262

9
References
Karpathy, A., et al. Cs231n convolutional neural networks for visual recog-
nition. Neural networks 2016;1(1).

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al.
Imagenet large scale visual recognition challenge. International journal of
computer vision 2015;115(3):211–252.

Krizhevsky, A., Sutskever, I., Hinton, G.E.. Imagenet classification with


deep convolutional neural networks. Advances in neural information pro-
cessing systems 2012;25:1097–1105.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.. Rethinking
the inception architecture for computer vision. In: Proceedings of the
IEEE conference on computer vision and pattern recognition. 2016a, p.
2818–2826.

Ronneberger, O., Fischer, P., Brox, T.. U-net: Convolutional networks for
biomedical image segmentation. In: International Conference on Medical
image computing and computer-assisted intervention. Springer; 2015, p.
234–241.

Long, J., Shelhamer, E., Darrell, T.. Fully convolutional networks for se-
mantic segmentation. In: Proceedings of the IEEE conference on computer
vision and pattern recognition. 2015, p. 3431–3440.

Simonyan, K., Zisserman, A.. Very deep convolutional networks for large-
scale image recognition. 2015. arXiv:1409.1556.

Simonyan, K., Zisserman, A.. Very deep convolutional networks for large-
scale image recognition. arXiv preprint arXiv:14091556 2014;.

Too, E.C., Yujian, L., Njuki, S., Yingchun, L.. A comparative study of fine-
tuning deep learning models for plant disease identification. Computers
and Electronics in Agriculture 2019;161:272–279.

Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.. Densely
connected convolutional networks. In: Proceedings of the IEEE conference
on computer vision and pattern recognition. 2017a, p. 4700–4708.

10
Steinkraus, D., Buck, I., Simard, P.. Using gpus for machine learning
algorithms. In: Eighth International Conference on Document Analysis
and Recognition (ICDAR’05). IEEE; 2005, p. 1115–1120.

He, K., Zhang, X., Ren, S., Sun, J.. Deep residual learning for image
recognition. In: Proceedings of the IEEE conference on computer vision
and pattern recognition. 2016, p. 770–778.

Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.. Densely
connected convolutional networks. In: Proceedings of the IEEE conference
on computer vision and pattern recognition. 2017b, p. 4700–4708.

Sachin, P.. Convolutional neural networks for image classification and cap-
tioning. 2016.

Muhammed, M.A.E., Ahmed, A.A., Khalid, T.A.. Benchmark analysis of


popular imagenet classification deep cnn architectures. In: 2017 Interna-
tional Conference On Smart Technologies For Smart Nation (SmartTech-
Con). IEEE; 2017, p. 902–907.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.. Rethinking
the inception architecture for computer vision. In: Proceedings of the
IEEE conference on computer vision and pattern recognition. 2016b, p.
2818–2826.

Vijayakumar, T., Vinothkanna, M.R.. Mellowness detection of dragon


fruit using deep learning strategy. Journal of Innovative Image Processing
(JIIP) 2020;2(01):35–43.

Sanga, S., Mero, V., Machuve, D., Mwanganda, D.. Mobile-based


deep learning models for banana diseases detection. arXiv preprint
arXiv:200403718 2020;.

Duan, J., Shi, T., Zhou, H., Xuan, J., Wang, S.. A novel resnet-based
model structure and its applications in machine health monitoring. Journal
of Vibration and Control 2021;27(9-10):1036–1050.

Andersson, J., Ahlström, H., Kullberg, J.. Separation of water and fat sig-
nal in whole-body gradient echo scans using convolutional neural networks.
Magnetic resonance in medicine 2019;82(3):1177–1186.

11
Yao, W., Zeng, Z., Lian, C., Tang, H.. Pixel-wise regression using u-net
and its application on pansharpening. Neurocomputing 2018;312:364–371.

Iglovikov, V., Shvets, A.. Ternausnet: U-net with vgg11 encoder pre-trained
on imagenet for image segmentation. arXiv preprint arXiv:180105746
2018;.

Kandel, M.E., He, Y.R., Lee, Y.J., Chen, T.H.Y., Sullivan, K.M., Aydin,
O., et al. Phase imaging with computational specificity (pics) for measuring
dry mass changes in sub-cellular compartments. Nature communications
2020;11(1):1–10.

Jaworek-Korjakowska, J., Kleczek, P., Gorgon, M.. Melanoma thickness


prediction based on convolutional neural network with vgg-19 model trans-
fer learning. In: Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition Workshops. 2019, p. 0–0.

Saba, L., Agarwal, M., Sanagala, S.S., Gupta, S.K., Sinha, G., Johri,
A., et al. Brain mri-based wilson disease tissue classification: an optimised
deep transfer learning approach. Electronics Letters 2020;56(25):1395–
1398.

Xiao, J., Wang, J., Cao, S., Li, B.. Application of a novel and improved
vgg-19 network in the detection of workers wearing masks. In: Journal of
Physics: Conference Series; vol. 1518. IOP Publishing; 2020, p. 012041.

Rafi, A.M., Kamal, U., Hoque, R., Abrar, A., Das, S., Laganière, R.,
et al. Application of densenet in camera model identification and post-
processing detection. In: CVPR workshops. 2019, p. 19–28.

Zeng, X., Feng, G., Zhang, X.. Detection of double jpeg compres-
sion using modified densenet model. Multimedia Tools and Applications
2019;78(7):8183–8196.

Aldoj, N., Biavati, F., Michallek, F., Stober, S., Dewey, M.. Automatic
prostate and prostate zones segmentation of magnetic resonance images
using densenet-like u-net. Scientific reports 2020;10(1):1–17.

Stawiaski, J.. A pretrained densenet encoder for brain tumor segmentation.


In: International MICCAI Brainlesion Workshop. Springer; 2018, p. 105–
115.

12
Huang, S., Lee, F., Miao, R., Si, Q., Lu, C., Chen, Q.. A deep con-
volutional neural network architecture for interstitial lung disease pattern
classification. Medical & biological engineering & computing 2020;:1–13.

Su, R., Zhang, D., Liu, J., Cheng, C.. Msu-net: Multi-scale u-net for 2d
medical image segmentation. Frontiers in Genetics 2021;12:140.

Khanh, T.L.B., Dao, D.P., Ho, N.H., Yang, H.J., Baek, E.T., Lee, G.,
et al. Enhancing u-net with spatial-channel attention gate for abnormal tis-
sue segmentation in medical imaging. Applied Sciences 2020;10(17):5729.

Cheng, S., Zhou, G.. Facial expression recognition method based on im-
proved vgg convolutional neural network. International Journal of Pattern
Recognition and Artificial Intelligence 2020;34(07):2056003.

Zhang, C., Benz, P., Argaw, D.M., Lee, S., Kim, J., Rameau, F., et al.
Resnet or densenet? introducing dense shortcuts to resnet. In: Proceedings
of the IEEE/CVF Winter Conference on Applications of Computer Vision.
2021, p. 3550–3559.

Zagoruyko, S., Komodakis, N.. Wide residual networks. arXiv preprint


arXiv:160507146 2016;.

Zhang, C., Rameau, F., Kim, J., Argaw, D.M., Bazin, J.C., Kweon,
I.S.. Deepptz: Deep self-calibration for ptz cameras. In: Proceedings of
the IEEE/CVF Winter Conference on Applications of Computer Vision.
2020, p. 1041–1049.

Liu, W., Zeng, K.. Sparsenet: A sparse densenet for image classification.
arXiv preprint arXiv:180405340 2018;.

13
Name & ID Contribution
Sowvik Sarker,
VGGNet-19,Discussion
17-33228-1
Mainul Islam Mahi,
Introduction,U-Net,ResNet
18-38468-2
MD. Rezvi Khalid Hridoy,
DenseNet, Abstract
18-38472-2
Abir Hassan,
Inception V3,Conclusion
18-39206-3

14

You might also like