01 - Semantic Segmentation
01 - Semantic Segmentation
01 - Semantic Segmentation
Maziar Raissi
Assistant Professor
[email protected]
Fully Convolutional Networks for Semantic Segmentation
YouTube Playlist
Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE conference on computer
vision and pattern recognition. 2015.
Learning Deconvolution Network
for Semantic Segmentation YouTube Video
Instance-wise prediction!
gi 2 RW ⇥H⇥C ! output score maps of the i-th proposal
ut score maps of the i-th proposal
Gi ! zero padded outside gi
Batch Normalization
pixel-wise class score map
Two-stage Training:
(before softmax)
1) ground-truth bounding boxes and 2) object proposals ( 0.5 in IoU)
Noh, Hyeonwoo, Seunghoon Hong, and Bohyung Han. "Learning deconvolution network for semantic segmentation." Proceedings of the IEEE international
conference on computer vision. 2015.
U-Net: Convolutional Networks for
Biomedical Image Segmentation YouTube Playlist
X
L= w(x, y) log p`(x,y) (x, y)
(x,y)2⌦
` : ⌦ ! {1, . . . , K}
(x, y) 7! `(x, y)
true label of each pixel !
XK
pk (x, y) = exp(ak (x, y))/ exp(ak0 (x, y))
k0 =1
(d1 (x, y) + d2 (x, y))2
w(x, y) = wc (x, y) + w0 exp( 2
)
2
wc (x, y) ! weight map to balance class frequencies
w0 = 10 & ⇡ 5 pixels
d1 (x, y) ! distance to the border of the nearest cell
d2 (x, y) ! distance to the border of the second nearest cell
Essential data
augmentation: shift and
0 1 rotation invariances as
3 ⇥ 3 Conv X X X well as robustness to
deformations and gray
bx,y,` = ReLU @ wi,j,k,` ax+i,y+j,k + c` A value variations
i2{ 1,0,1} j2{ 1,0,1} k2{1,...,K}
2 ⇥ 2 maxpooling
bx,y,k = max a2x+i,2y+j,k ! stride = 2
i,j2{0,1}
0 1
2 ⇥ 2 up-conv X
b2x+i,2y+j,` = ReLU @ wi,j,k,` ax,y,k + c` A for i, j 2 {0, 1}
k2{1,...,K}
Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical
image computing and computer-assisted intervention. Springer, Cham, 2015.
DeepLab: Semantic Image Segmentation with Deep Convolutional
Nets, Atrous Convolution, and Fully Connected CRFs YouTube Playlist
Three challenges in the application of DCNNs to Atrous Spatial Pyramid Pooling (ASPP) Cityscapes
semantic image segmentation: (1) reduced feature
resolution, (2) existence of objects at multiple
scales, and (3) reduced localization accuracy due
to DCNN invariance.
Atrous Convolution
Reduce the degree of signal downsampling due to
max-pooling and striding (from 32x down to 8x).
OC
AL
V
G -16
S C VG
PA
VOC - 1 01
C AL s N et
PA
S Re
X X
E(x) = u (xi ) + p (xi , xj )
i i<j
E(x) ! energy of a label assignment x
<latexit sha1_base64="/P3aMbMMbJWFSUWPXfZRPL79LNw=">AAACNHicbVDLSiNBFK32MT5njDNLN4VRcDahO4i6FAdhlg4YFZIQbldudwqrq5qq25rQ5Ff8DL/Are6F2Q1u5xusjln4OlBwOPd16sS5ko7C8DGYmZ2b/7KwuLS8svr121pt/fuZM4UV2BJGGXsRg0MlNbZIksKL3CJkscLz+PJXVT+/Quuk0ac0yrGbQaplIgWQl3q1g+Od4U/esTIdEFhrrnmHcEglarTpiJuEA1cQo+LgnEx1hpr41nBr3KvVw0Y4Af9IoimpsylOerWnTt+IologlF/WjsKcuiVYkkLheLlTOMxBXEKKbU81ZOi65eSHY77tlT5PjPXPG5iorydKyJwbZbHvzIAG7n2tEj+rtQtKDrql1HlBqMXLoaRQnAyv4uJ9aVGQGnkCwkrvlYsBWBDkQ31zpe8qa1Uu0fsUPpKzZiPaazT/7NYPj6YJLbINtsl2WMT22SH7zU5Yiwl2w+7YPXsIboO/wb/g6aV1JpjO/GBvEPx/BttQq7Q=</latexit>
softmax
<latexit sha1_base64="hVy9jxTJiPnrOuCH23Pa9lTPyhM=">AAACD3icbVBLTsMwFHTKr5RfoEs2FhUSqyrpAlhWsGFZJPqR2qhyHKe1aseR7SCiKIfgBGzhBOwQW47AAbgHTpsFbRnJ0mjmPc/T+DGjSjvOt1XZ2Nza3qnu1vb2Dw6P7OOTnhKJxKSLBRNy4CNFGI1IV1PNyCCWBHGfkb4/uy38/iORioroQacx8TiaRDSkGGkjje36aP5HJkmQQyVCzdHT2G44TWcOuE7ckjRAic7Y/hkFAiecRBozpNTQdWLtZUhqihnJa6NEkRjhGZqQoaER4kR52Tw4h+dGCWAopHmRhnP170aGuFIp980kR3qqVr1C/M8bJjq89jIaxYkmEV4EhQmDWsCiCRhQSbBmqSEIS2puhXiKJMLa9LWUEqjitNz04q62sE56raZ72Wzdtxrtm7KhKjgFZ+ACuOAKtMEd6IAuwCAFL+AVvFnP1rv1YX0uRitWuVMHS7C+fgGHZp2r</latexit>
<latexit sha1_base64="jECsszUi7ZUoa86Kv8c4GyeF/lA=">AAACUXicbVA9TxtBEF0fSSCQDwfKNKOYSEaKTncuQkoUmlQIJAxIxrLm9ubslfd2T7tzEZblX5afkYqSIk3yC+jYMy74yEgrPb03M2/2ZZVWnpPkuhWtvXj5an3j9ebWm7fv3rc/bJ95WztJfWm1dRcZetLKUJ8Va7qoHGGZaTrPpoeNfv6TnFfWnPKsomGJY6MKJZEDNWr3lWlkAq2mYcnE2hy6Kqb4C0jreQ9sAZW6Ig27ahcYp8qMQWPWEFejQHVtxhjMcyicLQHh8Ohob9TuJHGyLHgO0hXoiFUdj9p/LnMr65IMS43eD9Kk4uEcHSupabF5WXuqUE5xTIMADZbkh/Pl9xfwOTDB3rrwDMOSfTgxx9L7WZmFzhJ54p9qDfk/bVBz8W04V6aqmYy8NypqDWyhyRJy5UiyngWA0qlwK8gJOpQcIn3kkvvmtEXIJX2awnNw1ovTr3HvpNc5+L5KaEN8FJ9EV6RiXxyIH+JY9IUUv8SN+Cv+tX63biMRRfetUWs1syMeVbR1Bx1oscs=</latexit>
inverse likelihood (i.e., cost) of pixel i taking label xi (obtained from a CNN) <latexit sha1_base64="XuC2I2ulhpsP18N/y0kyR7MIMFA=">AAACOnicbVDLSgNBEJz1bXxFPXoZjEK8hN0c1JMEvXjyAeYBSQizsx0zOLO7zPQKYcnf+Bl+gVe9eBU8iFc/wNlNDkZtaCiqurua8mMpDLruqzMzOze/sLi0XFhZXVvfKG5uNUyUaA51HslIt3xmQIoQ6ihQQivWwJQvoenfnWV68x60EVF4g8MYuordhqIvOENL9YonnfxGqiEY0Ri0SjAaQKCZpJIhCg5UqFiCghDzFVreuyxfHOxRFAoOesWSW3Hzon+BNwElMqmrXvG9E0Q8yc5xyYxpe26M3ZRpayVhVOgkBmLG79gttC0MmQLTTfMfR3TfMgHtR9p2iDRnf26kTBkzVL6dVAwH5reWkf9p7QT7x91UhHGCEPKxUT+RFCOahUYDoYGjHFrAuBb2V8oHTDOONtopl8Bkr41sLt7vFP6CRrXiHVaq19VS7XSS0BLZIbukTDxyRGrknFyROuHkgTyRZ/LiPDpvzofzOR6dcSY722SqnK9vRg+uVQ==</latexit>
inverse likelihood (i.e., cost) of pixel i taking label xi (obtained from a CNN)
p (xi , xj ) ! pairwise energy component
<latexit sha1_base64="l199acV8JQES7LrlhMj2m3QHndY=">AAACOXicbVBBSxtBGJ21rbVa7VqPvQwNQgol7AZpexGCvfSoYDSQDcvs5EsyZnZmmPm2SVjya/wZ/gKv7cljDwXptX/A2ZiDRh8MPN77Ht83LzNSOIyim2DtxctX66833mxuvd3eeRfuvj9zurAc2lxLbTsZcyCFgjYKlNAxFlieSTjPxt8r//wnWCe0OsWZgV7OhkoMBGfopTQ8TIwTqalPU/GZTtOLTzSxYjhCZq2e0ARhiqVhwk6EAwoK7HBGuc6NVqBwnoa1qBEtQJ+SeElqZInjNPyb9DUvch/mkjnXjSODvZJZFFzCfDMpHBjGx2wIXU8Vy8H1ysU353TfK3060NY/hXShPkyULHdulmd+Mmc4cqteJT7ndQscfOuVQpkCQfH7RYNCUtS06oz2hQWOcuYJ41b4WykfMcs4+mYfbem76rSql3i1hafkrNmIvzSaJwe11tGyoQ3ygXwkdRKTr6RFfpBj0iacXJJr8ov8Dq6CP8Ft8O9+dC1YZvbIIwT/7wAjda8V</latexit>
1 ⇥ 1 conv
<latexit sha1_base64="A22DEUAaESGW+s1fnqRQJG7kInU=">AAACGXicbVDLSsNAFJ34rPUVdSnCYCu4KkkX6rLoxmUF+4AmlMlk0g6dTMLMpFBCVn6GX+BWv8CduHXlB/gfTtIsbOuBgcM59849HC9mVCrL+jbW1jc2t7YrO9Xdvf2DQ/PouCujRGDSwRGLRN9DkjDKSUdRxUg/FgSFHiM9b3KX+70pEZJG/FHNYuKGaMRpQDFSWhqaZ07xRyqIn8G6DR1FQyKhXYc44tOhWbMaVgG4SuyS1ECJ9tD8cfwIJyHhCjMk5cC2YuWmSCiKGcmqTiJJjPAEjchAU470MTctImTwQis+DCKhH1ewUP9upCiUchZ6ejJEaiyXvVz8zxskKrhxU8rjRBGO54eChEEVwbwT6FNBsGIzTRAWVGeFeIwEwko3t3DFl3m0TPdiL7ewSrrNhn3VaD40a63bsqEKOAXn4BLY4Bq0wD1ogw7A4Am8gFfwZjwb78aH8TkfXTPKnROwAOPrF9dHoEs=</latexit>
1 ⇥ 1 conv
<latexit sha1_base64="A22DEUAaESGW+s1fnqRQJG7kInU=">AAACGXicbVDLSsNAFJ34rPUVdSnCYCu4KkkX6rLoxmUF+4AmlMlk0g6dTMLMpFBCVn6GX+BWv8CduHXlB/gfTtIsbOuBgcM59849HC9mVCrL+jbW1jc2t7YrO9Xdvf2DQ/PouCujRGDSwRGLRN9DkjDKSUdRxUg/FgSFHiM9b3KX+70pEZJG/FHNYuKGaMRpQDFSWhqaZ07xRyqIn8G6DR1FQyKhXYc44tOhWbMaVgG4SuyS1ECJ9tD8cfwIJyHhCjMk5cC2YuWmSCiKGcmqTiJJjPAEjchAU470MTctImTwQis+DCKhH1ewUP9upCiUchZ6ejJEaiyXvVz8zxskKrhxU8rjRBGO54eChEEVwbwT6FNBsGIzTRAWVGeFeIwEwko3t3DFl3m0TPdiL7ewSrrNhn3VaD40a63bsqEKOAXn4BLY4Bq0wD1ogw7A4Am8gFfwZjwb78aH8TkfXTPKnROwAOPrF9dHoEs=</latexit>
<latexit sha1_base64="zl0klpUc6+dY0eetcXTCF7qx+Qw=">AAACJ3icbZDNSgMxFIUz9f+/6tJNsAiKUGa6UJfFgrisxdpCO5RM5lZDM8mQZIQyzEP4GD6BW30Cd6JLF76HmbELWz0QOJx7b+7lC2LOtHHdD6c0N7+wuLS8srq2vrG5Vd7eudEyURTaVHKpugHRwJmAtmGGQzdWQKKAQycYNfJ65x6UZlJcm3EMfkRuBRsySoyNBuXjfvFHqiDMGlKELI8Jxy0iQhnhCwY81Piw0brQR4Nyxa26hfBf401MBU3UHJS/+qGkSQTCUE607nlubPyUKMMoh2y1n2iICR2RW+hZK0gE2k+LgzJ8YJMQD6WyTxhcpL8nUhJpPY4C2xkRc6dna3n4X62XmOGZnzIRJwYE/Vk0TDg2EueEcMgUUMPH1hCqLBCK6R1RhBrLcWpLqPPTMsvFm6Xw19zUqt5JtXZVq9TPJ4SW0R7aR4fIQ6eoji5RE7URRQ/oCT2jF+fReXXenPef1pIzmdlFU3I+vwHtMaak</latexit>
Conditional Random Fields encourage (CRFs) assigning similar labels to pixels with similar properties <latexit sha1_base64="LPOwKKwEXOlkKlKzZcGA2Iz7gVg=">AAACQHicbVC7SgNBFJ31GeNr1dJmMAhWYTeFWga1sIxgHpCEMDu5SYbMzg4zs2pY8kN+hl9gG3vBTmytnE0WMYkXLhzOuc8TSM608byJs7K6tr6xmdvKb+/s7u27B4c1HcWKQpVGPFKNgGjgTEDVMMOhIRWQMOBQD4bXqV5/AKVZJO7NSEI7JH3BeowSY6mOewOC2lmkD5hozawm+lizkHGiMCcBcI1NhCV7StEjM4NfVapIgjIMdMcteEVvGngZ+BkooCwqHfe91Y1oHIIwlNu9Td+Tpp0QO41yGOdbsQZJ6NCe1bRQkBB0O5l+O8anluniXqRsCoOn7N+OhIRaj8LAVobEDPSilpL/ac3Y9C7bCRMyNtaV2aJezNP/U+twlymgho8sIFQxeyumA6IINdbguS1dnZ42tr74iy4sg1qp6J8XS3elQvkqcyiHjtEJOkM+ukBldIsqqIooekavaILenBfnw/l0vmalK07Wc4Tmwvn+AUeUshc=</latexit>
! derived from image features such as spatial location and RGB values
<latexit sha1_base64="zGb1+YWZifkt7ESdrO8mFieZhp4=">AAACUnicbVLLbhMxFHVSHn1B07Ls5oqoEqtopkLQZVUWsCyIpJWSKLrjudNY9dgj+zoQjfJn/Qw2bCt1BV/ACk+SBW25kuWjc1/HR84qrTwnyc9We+PJ02fPN7e2d3ZfvNzr7B8MvA1OUl9abd1lhp60MtRnxZouK0dYZpoususPTf5iRs4ra77yvKJxiVdGFUoiR2rSGYycupoyOme/wYjpO9c5OTWjHApnS1CxnqAg5ODIgw9yChjvKvajBm1XgwBNDl8+nsEMdSC/mHS6SS9ZBjwG6Rp0xTrOJ527UW5lKMmw1Oj9ME0qHtfoWElNi+1R8FShvI5qhhEaLMmP6+X7F3AUmajXungMw5L9t6PG0vt5mcXKEnnqH+Ya8n+5YeDiZFwrUwUmI1eLiqCBLTRmQq4cSdbzCFA6FbWCnKJDydHye1ty30hrfEkfuvAYDI576bve8ee33dOztUOb4lC8Fm9EKt6LU/FJnIu+kOJG3Ipf4nfrR+tPO/6SVWm7te55Je5Fe/cvmWe2AQ==</latexit>
<latexit sha1_base64="zovy2/90hytfkvaay89QLR0+Pa4=">AAACKHicbVDLTsJAFJ3iC/FVdelmIphgYkjLQl0S3bjERJAECJlOLzBhOm1mpkbS8BN+hl/gVr/AnWFr4n84hS4EvMlNTs65z+NFnCntOFMrt7a+sbmV3y7s7O7tH9iHR00VxpJCg4Y8lC2PKOBMQEMzzaEVSSCBx+HRG92m+uMTSMVC8aDHEXQDMhCszyjRhurZF2UJpkGB0ArrIWBOPOCYKMUGAnysQxyxZ8OUWOm8ZxedijMLvArcDBRRFvWe/dPxQxoHZjrlZmbbdSLdTYjUjHKYFDqxgojQERlA20BBAlDdZPbVBJ8Zxsf9UJoUGs/Yvx0JCZQaB56pDIgeqmUtJf/T2rHuX3cTJqJYg6DzRf2Yp7+mFmGfSaCajw0gVDJzK6ZDIgnVxsiFLb5KT5sYX9xlF1ZBs1pxLyvV+2qxdpM5lEcn6BSVkYuuUA3doTpqIIpe0Bt6Rx/Wq/VpfVnTeWnOynqO0UJY37+17KZ2</latexit>
µ ! label compatibility
<latexit sha1_base64="2zByXwBYDB4qCDzsfgvxqwy6UNw=">AAACJnicbVDLSgMxFM34tr6qLt0EiyAuyoyIuiy6calgW6FTSia9bUOTyZDcUcvQf/Az/AK3+gXuRNy58T/MtF2o9UDgcM653JsTJVJY9P0Pb2Z2bn5hcWm5sLK6tr5R3NyqWZ0aDlWupTY3EbMgRQxVFCjhJjHAVCShHvXPc79+C8YKHV/jIIGmYt1YdARn6KRW8SBUKQ2N6PaQGaPvaIhwj5lkEUjKtUpcLhJS4GDYKpb8sj8CnSbBhJTIBJet4lfY1jxVECOXzNpG4CfYzJhBwSUMC2FqIWG8z7rQcDRmCmwzG/1pSPec0qYdbdyLkY7UnxMZU9YOVOSSimHP/vVy8T+vkWLntJmJOEkRYj5e1EklRU3zgmhbGOAoB44wboS7lfIeM4yjq/HXlrbNT8t7Cf62ME1qh+XguHx4dVSqnE0aWiI7ZJfsk4CckAq5IJekSjh5IE/kmbx4j96r9+a9j6Mz3mRmm/yC9/kNR0KnmQ==</latexit>
X = (X1 , X2 , . . . , XN ) ! vector
<latexit sha1_base64="+zWKtL8OJVOKhTJQPnfuFMXSWwY=">AAACMXicbVDLSsNAFJ34tr6qLt0MFqFCKUnxtRFEN65EwdpAU8JkMm0HJ5kwc6OW0C/xM/wCt/oF7kRw5U84abuwrQcGzpxzL/feEySCa7DtD2tmdm5+YXFpubCyura+UdzcutMyVZTVqRRSuQHRTPCY1YGDYG6iGIkCwRrB/UXuNx6Y0lzGt9BLWCsinZi3OSVgJL946OJTXHZ9p4Jdv1bBnggl6PxztY89xTtdIErJR+wBe4LsgVGQqu8XS3bVHgBPE2dESmiEa7/47YWSphGLgQqiddOxE2hlRAGngvULXqpZQug96bCmoTGJmG5lg/P6eM8oIW5LZV4MeKD+7chIpHUvCkxlRKCrJ71c/M9rptA+aWU8TlJgMR0OaqcCg8R5VjjkytwreoYQqrjZFdMuUYSCSXRsSqjz1fJcnMkUpsldreocVWs3B6Wz81FCS2gH7aIyctAxOkOX6BrVEUXP6BW9oXfrxfqwPq2vYemMNerZRmOwfn4B7jipJA==</latexit>
x
I ! global observation (image)
<latexit sha1_base64="E8KcthbGqKbybJZX+MYuKn7CbJ8=">AAACK3icbVDLTgIxFO3gG1+oSzeNxAQ3OEOMujS60Z0mgiRASKdcoKEznbR3UDLhM/wMv8CtfoErjVv+ww6yEPAkTU7Oua8eP5LCoOt+OpmFxaXlldW17PrG5tZ2bme3YlSsOZS5kkpXfWZAihDKKFBCNdLAAl/Cg9+7Sv2HPmgjVHiPgwgaAeuEoi04Qys1c8c3tK5Fp4tMa/VI6whPmHSk8pmkyjeg++NCWhC2EY6GzVzeLbpj0HniTUieTHDbzI3qLcXjAELkkhlT89wIGwnTKLiEYbYeG4gY79npNUtDFoBpJOOPDemhVVq0rbR9IdKx+rcjYYExg8C3lQHDrpn1UvE/rxZj+7yRiDCKEUL+u6gdS4qKpinRltDAUQ4sYVwLeyvlXaYZR5vl1JaWSU9Lc/FmU5gnlVLROy2W7k7yF5eThFbJPjkgBeKRM3JBrsktKRNOnskreSPvzovz4Xw537+lGWfSs0em4Ix+AB2jqO0=</latexit>
Y <latexit sha1_base64="eqKzL6md3W5uNblSovtUk6LHB88=">AAACIHicbVDLSsNAFJ3UV62vqEs3g1VoNyUpom6EohuXLdgHNCFMJpN26OTBzERaQn/Az/AL3OoXuBOXuvc/nLRZ2NYDA+eecy/3znFjRoU0jC+tsLa+sblV3C7t7O7tH+iHRx0RJRyTNo5YxHsuEoTRkLQllYz0Yk5Q4DLSdUd3md99JFzQKHyQk5jYARqE1KcYSSU5+lmzMq5CC8Uxj8awlRU30FKF51DYcmhl7NCqo5eNmjEDXCVmTsogR9PRfywvwklAQokZEqJvGrG0U8QlxYxMS1YiSIzwCA1IX9EQBUTY6ew3U3iuFA/6EVcvlHCm/p1IUSDEJHBVZ4DkUCx7mfif10+kf22nNIwTSUI8X+QnDMoIZtFAj3KCJZsogjCn6laIh4gjLFWAC1s8kZ02VbmYyymskk69Zl7W6q2LcuM2T6gITsApqAATXIEGuAdN0AYYPIEX8AretGftXfvQPuetBS2fOQYL0L5/AW5nohA=</latexit>
1
<latexit sha1_base64="MUR/GxgJSlN0whdGveSNZnIroJQ=">AAACJXicbVDLSgMxFM3UV62vqks3wSJMF5aZIupGKIpgdxXsAztDyaSZNjTzIMlIyzjf4Gf4BW71C9yJ4MqV/2Gm7cK2Hgg5nHMv997jhIwKaRhfWmZpeWV1Lbue29jc2t7J7+41RBBxTOo4YAFvOUgQRn1Sl1Qy0go5QZ7DSNMZXKV+84FwQQP/To5CYnuo51OXYiSV1MkXa3oLXsDhY7WoPsvlCMdmEt/r1WICLTIM9eNrPXWLnXzBKBljwEViTkkBTFHr5H+sboAjj/gSMyRE2zRCaceIS4oZSXJWJEiI8AD1SFtRH3lE2PH4pAQeKaUL3YCr50s4Vv92xMgTYuQ5qtJDsi/mvVT8z2tH0j23Y+qHkSQ+ngxyIwZlANN8YJdygiUbKYIwp2pXiPtIxSJVijNTuiJdLVG5mPMpLJJGuWSelsq3J4XK5TShLDgAh0AHJjgDFXADaqAOMHgCL+AVvGnP2rv2oX1OSjPatGcfzED7/gWcqqMQ</latexit>
<latexit sha1_base64="rGL99I6hUUW6TTf6xbQGD4K6s5o=">AAACJ3icbVDLSgNBEJyNrxhfUY9eRoMgCGE34OMY9OIxgnlAEsLsbCcZMju7zPQKIeQj/Ay/wKt+gTfRowf/w8kmB5NYMFBd1U33lB9LYdB1v5zMyura+kZ2M7e1vbO7l98/qJko0RyqPJKRbvjMgBQKqihQQiPWwEJfQt0f3E78+iNoIyL1gMMY2iHrKdEVnKGVOvnzCyoQdFoZKhRFzYQSqkeZCqjnphIYtMpxJ19wi24Kuky8GSmQGSqd/E8riHgSgkIumTFNz42xPWIaBZcwzrUSAzHjA9aDpqWKhWDao/RTY3pqlYB2I22fQpqqfydGLDRmGPq2M2TYN4veRPzPaybYvW6PhIoTBMWni7qJpBjRSUI0EBo4yqEljGthb6W8zzTjNqf5LYGZnDa2uXiLKSyTWqnoXRZL96VC+WaWUJYckRNyRjxyRcrkjlRIlXDyRF7IK3lznp1358P5nLZmnNnMIZmD8/0LkNyl3w==</latexit>
N
<latexit sha1_base64="ATuDd0VjNHuUHpyCc+U3m2vlNkA=">AAACR3icbVBBSxtBGJ2NVaNWG+3Ry9BYiJewK1I9hhahRREFEwPZGGYn324GZ2eWmW/bhDU/qj+jv6B404Nnb+LR3ZiDRh8MPN77Pr43L0iksOi6105p7sP8wmJ5aXnl4+rap8r6Rsvq1HBoci21aQfMghQKmihQQjsxwOJAwnlw+aPwz3+DsUKrMxwl0I1ZpEQoOMNc6lUOD2q/robb1DciGiAzRv+hPsIQM1BgohHVIcUBUK5VKKLUTNbo1pD6QlE/ZjjgTGZH44vjrXGvUnXr7gT0LfGmpEqmOOlV7vy+5mkMCrlk1nY8N8FuxgwKLmG87KcWEsYvWQSdnCoWg+1mk0+P6ddc6dNQm/wppBP15UbGYmtHcZBPFjHtrFeI73mdFMP9biZUkiIo/nwoTCVFTYsGaV8Y4ChHOWHciDwr5QNmGMe851dX+raIVvTizbbwlrR26t63+s7pbrXxfdpQmWySL6RGPLJHGuQnOSFNwslf8p/ckFvnn3PvPDiPz6MlZ7rzmbxCyXkCKkiy0A==</latexit>
! CRF parameters
<latexit sha1_base64="bT+y3GUAmMOH+t2yON9F7dCHAa8=">AAACHXicbVDLSgMxFM3UV62vUZduokVwVWaKqMtiQVxWsQ/oDCWTpm1o5kFyRy1D136GX+BWv8CduBU/wP8w087Cth4IHM65l3NzvEhwBZb1beSWlldW1/LrhY3Nre0dc3evocJYUlanoQhlyyOKCR6wOnAQrBVJRnxPsKY3rKZ+855JxcPgDkYRc33SD3iPUwJa6piHjuT9ARApwwfsAHuEpHp7hSMiic9AL447ZtEqWRPgRWJnpIgy1Drmj9MNaeyzAKggSrVtKwI3IRI4FWxccGLFIkKHpM/amgY6SLnJ5CtjfKyVLu6FUr8A8ET9u5EQX6mR7+lJn8BAzXup+J/XjqF34SY8iGJgAZ0G9WKBIcRpL7jLJaMgRpoQKrm+FdOBboGmJcykdFV6WtqLPd/CImmUS/ZZqXxzWqxcZg3l0QE6QifIRueogq5RDdURRU/oBb2iN+PZeDc+jM/paM7IdvbRDIyvX3B3o3Y=</latexit>
Zheng, Shuai, et al. "Conditional random elds as recurrent neural networks." Proceedings of the IEEE international conference on computer vision. 2015.
fi
Multi-scale Context Aggregation by Dilated Convolutions
YouTube Playlist
F : Z2 ! R Size of the receptive field of each element in Fi+1 is (2i+2 1) ⇥ (2i+2 1).
discrete function
⌦r = [ r, r]2 \ Z2
k : ⌦r ! R
discrete filter of size (2r + 1)2
X
(F ⇤ k)(p) = F (p t)k(t)
t2⌦r
discrete convolution operator
X
(F ⇤` k)(p) = F (p `t)k(t) Context network architecture
t2⌦r
`-dilated convolution ReLU
algorithme à trous
(an algorithm for
wavelet decomposition)
uses dilated convolutions k b (t, a) = 1[t=0] 1[a=b] ! identity initialization
a ! index of the input feature map
Dilated convolutions support
exponentially expanding
b ! index of the output feature map
receptive fields without losing
! identity initialization (Large)
resolution or coverage.
Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. "Segnet: A deep convolutional encoder-decoder architecture for image segmentation." IEEE transactions on
pattern analysis and machine intelligence 39.12 (2017): 2481-2495.
Pyramid Scene Parsing Network
YouTube Playlist
Cityscapes dataset
Inconspicuous
Classes
Zhao, Hengshuang, et al. "Pyramid scene parsing network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
Rethinking Atrous Convolution for
Semantic Image Segmentation YouTube Video
Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation." arXiv preprint arXiv:1706.05587 (2017).
What Uncertainties Do We Need in Bayesian
Deep Learning for Computer Vision? YouTube Video
! dropout distribution
p ! dropout probability
✓ ! parameters of the simple distribution (weight matrices)
Heteroscedastic Aleatoric Uncertainty (Regression)
ci
W 1 2 1 2
log p(yi |f (xi )) / ky i b
y i k + log bi
2bi2 2
2 ci
W
yi , bi ] = f (xi )
[b learned loss attenuation
! predictive variance
Aleatoric Uncertainty: noise inherent in the obsevations
– homoscedastic: constant for di↵erent inputs | {z }
– heteroscedastic: depends on the inputs to the model predictive mean
Epistemic (Model) Uncertainty: can be explained away given enough data
Epistemic Uncertainty in Bayesian Deep Learning Heteroscedastic Aleatoric Uncertainty (Classification)
! prior distribution over the weights of neural network Wci
p(yi |f (xi )) = yiT softmax(b yi + bi ✏i ), ✏i ⇠ N (0, I)
! random output of a Bayesian Neural Network T
1 X
! model likelihood p= yt + bt ✏t ), ✏t ⇠ N (0, I)
softmax(b
! dataset T t=1
! posterior over the weights (Bayesian inference) ! uncertainty of probability vector p
Each datapoint and each pixel will have its own prediction and uncertainty!
Kendall, Alex, and Yarin Gal. "What uncertainties do we need in bayesian deep learning for computer vision?." arXiv preprint arXiv:1703.04977 (2017).
Re neNet: Multi-Path Re nement Networks
for High-Resolution Semantic Segmentation YouTube Video
DeepLabv3+
Chen, Liang-Chieh, et al. "Encoder-decoder with atrous separable convolution for semantic image segmentation." Proceedings of the European conference on
computer vision (ECCV). 2018.
Dual Attention Network for Scene Segmentation
YouTube Video
Fully-Convolutional Network (FCN) based architectures: Limited receptive field! SEgmentation TRansformer (SETR)
Benefits of adding more layers would diminish rapidly once reaching certain depths! Each Transformer layer has a global receptive field.
<latexit sha1_base64="m7D/264NnPg6phbFylIlnK1VwRw=">AAACWHicbZDLbhNBEEXbwyOJeZmwZNNgIbGyZhIJsozChmWQcBLJtqya7hq7lH6MumsI1sgfx2fAB8AW/oAexwuSUFJJV7fqqkqnrA1FzvPvveze/QcPd3b3+o8eP3n6bPB8/yz6JigcK298uCghoiGHYyY2eFEHBFsaPC8vP3Tz8y8YInn3mVc1ziwsHFWkgJM1H0ymjF+5PUGHFXGUvpKgNbmFtD6gNLBKYXnlG6OlJkuO4lIGqEmblfROoUzX1LILKAwM5KTGmpfx1bo/HwzzUb4peVcUWzEU2zqdD35OtVeNRcfKQIyTIq951kJgUgbX/WkTsQZ1CQucJOnAYpy1Gwhr+SY5WlY+pHYsN+6/iRZsjCtbpk0L6cHbs87832zScHU0a8nVDaNT14eqxkj2siOasARUnHBoAhUo/SrVEgIoTuhuXNGxe20DpriN4a44OxgV70aHnw6GxydbRLvipXgt3opCvBfH4qM4FWOhxDfxS/wWf3o/MpHtZHvXq1lvm3khblS2/xf52Lbc</latexit> <latexit sha1_base64="pdODa0JlDtA8Pu/l58z6FegKonw=">AAACMXicbVDLSsNAFJ34rPUVdelmsAiuQqKiLosiuKxgW6GGcjO5aYdOJmFmUiilP+Jv+ANu9Q+6Exdu/AmntQurXpjhcM693HNPlAuuje+PnYXFpeWV1dJaeX1jc2vb3dlt6KxQDOssE5m6j0Cj4BLrhhuB97lCSCOBzah3NdGbfVSaZ/LODHIMU+hInnAGxlJt9/QaWJfeKZA6yVSKigoY2L8LmgLtiCwCQRUyzA3vI004ithruxXf86dF/4JgBipkVrW2+/EQZ6xIURomQOtW4OcmHIIynAkclR8KjTmwHnSwZaGEFHU4nF43ooeWial1Z580dMr+nBhCqvUgjWxnCqarf2sT8j+tVZjkIhxymRcGJftelBSCmoxOoqIxt4cbMbAAmOLWK2VdUMCMDXRuS6wn1kZlG0zwO4a/oHHsBWfeye1xpXo5i6hE9skBOSIBOSdVckNqpE4YeSTP5IW8Ok/O2Hlz3r9bF5zZzB6ZK+fzC21zqj4=</latexit>
<latexit sha1_base64="C/o1WICGdSbr5MoBSnnKjoum6k8=">AAACGXicbVDLTgIxFO3gC/E16lIXjcQEN2QGE3VJdONyTBwgAUI65QINnc6k7ZAQwsbf8Afc6h+4M25d+QN+hx2YhYAnaXJyzr05tyeIOVPacb6t3Nr6xuZWfruws7u3f2AfHtVUlEgKPo14JBsBUcCZAF8zzaERSyBhwKEeDO9Svz4CqVgkHvU4hnZI+oL1GCXaSB371JNRX4JSbATY9xQJTaro45Lnexcdu+iUnRnwKnEzUkQZvI790+pGNAlBaMqJUk3XiXV7QqRmlMO00EoUxIQOSR+ahgoSgmpPZr+Y4nOjdHEvkuYJjWfq340JCZUah4GZDIkeqGUvFf/zmonu3bQnTMSJBkHnQb2EYx3htBLcZRKo5mNDCJXM3IrpgEhCtSluIaWr0tOmBVOMu1zDKqlVyu5V+fKhUqzeZhXl0Qk6QyXkomtURffIQz6i6Am9oFf0Zj1b79aH9TkfzVnZzjFagPX1C2TfoGU=</latexit>
<latexit sha1_base64="kTprED78OXyaleIVdYL5jOmYCHw=">AAACInicbVDLSsNAFJ34tr6iLt0MFkEXlqSCuqy6caGgYFuhhjKZ3qRDJw9mbgol9Av8DX/Arf6BO3EluPY7nNYsrHpg4HDOudw7x0+l0Og479bU9Mzs3PzCYmlpeWV1zV7faOgkUxzqPJGJuvWZBiliqKNACbepAhb5Epp+72zkN/ugtEjiGxyk4EUsjEUgOEMjte2dy0yi2L+APkgaAMNMAT0JQwXhOEF3Ly9O9tp22ak4Y9C/xC1ImRS4atufd52EZxHEyCXTuuU6KXo5Uyi4hGHpLtOQMt5jIbQMjVkE2svH3xnSHaN0aJAo82KkY/XnRM4irQeRb5IRw67+7Y3E/7xWhsGxl4s4zRBi/r0oyCTFhI66oR2hgKMcGMK4EuZWyrtMMY6mwYktHT06bVgyxbi/a/hLGtWKe1g5uK6Wa6dFRQtki2yTXeKSI1Ij5+SK1Akn9+SRPJFn68F6sV6tt+/olFXMbJIJWB9fxw2jrw==</latexit>
Zheng, Sixiao, et al. "Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers." Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. 2021.
Questions?
YouTube Playlist