Adaptive Convolutional Neural Network Loop Filter (ACNNLF)

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 10

Adaptive Convolutional Neural Network

Loop Filter (ACNNLF)


Hujun Yin, Rongzhen Yang, Shoujiang Ma, Xiaoran Fang, Yue Yu

January 11, 2019


Adaptive Convolutional Neural Network Loop Filter
Input + Trans./ Trans. / Quant. Coeff.
frame Quant.
-
Split into
CTUs
Intra Inv. Trans. Entropy
Predictor & Quant. Coder
+ Bitstreams
Recon.
Inter +
frame
Predictor Prediction Data

Predictor

ACNNLF ALF SAO DBF


Filter Ctrl. Data

In-loop filters
• ACNNLF as final stage of frame reconstruction in decoding process
• Simplified 2-layer CNN
• On-line training
• Multiple filters 2
Simple 2 Layer CNN

Stacked 6 channel input combined Luma and Luma


Chroma
• Use both luma and chroma as input

• Input & Output CNN Channel Block Size: N = 16

• 2 layer CNN
• Layer 1: 1x1 with ReLu
• Layer 2: 3x3, no ReLu
• Channels: M=16
• 3 sets of ACNNLFs with total 3,282 parameters Chroma
• 692 (672w+20b) parameters for each luma filter
• 402 (384w+18b) parameters for each chroma filter
Online Training (RA) – Data Collection

• The training data comes directly from the video sequence for compression
• The training is done for each random access segment (RAS) using 8 frames of video data
• 1st frame of RAS and the previous 7 frames (in encoding sequence order) in previous RAS are used
• Both the luma and chroma data of the original and re-constructed frames (before ACNNLF processing)
are collected
• Each frame is partitioned into small image blocks of size 32x32 for training
• Can be applied to other types of video sequences
4
Training Procedure for Multiple CNNs

• Train the first ACNNLF with full training data set


• Apply the first trained ACNNLF to partition the training set to ACNNLF#1 gain set and the loss set
• Use the loss set to train ACNNLF#2
• Apply two ACNNLFs, re-partition the training data set to ACNNLF#1 gain set, ACNNLF#2 gain set and
loss set
• Use the new loss set to train ACNNLF#3
• Apply the 3 trained ACNNLFs. Partition the data set into 4 subsets: subset with highest gain for
ACNNLF#1, subset with highest gain for ACNNLF #2, subset with highest gain for ACNNLF#3 and
subset with no gain for any ACNNLF
• Further train each ACNNLF with the subset on which it has the highest gain 5
Syntax Design
• In SPS
• ACNNLF enable flag
• ACNNLF block size (default 32)
• In Slice header (I frame only)
• ACNNLF present flag
• ACNNLF weights – 3 set for luma, 3 set for chroma
• In CTU
• ACNNLF luma index (2bits)
• ACNNLF chroma index (2bits)

6
Experiment Setup - Training
Information in Training
Stage  
Basic
Settings:  
learning rate: 0.0055
optimizer: ADAM
Mandatory batch size: 128
epoch: 148*
loss function: L1
training GPU: GTX 1080 Ti
training time: 15 minutes*
framework: TensorFlow

• Online training based on video sequence itself


• QP values {22,27,32,37} are used
• Training time measured per RAS on A1/A2 video
sequences
7
Experiment Setup - Inference
Network Details

Total
Total FC Frame-
Conv. Param. Num GFLOPs Mem.P(MB) Mem.T(MB)
Layers work
Layers
 

TensorFlow 692x3 Multiplication


One module (Luma) : 264/pixel
2 0 0.0028 0.0448
ACNNLF called by 402x3 Add:
VTM3.0 (Chroma) 265.5/pixel

• 8 bit fixed point inference module implemented in


TensorFlow
• VTM3.0 calls TensorFlow module for interference

8
Experimental Results – RA sequences
Random Access Main 10

Y U V EncT* DecT*

Class A1 -2.37% -1.34% -2.77% 102% 529%

Class A2 -0.45% -10.92% -6.19% 99% 355%

Class B -0.49% -11.29% -10.73% 101% 384%

Class C 0.12% -3.31% -1.62% 98% 254%

Class E        

Overall -0.70% -7.10% -5.80% 100% 361%

• Overall gain Y: -0.70%; U: -7.10%; V: -5.80%


• Class C has slight loss of 0.12% due to overhead
• EncT and DecT need to be further optimized by taking training/inference into VTM software

9
Conclusion
• ACNNLF as final stage of frame reconstruction in decoding process
• Simplified 2-layer CNN – minimum deep neural network
• Fast on-line training – 15min/RAS
• Multiple filters – best of 3 for each CTU
• Lower decoding complexity

• Overall gain Y: -0.70%; U: -7.10%; V: -5.80%


• Propose further study in a Core Experiment

10

You might also like