On Approximate Computing Techniques
On Approximate Computing Techniques
On Approximate Computing Techniques
Computing
Techniques
Sparsh Mittal
2
Introduction to AC
3
Introduction to AC
4
Terminology
5
Scope and motivation for AC
Let us see where does scope for AC arise and
what are motivations for using it?
6
1. Inherent scope/need for AC
7
2. Error-resilience of programs and users
8
2. Error-resilience of programs and users
9
3. Efficiency optimization and quality configurability
10
Example of quality-effort tradeoff
Maximum % errors in
neural approximators
for different nodes in
hidden layer
Li et al. TCAD'15
11
Challenges of AC
12
Limited application domain and gains of AC
13
Correctness issues
14
Tradeoff between quality and energy saving
17
Large overheads
18
ACT must work across the stack
User
Application
Algorithm
Language
Compiler
Architecture
Circuits
Devices
20
1. Error-injection (automatic)
Introduce errors in a
variable
QoR
Acceptable Variable is approximable
?
Variable is non-approximable
22
Monitoring and ensuring
output quality
23
Output quality monitoring
24
Strategies for monitoring and ensuring QoR
25
Some quality metrics used for different
approximable applications/kernels
26
After identifying approximable portions,
next step is
27
1. Using type qualifiers
@approx int a = ...;
✓
✗Approximate
int p; // precise by default
p = a; // not allowed (illegal)
a = p; // allowed Precise
Approximation Strategies
30
Some approximation strategies
32
Load-value approximation (LVA)
Core
Main
memory 3. Get K_exact
33
Loop perforation
34
Precision scaling (changing bit-width)
35
Precision Scaling
Use of sampling in a
MapReduce job. Only 4
map tasks (1, 2, 4, and 6)
are executed, processing
10 input data items.
Goiri et al. ASPLOS’15
37
Algorithm-level approximation strategies
38
Creating approximate program versions
Iteration i+1
Time
Iteration i+2 Iteration Iteration
Iteration i+2k
i+k+1 i+k+2
Iteration i+3
39
Scalable-effort design
QoR check
Traditional approach
Approx accelerator k
Input tasks
QoR check
Precise accelerator
Precise accelerator
Commit Commit
41
DRAM refresh rate reduction
Critical/non-critical
Programmer
declaration (object level)
Critical/non-critical
Runtime system
allocation (page level)
OS Virtual to physical
address mapping
Critical Non-critical
pages pages
Normal Low
Approximate DRAM
refresh refresh
Stencil pattern
44
Inexact reads/writes in NVMs
46
Inexact reads/writes in MLC NVMs
49
Neural-network based acceleration
50
Reducing branch divergence in SIMD
Branch divergence: threads in SIMD diverging due to if/else Grigorian et al. TACO'15, Sartori et al. TMM’13
51
Neural-network based acceleration
Approximable
portion
Program
1. Find an approximable program portion
2. Train an NN to mimic this
3. Implement NN on an neural processing unit (NPU)
Credit: Sampson et al.
52
Neural Algorithmic Transformation
Common Neural
Intermediate Representation
Representation
CPU NPU
Digital Analog
CPU GPU FPGA FPAA
ASIC ASIC
57
References
58