Online Learning for Adaptive Optimization of
Heterogeneous SoCs
Ganapati
Bhat1 ,
Sumit K.
(Invited Paper)
Mandal1 , Ujjwal
Gupta2 , Umit Y. Ogras1
{gmbhat, skmandal, umit}@asu.edu,
[email protected]
1 School
of Electrical Computer and Energy Engineering, Arizona State University, Tempe, AZ, USA
2 Intel Corporation, Hillsboro, OR, USA
ABSTRACT
Energy efficiency and performance of heterogeneous multiprocessor systems-on-chip (SoC) depend critically on utilizing a diverse
set of processing elements and managing their power states dynamically. Dynamic resource management techniques typically rely on
power consumption and performance models to assess the impact of
dynamic decisions. Despite the importance of these decisions, many
existing approaches rely on fixed power and performance models
learned offline. This paper presents an online learning framework to
construct adaptive analytical models. We illustrate this framework
for modeling GPU frame processing time, GPU power consumption
and SoC power-temperature dynamics. Experiments on Intel Atom
E3826, Qualcomm Snapdragon 810, and Samsung Exynos 5422 SoCs
demonstrate that the proposed approach achieves less than 6% error
under dynamically varying workloads.
ACM Reference Format:
Ganapati Bhat, Sumit K. Mandal, Ujjwal Gupta, Umit Y. Ogras. 2018. Online
Learning for Adaptive Optimization of, Heterogeneous SoCs . In ICCAD
’18, November 2018, San Diego, CA, USA. https://doi.org/10.1145/3240765.
3243489
1
INTRODUCTION
Heterogeneous architectures are recognized as the primary instrument to bridge the energy efficiency of application-specific
hardware with the programmability of general-purpose processors [21, 32]. Indeed, integrating custom accelerators and generalpurpose cores delivers programmable SoCs with superior performance and significantly lower power footprint compared to homogeneous architectures. This capability has already been successfully
illustrated by mobile platforms, which are used by more than half of
the world’s population to run a large variety of apps, such as phone
call, video conferencing, navigation, and games [38]. Hence, heterogeneous architectures can enable new application domains ranging
from biomedical and environmental sensing to mobile applications
and all the way up to big data analytics [34].
Harvesting the full potential of heterogeneous SoCs is challenged
by the tension between energy efficiency and development cost.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from
[email protected].
ICCAD ’18, November 5–8, 2018, San Diego, CA, USA
© 2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-5950-4/18/11. . . $15.00
https://doi.org/10.1145/3240765.3243489
While form factors and specific system requirements vary, maximizing performance under tight energy and power-density constraints
is a common objective. However, application development, let alone
aggressive optimization, is notoriously known to be difficult and
time consuming when utilizing highly specialized accelerators. The
optimization problem is exacerbated by dynamic variations of application workloads and operating conditions [4, 6].
In practice, active applications are scheduled to the processing elements (PE) by the resource management techniques implemented
in OS kernels, as shown in Figure 1. At the same time, the sleep
and active power states of the PEs are managed by ‘the power management drivers [20]. These decisions are made in discrete time
intervals which are typically 50–100 ms long. Existing governors
implemented in commercial platforms typically rely on PE utilizations to make these decisions [24, 27]. However, utilization alone
does not reveal the sensitivity of the system power and performance
to the control knobs, such as the operating frequency and sleep
states. For example, we cannot quantify the change in the frame
processing time of a GPU as a function of the change in frequency
by only using the GPU utilization. In contrast, the sensitivity of
system power and performance to the control knobs, e.g., the partial
derivative of processing time to frequency, can help in determining the optimal actions, as illustrated in Figure 1. For instance, a
“Jacobian matrix” for heterogeneous systems can be constructed
by computing the sensitivity of system power and performance
to the frequency of individual resources. Then, these sensitivity
models can be used by adaptive resource and power management
techniques to achieve maximum energy efficiency.
Adaptive optimization techniques need to learn the sensitivity
models online, since the effectiveness of offline models is severely
limited. First, a significant amount of manual effort is required for
developing and maintaining offline models. Second, one cannot
account for emerging application workloads that are not available
at the time of the development. Even exploring different combinations of known applications at design time is costly. Finally, offline
models cannot capture runtime variations in workload and operating conditions, which are unknown at design time. Therefore, this
paper presents online techniques for learning models that can be
used by the dynamic resource and power management algorithms
to utilize the PEs in heterogeneous SoCs most effectively. The proposed framework is illustrated for modeling GPU frame processing
time, GPU power consumption and SoC power-temperature dynamics. The effectiveness of these models is demonstrated by running
dynamically varying workloads on Intel Atom E3826, Qualcomm
Snapdragon 810, and Samsung Exynos 5422 SoCs.
Figure 1: The use of the adaptive models (right-most block). Dynamic resource management techniques schedule applications
to PEs, and manage the power states periodically in discrete time intervals. Adaptive models guide these decisions by expressing the sensitivity of power, performance and temperature metrics to the control knobs. For instance, the first row of the
sensitivity matrix gives the partial derivative of frame processing time (t f ) with respect to the operating frequencies of the
PEs (f 1 , . . . , f N .)
This paper is a part of the ICCAD 2018 Special Session on "Managing Heterogeneous Many-cores for High-Performance and EnergyEfficiency". The other two papers of this special session are: “Dynamic Resource Management for Heterogeneous Many-Cores” [14]
and “Hybrid On-Chip Communication Architectures for Heterogeneous Manycore Systems” [18]. The rest of this paper is organized
as follows. Section 2 overviews the proposed online learning framework. Sections 3 and 4 illustrate this framework for GPU frame processing time and power consumption modeling, respectively. Section 5 presents an online analysis technique for power-temperature
dynamics using a multi-input multi-output model. Finally, Section 6
concludes the paper.
2
ADAPTIVE MODELING FRAMEWORK
The quintessential question in dynamic resource and power management is to quantify how the control knobs affect the metric
of interest. For instance, when the operating frequency of a PE is
increased, it is well known that the execution time will decrease,
while the power consumption becomes larger. However, it is hard
to take an action, unless we can accurately predict by how much the
execution time, performance, as well as other dependent metrics,
such as energy and temperature, will change. Note that, these quantities can be measured at runtime through sensors (e.g., current and
temperature sensors) or OS/Firmware instrumentation (e.g., power
meters and performance monitors). However, measurements at a
given configuration are not sufficient to take dynamic resource and
power management actions. We still need to predict the impact of
a potential action, such as increasing the CPU operating frequency,
before committing to it.
The core of the proposed adaptive modeling framework is the
analytical models for system performance, power consumption and
temperature, as shown in Figure 2. The analytical models express
these metrics in terms of the hardware configuration and the states
of the PEs. Hardware configuration includes the set of active processing elements and their operating frequencies. The definition
of the state depends on the PE. For example, a CPU core can be
characterized by the number instructions retired, memory accesses,
cache hits, and active cycles in a given amount of time. Similarly,
state of the GPU can include number of frames processed, pixels
shaded, and utilization. These quantities can be obtained using performance monitors, such as SimplePerf [9], and instrumenting the
target OS/Firmware. Then, they are used to process features which
are fed to the analytical models. Feature generation can be as simple
as normalizing the measured values within a given range, arithmetic operations (e.g., normalizing with number of instructions), or
more complex transformations. At the end, the analytical models
produce system performance, power consumption and temperature
estimates, as illustrated in Figure 2.
The analytical models can be used for “what/if” analysis by dynamic optimization techniques at runtime. For example, the impact
of increasing (or decreasing) the operating frequency of one of
the PEs can be predicted without actually changing the frequency.
However, it is hard to maintain the accuracy of open loop models
designed offline, as the workload changes at runtime. Therefore,
we propose to employ a closed loop approach that continuously
improves the models by taking advantage of the abundant data
collected at runtime. More precisely, the quantities, such as processing time, predicted at time k (tF (k + 1)) are compared to the
actual values in the next interval. Then, the error between the predicted and measured values is used to improve the model using
adaptive algorithms, such as recursive least square estimation [35],
as illustrated in Figure 2. In this way, we maintain the accuracy by
adapting the analytical models to dynamically varying workloads.
This approach is broadly applicable to power consumption and
performance modeling. For illustration purposes, we describe its
application to GPU frame processing time and power consumption
in the following two sections.
3
ADAPTIVE GPU PERFORMANCE MODEL
Increased demand of graphics applications in mobile systems has
led to tight integration of GPUs in SoCs [26]. Since the maximum
achievable frame rate and power consumption depend critically
on the GPU frequency, dynamic power management algorithms
have to assess the performance sensitivity to the GPU frequency
Figure 2: Overview of the proposed online learning framework. Raw performance monitors, such as PE utilizations, frequencies and memory access statistics, are collected through OS and firmware instrumentation. Features generated from this data
are used to predict performance metrics, such as processing time, power consumption and temperature. Then, measured values
are used to compute prediction errors, which are used to correct the model coefficients.
accurately. To address this need, a number of performance models
have been proposed [7, 8, 19, 31]. However, these models do not
generalize well to a larger set of workloads due to offline training
and coarse-grain inputs, such as utilization. In contrast, the online
learning framework presented in Section 2 can be used to construct
a lightweight adaptive runtime GPU performance model with finegrain inputs, as described next.
3.2 Online Learning for Frame Processing Time
Parameters a ∈ RN +1 can be learned online using a variety of algorithms, such as recursive least squares (RLS) or least mean squares
(LMS) [35]. For example, update equations for the information form
of RLS with exponential forgetting can be written as:
Correlation Matrix Update : Rk = λRk −1 + hk hTk
3.1 Model Templates for Frame Processing Time
t F,k ( fk , xk ( fk )) = t F,k −1 ( fk−1 , xk ( fk −1 )) + ∆t F,k ( fk , xk ( fk ))
(1)
where ∆t F,k ( fk , xk ( fk )) is the change in frame processing time due
to frequency and workload. Thus, we can find the frame processing
time in interval k by measuring the previous value at the end of
each interval and modeling ∆t F,k ( fk , xk ( fk )). The change in frame
processing time can be expressed as:
N
f
∆t F,k ( fk , xk ( fk )) ≈a 0t F,k −1 k−1 − 1 + ai ∆x i,k ( fk )
(2)
fk
i=1
where a 0 coefficient denotes the sensitivity to the GPU frequency,
and a 1 , . . . , a N are sensitivities to performance counters x i , . . . , x N ,
respectively. Equation 2 enables us to determine the frequency sensitivity that can be used by dynamic power management algorithms.
For example, when a 0 ≈ 0, changing the frequency does not impact
the performance, while a 0 ≈ 1 shows
high
frequency sensitivity.
f
In this example, the terms t F,k−1 kf −1 − 1 and ∆x i,k ( fk ) ∀i form
k
the feature set hk . These features can be either selected online [10],
or determined offline by using techniques such as Lasso regression [11].
(4)
(5)
where λ ∈ (0, 1] is the exponential forgetting factor, Rk is the
correlation matrix updated in each iteration, and ek is the prediction
error. The features ∆x i,k ( fk ) ∀i used by the model can be selected
online [10], or determined offline by solving a ℓ2 regularized cost
function [11].
To illustrate this approach, we run Nenamark2 and Mobilebench [28, 30] applications sequentially on Intel Minnowboard
platform [17]. The predicted frame processing time follows the
measured value closely, as shown in Figure 3. We observe a sudden
change in frame time when the Nenamark2 benchmark completes
and Mobile-bench starts at time t = 15 s. Our adaptive model is able
to track the frame processing time accurately even during this transition. Similarly, we increase the frequency from 355 MHz to 511
MHz after Mobile-bench finishes at time t = 29 s. The model quickly
adapts its coefficients to track fast changes in frame time, as shown
in Figure 3. Overall, the proposed adaptive model yields a mean
Prediction
Frame time (ms)
The maximum achievable frame rate in interval k is given by the
reciprocal of the frame processing time, which is a multivariate
function of the GPU frequency and workload. Suppose that the frequency is denoted by fk , and the workload is characterized by GPU
performance counters x i,k ( fk ), 1 ≤ i ≤ N , where N is the number
of performance counters. We can measure the frame processing
time in the previous interval (k −1) at runtime by instrumenting the
GPU driver. Hence, the frame processing time in the next interval
can be written as:
(3)
Prediction Error : ek = ∆t F,k − aTk−1 hk
Correction Equation : ak = ak−1 + Rk−1 hk ek
Actual
A2: 355 MHz
A2: 511 MHz
20
10
A1: 355 MHz
0
0
10
A1: 511 MHz
20
30
40
50
Time (s)
Figure 3: Adaptive frame time prediction result for Nenamark2 (A1) and Mobile-bench (A2) benchmarks running on
Minnowboard with two distinct frequencies of 355 MHz and
511 MHz.
absolute percentage error (MAPE) of only 4.1%. Furthermore, we
observe that the Nenamark2 application (A1) has higher frequency
sensitivity (larger a 0 coefficient in Equation 2) than Mobile-bench.
This is also evident in Figure 3, since the reduction in frame time
is larger for Nenamark2, when the GPU frequency increases from
355 MHz to 511 MHz. The frequency sensitivity information can
be utilized by dynamic power management algorithms [22]. For
example, Nenamark2 can run at a higher frequency to avoid performance loss, while the Mobile-bench application can run at a lower
frequency to save power without significant performance penalty.
4 ADAPTIVE POWER CONSUMPTION MODEL
This section illustrates the application of the proposed online learning framework to power consumption. Maintaining an accurate
power consumption model for the entire SoC and different power
domains is critical for two reasons. First, the power consumption
should not exceed the thermal design power to avoid temperature
violations [3, 29]. Second, the total power budget needs to be distributed among different processing elements optimally to avoid
performance bottlenecks [12, 19]. Thus, dynamic resource management techniques should have access to accurate power consumption
models that guide power management decisions.
4.1
Power Consumption Model Templates
The general model template used for power consumption of a core
can be written as:
P = Pdynamic + Pl eakaдe
2
P = αCV f + V Il eakaдe
c2
(6)
(7)
where As is a technology dependent constant, L and W are channel
length and width, k is the Boltzmann constant, T is the temperature,
q is the electron charge, VGS is the gate to source voltage, Vth
is the threshold voltage, n is the sub-threshold swing coefficient,
and Iдat e is the gate leakage [25, 33]. The technology dependent
constants and other constants in Equation 7 can be combined to
get parameters c 1 and c 2 [3].
Our goal is to estimate the power consumption at runtime, e.g., in
a given control interval k. To achieve this goal, we first characterize
the leakage power parameters offline. These parameters can be
estimated by performing nonlinear regression using traces obtained
at multiple temperatures [3]. Once c 1 , c 2 , and Iдat e parameters are
estimated, the leakage current can be found at runtime by plugging
the temperature to Equation 7. Thus, the total power consumption
at time k can be written as:
c2
GPU Capacity
GPU Utilization
Frame Count
CPU Cycles per Instruction
L2 References per Instruction
L2 Misses per Instruction
Branch Misses per Instruction
Per Core CPU Utilization
switching activity α k as a function of the workload, and employ
the proposed online learning framework.
4.2
Online Learning for Dynamic Power
The switching activity α k is a function of the workload and frequency similar to the frame processing time modeled in Section 3.
Therefore, we first express it as α k ( fk , xk ( fk )), where x i,k ( fk ), 1 ≤
i ≤ N denote the performance counters. For example, the CPU
power consumption can be modeled by using the number instructions retired, clock cycles, page walks, power state residencies, memory bus accesses, level two cache accesses, operating frequency,
and utilization [20, 31].
To align with Section 3, we illustrate the proposed approach for
GPU power consumption modeling. The switching activity can be
expressed as a function of the GPU frequency and performance
counters x i,k listed in Table 1 as:
N
ai x i,k ( fk )
(9)
α k ( fk , xk ( fk )) ≈
i=0
where α is the activity factor, C is the switching capacitance, V is
the operating voltage, f is the operating frequency, and Il eakaдe
is the leakage current [5, 39]. We can further express the leakage
current in terms of the temperature and technology parameters as:
W kT 2 q (VG S −Vt h )
Il eakaдe = As
+ Iдat e
e nkT
L q
Il eakaдe = c 1T 2e T + Iдat e
Table 1: Performance counters used in this work
(8)
Pk = α k CVk2 fk + Vk (c 1Tk2e Tk + Iдat e )
Unlike the leakage power, the dynamic power components (the first
term) depends heavily on the workload. Therefore, we model the
These counters are observed at runtime by instrumenting the drivers in the OS. The GPU capacity in Table 1 is found as:
GPUcapacity,k = GPUut il izat ion,k
fk
fдpu,max
(10)
where thefдpu,max is the maximum possible GPU frequency. We
utilized this feature, since it is used by the default GPU drivers.
At runtime, we employ Equation 9 to approximate the switching
activity. Given the switching activity, the power consumption can
be predicted using Equation 8 for any frequency/voltage pair. These
predictions, along with performance models, can be used to make
power management decisions [11].
For illustration, we employ the Nexus 6P smartphone [16], which
allows controlling the CPU and GPU frequencies independently. We
run Angry Birds and Rendering Test applications back to back to
evaluate the power consumption model. More precisely, we employ
Equation 9 to predict the power consumption in the next interval.
After the actual power consumption is measured, we compute the
prediction error. Then, we employ the RLS algorithm using Equation 3 – Equation 5 to maintain the accuracy of the model. We
observe that the proposed adaptive model tracks the GPU power
consumption accurately throughout the 120 s experiment, as shown
in Figure 4. In particular, the power consumption drops sharply
around t = 20 s, when the execution of Angry Birds completes and
Rendering Test starts. The adaptive model successfully tracks the
measured power consumption with only one-sample overshoot at
the time of transition. Similarly, the model captures the variations
within each application, and achieves MAPE of only 5.5%.
GPU Power (W)
1
the temperature in the next control interval given the current temperature and power consumption. This information is useful to
check if the temperature constraints will be violated in the future
control intervals. Dynamic resource management techniques can
take appropriate actions with the help of these models [3].
Actual
Prediction
0.8
0.6
0.4
5.2 Online Power-Temperature Analysis
0.2
0
0
20
40
60
80
100
120
Time (s)
Figure 4: Online learning and estimation of the GPU power
consumption while running Angry Birds and a custom Rendering Test app on the Nexus 6P smartphone. The MAPE for
this workload is 5.5%.
5
POWER–TEMPERATURE DYNAMICS
Increasing power density drives the chip temperature up through
thermal resistance and capacitance networks [3, 15]. In turn, higher
temperature leads to an exponential increase in the leakage current,
as revealed by Equation 7. This relation gives rise to a positive feedback that continues until the power-temperature dynamics reaches
a stable steady state, or a thermal runaway occurs [2, 23]. If the
chip temperature exceeds thermally safe limits, power management
drivers reduce the operating frequencies or power down active PEs.
However, these actions degrade the performance and quality of service (QoS) delivered to the user severely. Therefore, it is critical to
maintain accurate thermal models, in addition to the performance
and power models, to predict the impact of power management
decisions before committing them.
5.1
Temperature Model Template
Thermal modeling has recently received significant attention due
to its importance in dynamic thermal and power management [1,
5, 15, 36, 37]. The power-temperature dynamics is described as:
dT
= −G t T + P
(11)
dt
where Ct is the thermal capacitance, G t is the thermal conductance,
T is the temperature, and P is the power consumption. Since the
power management decisions are typically made at discrete time
intervals, most studies discretize Equation 11 to obtain:
Ct
T [k + 1] = AT [k] + BP[k]
(12)
where T [k] and T [k + 1] are vectors that denote the temperature
at each thermal hotspot in time intervals k and k + 1, respectively.
The matrix A captures the impact of T [k] on T [k + 1]. Similarly,
matrix B models the impact of each power sources on each thermal
hotspot [3]. These matrices can be characterized by starting off
with continuous time models, such as those in HotSpot [15], and
discretizing them [36]. However, detailed thermal network models
are not available for most modern SoCs. Therefore, recent studies
employ system identification methods to directly identify matrices
A and B [1, 3]. System identification methods are effective even
when detailed floorplan information is not available. After A and
B matrices are characterized, Equation 12 can be used to predict
Evaluating the longer term behavior of the power-temperature
dynamics is useful to analyze stability and predict potential thermal
runaway. However, Equation 12 predicts the temperature only in the
next control interval. While it can be also used iteratively to predict
the temperature in future intervals, the prediction error increases
with the prediction interval [2, 3]. Hence, Equation 12 cannot be
used alone to find the steady-state temperature at runtime.
Our goal is to predict the steady state fixed point, which is defined as the power consumption and temperature the system will
converge if the current operating conditions are maintained [2]. The
fixed point is a function of the current operating frequency, voltage
and temperature, as well as the workload. It can be calculated by
solving the system of equations:
Tf ix [k] = ATf ix [k] + BP[k]
(13)
where Tf ix,k is the fixed point temperature evaluated at time k.
We have the same temperature on both sides of the equation, since
the system is in a steady state when the fixed point is reached.
The power consumption of each processing element in P[k] can be
written as:
c2
Pi [k] = α i Ci Vi2 fi + Vi (c 1Ti2 [k]e Ti [k ] + Iдat e ), 1 ≤ i ≤ M
(14)
where M is the number of processing elements in the system.
Due to the nonlinear dependency between the power consumption and temperature, solving this set of equations is challenging at
runtime. Therefore, we constructed an efficient fixed point computation algorithm that can be implemented in power management
drivers [2]. Using this technique, we compute the power consumption and temperature fixed point at time k in less than 100 μs, as
detailed in [2]. Since the workload, frequency and voltage change in
every interval, the fixed point may change dynamically. Therefore,
we repeat this process in each control interval to maintain an up to
date and accurate prediction. To illustrate this analysis, we run the
CRC32 benchmark on the Odroid-XU3 [13] board that includes a
Samsung Exynos-5422 SoC. When the benchmark starts running,
the board has a low power consumption of about 0.25 W, as shown
in Figure 5. At this time, the fixed point temperature is predicted
as only 50 ◦ C, as shown by the red with black border. In the next
few time intervals, the power consumption rises to about 1.8 W, as
shown by the red line in the figure. As a result of this increase, the
fixed point prediction is updated as 85 ◦ C. As the benchmark continues to run, the temperature of the system rises due to increase
in the leakage power consumption. The fixed point is continuously
updated as the workload and temperature varies. These fixed point
predictions are shown using red with black borders in Figure 5.
We observe that the predictions are clustered around about 88 ◦ C
as the variation in the power consumption is only about 0.2 W. We
also see that the measured temperature at the end of the experiment
reaches to about 87 ◦ C, which is within 1 ◦ C of the predicted fixed
point, thus showing the accuracy of our prediction algorithm.
Simulation
Measurement
Analytical Fixed Point Prediction
Temperature ( oC)
90
80
70
60
50
0
0.5
1
1.5
2
2.5
Power Consumption (W)
Figure 5: Measurement, simulation and prediction of temperature while running the CRC benchmark on the OdroidXU3 board that includes a Samsung Exynos-5422 SoC.
6
CONCLUSION
Heterogeneous SoCs have the capability to bridge the gap between
the energy efficiency of application specific hardware and generalpurpose processors. In order to achieve this goal, dynamic resource
managers in heterogeneous SoCs need adaptive models for performance, power consumption and temperature of various processing
elements in the SoC. This paper presented a general methodology
for online learning of adaptive performance, power and thermal
models. Specifically, we illustrated online learning of GPU frame
processing time, GPU power consumption and power-temperature
dynamics of a SoC. Experiments on state-of-the-art industrial platforms show that the proposed approach is able to model the metrics
of interest with less than 6% modeling error.
Acknowledgements: This work was supported partially by National Science Foundation (NSF) grants CNS-1526562, Semiconductor Research Corporation (SRC) task 2721.001, and Strategic CAD
Labs, Intel Corporation.
REFERENCES
[1] F. Beneventi, A. Bartolini, A. Tilli, and L. Benini. An Effective Gray-Box Identification Procedure for Multicore Thermal Modeling. IEEE Trans. Comput.,
63(5):1097–1110, 2014.
[2] G. Bhat, S. Gumussoy, and U. Y. Ogras. Power-Temperature Stability and
Safety Analysis for Multiprocessor Systems. ACM Trans. Embedd. Comput. Syst.,
16(5s):145, 2017.
[3] G. Bhat, G. Singla, A. K. Unver, and U. Y. Ogras. Algorithmic Optimization of
Thermal and Power Management for Heterogeneous Mobile Platforms. IEEE
Trans. Very Large Scale Integr. (VLSI) Syst., 26(3):544–557, 2018.
[4] P. Bogdan, R. Marculescu, S. Jain, and R. T. Gavila. An Optimal Control Approach
to Power Management for Multi-Voltage and Frequency Islands Multiprocessor
Platforms under Highly Variable Workloads. In Proc. of the Int. Symp. on Networks
on Chip, pages 35–42, 2012.
[5] D. Brooks, R. P. Dick, R. Joseph, and L. Shang. Power, Thermal, and Reliability
Modeling in Nanometer-Scale Microprocessors. IEEE Micro, 27(3):49–62, 2007.
[6] E. Del Sozzo et al. Workload-aware Power Optimization Strategy for Asymmetric
Multiprocessors. In Proc. of the Conf. on Design, Autom. and Test in Europe, pages
531–534, 2016.
[7] B. Dietrich and S. Chakraborty. Lightweight Graphics Instrumentation for Game
State-Specific Power Management in Android. Multimedia Systems, 20(5):563–578,
2014.
[8] B. Dietrich et al. LMS-based Low-complexity Game Workload Prediction for
DVFS. In Proc. of the Int. Conf. on Comput. Design, pages 417–424, 2010.
[9] Google. Simpleperf. https://developer.android.com/ndk/guides/simpleperf Accessed 08/18/2018, 2018.
[10] U. Gupta, M. Babu, R. Ayoub, M. Kishinevsky, F. Paterna, and U. Y. Ogras. STAFF:
Online Learning with Stabilized Adaptive Forgetting Factor and Feature Selection
Algorithm. In Proc. of Design Autom. Conf., page 6, 2018.
[11] U. Gupta et al. An Online Learning Methodology for Performance Modeling of
Graphics Processors. IEEE Trans. Comput., 2018. DOI:10.1109/TC.2018.2840710.
[12] U. Gupta et al. Dynamic Power Budgeting for Mobile Systems Running Graphics
Workloads. IEEE Trans. Multi-Scale Comput. Syst., 4(1):30–40, 2018.
[13] Hardkernel. Platforms, ODROID − XU3, 2017. http://www.hardkernel.com/
main/products/prdt_info.php?g_code=G143452239825, Accessed 08/22/2018.
[14] J. Henkel, J. Teich, S. Wildermann, and H. Amrouch. Dynamic Resource Management for Heterogeneous Many-Cores. In Proc. of the Int. Conf. on Comput.-Aided
Design, Nov. 2018.
[15] W. Huang, S. Ghosh, S. Velusamy, K. Sankaranarayanan, K. Skadron, and M. R.
Stan. HotSpot: A Compact Thermal Modeling Methodology for Early-Stage VLSI
Design. IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 14(5):501–513, 2006.
[16] Huawei. Huawei Nexus 6P Smartphone, 2015. https://www.gsmarena.com/
huawei_nexus_6p-7588.php, Accessed 08/22/2018.
[17] Intel Corp. Minnowboard, 2016. http://www.minnowboard.org/, Accessed
08/22/2018.
[18] B. K. Joardar, J. R. Doppa, P. P. Pande, D. Marculescu, and R. Marculescu. Hybrid
On-Chip Communication Architectures for Heterogeneous Manycore Systems.
In Proc. of the Int. Conf. on Comput.-Aided Design, Nov. 2018.
[19] D. Kadjo, R. Ayoub, M. Kishinevsky, and P. V. Gratz. A Control-Theoretic Approach for Energy Efficient CPU-GPU Subsystem in Mobile Platforms. In Proc.
of the Design Autom. Conf., pages 62:1–62:6, 2015.
[20] D. Kadjo, U. Ogras, R. Ayoub, M. Kishinevsky, and P. Gratz. Towards Platform
Level Power Management in Mobile Systems. In IEEE Int. System-on-Chip Conf.,
pages 146–151, 2014.
[21] J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy.
Introduction to the Cell multiprocessor. IBM J. of Res. and Develop., 49(4.5):589–
604, 2005.
[22] R. G. Kim et al. Imitation Learning for Dynamic VFI Control in Large-Scale
Manycore Systems. IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 25(9):2458–
2471, 2017.
[23] W. Liao and L. He. Coupled Power and Thermal Simulation with Active Cooling.
In Proc. of the Int. Workshop on Power-Aware Comput. Syst., pages 148–163, 2003.
[24] Linux Kernel. The Interactive Governor. https://android.googlesource.
com/kernel/common/+/a7827a2a60218b25f222b54f77ed38f57aebe08b/
Documentation/cpu-freq/governors.txt, Accessed 08/14/2018, 2016.
[25] Y. Liu, R. P. Dick, L. Shang, and H. Yang. Accurate Temperature-Dependent
Integrated Circuit Leakage Power Estimation is Easy. In Proc. of the Conf. on
Design, Autom. and Test in Europe, pages 1526–1531, 2007.
[26] X. Ma, Z. Deng, M. Dong, and L. Zhong. Characterizing the Performance and
Power Consumption of 3D Mobile Games. Computer, 46(4):76–82, 2013.
[27] T. S. Muthukaruppan, M. Pricopi, V. Venkataramani, T. Mitra, and S. Vishin.
Hierarchical Power Management for Asymmetric Multi-Core in Dark Silicon Era.
In Proc. of the Design Autom. Conf., pages 1–9, 2013.
[28] Nena Innovation. Nenamark2. https://nena.se/nenamark/ Accessed 08/18/2018,
2018.
[29] S. Pagani, H. Khdr, J.-J. Chen, M. Shafique, M. Li, and J. Henkel. Thermal Safe
Power (TSP): Efficient Power Budgeting for Heterogeneous Manycore Systems
in Dark Silicon. IEEE Trans. Comput., 66(1):147–162, 2017.
[30] D. Pandiyan, S.-Y. Lee, and C.-J. Wu. Performance, Energy Characterizations
and Architectural Implications of an Emerging Mobile Platform Benchmark
Suite-Mobilebench. In Int. Symp. Workload Characterization, pages 133–142,
2013.
[31] A. Pathania, A. E. Irimiea, A. Prakash, and T. Mitra. Power-Performance Modelling
of Mobile Gaming Workloads on Heterogeneous MPSoCs. In Proc. of the Design
Autom. Conf., pages 201:1–201:6, 2015.
[32] E. Rotem. Intel Architecture, Code Name Skylake Deep Dive: A New Architecture
to Manage Power Performance and Energy Efficiency. In Intel Dev. Forum, 2015.
[33] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand. Leakage Current Mechanisms and Leakage Reduction Techniques in Deep-Submicrometer CMOS Circuits.
Proc. IEEE, 91(2):305–327, 2003.
[34] M. Saecker and V. Markl. Big Data Analytics on Modern Hardware Architectures:
A Technology Survey. In Eur. Bus. Intell. Summer School, pages 125–149, 2012.
[35] A. H. Sayed. Fundamentals of Adaptive Filtering. John Wiley & Sons, 2003.
[36] S. Sharifi, D. Krishnaswamy, and T. S. Rosing. PROMETHEUS: A Proactive Method
for Thermal Management of Heterogeneous MPSoCs. IEEE Trans. Comput.-Aided
Design Integr. Circuits Syst., pages 1110–1123, 2013.
[37] K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and D. Tarjan. Temperature-Aware Microarchitecture: Modeling and Implementation. ACM
Trans. Archit. Code Optim., 1(1):94–125, 2004.
Number of Mobile Phone Users Worldwide From
[38] Statista.
2013
to
2019.
https://www.statista.com/statistics/274774/
forecast-of-mobile-phone-users-worldwide/, Accessed 08/22/2018.
[39] N. Weste and D. Harris. CMOS VLSI Design: A Circuits and Systems Perspective.
2010.