18.1 Các vấn đề về hiệu suất phần cứng Giới thiệu
18.1 Các vấn đề về hiệu suất phần cứng Giới thiệu
18.1 Các vấn đề về hiệu suất phần cứng Giới thiệu
Giới thiệu
To keep increasing
performance, designers have implemented complex processor features
(pipelining, superscalar, SMT) and raised clock frequencies, leading to
higher power demands. Increasing cache memory helps control power
density, as memory transistors consume significantly less power than
logic. As a result, memory now occupies up to half of the chip area,
although much area still goes to logic.
For instance, with 10% serial code (f = 0.9), a program running on eight
cores would achieve only a 4.7x speedup. Additionally, parallel processing
introduces overhead from communication, task distribution, and cache
coherence, which can cause performance to peak and then degrade as
the number of cores increases.
However,
many applications can still take advantage of multicore systems.
Database management systems are one example where careful attention
to reducing serial portions of code allows for efficient multicore use.
Servers also benefit from multicore organization, as they handle numerous
independent transactions simultaneously.
In addition to general-purpose server software, several application types
benefit from scaling throughput with additional cores, including:
- Java applications: Java inherently supports threading, and the Java Virtual
Machine (JVM) is designed for multithreading, providing efficient
scheduling and memory management.
Valve, known for popular games and the Source engine, has enhanced the
Source engine to leverage multithreading for better scalability on
multicore processors from Intel and AMD. This upgrade improves the
performance of games like Half-Life 2.
Valve identified certain systems, like sound mixing, that function well on a
single processor without the need for interaction or timing constraints. In
contrast, scene rendering can be divided into multiple threads, benefiting
from parallel processing.
1. Figure 18.6a: Early multicore designs, like the ARM11 MPCore, feature
individual L1 caches for each core, while L2 and higher-level caches are
unified.
4. Figure 18.6d: With increasing cache memory, systems like the Intel Core
i7 employ dedicated L1 and L2 caches alongside a shared L3 cache.
1. Reduced Miss Rates: If one core accesses data, it's cached for others,
improving access speed.
Trắc nghiệm
3. Which threading approach was found to provide the best scalability for
Valve's Source engine?
- A) Coarse-grained threading
- B) Fine-grained threading
- C) Hybrid threading
- D) Single-threaded execution