Intel Core I7 Processor

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 22

INTRODUCTION

A CPU or processor is a description of a class of logic machines that can execute computer programs. A multi core processor is a processing system composed of two or more independent cores (CPUs). The cores are typically integrated into a single integrated circuit die(known as a chip multiprocessor or CMP) or they may be integrated onto multiple dies in a single package. Intel core i7 is a family of three Intel desktop processor, the first processor released using the Intel Nehalem micro architecture and the successor to the Intel Core 2 family. All three models are quad core processors. A quad core processor consists of four cores. Quad core technology is a type of technology that includes two separate dual-core dies, where dual-core means a CPU that includes two complete execution cores per physical processor, installed together in one CPU package. In this setup cores 1 and 2 would share a memory cache, and core 3 and 4 another cache. Communication between core 1 and 2 and core 3 and 4 using QPI(Quick Path Interconnect) They are 64 bit processors. In computer architecture, 64-bit integers, memory addresses, or other data units are those that are at most 64 bits (8 octets) wide. Also, 64-bit CPU and ALU architectures are those that are based on registers, address buses, or data buses of that size.

The need for core i7 processors requires a comparison with their immediate predecessors. The comparison can be summarized as follows. The Core i7 is a completely new architecture which is much faster and more efficient than the Core 2 Duo. Currently only the Core i7 920, 945 and 965 XE versions are available. Of that the Core i7 920 is available at just $284 which makes it a great buy. It offers better performance than almost all Core 2 Duo processors.

1.1 Fig Logo of core i7

FEATURES
1) Core i7 uses a LGA1366 socket
A cpu socket or cpu slot is an electrical component that attaches to a circuit board and is designed to house a cpu. It is a special type of IC socket designed for very high pin counts. A cpu socket provides many functions including providing a physical structure to support the cpu, facilitating replacement and cost reduction and as an electrical interface both with the cpu and the circuit board.

Core i7 uses an LGA1366 socket.(socket B). it is incompatible with the previous versions. LGA refers to Land Grid Array and is used as a physical interface for microprocessors of the Intel Pentium 4, Intel Xeon, Intel Core 2 and AMD Opteron families. Earlier the socket used was the PGA(Pin Grid Array). In LGA there are no pins on the chip .Instead there are pads of gold plated copper that touch pins on the motherboard. LGA provides a larger contact point, allowing for eg higher clock frequencies. It also allows higher pin densities and thus enables a more stable power supply to the chip.

LGA SOCKET

2)On-die memory controller

The memory is directly connected to the processor. The memory is divided into three channels. Each channel can support one or two DDR3 RAMs. Motherboards for core i7 have three or six RAM slots.DDR3 RAM is double data rate 3 random access memory. This is a RAM technology used for high speed storage of the working data of a computer or other digital electronic devices. The primary benefit of DDR3 is its ability to run its I/O bus at four times the speed of the memory cells contained in it. It enables faster bus speeds and higher throughputs than earlier memory technologies. There is a significant reduction in the power consumption. It needs only 1.5V compared to 1.8V for DDR2.

3)QPI(Quick Path Interconnect)

In core i7 the FSB(Front side bus) is replaced by QPI.FSB is the bus that carries data between CPU and the rest of the hardware. There were many advantages for the FSB. It had high flexibility and low cost. There is no practical limit to the number of CPUs that can be placed on the FSB. But FSB lacked speed. It had the major drawback that the bus was too slow. This led to the advent of the QPI. The QPI is a point to point interconnect that carries data between the CPU and the rest of the hardware. The QPI has a speed of 25.6Gbps which is almost double of FSB.QPI serves as a connection point between the CPU and the rest of the hardware by means of a chipset. The chipset is a connection point for all other buses in the system.

A chipset is a group of ICs designed to work together,but they are treated as a single product. In PCs using the Intel Pentium processors, the term refers to a specific pair of chips on the motherboard; the northbridge and the southbridge. The northbridge links the CPU to very high speed devices especially main memory and graphics controllers. Southbridge connects to lower speed peripheral devices such as PCI.

A chipset is usually designed to work with a specific family of microprocessors. Because it controls the communication between the processor and external devices it plays a vital role in determining system performance. The northbridge typically handles communications among the CPU, RAM, AGP, PCI and the southbridge. Some northbridge also contain integrated video controllers known as GMCH in Intel systems. The southbridge implements the slower capabilities of the motherboard. The southbridge actually contains some on-chip integrated peripherals, such as Ethernet, USB and audio devices.

4)Hyper Threading

A thread is a operating system abstraction of an activity. The central aim of having multiple threads is to maximize the degree of concurrent execution between operations. Hyper-threading (officially termed Hyper-Threading Technology or HTT) is an Intel-proprietary technology used to improve parallelization of computations (doing multiple tasks at once) performed on PC microprocessors. A processor with hyper-threading enabled is

treated by the operating system as two processors instead of one. This means that only one processor is physically present but the operating system sees two virtual processors, and shares the workload between them. In the case of core i7 it has four cores. Each of the four cores can process upto two threads simultaneously, so the processor appears to the OS as eight core. This technology is based on the fact that when the CPU core is running there are certain circuits inside that are idle and thus can be used. This technology is also called SMT or Simultaneous Multi-Threading (SMT). This increases the speed of processing. Hyper-threading requires only that the operating system support multiple processors, but Intel recommends disabling HT when using operating systems that have not been optimized for the technology. 5)Memory Cache

A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations On the cache memory side Intel will use individual L2 caches for each core and a shared L3 memory cache. Each L2 memory cache will be of 256 KB and the L3 cache will be of 8 MB. L1 cache remains the same as Core 2 Duo (64 KB, 32 KB for instructions and 32 KB for data). Here L1,L2,L3 are multi-level caches Multi-level caches generally operate by checking the smallest Level 1 (L1) cache first; if it hits, the processor proceeds at high speed. If the smaller cache misses, the next larger cache (L2) is checked, and so on, before external memory is checked..

Core 2 Duo processors have only one L2 memory cache, which is shared among all CPU cores, but quad-core CPUs from Intel like Core i7 have two L2 caches, each one shared by each group of two cores. 6)Turbo Mode

The embedded power control unit added also power sensors for each core. So the CPU knows how much power each core is consuming and how much heat is being dissipated. This allowed the addition of a Turbo Mode in the CPU. Turbo Mode allows the CPU to increase the clock rate of the active core(s).Even though this idea isnt new, but on the previous incarnation of this technology it could only be used when the other processing cores were idle. This new mode is a closed-loop system. The CPU is constantly monitoring its temperature and power consumption. The CPU will overclock the active cores until the CPU reaches its maximum allowed TDP, based on the cooling system you are using. This is configurable on the motherboard setup. For example, if you say your CPU cooler is able to dissipate 130 W, the CPU will increase (or reduce) its clock so the power currently dissipated by the CPU matches the amount of power the CPU cooler can dissipate. So if you, for example, replace the CPU stock cooler with a better cooler, you will have to enter the motherboard setup to configure the new cooler TDP (i.e. the maximum amount of thermal power it can dissipate) in order to make Turbo Mode to increase the CPU clock even more. Notice that the CPU doesnt have to necessary shut down unused cores to enable Turbo Mode. But since this dynamic overclocking technique is based on how much power you can still dissipate using your current CPU cooler, shutting down unused cores will reduce the CPU consumption and power dissipation and thus will allow a higher overclocking.It only works for the CPU cores, so the memory controller and the memory are not affected. 7)Power Requirements Transistors inside the CPU work as a switch, with two possible states: conductive (a.k.a. saturation mode), working as a closed switch, and non-conductive (a.k.a. cut-off mode), working as an open switch. The problem is when they are on their nonconductive state they shouldnt allow any current to flow, but a small amount of current still flows. This current is called leakage and if you add up all leakage currents you have a significant amount of current (and thus power) being wasted and unnecessary heat being generated. One of the challenges in designing CPUs in recent years has been trying to eliminate leakage current.

Basically, the CPU can now have different voltages and frequencies for each core, for the units outside the cores, for the memory controller, for the cache and for the I/O units. On previous CPUs, all cores had to run at the same clock rate but on Nehalem-based CPUs each core can be programmed to run at different clock rates to save power. The embedded power control unit can now switch off any of the CPU cores, feature not available on mobile Core 2 CPUs. In fact now the CPU can put any core into the C6 (deep power down) power state independently of the state under the remaining cores is running. This allows energy savings when you are running your PC normally but one or more cores are idle and thus can be shut down.

SYSTEM ARCHITECTURE

Intel x58 is an Intel chip designed to connect Intel processors with Intel Quick Path Interconnect interface to peripheral devices. The processors that support the Intel X58 use the Intel Nehalem microarchitecture and therefore do have the integrated memory controller and therefore it does not need a memory interface. It communicates with the processor via the QPI and the southbridge via the DMI and the peripherals via the PCI. The ICH10\ICH10R is an Intel microchip which is used as Southbridge on motherboards. As any Southbridge the ICH is used to connect and control peripheral devices. The BIOS is the basic input output system which is also known as System BIOS. It is a boot firmware, designed to be the first cache run by a pc when powered on

INTEL NEHALEM ARCHITECTURE

FEATURES
1) Macro Fusion

Programs are written using x86 instructions (also called macro-op or simply instructions), which arent understandable by the CPU execution units. They must be first decoded into microinstructions (also called micro-op or op). Macro-fusion, which is the ability of translating two x86 instructions in just one microinstruction (also known as micro-ops) to be executed inside the CPU, improving performance and lowering the CPU power consumption, since it will execute only one microinstruction instead of two. This scheme, however, only works for comparing and conditional branching instructions (i.e. CMP or TEST plus a Jcc instruction).Nehalem microarchitecture improves macro-fusion in two ways. First it adds the support for several branching instructions that couldnt be fused on Core 2 CPUs. And second, on Nehalem-based CPUs macro-fusion is used on both 32- and 64-bit modes, while on Core 2 CPUs macro-fusion only works when the CPU is working under 32-bit mode. 2)Loop Stream Detector A Loop Stream Detector, basically a small 18-instruction cache between the fetch and the decode units from the CPU. When the CPU is running a loop (a part of a program that is repeated several times) the CPU doesnt need to fetch the required instructions again from the L1 instruction cache: they are already close to the decode unit. In addition the CPU actually turns off the fetch and branch prediction units while running a detected loop, making the CPU to save some power. On Nehalem-based CPUs this small cache has been moved to after the decode unit. So instead of holding x86 instructions like on Core 2 CPUs, it holds micro-ops (up to 28). This improves performance, because when the CPU is running a loop, it now doesnt need to decode the instructions present in the loop: they will be already decoded inside this small cache. Also, the CPU can now turn off the decode unit in addition to the fetch and branch prediction units when running a detected loop, saving even more power.Nehalem architecture adds one extra dispatch port and has now 12 execution units, see below. With that CPUs based on this

architecture can have more microinstructions being executed at the same time than previous CPUs.

3)Translation Look Aside And Branch Target Buffer Nehalem microarchitecture also adds two extra buffers: a second 512-entry Translation Look-aside Buffer (TLB) and a second Branch Target Buffer (BTB). The addition of these buffers increases the CPU performance. TLB is a table used for the conversion between physical addresses and virtual addresses by the virtual memory circuit. Virtual memory is a technique where the CPU simulates more RAM memory on a file on the hard drive to allow the computer to continue operating even when there is not enough RAM available (the CPU gets what is on the RAM memory, stores inside this swap file and then frees memory for using). Branch prediction is a circuit that tries to guess the next steps of a program in advance, loading to inside the CPU the instructions it thinks the CPU will try to load next. If it hits it right, the CPU wont waste time loading these instructions from memory, as they will be already inside the CPU. Increasing the size (or adding a second one, in the case of Nehalem-based CPUs) of the BTB allows this circuit to load even more instructions in advance, improving the CPU performance 4)SSE4 Instruction Set Intel SSE4 consists of 54 instructions. A subset consisting of 47 instructions, referred to as SSE4.1 . Additionally, SSE4.2, a second subset consisting f the 7 remaining instructions, is first available in Core i7. SSE4 contains instructions that execute operations which are not specific to multimedia applications.

Some instructions in the SSE4.1 are

PMULDQ PMULLD -

Packed signed multiplication on two sets of 2 out of 4 packed integers, the 1st and 3rd per packed 4, giving 2 packed 64-bit results Packed signed multiplication, 4 packed sets of 32-bit integers multiplied to give 4 packed 32-bit results.

Some instructions in the SSE4.2 are

PCMPESTRI PCMPESTRM -

Packed Compare Explicit Length Strings, Return Index Packed Compare Explicit Length Strings, Return Mask

WORKING
The instruction decoder has three decoder units that can decode one simple instruction per cycle per unit. The other decoder unit can decode one instruction every cycle, either simple instruction or complex instruction made up of several micro-ops.Instructions made up of more than four micro-ops are delivered from the MSROM. Upto four micro-ops can be delivered each cycle to the instruction decoder queue (IDQ).The IDQ delivers micro-op stream to the allocation/renaming stage of the pipeline. The out-of-order engine supports up to 128 microops in flight. Each micro-ops must be allocated with the following resources: an entry in the re-order buffer (ROB), an entry in the reservation station (RS), and a load/store buffer if a memory access is required.The allocator also renames the register file entry of each micro-op in flight. The inputdata associated with a micro-op are generally either read from the ROB or from theretired register file. The RS dispatch up to six micro-ops in one cycle if the micro-ops are ready to execute. The RS dispatch a micro-op through an issue port to a specific execution cluster, each cluster may contain a collection of integer/FP/SIMD execution units.The result from the execution unit executing a micro-op is written back to the register file, or forwarded through a bypass network to a micro-op in-flight that needs the result. Intel microarchitecture (Nehalem) can support write backthroughput of one register file write per cycle per port. The bypass network consistsof three domains of integer/FP/SIMD. Forwarding the result within the same bypass domain from a producer micro-op to a consumer micro is done efficiently in hardware without delay. Forwarding the result across different bypass domains may be subject to additional bypass delays. The bypass delays may be visible to software in addition to the latency and throughput characteristics of individual execution units.Intel microarchitecture (Nehalem) contains an instruction cache, a first-level datacache and a second-level unified cache in each core. Each physical processor may contain several processor cores and a shared collection of subsystems that are referred to as uncore. Specifically in Intel Core i7 processor, the uncore provides a unified third-level cache shared by all cores in the physical processor, Intel QuickPath Interconnect links and associated logic. The L1 and L2 caches are writeback and non-inclusive.The shared L3 cache is writeback and inclusive, such that a cache line that exists ineither L1 data cache, L1 instruction cache, unified L2 cache also exists in L3. The L3 is designed to use the inclusive nature to minimize snoop traffic between processor

cores. The latency of L3 accessmay vary as a function of the frequency ratio between the processor and the uncore sub-system

The Intel microarchitecture (Nehalem) implements two levels of translation lookasidebuffer (TLB). The first level consists of separate TLBs for data and code. DTLB0handles address translation for data accesses, it provides 64 entries to support 4KBpages and 32 entries for large pages. The ITLB provides 64 entries (per thread) for 4KB pages and 7 entries (per thread) for large pages. The second level TLB (STLB) handles both code and data accesses for 4KB pages. It support 4KB page translation operation that missed DTLB0 or ITLB. All entries are 4way associative. Here is a list of entries in each DTLB: STLB for 4-KByte pages: 512 entries (services both data and instruction lookups) DTLB0 for large pages: 32 entries

DTLB0 for 4-KByte pages: 64 entries An DTLB0 miss and STLB hit causes a penalty of 7cycles. Software only pays this penalty if the DTLB0 is used in some dispatch cases. The delays associated with miss to the STLB and PMH are largely non-blocking.

ADVANTAGES
The intel core i7 processor is said to be the fastest processor on the planet. So naturally the most important advantage of the intel core i7 would be its speed. This is contributed by many factors. They are 1) Hyper Threading The simultaneous execution of many threads. This improves parallel execution 2) On-die memory controller It can execute its IO bus at 4 times the speed of the memory cells contained in it which is inturn contributed by the DDR3RAM. 3) L3 cache The L3 cache is designed to use the inclusive nature to minimize snoop traffic between processor cores. 4) QPI (Quick Path Interconnect) The QPI has a speed of 25.6Gbps which is the double of Side Bus) 5) Macro-fusion The ability of translating two x86 instructions in just one microinstruction to be executed inside the CPU, improving performance and lowering the CPU power consumption, since it will execute only one microinstruction instead of two. Similarly the loop stream detector, TLB and BTB also helps in increasing the speed. FSB(Front

Another important advantage is the reduction in the power consumption. Intel core i7 hasa power control unit inside the CPU in order to better manage power. It via also better cooling. So also the DDR3RAM uses 1.5V when compared to DDr2 which is 1.8V

DISADVANTAGES

As anything in this world has a demerit so do the core i7 processor. The major disadvantages of the core i7 are the following. It requires newer motherboards. It is sensitive to high voltages. Another major drawback is that it does not support ECC memoy. ECC is the Error Correcting Code or Circuit. It not identifies the errors but also corrects them.

CONCLUSION
In a test performed on a leaked hardware, the core i7 outperformed the currently fastest core2 extreme processor. It has got the advantages of high performance, highly overclockable, quite cooling and power efficient. Some of the disadvantages include the requirement of newer motherboards, sensitive to higher voltage and does not support ECC memory. The technology keeps on improving as the need for faster and high end applications increases. The 8 core version is about to be released soon.

Referance:Wikipedia,Intel

You might also like