Computer Gaming Hardware

Reviews, Ratings, and More ™

The new K10 architecture is based on the K8 (a.k.a. AMD64) architecture with some enhancements. Thus we recommend you to read our Inside AMD64 Architecture before continuing to read the present tutorial. By the way, AMD never released an architecture called K9, from K8 they jumped to K10.
Just to remember, memory cache is a high-speed memory (static RAM or SRAM) embedded inside the CPU used to store data that the CPU may need. If the data required by the CPU isn’t located in the cache, it must go all the way to the main RAM memory, which reduces its speed, as the RAM memory is accessed using the CPU external clock rate. For example, on an AMD 3 GHz CPU, the memory cache is accessed at 3 GHz but the RAM memory is accessed at 800 MHz (if you are using DDR2-800 memories) or less.
The higher the data the CPU fetches from the RAM memory per clock cycle the faster the system will be. As we explained on the previous page, the CPU is a lot faster than the RAM memory, so the less times it needs to fetch data from the memory the better. Loading lots of data at once prevents this from happening.
The problem with dual-channel technology is that the second 64-bit data that is loaded together with the data that was originally requested is necessarily stored on the following address. For example, if the CPU asked for the data A stored in address 1, the memory controller will automatically load data A and data B, which is stored in address 2.
If the CPU doesn’t have a use for this data B, this second load will be completely wasted, as the memory controller cannot use this parallel loading to read a data that is stored on an address that is not the following address.
The memory controller used on K10 architecture allows the CPU to load a data stored on an address different from the next address. This independency will increase the CPU performance by not wasting memory loads. Figure 5 illustrates this feature, where the CPU wanted to load data A and F. On K8 architecture, illustrated on the left side, two data fetches are needed (as two data are completely useless), while on K10 architecture only one data fetch is needed.
Intel says that this shared architecture is better, because on the separated cache approach at some moment one core may run out of cache while the other may have unused parts on its own L2 memory cache. When this happens, the first core must grab data from the main RAM memory, even though there was empty space on the L2 memory cache of the second core that could be used to store data and prevent that core from accessing the main RAM memory. So on a Core 2 Duo processor with 4 MB L2 memory cache, one core may be using 3.5 MB while the other 512 KB (0.5 MB), contrasted to the fixed 50%-50% division used on other dual-core CPUs.
On the other hand, current quad-core Intel CPUs like Core 2 Extreme QX and Core 2 Quad use two dual-core chips, meaning that this sharing only occurs between cores 1 & 2 and 3 & 4. In the future Intel plans to launch quad-core CPUs using a single chip. When this happens the L2 cache will be shared between the four cores.

Published in: Hardware, I/O devices, processors

Leave a Reply