Pentium 4 and new Celeron processors use Intel’s seventh generation architecture, also called Net burst. Its overall look you can see on Figure 1. Don’t get scared. We will explain deeply what this diagram is about.
• The data path between the L2 memory cache (“L2 cache and control†on Figure 1) and L1 data cache (“L1 D-Cache and D-TLB†on Figure 1) is 256-bit wide. On previous processors from Intel this data path was of only 64 bits. So this communication can be four times faster than processors from previous generations when running at the same clock. The data path between L2 memory cache (“L2 cache and control†on Figure 1) and the pre-fetch unit (“BTB & I-TLB†on Figure 1), however, continues to be 64-bit wide.
• The L1 instruction cache was relocated. Instead of being before the fetch unit, the L1 instruction cache is now after the decode unit, with a new name, “Trace Cacheâ€. This trace cache can hold up to 12 K microinstructions. Since each microinstruction is 100-bit wide, the trace cache is of 150 KB (12 K x 100 / 8). On of the most common mistakes people make when commenting Pentium 4 architecture is saying that Pentium 4 doesn’t have any instruction cache at all. That’s absolutely not true. It is there, but with a different name and a different location.
• On Pentium 4 there are 128 internal registers, on Intel’s 6th generation processors (like Pentium II and Pentium III) there were only 40 internal registers. These registers are in the Register Renaming Unit (a.k.a. RAT, Register Alias Table, shown as “Rename/Alloc†on Figure 1).
• Pentium 4 has five execution units working in parallel and two units for loading and storing data on RAM memory.
Pipeline is a list of all stages a given instruction must go thru in order to be fully executed. On 6th generation Intel processors, like Pentium III, their pipeline had 11 stages. Pentium 4 has 20 stages! So, on a Pentium 4 processor a given instruction takes much longer to be executed then on a Pentium III, for instance! If you take the new 90 nm Pentium 4 generation processors, codenamed “Prescottâ€, the case is even worse because they use a 31-stage pipeline! Holy cow!
Because of that, Intel has already announced that their 8th generation processors will use Pentium M architecture, which is based on Intel’s 6th generation architecture (Pentium III architecture) and not on Netburst (Pentium 4) architecture. This arquitecture, called Core, can be studied on our Inside Core Micro architecture tutorial.
Actually that’s why ports 0 and 1 have more then one execution unit attached. If you pay attention, Intel put on the same port one fast unit together with at least one complex (and slow) unit. So, while the complex unit is busy processing data, the other unit can keep receiving microinstructions from its corresponding dispatch port. As we mentioned before, the idea is to keep all execution units busy all the time.
The two double-speed ALUs can process two microinstructions per clock cycle. The other units need at least one clock cycle to process the microinstructions they receive. So, Pentium 4 architecture is optimized for simple instructions.
