AMD Llano A-Series: Architecture Analysis - Floating Point Unit

Indice articoli


Floating Point Unit

Floating Point Unit (FPU) contains two components, the floating point scheduler and floating point execution units

 

Floating Point Scheduler

The scheduler is able to accept 3 macro-ops for cycle in any combination of supported instructions: x87 floating-point, 3DNow, MMX, SSE1/2/3/4a.

It manages the register renaming and has a buffer of three lines of 14 elements each, for a total of 42 elements (the previous Stars architecture had 3 queues of 12 elements for a total of 36 macro-ops: here longer queues mean more performance too).

It also runs the superforwarding, which is to forward the results of a read from memory operation to dependent operations at the same clock cycle, without waiting for writing to registers, as is the case for the regular forwarding that normally occurs in this unit and the integer unit, through the use of the result buses.

Also performs micro-ops forwarding and out of order execution. The scheduler communicates also with the ICU to retire the completed instructions, to manage the results of various floating point conversion instructions to integer and vice versa (the two integer and floating point units use a 64 bits bus to communicate) and to receive results nulling commands due to incorrect prediction of a jump.

 

Floating point execution unit (FPU)

The FPU handles all register operations of x87, 3DNow, MMX and SSE1/2/3/4a instruction sets.

The FPU consists of a stack renaming unit (because x87 does not work with registers but with a stack architecture of 8 elements, inherited from the first external x87 FPU), which allows translating stack access to an 8 registers access, a conventional register renaming unit, a scheduler, a register file, which contains the physical registers and execution units each capable of processing up to 128 bits per clock cycle. The figure shows a block diagram for the flow of data of the FPU.


 

008_pipe_FP


As is shown in Figure the FPU consists of 3 units (FADD, FMUL, and FSTORE).

The FADD unit is able to carry out any instructions of addition and subtraction, both integer SIMD (3DNow!, MMX and SSEn), both floating point, and the types of movement between registers, and shuffle (exchange) and SSEn extract.

The FMUL unit can execute all the instructions of multiplication, division, inverse, square root and transcendental, both integer SIMD (3DNow!, MMX and SSEn), both floating point, and the movement between registers and some simple types of shuffle and extract.

The FSTORE unit is used, alone or in tandem, in each instruction that requires a memory access and is also capable of movement between registers.

Corsair