AMD Llano A-Series: Architecture Analysis - AMD Turbo Core 2.0 and AMD System Monitor

Indice articoli


AMD Turbo Core 2.0 and AMD System Monitor

TurboCore technology evolves from its first implementation in Thuban. In this architecture, the clock increase was made ​​only in very specific conditions, ie when 50% of the cores were idle, and only with an on/off logic, regardless of the TDP margin.

This is because the first implementation of TurboCore was very conservative, since it did not verify in real time the actual consumption. So the conditions for intervention and the margin of clock increase had to be kept very conservative to avoid exceeding the TDP even in the worst case.


016_Turbo_Core

 

This all changed with the TurboCore, version 2.0. First, it was introduced an unit called APM (Advanced Power Management), which digitally measures 95 signals for each core, plus others scattered throughout the chip, to estimate, with an accuracy above 98% (ie less than 2% of error margin) the power dissipated in a range of one hundredth of a second, or 10 ms.

Comparing this situation with that of Thuban, where the overclocking intervention was carried out in a modality so-called open-loop, ie where the effect of the increased clock on the TDP is not brought back and then you must be very conservative, the TurboCore 2.0 intervention mode can be classified as a closed-loop (with feedback), since it measures the actual TDP and is able to implement an adjustment.

The APM measures an estimate of the TDP by not taking into account the temperature and in a fully digital mode: essentially measures the percentage of activities of various units, obtaining the total consumption, knowing the maximum of each unit.

Confronted with the measuring method of the Intel counterpart, analog and strongly temperature dependent, we have a very reproducible pattern of the clock for Llano, although potentially not optimal because the maximum TDP could not be reached, because of safety margins, though present, for overclocking.

Another novelty is the clock increment mode of TurboCore. While in Thuban was of on/off type, in Llano uses a dithering algorithm, which is to calculate the percentage of time, of a preset time interval, the CPU should be clocked higher than the base state, so that the average power during the interval is as high as possible, while remaining below the TDP. This also implies that the "effective" frequency of the CPU is an intermediate value between the default value and the Turbo value. That is, the CPU behaves as if it runs at the intermediate frequency. Let's take an example with top of the range Thuban frequencies. The base frequency is 3.3GHz and the turbo frequency is 3.7GHz. Suppose that the maximum TDP is 100W, as for Llano. Suppose, finally, to have a workload that make the CPU absorb 90W at 3.3GHz, but 102W at 3.7GHz. If we implement the increase in the clock as Thuban, even with the APM, we can not increase the clock, because we know that we could not stay within the TDP. This algorithm calculates the optimal percentage of time that the CPU must be in the upper clock state. Suppose that the calculation says that with 90% of time in the turbo state, the TDP is 99W. Just below the limit. The effective frequency of the CPU is 3.3*0.1+3.7*0.9=3.66GHz. This is a great advantage over Thuban architecture and also over Intel architecture: in this case, the number of turbo states is limited and you must go in increments of 100 or 133MHz. With the mechanism of AMD, however, you can virtually have the whole range of effective frequencies ranging from the base, in which is guaranteed to not exceed the TDP, and the upper clock.

This allows you to maximize the capabilities of the dissipative system and allows you to select a higher frequency of Turbo: Thuban was in fact limited by the fact that it could change the CPU clock speed with a certain speed without being able to modulate, with a dithering algorithm, the time spent in the high clock state.

As for the Thuban, the higher clock P-state is an hardware state invisible to the operating system, which detects that the CPU is always in the state of maximum speed, but the default frequency.

This causes problems in the revelation of the actual frequency of the CPU and in fact there have been added some internal registers (the family of so-called MSR Model Specific Registers, read with the CPUID instruction), which allow you to calculate an average effective clock of the cores, because they automatically count the number of clock ticks seen by the CPU in a given interval.

In Llano only the CPU can increase its clock, even though AMD has reserved the possibility to also increase the GPU clock in future models (and it would seem that already the successor of Llano, Trinity, based on Bulldozer architecture on the CPU side, can implement it).


 

017_Turbo_Core

 

018_Turbo_Core

 

019_Turbo_Core

 

In the figures we can see the behavior of TurboCore in some typical scenarios. The GPU can only see its clock to be reduced or turned off completely, but in all cases take precedence: the P-state of the GPU depends only on the required load applied to it and the CPU sees the intervention of the Turbo limited by the total chip TDP, including GPU.

Until now, we have seen that the temperature of the chip has not actually entered into the turbo equation. But there is a protective mechanism that lowers the CPU clock below the default value if the temperature exceeds the critical values.

Apart from the failure of the cooling system, the only way that this can happen is compute-intensive scenarios, where both CPU and GPU are committed to the maximum. These programs can be specifically developed for stress testing or even OpenCL programs that engage to the maximum the CPU and the GPU.

 

AMD System Monitor

AMD has developed a monitoring software particularly suited for the new Bobcat and Llano APU, which also works with older systems, detecting and monitoring AMD components (CPU, GPU, APU and IGP) in the system.


020_AMD_System_Monitor

 

The application has two levels of detail that can be selected independently for the computing elements (CPU, GPU, IGP, and APU) and the memory, via the button marked by the word Details.

At the lowest level of detail in the section relating to the CPU, the application shows the usage as a percentage for the single cores. If there is an APU is also shown the division of resources between core and GPU.

In the GPU section the percentage of utilization of installed AMD GPUs (including discrete cards, IGP and graphics of the APU, manufactured by AMD) is displayed.

In the memory section the distribution of memory between the various types in the system (reserved in hardware, in use, modified, in standby) and free memory, expressed as a percentage of total memory, is displayed.

At the higher level of detail in the section relating to the CPU, additional detailed information are shown for each core, including the approximate operating frequencies.

In the GPU, in addition to basic information, the frequencies of the GPU and video memory and the current speed of the GPU fan (if present) are displayed.

In the memory section, in addition to basic information, additional details of the system, including the frequency of main RAM memory are displayed.

Corsair