Thursday, February 23, 2012

Samsung at ISSCC: Quad-core Exynos apps processor relies on skillful analog IC design.

Samsung's new 32nm Exynos Application processor will be
produced in both dual-core and quad-core configurations.
At the International Solid State Circuits Conference this week, Samsung provided details of their next-generation Exynos mobile applications processor in their paper on "A 32nm High-k Metal Gate Application Processor with GHz Multi-Core CPU". Dr. Se-Hyun Yang, Principal Engineer for SoC Development at Samsung, delivered the presentation on Exynos, and conducted a demonstration of the device in a smartphone reference platform during the Industrial Demonstration Session (IDS) on Tuesday evening.

Exynos is either a Dual or Quad ARM-v7A Cortex A9-based architecture, which Yang says is capable of operating up with a CPU clock at up to 1.5GHz, and down to as low as 200MHz. Each CPU core includes a full hardware vector floating-point unit and a 64-bit ARM 64-bit NEON Single-Instruction Multiple Data (SIMD) multimedia engine. The cores share a 1MB L2 cache, with a Snoop Control Unit (SCU) to manage communications between the cores and the memory subsystem. The Graphics Processor Unit (GPU) uses quad pixel processors plus a geometry processor and 128KB of dedicated L2 cache, and supports OpenGL ES 1.1/2.0. The DRAM controller provides a 6.4GB/s (i.e. 16b @400MHz) dual-port interleaved DRAM interface for LPDDR2, DDR2 and DDR3 memories.

Managing Power
Yang said that the migration to Samsung's 32nm High-K Metal Gate (HKMG) process, with 1/100th the gate leakage of the predecessor 45nm poly-Silicon gate process, was the first key to managing power on the new applications processor. Process engineers can target either performance or low power in the HKMG process. With the same leakage as the 45nm process, a 32nm HKMG design would yield a 40% improvement in delay, said Yang. Alternatively, by targeting the same timing performance, leakage would be reduced by a factor of 10.

In Exynos, each voltage domain has independent voltage and frequency options, with additional power-gated sub-domains to enable blocks to be completely shut off. The processor employs a total of four major power domains: for the CPU cores and L2 cache, GPU, memory interface, and audio/video IP blocks. The CPU cores have their own power sub-domains, and the L2 cache is split into two sub-domains with retention registers. Each core can be turned on and off independently, or put into a state with half of the cache turned off. Each of the audio/video media IP blocks can also be turned off independently.

Designers control the power of the L1 cache by placing power-gating switches around the periphery at design time, so they can minimize voltage drop on the rails. Samsung integrates power switches directly into the L2 cache, in order to minimize die area.

Yang stated that games are the most challenging applications for power management in mobile processors. In a dual-core example, he showed how the processor activity ramped up from a 20% load to 95%. Samsung utilizes both Dynamic Voltage and Frequency Scaling (DVFS), and power gating, to manage the workload for such applications. With the "hot plug" capability of the Exynos processor, individual cores can be dynamically turned on or off as needed, down to a standby all-core off state with the L2 cache in a retention state.

Analog circuit design improves yield and lowers power
In comparison to other designers of application processors, who are employing a "Big-Little" architectural mix of high-performance/low-performance cores to enable lower power operation, Samsung has taken advantage of analog circuit design to tune power and performance in the Exynos processors. By applying both positive and negative body-biasing, Samsung is able to adjust for process variations, and hence improve yield.  Since leakage and performance are negatively correlated, if measurements from on-chip sensors that designers have distributed throughout a Exynos die show that the device represents a Slow-Slow (SS) process corner, positive bias is applied - up to the limits of the leakage specification. Alternatively, samples from Fast-Fast (FF) process corners, with high leakage, receive negative bias adjustments. During standby power-down modes, Samsung applies negative body biasing throughout the chip to extend battery life. Samsung's measurements showed that the use of forward body-biasing (FBB) on SS devices yielded an average of 13.5% performance improvement. On FF samples, negative body-biasing (NBB) resulted in 21% less leakage.

Analog to bring down your high temperature
Yang said that a little-discussed issue affecting application processors, but one that is more critical as performance has increased, is that they now are running hotter. With the popularity of CPU-intensive applications like 3D graphics gaming, there is a real danger that an application processor can burn-out, or at least reach excessive surface temperature within the confines of a mobile device that could impact reliability and usability. To address the issue, Samsung developed a Thermal Management Unit (TMU), which monitors thermal sensors throughout the Exynos chip to detect hotspots, applying thermal throttling through DVFS mechanisms or tripping a shutdown of the chip if necessary. A side benefit of the thermal management is to affect a further reduction in power dissipation.

In the IDS demo, Yang showed a head-to-head comparison between a dual-core 32nm Exynos, and the same applications running on the 45nm dual-core chip. The dynamic power consumption of each platform was monitored in real time, to show the advantages of the new design. With 1080p video encoding, Samsung reported a 26% performance improvement in frame rate generation over the previous generation design. In a set of benchmarks that measured battery life with just the home screen, while playing a 720p movie, and 3D graphics rendering, 34-50% longer battery life was achieved. 

Facing your peers - an integral part of the ISSCC experience
The Samsung presentation drew the attention of a number of competitors, both in the areas of application processor design and process technology. An engineer from Texas Instruments was curious about the body-biasing, asking whether deep N-wells were employed to isolate the substrate from the transistors. Dr. Yang at first hesitated, deferring to company-imposed limits on what he could say, but with the insistence of the questioner he revealed that deep N-wells are used for body biasing.

An engineer from Broadcom wanted to know more about the details of the body biasing, such as the range of voltage variation that was used. After more persistent questioning, Yang stated that a range of 0.1v to 0.6v had been tested.

Another engineer from NVIDIA, where the digital solution of a "4-PLUS-1" architecture is being used to manage power/performance in Tegra-3 application processors, was curious to know how long it takes to switch cores from the inactive to active states. The initial answer: "in the micro-second" range. Once again, the questioner insisted on a more detailed answer "a couple of microseconds, or tens of microseconds",  to which Dr. Yang replied "more than a couple".

From TSMC came a question about the voltage variation used in the DVFS scheme. The operating range for the nominal 1.0v supply is just a few percent, but the frequency range is as had been stated in the beginning of the talk, over a large range of 200MHz to 1.5GHz. Finally, an engineer from Intel was curious about the chip size? The answer: "You will have to wait for the press release, but you will know soon". This announcement could come as soon as the Mobile World Congress, in Barcelona next week.

At the IDS session, Dr. Yang said that the 32nm-to-45nm comparison demo, which measured just the CPU and GPU, was an appropriate indicator since these are the major sources of power consumption in the device. Any other differences, such as in the IP blocks, were not as significant, he said. While the 34-50% gain in battery life with the 32nm Exynos is significant, the Quad-core configuration would therefore lose much of that difference, so we may see the Dual-core version in smartphones, but the Quad-core primarily in devices with larger batteries - such as the Galaxy tablets.

Related Story:

No comments: