Thursday, April 4, 2013

Tale of the tape outs: ARM adds the Cortex-A57 to array of FinFET test chip projects

While the foundry industry is just beginning to develop chips in a (last of Moore's scaling law) 20nm process, which adds the cost and complexity of Double-Patterning Technology (DPT) in order to achieve the smaller feature size, a race is on to catch up with Intel's more advanced 3D transistors, using FinFET technology. On Tuesday, April 2, ARM announced that they had completed a "tape out" of their highest performance next-generation processor, the Cortex-A57,  targeting a 16nm FinFET process which TSMC currently has in early development. The Cortex-A57 supports ARM's new AArch64 64-bit architecture, which designers can utilize as the "big" processor in ARM's big.LITTLE configuration, paired with the more efficient Cortex-A53, or build in multiple quad configurations for standalone high-performance applications.

ARM's latest announcement follows their December 2012 press release with Samsung, in which they described a tape out for that foundry's 14nm FinFET process, based on the high-efficiency Cortex-A7 processor. ARM collaborated with Cadence Design Systems to develop the EDA tool flow for both projects. In February, GLOBALFOUNDRIES announced a projection of simulated performance, power and area for an experimental tape out for a dual-core ARM Cortex-A9 processor, based on that foundry's 14nm-XM process design kit (PDK). GLOBALFOUNDRIES said that they expect the 14nm implementation to be capable of a 61% higher speed than the same processor in 28nm-SLP technology. Alternatively, at the same clock frequency, simulations showed that the power consumed by the 14nm design could be lowered by 62% compared to 28nm. In August 2012, ARM extended their collaboration with GLOBALFOUNDRIES for the 20nm planar and future FinFET process. GLOBALFOUNDRIES is a member of the Common Platform Alliance with Samsung and IBM, which has a goal of collaborating on development of common process technology for all three companies.

Ron Moore, Director of Strategic Accounts Marketing at ARM, says that this first implementation of the Cortex-A57 on 16nm FinFET will help the company to optimize their Processor Optimization Packs (POPs), which provide licensees with Artisan Physical IP logic libraries and memory instances. Moore says that ARM and TSMC will take the 16nm test chip to fabrication. At this early stage in foundry FinFET development, ARM and Cadence were limited to a "0.1 revision" Process Design Kit (PDK), which designates the first attempt by the ecosystem partners to assemble a usable flow, more as a learning exercise than as a production ready methodology. In the past, a "tape out" indicated that a design was signed off for production. Now, because of the complexity of advanced node process development, the first  "tape outs" are only intended as test chips.

ARM has multiple objectives for this early collaboration, says Moore, starting with getting experience with how the Place and Route tools will work for their cores in a FinFET process. With the numerous new manufacturing steps that will be employed for FinFETS, ARM must evaluate the impact on power, performance, and area, before being ready to hand off their IP to customers.

The 3-way collaboration of foundry, IP vendor, and EDA providers acts as a virtual Integrated Device Manufacturer (IDM). TSMC gets to test their process with a large functional portion of what will be a typical SoC, eventually getting feedback from the test chip silicon. The EDA vendors gain early access for learning the modifications they must make to have their tools ready along with the process.

Moore said that the team just aimed for a functional test chip at this stage, since it is too early to evaluate process corners. The 0.1 PDK will not be sufficient to discern performance. It will be at least a year before we begin to see real designs put into a FinFET process. Samsung has said that they are planning to offer risk production in their 14nm FinFET process by the end of 2013.

Related articles:

Monday, February 25, 2013

Broadcom, Maxim, Freescale and TI increase focus on small cells at Mobile World Congress

Small cells and heterogeneous networks continue to be hot topics at this year's Mobile World Congress (MWC), which kicked off today in Barcelona. We are seeing announcements from several chip vendors who are expanding their offerings in this space, with an increased focus on supporting wireless operator plans to ramp up deployment for LTE metro, enterprise and residential applications.

Broadcom has announced that they are sampling a new family of dual-mode 3G/LTE SoCs, the BCM617xx series. The BCM617xx consist of three products, the BCM61760 is designed for high capacity metro cells, the BCM61750 for enterprise small cells, and the BCM61730 for residential.

Greg Fischer, Broadcom’s VP and GM for Broadband Carrier Access, says that the BCM61760 can support 100 to 200 simultaneous LTE users, and 32 to 64 for 3G, depending on the combination of radio access technologies that an operator employs. The BCM61730 can support 8 to 16 users in a residential setting.

In LTE mode, the BCM617xx enables carrier aggregation, for a maximum total of 40MHz of channel bandwidth. The SoCs support LTE CAT-4, with 150Mbs Down Link (DL) and 50Mbps Up Link (UL) data rates, and 128 active users, or 3G data rates of 42Mbps DL and 11Mbps UL, with 32 simultaneous users. Both Frequency Division Duplexing (FDD) and Time Division Duplexing (TDD) modes are supported in the BCM61750 and BCM61760.

Broadcom has partnered with Radisys to support their 4G Trillium LTE TOTAL eNodeB software on the new 4G LTE small cell modems. The software integration provides a pre-integrated solution for implementing Radio Resource Management (RRM), Self-Organizing Network (SON), Operations/Administration and Maintenance (OAM) and 3GPP-compliant protocol stacks.

For the existing 3G WCDMA residential market, Broadcom is also introducing the BCM6163, which integrates the digital base band processor and RF transceiver into a single SoC. The BCM6163 provides HSPA data rates at up to 21.6 Mbps. Broadcom is planning volume production for the BCM61630 later in the first-half of 2013.

Eric Hayes, Broadcom VP of Marketing for Processor and Wireless Infrastructure, says that the company has also developed a multi-mode small cell platform that leverages the latest 28nm processors from their former NetLogic group, which will enable customers to build more complex systems for Multiple Radio Access Technology (multi-RAT) networks. . At MWC, Broadcom is demonstrating a combined 3G/4G/"5G" solution, the "5G" being Broadcom's marketing label for IEEE 802.11ac WiFi, which operates in the 5 GHz band. Hayes says that the XLP-208 multicore processor can manage all radios and data traffic, and provides scalablity for delivering edge-of-network revenue-generating services, such as content caching, ad insertion, as well as supporting traffic management for tuning backhaul links. Broadcom says that the development platform includes Broadcom’s complete portfolio of backhaul devices for optimized traffic management, including x-DSL, x-PON and wireless.Broadcom is sampling the platform now, and is planning to support volume production in Q2.


Thursday, February 21, 2013

Samsung details Exynos Octa at 60th ISSCC

Samsung discussed details of their Exynos Octa processor design at the
2013 IEEE International Solid State Circuits Conference

Samsung raised the bar for CPU core count in mobile application processors at the 2013 IEEE International Consumer Electronics Show in Las Vegas, with their announcement of the next-generation Exynos Octa. The Exynos Octa will incorporate ARM's big.LITTLE configuration times four, with one large quartet of high-performance Cortex-A15 cores trading off operation with a smaller quartet of power-saving Cortex A7s. This week, at the 60th International Solid State Circuits Conference (ISSCC) in San Francisco, Samsung's lead designer provided some more detail on the new SoC, in a presentation of "28nm High-K Metal Gate heterogeneous quad-core CPUs for high-performance and energy-efficient mobile application processor".

Tuesday, February 12, 2013

ARM and Synopsys collaborate for Cortex-A57 virtual prototype

In late October last year, at ARM's TechCon event in Silicon Valley, ARM announced the first processor cores to be built with their next-generation 64-bit v8 architecture, the Cortex-A57 and Cortex A-53. Now, in order to enable early development of software in parallel with hardware development, Synopsys has announced the addition of the ARM v8 processors to their Virtualizer Development Kits (VDKs). Tom De Schutter, Senior Product Marketing Manager for System Level Solutions at Synopsys, says that the two companies have extended their collaboration agreement to include ARM's Fast Models for the v8 architecture.

Customers can use the VDK to boot-up operating systems, and to develop firmware and device drivers, prior to availability of silicon or FPGA hardware prototypes.The virtual models support analysis of multicore architectures, and provide developers with tools to optimize their code for maximum energy efficiency. Semiconductor companies can develop virtual models to provide to their customers for application development, without giving away any details of the chip design.For a complete SoC, users can combine Synopsys DesignWare IP models with the ARM core VDKs.

Synopsys and ARM are initially making Cortex-A57 virtual prototypes available, and the company's roadmap is to add the Cortex-A53 and support for the ARMv8 big.LITTLE methodology later. The emphasis at this point is still primarily on mobile device applications, with support for a Linux kernel and the Android operating system on the virtual platform. Schutter says that some of Synopsys' customers are using the virtual models to develop Windows on ARM, but that is not an out-of-the-box solution. The VDK supports use of ARM's debugger, along with tools from Lauterbach and GNU.

Synopsys made no mention of development of VDKs for ARM server applications, and is initially targeting customers who will be migrating from the ARM v7 32-bit architecture. The company is doing some initial exploration of server-type applications, says Schutter, such as utilizing Ethernet-connected VDKs to develop communications and network interfaces between multiple ARM-based processors.

Related articles:

Tensilica adds Image/Video Processor to DSP IP core offerings

Tensilica's IVP is a complete image and video processing subsystem

At the 2013 International Consumer Electronics Show (CES) in January, semiconductor companies engaged in a daily competition to one-up each other with announcements of their next-generation application processors. Advances in imaging and video capabilities were central to those announcements, as NVIDIA rolled out Tegra-4 with 72 GPUs, along with a dedicated computational photography engine for High Dynamic Range (HDR) still image and video recording. Qualcomm followed with their announcement that the Snapdragon 800 Series will support UltraHD video. In their CES press conference, Intel demonstrated advances in User Interfaces (UI), including recognition of finger gestures and eye movement.

It is ironic that Texas Instruments predicted many of these developments three years ago, in a plenary talk ("Harnessing Technology to Advance the Next-Generation Mobile User-Experience") at the International Solid States Circuits Conference (ISSCC), and they now exit the mobile device segment just as the applications are becoming a reality. Greg Delagi, the TI Senior VP who delivered that talk at ISSCC in February 2010, now says that his company will focus on the embedded market (such as "smart cars"), where embedded vision and graphics applications are also expected to grow rapidly but with less volatility and pricing pressure.

Tensilica introduces IVP

DSP silicon IP provider Tensilica is gearing up for these emerging opportunities, and will be demonstrating a new Image and Video Processor (IVP) at the Mobile World Congress (MWC) later this month. Chris Rowen, Tensilica’s founder and CTO, calls the IVP a "major new thrust" for his company, which previously focused on audio and baseband DSP cores. Rowen says that the targeted applications for IVP have common requirements for very high pixel rate and high levels of operations per pixel, along with the need for ease of programming to enable designers to change and upgrade image processing algorithms that continually evolve. The new imaging and video applications that the likes of Intel, Qualcomm and NVIDIA are planning to support require performance to improve at a faster rate than Moore's Law. For example, UltraHD video requires 4X the processing of 1080p video, while applications such as feature recognition or image tracking and identification demand 10X the horsepower of today's Image Signal Processors (ISP). According to Rowen, new architectures are required in order to meet the performance needs with fixed power budgets.

Tensilica will be offering their IVP as a complete subsystem, including the synthesizable core, memory system, and an instruction set which supports 8, 16, and 32 bit pixel processing. The IVP architecture will be scalable by number of element engines and processors. The company's lead customers have had early access to the IVP since last year, and with the MWC announcement Tensilica is opening it up for licensing to a broader customer base. Tensilica will be competing with UK-based silicon IP provider Imagination Technologies, known for their Power VR GPUs, who demonstrated their PowerVR vision ISP at CES, also targeting embedded vision and UltraHD video processing.

IVP Architecture and Software

A key component of any image processor is the memory subsystem. Moving massive amounts of data for frame-to-frame analysis can result in bottlenecks and processing latency. In their IVP, Tensilica allocates two 512 bit wide memory ports to the core processor for access to local data RAM, and adds a μDMA controller with an additional 512 bit wide port, for a total of 192 (8 bit) Bytes of data access per cycle.

The parallelism of the IVP Single-Instruction Multiple Data (SIMD) architecture allows for fetching of 16, 24, or 96 bits per cycle, followed by issuing up to 3 commands to the Xtensa control processor, or 4 pixel processing commands per cycle to a set of 32 element engines. Each element engine contains three 16b ALUs, a 16b X 16b multiply-add and a 16b variable shifter, along with dual 16b/8b load/store, and multiple register files. The end result is that each element engine is capable of up to 96 ALU operations, 32 multiples, or 64 element loads per cycle. A memory data rotator takes care of allocating the 512b data slices per cycle to the proper element engine.

Customers who license the Tensilica IVP package will get the synthesizable RTL and  EDA tool scripts, along with a reference testbench and set of test cases, design tools, and a Software Developer Kit (SDK). Tensilica provides an Instruction Set Simulator (ISS), Fast Function Simulator (TurboXIM),  System-C based system modeling tools, and the Xenergy energy estimator. The Software Developer Kit (SDK) provides the Xplorer IDE GUI, and a GNU Software Toolkit. The Xtensa C/C++ (XCC) Compiler includes the capability for auto-vectorization of developer's code. Tensilica includes a set of reference image processing applications and a library of operating system kernel functions.

Tensilica will demonstrate the IVP at MWC on a FPGA prototyping system, based on Xilinx Virtex-7 devices, which they will make available to customers in the coming months. The FPGA development kit includes an image sensor interface and FPGA Mezzanine Card (FMC) from Dream Chip Technologies. Tensilica has also lined up a set of image processing software partners for the IVP launch, including Morpho, Irida Labs, and Almalence.

Related Articles: