Tuesday, November 13, 2012

TI combines ARM cores with DSP to accelerate applications in the cloud

Texas Instruments' new Keystone processors provide a combination of ARM Cortex-A15 cores and C66x DSP cores, along with a Secure Acceleration Pac for cryptography, a Packet Accelerator and a 1G/10G Ethernet switch, for accelerating cloud applications.

At this year's Mobile World Congress in February, Texas Instruments announced the semiconductor industry's first quad implementation of 28nm ARM Cortex-A15 cores in a small cell wireless base station on a chip, the TCI6636 SoC. For wireless applications, the Keystone II architecture combines TI's C66x DSP cores with ARM's most powerful application processor in their 32b v7 instruction set architecture (ISA) family, along with hardware accelerators for 3G/4G baseband applications. Today, at Electronica in Munich, TI will announce that they have leveraged their ARM-DSP hybrid architecture for a new family of processors, targeting cloud-based (or client-server) applications and purpose-built servers.

Tom Flanagan, Director of Technical Strategy for Multicore Processors at TI, says that his company's goal is not to replace the x86 processors which dominate the server world, but to supplement those servers for compute-intensive applications that can benefit from TI's floating point DSPs. Some of the targeted applications are weather modeling, high-performance financial analysis, and embedded vision. The new SoCs integrate control and application processing with Ethernet switching and packet processing functions, which can provide a lower cost, lower power solution for industrial and enterprise servers. The WCDMA and LTE processing accelerators of TI's base station SoCs are removed for these markets.

TI is also targeting network processors, such as the typically MIPS and Power-PC based devices from companies such as  Cavium.  Echoing some of ARM's recent comments regarding their push into the server market, Flanagan says that customers are looking for less proprietary solutions in their next generation systems, which can leverage the breadth of the ARM ecosystem. The new SoCs will provide a variety of ARM-DSP configurations, with the top of the line 66AK2H12 and 66AK2H06 having quad (or dual) Cortex A15s with eight (or four) C66x DSP cores, for high-performance cloud computing. Enterprise and industrial processors get one DSP core, along with a single (66AK2E02) or quad Cortex A15 (66AK2E05). For power networking applications, TI removes the DSP while still adding their security and packet accelerators and Ethernet switching to two (AM5K2E02) or four (AM5K2E04) ARM A15 cores.

As in the TCI6636 base station on a chip, all devices in TI's new family of Keystone processors integrate 4MB of shared memory for the ARM cores, with 1MB of dedicated L2 cache per C66x DSP core. TI's multicore shared memory controller provides an additional 6MB of memory in the high-performance computing devices, and 2MB in the enterprise/industrial and networking processors.

Texas Instruments was one of the founding members of the Embedded Vision Alliance (EVA), and Flanagan emphasized video and image processing as prime applications for the new devices in client-server networks. He sees a resurgence in demand for virtual desktop applications, with the popularity of thin clients such as tablets, which can benefit from performing advanced video processing on a remote server. Video conferencing and machine vision for industrial automation are target applications for the mid-performance tier of processors, with more compute-intensive functions executed in the floating point DSP core. Networking applications don't need the DSP, and control plane functions can be executed directly by the dual or quad ARM cores. Flanagan says that applications for the lowest performance tier SoCs are in Small to Medium Business (SMB) routers and switches, and private cloud infrastructure, such as industrial sensor networks.

TI doesn't utilize an ARM architecture license to customize the processor in their Keystone SoCs. The modifications which they make are around the processors in their proprietary interconnect, which is a 256 bit fabric that doubles the data bandwidth of ARM's standard AMBA bus. TI also has developed their own low latency memory controller. The Ethernet switch in the new processors can can support 1G or 10G networking. For multicore software development, TI supports C/C++ programming and the Open MP and Open CL standards for heterogeneous processors.

The first devices in the family to be available will be the 66AK2H12/66AK2H06, with sample quantities in December. TI is planning to offer evaluation modules (EVMs) for approximately $900 in the second quarter of 2013. Flanagan says that customers can begin developing their applications on the most powerful processors, and migrate to the AM5K2Ex and 66AK2Ex when samples and EVMs become available later, in the second half of 2013.


While offering the new hybrid ARM-DSP devices in several different configurations, weighed more heavily on the DSP side for compute-intensive applications, or solely on the the ARM side for lower power applications, TI will be competing with companies such as  Analog Devices that have announced more specialized devices for some of the same applications. In their ADSP-BF608 and ADSP-BF609 Blackfin DSPs, ADI is offering a device with specialized hardware accelerators for video analytics. In the financial market, High-frequency trading is such a lucrative business, where brokers pay to shave milliseconds off their trading time, that several companies have developed specialized FPGA-based High Performance Computers specifically for the purpose.

For gaming in the cloud, requiring accelerated graphics processing, NVIDIA CEO Jen-Hsun Huang announced his GeForce GRID initiative at the company's GPU Tech Conference (GTC) on May 15th. Huang proposes to virtualize GPUs the way that CPUs are virtualized today. At GTC, he also demonstrated virtual desktop applications, with a virtualized GPU enhancing Citrix XenDesktop. TI's Flanagan acknowledged that cloud gaming would still require a dedicated hardware GPU for graphics rendering. In his GTC keynote, Huang cautioned that GeForce GRID would fit for a "certain class of games", and not for enthusiasts. He was accompanied onstage for a demonstration with GAIKAI CEO David Perry. In August, GAIKAI was acquired by Sony. This was the same month that competitor OnLive shut its doors. Given the challenges of bandwidth, latency, and capped data plans from broadband providers, it remains to be seen if Jen-Hsung Huang's vision to "do for games what cable-TV did for videos" can come to reality.

Related articles:

No comments: