Choosing between DSPs and FPGAs for high-performance signal processing

 

By Edward Young, Managing Director, CommAgility and Paul Moakes, Technical Director, CommAgility (www.commagility.com)

 

Traditionally, the DSP has been the solution of choice for all high-performance signal processing. It provided excellent performance at reasonable power and cost levels; there was a big community of experienced DSP engineers; a large base of off-the-shelf code; and great support from third party vendors.

Over the last few years, though, FPGA vendors, such as Altera and Xilinx, have substantially improved their chips and development tools for signal processing. This means that today the choice between DSP, FPGA or other technologies is not so easy – and in fact, a heterogeneous system combining both DSP and FPGA can often be the best solution.

FPGAs for signal processing

The number one benefit offered by FPGAs is their efficiency in concurrent applications by using multiple parallel processing blocks. Coupled with their flexibility to allow the embedded systems designer to tailor the device to match their application’s demands as closely as possible, FPGAs can achieve the highest possible throughput with low cost per channel.

The FPGAs’ flexibility has traditionally come with an additional cost in power due to the increased gate count and silicon area of non-optimised solutions in comparison to hard-wired architectures. However, 65-nm technologies and the use of equivalent ASIC technology for volume manufacture mean that FPGAs can be low-power in the lab, and power-reduced further in volume.

The per-channel power of an FPGA may now be well be below that of DSPs, even though the chip-level power dissipation is higher. DSPs typically consume 3-4W and FPGAs 7-10W but FPGAs can handle 10 times the channel density.

Acknowledging the advantages of DSPs has seen a shift in recent years to FPGAs incorporating DSP technology, for example Xilinx Virtex-5 SXT devices. This enables the FPGA to incorporate DSP algorithmic processing for tasks which are not naturally parallel. Such “DSP-enabled” FPGAs have shown huge throughput advantages for certain types of signal processing, which has been reflected in their success in the high-end processing market. However, FPGAs are in general ill-suited to processing sequential conditional data flow.

DSP strengths and weaknesses

Year on year, high performance DSPs continue to develop with faster clock speeds and multi-core solutions. High-performance very-long instruction word DSPs provide high clock rates and independent execution units to get the maximum speed.

Experienced developers have also built up a wealth of field proven application code to run on the DSP cores. DSP development cost is relatively low, and as a mature technology it can be argued that it has a lower risk and faster time-to-market than FPGAs and other signal processing technologies.

DSPs can be attractive for many applications which are based on emerging standards, which often change frequently and rapidly. As DSP algorithms can be readily implemented in an accessible language such as C, it is easier to update the code to reflect changes in the standards as they occur. In addition, the complex nature of many of the signal processing algorithms in applications such as the latest wireless standards often make them more suitable to implement using a DSP, and it is much easier for a DSP device to change the processing algorithm on-the-fly by calling a different software routine; while modern FPGAs can be reconfigured quickly, to achieve this dynamically while continuing to process data is a complex and challenging task.

DSPs are also improving their performance in the field of power. Led by the demands of the hand-held market, some next generation high-performance DSPs are incorporating power management techniques from their little-brothers. This allows overall system power dissipation to be reduced during times of low traffic or to prevent over-temperature. A power and temperature-aware FPGA configuration could, of course, manage its clock domains in a similar way, but at the cost of greater development effort.

However, the DSP is not particularly well suited to parallel processing tasks: multiple devices can be required for tasks which easily fit into a single FPGA.

Performance compared

It is worth looking at independent benchmarks, and in particular those published in 2007 by BDTi in an analysis of FPGAs in DSP applications (see www.bdti.com). The BDTi tests looked at cost/performance in a typical multi-channel communications application. The results are clear-cut, with the FPGA delivering a cost-per-channel figure of better than 20x compared to the DSP.

The BDTi benchmark did use FPGA devices in a “highly parallel architecture… designed specifically for the application”, but even taking this into account the results are clear. This does not mean that FPGAs are necessarily best for high-performance signal processing applications, but certainly demonstrates they can have clear performance advantages over DSPs in some circumstances.

One application that places high demands on signal processing hardware is wireless baseband processing. Taking for example the processing of WiMAX Orthogonal Frequency Division Multiple Access (OFDMA) channels, a pure DSP solution cannot match an FPGA in the bandwidth and number of channels it can process. Consequently the DSP solution may have an unacceptable cost and power per channel.

To improve DSP performance in specific algorithms, vendors have introduced hardware cores to handle some processing traditionally off-loaded to FPGAs. For example TI’s TCI6482 DSP includes Viterbi and turbo decoder co-processors for 3GPP and 3GPP2, while the multi-core TCI6487 DSP also includes a direct Common Public Radio Interface (CPRI)  / Open Base Station Architecture Initiative (OBSAI) interface which can be chained between DSPs.

Software and tools

The technology is only one aspect of reaching a successful design. The development tools available also play a vital role in ease of use and time-to-market.

Programming FPGAs is difficult, even after selecting a hardware-oriented language such as Verilog or VHDL. FPGA solutions can take an order of magnitude longer to code than DSP solutions which impacts development costs and increases time to market.

C-based synthesis tools have yet to deliver the ease of use and performance of C-coded processor solutions. High-level representations such as Simulink block diagram synthesis are not currently widely adopted and old FPGA synthesis methods still persist, especially where maximum performance is required.

Another important factor is the IP cores and software libraries geared at particular target applications which are often provided by vendors. These can alleviate some of the reliance on in-house development of complex algorithms using the vendor tools and further reduce time-to-market.

The importance of interconnects

Let’s return to our earlier example of wireless baseband processing. When supporting multiple-input multiple-output (MIMO) systems with channels encoded using spread-spectrum techniques such as CDMA, data from all radio antennas has to be available to all baseband processing blocks. To achieve good performance, the key is an efficient low-latency interconnect.

Serial RapidIO (SRIO) is becoming increasingly popular in this kind of multi antenna system, as it has a lower protocol overhead compared to Ethernet, and supports multiple masters, unlike PCI Express. SRIO’s multicast feature is also very important in distributed systems for this kind of application.

SRIO is also well-suited to the needs of other high-performance signal processing applications, including radar, imaging, and signals intelligence. Here also multicasting can be a useful feature, for example in video processing applications such as IPTV servers, where data is sent to multiple DSPs.

FPGA solutions can suffer when accommodating external interfaces. The number of logic elements taken to implement a SRIO interface today runs to several thousand gates, which comes at a premium in comparison with the DSP’s hard-wired interfaces. This point is not lost on the FPGA vendors, for example the Xilinx Virtex-5 introduces a hard core PCI-Express interface. An elegant way to avoid this cost is to use an FPGA as a co-processor to a DSP, connected via the DSP’s external memory interface bus, which allows data to be DMA’d to and from the FPGA at little cost in logic elements or DSP processor overhead.

Hybrid Multi-processor Systems

From a design engineer’s point of view, this neck-and-neck technology development of FPGAs and DSPs is enabling them to find new and better solutions for signal processing applications. There is no simple answer as to whether FPGAs or DSPs are superior, and for many applications the best approach is a hybrid system, including both technologies to provide a solution that is superior to the sum of its parts.

 

Figure 1: Typical blade-level subsystem

Figure 1 shows a typical blade-level subsystem, which includes four Texas Instruments DSPs and one Xilinx FPGA. In addition to EMIF connections from the DSPs to the FPGA to allow co-processing with minimal overhead, it has a full SRIO architecture allowing its use for radio data distribution and as a low-latency direct memory access between devices, both on and off-card.

The scalability of the Advanced Mezzanine Card (AMC) form factor extends across the whole chassis, especially when systems are built using SRIO as the primary data transport. In either the Advanced Telecom Computing Architecture (ATCA) or MicroTCA chassis system, integrators have the option of mixing and matching DSP-centric and FPGA-centric blades to get the right balance of technology.

To develop efficient hybrid systems, protocols such as SRIO and standards such as AMC enable designers and system integrators to manage the balance both at the blade and system level.

CommAgility’s strategy is to provide a scalable range of AdvancedMCs with varying numbers of FPGAs and DSPs. This includes the AMC-D4F1 (with four Texas Instruments TMS320C6455 DSPs and one Xilinx Virtex-4 FX series FPGA), and the soon-to-be-announced AMC-D1F3, which reverses the balance to include one DSP and three FPGAs. This allows developers to vary the technology used depending on their overall processing requirements, stage of application development and optimisation, and experience with the existing code base for DSPs and FPGAs. Using SRIO both on card and in the chassis allows the various elements to be brought together; the AMC-D4F1 provides two high-speed 10Gbps link off the card, achieved using two 4x SRIO interfaces.

In MicroTCA, the imminent ratification of the AMC.4 specification for SRIO distribution from an AMC backplane will be the final piece in the system jigsaw, although this hasn’t prevented a SRIO AMC eco-system flourishing already in preparation, with multiple vendors now providing SRIO support for both MicroTCA Hub Carriers, and control and signal processing AMC cards.

Beyond DSPs and FPGAs

For high-performance signal processing applications, of course there are other options. Used together, DSPs and FPGAs provide strong competition for all of these alternatives, but they are worth considering.

One disruptive technology for signal processing is the development of massively parallel processors such as picoChip’s multiple-instruction multiple-data architectures. They do not require particularly high clock-speeds to outstrip the performance of DSPs.

However, to configure them requires the use of the vendor’s proprietary toolset. Developers used to using off-the-shelf algorithms from a range of sources, both open source and commercial, may not wish to trust their project’s success to vendor fixes and enhancements.

Application Specific Integrated Circuits (ASICs) and Application-Specific Standard Products (ASSPs) are also well-suited to certain signal processing tasks, but the high up-front costs of these technologies rule them out in many cases. For high volume applications, they are certainly worth considering.

Conclusions

Recent developments in FPGA technology redress many of the long-held preconceptions about their use, and have met many engineers’ concerns about power, cost and complexity. Developing signal processing applications on FPGAs still requires significantly more effort than for DSPs, even with the high-level development tools and libraries available from FGPA vendors. Finding the right engineers with DSP and system-level experience to develop applications on FPGAs can also be tricky.

The key advantages of the DSP are reduced development time for new and complex algorithms, and flexibility to run many different algorithms. For the FPGA, its number one benefit is efficiency gains from parallel processing. In many applications, such as image processing and wireless baseband processing, there is a mixture of these repetitive, simple processing tasks that are best suited to an FPGA and more complex and less predictable tasks that are perhaps better handled with a DSP. This means that a hybrid system containing DSPs and FPGAs can often provide the best overall solution.

In the near future at least, FPGAs are unlikely to eliminate the need for DSPs. Additionally as parallel processing blocks implemented in FPGAs become mainstream, they are increasingly likely to be integrated into the DSP vendor’s silicon. FPGAs and DSPs should therefore be seen as complementary not competing technologies.

 

Figure 2: Software view of a SRIO system using the RapidFET system management and analysis software. A combination of end-points and switches can be seen. The system contains various cards including the CommAgility AMC-D4F1 card.

Figure 3: N.A.T. management software view of a MicroTCA system including the CommAgility AMC-D4F1 card.

 

Figure 4: View of AMC-D4F1 baseboard showing TI DSP, Xilinx FPGA, Tundra SRIO switch, BroadCom Ethernet switch and various other components.

 

Figure 5: CommAgility’s AMC-D4F1 module (incorporating four TI DSPs and one Xilinx FPGA).