Welcome to the New World of Multi-Core Chips

OPEN PLATFORM

Things are about to change dramatically for software developers and application architects who are responsible for building high-performance systems in the financial world. For decades, developers could rely on a constant stream of incrementally improved processor speeds from Intel, Advanced Micro Devices (AMD), IBM, and Sun Microsystems to increase the performance of their software.

By Benjamin Freidlin

These processor improvements would use the same underlying instruction set and simply increase the clock speed, thereby giving developers instant performance gains with little if any software re-factoring required. If your in-memory message processing system could parse through 5,000 messages per second, throwing the latest Intel Pentium chip at it might increase that number by 10 to 15 percent, not counting external dependencies such as network traffic and memory latency.

Software developers have been living in a world that is highly abstracted from the underlying hardware. We've seen this in the trends over the past 10 years to move toward so-called managed languages such as Sun's Java and now Microsoft's .Net framework. The implementers of these language runtimes have been able to shield developers from subtle but powerful processor innovations at the instruction level, merely by dynamically leveraging these features using runtime compilation.

Single-Core Power Waning

While systems written in these new languages can still benefit from the automatic use of instruction set improvements, the increase in processor clock speeds will no longer equate to automatic increases in application performance. This type of free hardware-driven benefit will become less and less common in the coming years as the major chip designers continue to make multi-core chips the norm. Multi-core chips have been available during the last year from AMD and Intel, specifically as dual-core offerings.

Multi-core chips essentially combine more than one CPU core onto the same chip surface and, in the best case, provide an architecture alongside the cores that take full advantage of this fact. A multi-core chip is a close cousin to a multi-processor set-up, except in the case of a multi-processor box, the cores are on separate chips and are often separated by a less-than-optimal bus.

So why are chipmakers turning to multi-core chips and what does it all mean for developers? In essence, you can boil down the rationale for multi-core chips to the issue of the performance limitations of classical chip designs. Intel and AMD have spent many years and invested billions of dollars leveraging a design principle that relies on extremely long pipeline architectures. Part of this was driven by marketing because everyone (except semi-conductor enthusiasts) usually equates higher clock rates—in the form of MHz and later GHz—with better performance. In order to continue selling chips that could scale to faster clock rates, designers had to rely on increasingly long instruction pipelines. The eventual consequence was that it became increasingly difficult to hide the effects of the flaws of these lengthy pipelines from software that ran on them. Chip designers came up with many clever tricks but, at a certain point, there was little they could do.

There were other constraints as well, such as electronic signal leakage that occurred when transistors were placed increasingly close together on the chip surface, as well as the widening gap between the rate of instructions executed per second and the decrease in latency and throughput to main system memory. The latter problem plagues developers constantly when facing problems that require working on a large set of discontinuous data in real-time. It's often the case that algorithms, including those for trading executions, can be written that are mathematically efficient but, when executed on a computer, show that the memory system cannot feed the CPU fast enough, thereby wasting a good portion of the power already inherent in many hardware platforms.

When these problems were assessed together, the prospects for classical superscalar, single-core chip architectures looked grim. Reality eventually sunk in—first with AMD and later Intel—chip designers began to search for a better way to take advantage of the growing transistor budget. They decided to scale back the clock rate of one chip core by a small amount and make room for a second chip core on the same surface. They then implemented a powerful bus interface between the two chip cores and the chip's instruction caches, providing a powerful package that increases the available computing power twofold.

But simply replacing a single-core CPU with a similarly clocked dual-core version will not make a difference to your application unless a number of factors are in place. First, the new multi-core chips will only directly benefit applications that use more than one operating system thread. If you are running a single-threaded process alone on a server, a multi-core chip will have no direct effect on its performance.

The reason should be clear—the second core on the chip has nothing to do while the first core runs your application's single thread of execution, presuming the operating system has nothing of its own to attend to. This is where we as developers start paying for the lunches that the semi-conductor industry has been giving away for years. The onus is on us to go back to our code and re-factor it to use at least one other thread. This task is not an easy one. Properly multithreaded application design requires exemplary discipline on the part of programmers as well as a bit of luck.

Out of Hibernation

For starters, some of the problems addressed by your application must be data independent, or able to break down its job into pieces that can be executed in parallel. This basically means that your application must be able to take one thread and have its dependent data be completely independent from the second thread's data. A good example of this is computer graphics. Each pixel in an image might need processing, and if each pixel could be processed independently from the one that preceded it, and the one that followed it, it could be possible to process each pixel simultaneously, if we had chips that had 786,432 cores on one chip. You'll want to find areas of your application where, at least momentarily, multiple threads of execution can operate on multiple pieces of non-related data, even if their relationship is suspended and protected for a brief moment.

This constraint is not easy to identify and is uncommon in the financial software realm. That doesn't mean there isn't opportunity to leverage multi-core chips for financial applications of all types on Wall Street, especially in the area of derivatives pricing, back-end order routing and real-time user interfaces. It means the job requires a little more creativity and, in some cases, a rethinking of how we architect our systems.

Even if your systems don't have large, independent elements of data, careful synchronization of threads and vigilant guarding of shared data can still yield powerful performance increases from a multi-core chip. And, in cases where there are no such opportunities within an application itself, there are still indirect benefits to multi-core chips, even to applications that are single-threaded. If you run more than one single-threaded process on one machine, and each process requires more than 50 percent of a single core's time you can now double the performance of each of those processes by moving them to a dual-core chip. One example would be the trader's desktop environment. Even if a trader relies on a handful of single-threaded applications, as a collective whole, they should become more responsive to the user because they, in theory, will face less contention due to the additional core. The same is true of back-end processes that manage client connections.

In the end, multi-core chips are a welcome development, even if they are the equivalent of the big chip makers' detour from substantial single-threaded performance enhancements. Don't forget that each core's performance will continue to increase as well, just not at the pace it used to, and with less overall effect. Multi-cores provide a lot of potential performance for a great price.

Developers must take proper care when developing multi-threaded applications. Multi-threaded code can become especially difficult to debug and even veteran developers will discover that there are race conditions and deadlocks that have been hibernating until the day that they could be unleashed on a true multi-core processor. As technologists, we need to be aware of these challenges while also staying ready to harness the opportunities that this new generation of hardware brings with it.

Benjamin Freidlin is the chief technology officer (CTO) of Vizalytics, a Westchester, N.Y.-based technology vendor specializing in high performance software and hardware solutions for both buy-side and institutional trading desks. Freidlin has 12 years of experience in high-performance computing and real-time financial technology and has served in many development and consulting roles. Prior to joining Vizalytics, Freidlin was with UBS Securities, where he contributed to the development of real-time trading systems. He is also a four-year veteran of Microsoft, where he was responsible for helping Fortune 500 firms adopt the company's latest technology. He has designed and implemented high-performance systems for JPMorgan, IBM Research and Solomon Smith Barney. He can be reached via e-mail at benjamin@vizalytics.com.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

Removal of Chevron spells t-r-o-u-b-l-e for the C-A-T

Citadel Securities and the American Securities Association are suing the SEC to limit the Consolidated Audit Trail, and their case may be aided by the removal of a key piece of the agency’s legislative power earlier this year.

Enough with the ‘Bloomberg Killers’ already

Waters Wrap: Anthony interviews LSEG’s Dean Berry about the Workspace platform, and provides his own thoughts on how that platform and the Terminal have been portrayed over the last few months.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here