The Hunt for Power
WATERS BRIEFING
It's a vexing question: How can investment firms increase their computing power and at the same time reduce the number of servers they use? This winter, Waters hosted a breakfast briefing entitled "Leveraging Greater Computing Power for Trading Applications" to discuss this challenge. The panel consisted of Larry Tabb, CEO and founder of The Tabb Group, a market research firm; Thanos Mitsolides, senior vice president, fixed-income derivatives, for Lehman Brothers; and Rick Jacobsen, managing director, financial services for Intel, which sponsored the event.
Waters:
What does managing 1,200 CPUs within fixed-income derivatives at Lehman Brothers entail? Does it involve upgrading and replacing the actual hardware?Thanos Mitsolides, Lehman Brothers:
No, the hardware doesn't have to change. We keep using it until it falls apart. After about four or five years, it becomes a little difficult to maintain.Waters:
Does it become obsolete?Mitsolides:
No, it doesn't actually fall apart. It gets so slow that it is not worth using it.Waters:
Is there a mandate to reduce the number of servers in the back office?Mitsolides:
Yes. After we tear down the silos between the business technologies and start combining the grids-which we expect to do over the next six months or so-we are going to start using one other's hardware to handle exceptions and to handle load. Let's say a business needs to order new hardware. The new hardware may take two months to come in. Until the new hardware arrives, we can borrow from another business. That's certainly going to increase utilization, and it will allow us to keep less spare capacity around.Waters:
But at the same time you must still increase the actual compute power. The number of boxes may decrease, but if you actually had 100 times the capacity, you'd still find a need for it.Mitsolides:
Exactly. That happens in phases. We see growth of somewhere around 50 percent every year in the amount of power that we actually use. We always try to optimize, of course. You just cannot increase your data center by a factor of 10. It gets expensive. Each year, we increase by 50 percent, let's say, and that's the amount of capacity we can handle these days. We need better software before we can think about increasing our computing capacity more than that.Waters:
Today's back office is a mix of different solutions, old mainframes next to newer servers, and multiple operating systems. What are the challenges of a mixed back office?Larry Tabb, The Tabb Group:
It's a challenge. Traditionally, you have clusters, which tend to be similarly oriented hardware, wrapped together in the same rack. That tends to work fairly well as a unit, and you can get the operating software from the person who sells the cluster. But the challenge is when you start trying to share that capacity against other clusters or other hardware, you wind up with different operating systems, BIOSes, chip sets, and services. Then, you really need to move from a cluster, which tends to be more tightly integrated, to more grid-oriented solutions, which tend to be more heterogeneous.We are moving into an era where more and more things can be done in a less expensive chassis, but the issue is how to integrate and leverage it. How do we get the most out of it and how do I do the things that I once did on a mainframe? You're going to need to invest in software to help manage that.
Rick Jacobsen, Intel:
The back office is typically made up of heterogeneous solutions, where you have all these different configurations and silos. When you start putting together the combinations of different OSes and different versions of the software, the numbers are staggering.Waters:
What about CPU outsourcing? Is there such a demand for processing power that you would go to a third party and rent their CPUs?Mitsolides:
Yes, there is. CPU outsourcing is interesting in a number of different ways. It depends on the price. If the price is really low, then we can never stop having our own local hardware. If the price is high, it is still interesting, because we can use it to handle emergencies or big requirements. One way or another, it's very likely we are going to use it.The main obstacle to being able to do CPU outsourcing is you need to use some kind of standard software for your grid. If your grid software is not standard, it is just close to impossible to go into another vendor and borrow CPUs. We are having some discussions now to figure out how to make sure that next time we are going to need big performance, we can go and use outside CPUs easily.
Waters:
Is this for modeling a portfolio or in the event of a crisis in the back office? What would drive the need for this?Mitsolides:
The biggest need would come from the traders. Let's say the traders say that their portfolio has unpredictable risk and they need to do a calculation of risk scenarios for 10 different cases of what would happen if the interest rates moved up by 200 base points. They say that for all the combinations of this, they are going to see what happens to their risk, and they want to do that just for a month. So for that month your need for power is 10 times greater. What are you going to do? Just buy 10 times more hardware, and then throw it away? No, you can't do that.Tabb:
The complexity of this is very challenging. Just think of the simplest but most complicated need: All of a sudden, something happens in the market and I'm doing seven times the trading volume that I was doing yesterday. Where am I going to get that volume from?I need to reach out to the outsourcing agent, make an agreement, get my reference data out there, get the CPUs allocated, marshal it, control it, make sure it's encrypted, and so on. Then all of a sudden, volume drops back to normal tomorrow or even in an hour, and I need to relinquish it. How do I do that? Unfortunately, we're not there yet.
Waters:
Are firms using a true follow-the-sun computing plan? Are they using the servers in London or Tokyo while New York isn't working, and vice versa?Mitsolides:
We are doing that, to some extent.Waters:
What is the biggest challenge with this method of grid computing?Mitsolides:
You need better grid software. Just because there are fast machines in London doesn't mean you should send everything you have over there. Because of the network latency and overhead of moving all of your data around, those fast machines are actually less useful than they seem at first. You need more powerful scheduling. Otherwise, you will have a few issues with support-if the network goes down, the hardware goes down. But the main issue comes down to the grid software. I think that there are a number of very good grid solutions at this point, but I think it is likely we are going to have a shake-up in the next five years.Waters:
Is Intel hearing more about that follow-the-sun grid computing problem?Jacobsen:
Absolutely. Intel actually has a 100,000-CPU grid that we use for managing our own critical applications, and it's certainly a follow-the-sun concept, where we're using cycles that are in Israel when people are at home, while we're using cycles in the Americas while the American people are home at night. And they're not using them for engineering during the day. We use all those cycles to test the gates, run test vectors of all of our billions of gates in our new products, to simulate the product and software, before we try to manufacture it in our very expensive application plans, which also takes six weeks, to run the chemical process [for making the chips]. So it's faster, and much, much cheaper.It sounds like [Thanos Mitsolides] is looking at platform computing to move his workloads to take advantage of other CPUs and the reason why that capability is available today has a lot to do with this. Virtualization used to be only for the supercomputing class. It became available in software recently, but this was a fairly expensive solution that you had to buy on a per-socket basis. These very new technologies are just now coming out to the market and they have exciting capabilities.
Waters:
It sounds like we're waiting for a revolution in grid software.Mitsolides:
Yes. We expect that in two or three years, it will become very difficult to manage 5,000 to 10,000 CPUs with our current software.Tabb:
The challenge is the scheduling software that distributes a lot of the information, and brings it back together. But as we start moving into more utility and trying to grab more machines, it's the schedule. It's the data, marshalling everything, coordinating, and bringing it all back together. It's billing and metering. How do I figure out the number of blades that are down or utilization rates? That's all software and it gets extremely complicated. There are a lot of moving pieces.Mitsolides:
Let's say you are sent a request with a million tasks to be done. You're supposed to finish in an hour. Two hours later, it's not done. What do you do? You need to have really good capabilities to track down the problem.Waters:
How do you assign the resources if the equities desk is banging the table and screaming for something? Does the fixed-income desk complain that the servers and CPUs are theirs and that the equities desk will have to wait to use them? Does territoriality play a role?Mitsolides:
Yes. Certainly, we need to make sure people are comfortable. First, we're moving from the silo solutions into one solution. But most likely, we're going to keep everyone using their own hardware. As soon as they're comfortable with new scheduling software, we're going to start sharing a little bit, slowly. After the people are completely comfortable, we're going to start introducing intelligent scheduling algorithms that, on the fly, are going to do re-allocation without requiring manual permission from anyone.Tabb:
There are a couple of firms that are experimenting with creating a quasi-CPU resource exchange where the application bids for the resources. Depending on the priority of the project and the resources that it needs, you might wind up with a Monte Carlo simulation for a portfolio that needs to get done now. Then there would be a significant bid for those resources that are specifically needed. Then you might have a less important task that needs resources but couldn't afford to bid as high for those resources. We're seeing some folks playing with those ideas. That ties back to billing and metering for the different resources that are out there. It's a way of saying, "I have a high-priority project; I'm going to pay a lot for it." Because you've got to pay a lot for it, the charge back on that allocation is going to be higher than somebody who says, "Well, this isn't that important, so I can only afford to bid this smaller amount." But because you're bidding lower, it's going to take longer, or you're going to wind up not using the fastest or the most select machines. Those would then go to the higher bidder.Waters:
Would that happen in a single investment firm? If the equities department wants something, and if they want it bad enough, they are going to pay for it.Tabb:
Right. They are going to pay for it.Waters:
What about the disaster recovery sites? There are these pristine buildings in New Jersey with these terrific servers that basically are sitting idle until there's another disaster. Are those systems being used, or are they just gathering dust?Mitsolides:
A few years ago, we used to let them lie idle. Of course, we used them in an emergency, and once we started using them, there was no way back. Currently, we use them as a rule, and if we lose one of our sites, the only impact is going to be that performance drops down by a factor of two. We cannot afford to keep them idle.Tabb:
A couple of years ago, we were looking at this and in effect predicted that this idea of laying idle this machinery is unaffordable. The whole idea of having disaster recovery centers is almost going to be a passé thought because as you virtualize all of this hardware, and as your virtualization schemes move outside of one data center to multiple data centers, why are you going to have this hardware sitting around doing nothing? If something goes wrong, the grid automatically shifts resources and moves data, so why are you spending money for this? All you really need is places for people to go with monitors and desktops so that they can work. The actual datacenter is almost a passé idea. We're not there yet, but five years from now, there probably won't be a need for this redundant hardware.Waters:
Which factors are driving the need for grid computing? Is algorithmic trading playing a role?Tabb:
Yes, and so is market data volume. When you look at the amount of data that is getting pushed through these machines, and the ability to analyze and take these large trades, break them up into smaller pieces, we're just creating more and more processing demands. We're seeing firms use grids to manage data, analytics, and time series databases to do stream processing and manage order routing.Waters:
Do you think the need for high-power computing will ever level off?Tabb:
No, absolutely not. I think, as [Thanos Mitsolides] said, that the more power that's available, the more they're going to use it, and the more they're going to come up with creative ways to use it. Back when I used my first computer, I thought it was cool. It probably had 8KB of memory and a 30MB hard drive, and that was all we needed. Now, you want quad core chips running at 3GHz each, thousands of gigs of RAM, storage, and more. That's not going to be enough, either. I think as processing becomes cheaper, connectivity becomes faster, and people use more memory and CPUs, you are going to have to figure out creative ways to make money from this.Jacobsen:
We see more venues opening up every day and the data that's coming from the venues is coming faster every day. For any given trade, the amount of data for that trade is getting richer each day. Also, the need for historical data in the model is just getting bigger. More history is collected and used in models. So, it's bigger, faster, richer, and coming at you in more places at once. That just means more memory, more compute power, and more process modeling.Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe
You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.
You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@waterstechnology.com
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@waterstechnology.com
More on Trading Tech
After acquisitions, Exegy looks to consolidated offering for further gains
With Vela Trading Systems and Enyx now settled under one roof, the vendor’s strategy is to be a provider across the full trade lifecycle and flex its muscles in the world of FPGAs.
Enough with the ‘Bloomberg Killers’ already
Waters Wrap: Anthony interviews LSEG’s Dean Berry about the Workspace platform, and provides his own thoughts on how that platform and the Terminal have been portrayed over the last few months.
BofA deploys equities tech stack for e-FX
The bank is trying to get ahead of the pack with its new algo and e-FX offerings.
Pre- and post-trade TCA—why does it matter?
How CP+ powers TCA to deliver real-time insights and improve trade performance in complex markets.
Driving effective transaction cost analysis
How institutional investors can optimize their execution strategies through TCA, and the key role accurate benchmarks play in driving more effective TCA.
As NYSE moves toward overnight trading, can one ATS keep its lead?
An innovative approach to market data has helped Blue Ocean ATS become a back-end success story. But now it must contend with industry giants angling to take a piece of its pie.
BlackRock, BNY see T+1 success in industry collaboration, old frameworks
Industry testing and lessons from the last settlement change from T+3 to T+2 were some of the components that made the May transition run smoothly.
Banks seemingly build more than buy, but why?
Waters Wrap: A new report states that banks are increasingly enticed by the idea of building systems in-house, versus being locked into a long-term vendor contract. Anthony explores the reason for this shift.