From Grid to Great

SPECIAL FEATURE

Financial firms are still exploring the best ways to exploit the powerful and versatile grid technology. In the panel discussion entitled "Optimal Use of the Grid—Migrating and Managing Appropriate Applications," experts from top investment firms discussed the challenges of getting the most out of their grids. The panel consisted of Robert Porter, director, information technology, Dresdner Kleinwort; Adam Vile, head of grid and high-performance computing, Excelian Ltd; and Theo Gonciari, senior system architect, Bank of America. The panel was moderated by Octavio Marenzi, founder and CEO of consultancy Celent.

Moderator:

We have heard about which applications should be put on a grid, but which applications do not belong on a grid?

Adam Vile, Excelian Ltd:

It's interesting to note that there are some applications you can put on the grid, but it doesn't always make everything faster. Grid computing can actually make some things slower. There are latency-intolerant applications that shouldn't be put on a grid.

The original intention of a grid was to take embarrassingly distributed calculations and distribute them as widely as possible. It was never intended to take applications and distribute them to be calculated somewhere else. The advantage of taking an application that can't be broken up into small pieces, or a basket of applications and distributing them, takes into account the latency of the distribution. These are the problems, getting the data to the right place at the right time. Let's make it absolutely clear that grid isn't one-size-fits-all.

Robert Porter, Dresdner Kleinwort:

It depends on what sort of grids we're talking about. To start with the compute grid side, most applications are unsuitable for the grid. Those that are suitable actually fit into a fairly obvious sphere. It's about running scenario analyses, risk management and pricing very complicated products, such as Monte Carlo simulations.

Where analytics are involved there's a good argument for those kinds of compute grids and anything with a deterministic response time begins to be a struggle and I will be looking at a different kind of solution in that area. If you need a dedicated resource for something, then you should give it a dedicated resource.

Vile:

Everything is appropriate for putting on the grid purely because the grid has functions and features that are very useful. It depends on which area you are coming from. If you're coming from the infrastructure perspective and you want something to be scalable, resilient, and you want submission, then actually the grid is the fabric or the infrastructure that will allow you to do that. When you submit a job to the grid it will complete the job.

On the other hand, if you wish it to complete within a certain period of time then there are some jobs you don't want to do that. One way of thinking about grid is as virtualizing an entire load of machines as a single machine. You take hundreds, thousands of blades and put a single operating system on the top and you submit your job and it will get executed somewhere and will be completed. As long as the services are available on the grid, the grid is a good fabric for doing that.

Moderator:

Bank of America is running four applications on its global grid. How do you handle scheduling when certain user groups compete with each other for grid resources? How do you prioritize?

Theo Gonciari, Bank of America:

When we put together the global grid within Bank of America, each of the applications on the grid came with its own pool of resources. They obviously have ownership of the resources, and therefore get prioritized into those applications.

At the same time we've developed a global grid monitoring tool that allows us to determine the profiles of grid users across the applications. Using that tool we can kind of tune the scheduling and play around with certain scheduling configurations to allow better utilization of resources.

To give you an example, we've managed to increase fourfold the number of calculations the exotics write. This is done by increasing the cost 20 percent due to the amount of sharing we can do on the grid.

Moderator:

Any thoughts about managing the grid and different competing user groups?

Vile:

I think sharing is one of the hot topics. If you take the grid at large with its history, one of the reasons it was put in place was for universities that didn't have the compute resources to do the kind of calculations they wanted to do.

Trading desks and traders are used to buying or having the IT department buy servers and I've certainly been in organizations where they ask about the number of servers, and I answer that I have X thousands, and that's it, those are the ones that they use.

The technology is there and allows us to do it, but the politics is the issue. You can put three grids together and have lending and borrowing across the grids, or you can partition some of the grids so that only some can be lent and borrowed.

At the very top level, I think the key driver is to have a central team whose responsibility it is to agree SLAs with the business and to deliver to those SLAs. Until such time as they can prove they can deliver to those SLAs, the business side is always going to be a little bit nervous.

There are benefits, of course, because you can say, "Well, actually, you've got 1,000 CPUs now," but when you need them at 9 p.m., you're going to have 1,500 because the guys over in Asia don't need them. So, that's one sell.

The other thing is to say, "You don't have to buy servers," because servers cost thousands of dollars or pounds per year each to maintain. They all come directly out of the bottom line of the trading desk one way or the other. IT doesn't have to buy servers, necessarily.

They can rent CPU time, pay per use within the organization, not outside. Say, "Sure, you want your SLA, you can do the 1,000 CPUs for that. You'll be using them between 6 p.m. and 9 p.m. and it's going to cost you this much." What that means is that a trader can make a decision about pricing or valuing or doing a trade, because all they have to understand is how much it's going to cost to risk and manage that trade for the next however many years that trades. The trader can calculate the CPU hours they'll need to do that.

Moderator:

That sounds like you're suggesting running the grid as a profit center within the organization and renting to the different business groups.

Vile:

Why not?

Moderator:

How are these issues handled at Dresdner Kleinwort?

Porter:

We started off with a very optimistic viewpoint that we would have one grid and quickly found there are very good reasons why we needed to segregate. They weren't just political but also technical reasons.

We ended up segregating the grid quite significantly while trying to maintain an overall common architecture of the way things would run on the grid, so that when we need to re-segregate it's only a configuration change. When the organization changes, as it does regularly, we're not somehow bound into some silo that we created for ourselves but we can reallocate resources.

The production grids are fairly segregated with minimal lending agreements but I think there are other use patterns that have been emerging while we've been running the grid. For example, with the test grid, we previously never really thought when we scaled and scaled derivatives adaptations, "Well, let's have an identical test grid over here." It just didn't happen. No trader was going to suddenly sign off to buy the equivalent 1,000 CPUs.

So the test grid is a free-for-all for developers. The infrastructure is maintained but the actual usage is, "I'll deploy something; I'll run something completely ad hoc." That's kind of an interesting sharing arrangement. We need far fewer resources in a test grid to support the number of developers that we have and that we used to have in terms of if you wanted to do production on the system.

Another use pattern that we more recently established is what we call the front-office grid, where the front office doesn't really want a production grid but they want production quality. They want to make changes every day and today they want to release new applications every day and today they want to have a lot more control over what's running on there. So we established a front-office grid that has a much lighter change management scheme attached to it. That's been quite interesting.

The resources that you might apply to those various types of grids are changing as well. You're starting with your production grid, and it all has to be in the datacenter for regulatory reasons. All sorts of security people will make it very hard for you to run it outside a datacenter.

The front office and the test grid are all scavenged resources; they're all taking resources out of the business continuity centers and assorted desktops. Ironically, you get most of those resources at night and not during the day, so I would hope in the future that we actually start allocating these resources properly in a different way to the different types of grids that we go toward.

Moderator:

How do you feel about the "s" word: scavenging?

Vile:

I don't think scavenging is appropriate. It's too negative. I think renaming as harvesting is important because it is collecting together resources rather than perniciously stealing them from people.

I've only got one thing to say about scavenging, which is that you can't guarantee SLAs with scavenged resources. If you're happy to use a heterogeneous environment where some nodes will run slower than others, where some nodes won't be available to you, where people will turn their machines off or someone will be downloading and listening to MP3s or converting a JPEG or whatever it might be, at the time when you're trying to do your critical calculation, harvesting resources is perfectly adequate.

Moderator:

If I were to ask you to look into the future, where do you see grids going? What kinds of tasks will be put on the grid?

Vile:

I see that within the next 18 months there will be a number of enterprise grids distributing work across shared resources. Now, maybe 18 months is too soon but certainly the technology exists now to enable us to do that.

The second thing that will happen to grid, of course, is what we've already heard about "Grid 2." The people who invented grid in the first place have re-branded it and put out a new book that we can all buy. But essentially Grid 2 is data grid and moving toward getting the data to the right place in the right time.

If you're suddenly going to have an enterprise grid or widen it to thousands of nodes and you run a calculation of one node over there, you've got to get the data there. The database is now scaled, data caching will become more and more prevalent across our data grid and our infrastructure.

The third thing that I see is the integration of acceleration technology into our grid. Grid is not high-performance computing, grid is high-throughput computing. It can enable higher performance by distributing work out so that a user can put a request at the top and the grid will know which node on which it can execute and whether it's got an accelerator attached to it and if that's appropriate to do so and what data it requires.

Gonciari:

We've already done a fair amount of sharing within the bank but I feel like it's going to increase. We're going to add more applications onto the grids and sharing between applications is going to be key going forward.

We have seen high demand for more intraday or more real-time calculations. One of the solutions we put out was for the exotics desk—pumping around 150 to 200 megs' worth of data back to the client within a minute. So, it's got a lot of data going back and forth to enable the trader to get results as soon as possible. So I think more intraday grids will definitely be in demand going forward.

Porter:

I expect the grid to become a commodity. At the moment it's still takes quite a lot of effort to go and buy a grid, install it, get support and have application guys work out how to use it. Just like relational databases are just part of the landscape, I expect grids to be part of the landscape—as invisible as the network.

Perhaps in the longer term, compute grid and data grid will disappear and there will just be grid. Maybe somebody will come out with a product that does both or merges the products into doing both. That might change the marketplace.

I think it's already quite impressive that the front-office people make the assumptions that there are hundreds and hundreds of CPUs behind them hitting a button. This wasn't really possible a few years ago. It was largely developed in-house if it was going to happen and I can just imagine that in a few years it will be thousands that are behind a button getting their numbers much faster.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

Enough with the ‘Bloomberg Killers’ already

Waters Wrap: Anthony interviews LSEG’s Dean Berry about the Workspace platform, and provides his own thoughts on how that platform and the Terminal have been portrayed over the last few months.

Banks seemingly build more than buy, but why?

Waters Wrap: A new report states that banks are increasingly enticed by the idea of building systems in-house, versus being locked into a long-term vendor contract. Anthony explores the reason for this shift.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here