JPMorgan, Merrill Lynch Take Grids Global
LONDON—The ultimate goal of utility computing is the global grid, where all resources are shared and applied dynamically to meet application needs. Although many challenges remain, JPMorgan and Merrill Lynch are two banks striving to reach this goal.
"We have a global grid utility model across several datacenters in each region," says Mike Ryan, chief technologist of IB architecture at JPMorgan. The bank has built its own meta-scheduling across the WAN on top of the vendor grid schedulers that work on LANs. It has also built a common GUI to give an aggregated, synthesized view of applications across datacenters and software vendors to avoid logging into separate datacenters for monitoring grid operations. "It's all in one place," says Ryan.
The consolidated view also supports JPMorgan's rigorous two-supplier policy, which covers hardware, storage, networking, grid middleware, data caching, and even operating systems. "For example, we use Platform Computing and Condor, the open-source platform, as our dual suppliers for grid scheduling," says Ryan.
JPMorgan also puts a thin, software abstraction layer in front of the supplier product, so the programmer sees just one interface, which is then mapped down onto the chosen proprietary interfaces. "This keeps vendors on their toes and adds flexibility," says Ryan.
Merrill Lynch has similarly integrated its global grid resources and uses abstraction layers. "There is central management of the entire global grid with unified service-level-agreement (SLA) reporting and full recoverability from the loss of any node or even datacenter," says Juan Lando, director, grid center of expertise at Merrill Lynch. In this way the IT team can guarantee to business units a minimum set of resources and then share the rest dynamically. "That allows us to achieve very high utilization targets while meeting the SLAs," he adds.
Adam Vile, head of grid, high-performance computing and technical computing at consultancy Excelian, describes this virtualization as "trying in effect to recreate the mainframe, but with scalable, commodity hardware." However, this "mainframe" is distributed around the world and potentially across the supply chain. By adding the abstraction layers, the banks are able to harvest spare capacity wherever it may be. However, it is still not easy.
"Some applications do run across the global grid," says Merrill Lynch's Lando. "But we have to consider the latency requirements, caching strategies and SLAs quite carefully." For JPMorgan's Ryan, the key is to have enough liquidity in the global pool. "It will ensure you hit all of your SLAs, and give you enough capacity to help an application in need," he says.
Strategies for global grid management depend very much on the application mix as well. Most banks, including JPMorgan, use the grid mainly for complex risk and analytics applications, say industry experts.
"Stateless compute is key to us for the grid," says Ryan. "So, primarily we use the grid for analytics." Merrill Lynch is doing analytics but also uses the grid for transaction-based applications. "We run simple Java and .Net applications on the grid," says Lando. "Instead of running on an application server cluster, the grid is the application server and load balancer. It works really well."
Understanding Application Dynamics
Grid environments are very complex, and the demands of a broad mix of applications can be volatile. Transparent reporting and detailed service management are essential.
"For us, policy-driven, dynamic resource allocation is a key priority to meet SLA commitments," says Lando. "We post all our performance statistics on the Web by application and business unit, including SLA, return on investment (ROI) and total cost of ownership (TCO) scorecards, so users see the advantages."
The demand for advanced, interoperable tools was demonstrated recently when DataSynapse, a grid software provider, announced an alliance with tools supplier OpTier. Motti Tal, executive vice president for marketing and business development at OpTier, explains some of the challenges. "As banks move to componentized, SOA strategies, the complexity of operation IT management rises significantly," he says. "We are finding that shared components can have many different behaviors depending on the applications, users, and business transactions they are servicing. This can cause serious contention issues, such as significant complexity in problem diagnostics and challenges in capacity planning." Tal says it is a top-of-mind issue for cost-conscious CIOs who are trying to broaden their application inventory on the grid.
"Our alliance is really customer driven," says Kevin Kelly, director of business development, EMEA, at DataSynapse. "A number of our most advanced joint customers with the largest grid deployments have pressed us to combine our expertise to help them to optimize their enterprise architectures."
OpTier provides transaction management software to monitor business application flows at a transaction level including every instance of every service consumed by a user across an infrastructure. This gives insight into resource usage and response times to the application development teams and business activity to the infrastructure teams. "This helps organizations to do better capacity planning, tune their applications and rapidly sort out bottlenecks to meet SLAs," says Tal.
Initially the information will be available to a bank's IT personnel for capacity planning and defining provisioning policies. "Later, we shall focus on dynamic optimization of resource broker behavior," says Tal. "We already set and enforce transaction priorities on n-tier application architectures, and the extension to dynamic virtualized environments is natural."
However, no set of tools out of the box will solve the problems of multi-vendor global grids, say industry insiders. "Integrating the management and monitoring feeds from all of the components has to be done by the bank," says Vivake Gupta, managing director and co-founder of consultancy Lab49. "But you also have to look inside the grid at the applications and see what's running." He says he agrees that dynamic capacity planning is a big development area. "If you want to run multiple, cross-asset-class applications on a shared grid, it will be essential," he says.
Enforcing Standards
Flexible resource deployment depends crucially on enforcing standards. "The challenge is to ensure a common data environment, common software, and enough abstraction from the hardware that the differences between hosts won't matter when work gets scheduled in the global grid," says JPMorgan's Ryan. This is essential, he says, for JPMorgan's 10,000-node grid utility, which is rapidly growing. "Standardizing our compute backbone globally took a long time and a lot of discipline. Our CIO, Mike Ashworth, and CTO, Adrian Kunzle, had to work hard to drive through the utility model, but the economics were very compelling, and in the end they have been successful with this approach," he adds. While JPMorgan doubled the size of its grid in the past year alone, it added zero incremental system administer headcount, which according to Ryan proves the model is the most economical.
"With only one common build and automated, policy-driven provisioning, we save on systems engineering and operational full-time equivalents," says Merrill Lynch's Lando.
At Merrill Lynch, the grid is based on three key principles: everything is virtualized and packaged for the grid, the bank insists on common grid software, and operating system builds are also rigorously standardized. "With few exceptions the whole grid is updated with new software versions together and regression tested," says Lando, emphasizing the high grid stability and faster debugging that results in such a controlled environment. "Without rigorous standards, a few errors that are hard to diagnose can kill you," he says.
For JPMorgan, virtualization is leveraged to allow a bit more flexibility on change management. The bank has deployed VMware to create virtual Windows servers in containers on top of Linux. "VMware also allows us to patch just one application instead of an entire cluster of grid applications," says Ryan. "In some cases, it's a pain to have to regression-test everything, especially when you're trying to solve a problem for another line of business. This is good." Virtualization provides another abstraction layer to facilitate the global grid.
Ensuring Control
In a global context, idle servers and desktops represent an enormous resource; however, both banks focus their grids on dedicated servers that they fully control. "We also have a desktop grid scavenging spare CPU cycles on thousands of computers, where we obviously have less control on machine availability," says JPMorgan's Ryan. "Therefore, we ensure the appropriate work goes to the appropriate class of machines." JPMorgan uses the desktop grid for work that is neither latency sensitive nor data intensive. The firm also uses it for less critical work like development and testing. Merrill Lynch, too, has typically used PC scavenging for quality assurance testing. However, now it has set up a global committee to help manage firm-wide scavenging control and increase utilization.
To achieve a global grid, banks are exercising rigorous discipline, tracking detailed component behaviors and constantly tuning their infrastructure to yield impressive results. "We have very solid operational model for our grid with tremendous up-time numbers," says Ryan.
Bob Giffords is an independent banking and technology analyst who can be reached at
bob.giffords@btinternet.com.Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe
You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.
You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@waterstechnology.com
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@waterstechnology.com
More on Trading Tech
After acquisitions, Exegy looks to consolidated offering for further gains
With Vela Trading Systems and Enyx now settled under one roof, the vendor’s strategy is to be a provider across the full trade lifecycle and flex its muscles in the world of FPGAs.
Enough with the ‘Bloomberg Killers’ already
Waters Wrap: Anthony interviews LSEG’s Dean Berry about the Workspace platform, and provides his own thoughts on how that platform and the Terminal have been portrayed over the last few months.
BofA deploys equities tech stack for e-FX
The bank is trying to get ahead of the pack with its new algo and e-FX offerings.
Pre- and post-trade TCA—why does it matter?
How CP+ powers TCA to deliver real-time insights and improve trade performance in complex markets.
Driving effective transaction cost analysis
How institutional investors can optimize their execution strategies through TCA, and the key role accurate benchmarks play in driving more effective TCA.
As NYSE moves toward overnight trading, can one ATS keep its lead?
An innovative approach to market data has helped Blue Ocean ATS become a back-end success story. But now it must contend with industry giants angling to take a piece of its pie.
BlackRock, BNY see T+1 success in industry collaboration, old frameworks
Industry testing and lessons from the last settlement change from T+3 to T+2 were some of the components that made the May transition run smoothly.
Banks seemingly build more than buy, but why?
Waters Wrap: A new report states that banks are increasingly enticed by the idea of building systems in-house, versus being locked into a long-term vendor contract. Anthony explores the reason for this shift.