Follow the Search Engines

Earlier this week, I had my first encounter with Apache Hadoop, the open-source parallel file framework for unstructured data. The system is currently being used by Yahoo to store much of the search engine's data, and the technology was built from similar technology developed and used by Google.
An industry contact of mine estimates that Yahoo has about 14 TB worth of data stored on its various servers.
The beauty of Hadoop is that it is, theoretically at least, infinitely scalable—it's just a matter of adding more servers to hold the additional data. The only catch is that it's for unstructured data, such as HTML text, which shouldn't come as surprise considering who contributed to the design of it.
There are a number of startups like Hadapt looking for ways to take Hadoop's scalability and mix it with various relational databases. Then there are some new firms, such as Mapr, creating a proprietary version of Hadoop, according to industry gossip.
Although much of the data in the financial services industry is structured data, the rising importance of unstructured data when it comes to trading cannot be underestimated.
We've seen the rise of commercially available machine-readable news over the past five years, where the news providers tag their stories to make it easier for automated analytics to consume them.
Now traders are looking to add data from unstructured content, such as government reports, court decisions and other content, to their automated decision-making.
The one spot that I'm not hearing about, though, is search engine results. I can't envision anything that would be higher up in the decision-making process than sitting down to Google, Yahoo or Bing, and literally seeing what the world is thinking about in real time. I'm sure that Google has perfected this to a science and will continue to manage its own treasury head and shoulders above the competition. The question is whether or not the search giant will commercialize that offering or keep it to itself.
I'm not sure whether financial firms would go as far as to create their own mini-Googles internally to analyze what is happening on the web, but I have a feeling that there are at least five firms going down this road, if not more, at the moment.
To handle this amount of unstructured data, Hadoop would seem to be the proper technology to adopt.
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe
You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.
You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@waterstechnology.com
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@waterstechnology.com
More on Data Management
Stocks are sinking again. Are traders better prepared this time?
The IMD Wrap: The economic indicators aren’t good. But almost two decades after the credit crunch and financial crisis, the data and tools that will allow us to spot potential catastrophes are more accurate and widely available.
In data expansion plans, TMX Datalinx eyes AI for private data
After buying Wall Street Horizon in 2022, the Canadian exchange group’s data arm is looking to apply a similar playbook to other niche data areas, starting with private assets.
Saugata Saha pilots S&P’s way through data interoperability, AI
Saha, who was named president of S&P Global Market Intelligence last year, details how the company is looking at enterprise data and the success of its early investments in AI.
Data partnerships, outsourced trading, developer wins, Studio Ghibli, and more
The Waters Cooler: CME and Google Cloud reach second base, Visible Alpha settles in at S&P, and another overnight trading venue is approved in this week’s news round-up.
A new data analytics studio born from a large asset manager hits the market
Amundi Asset Management’s tech arm is commercializing a tool that has 500 users at the buy-side firm.
One year on, S&P makes Visible Alpha more visible
The data giant says its acquisition of Visible Alpha last May is enabling it to bring the smaller vendor’s data to a range of new audiences.
Accelerated clearing and settlement, private markets, the future of LSEG’s AIM market, and more
The Waters Cooler: Fitch touts AWS AI for developer productivity, Nasdaq expands tech deal with South American exchanges, National Australia Bank enlists TransFicc, and more in this week’s news roundup.
‘Barcodes’ for market data and how they’ll revolutionize contract compliance
The IMD Wrap: Several recent initiatives could ease arduous data audit and reporting processes. But they need buy-in from all parties if all parties are to benefit.