Follow the Search Engines
![robdaly-headshot robdaly-headshot](/sites/default/files/styles/landscape_750_463/public/import/IMG/762/101762/robdaly-headshot-580x358.png.webp?itok=vDU7KFE5)
Earlier this week, I had my first encounter with Apache Hadoop, the open-source parallel file framework for unstructured data. The system is currently being used by Yahoo to store much of the search engine's data, and the technology was built from similar technology developed and used by Google.
An industry contact of mine estimates that Yahoo has about 14 TB worth of data stored on its various servers.
The beauty of Hadoop is that it is, theoretically at least, infinitely scalable—it's just a matter of adding more servers to hold the additional data. The only catch is that it's for unstructured data, such as HTML text, which shouldn't come as surprise considering who contributed to the design of it.
There are a number of startups like Hadapt looking for ways to take Hadoop's scalability and mix it with various relational databases. Then there are some new firms, such as Mapr, creating a proprietary version of Hadoop, according to industry gossip.
Although much of the data in the financial services industry is structured data, the rising importance of unstructured data when it comes to trading cannot be underestimated.
We've seen the rise of commercially available machine-readable news over the past five years, where the news providers tag their stories to make it easier for automated analytics to consume them.
Now traders are looking to add data from unstructured content, such as government reports, court decisions and other content, to their automated decision-making.
The one spot that I'm not hearing about, though, is search engine results. I can't envision anything that would be higher up in the decision-making process than sitting down to Google, Yahoo or Bing, and literally seeing what the world is thinking about in real time. I'm sure that Google has perfected this to a science and will continue to manage its own treasury head and shoulders above the competition. The question is whether or not the search giant will commercialize that offering or keep it to itself.
I'm not sure whether financial firms would go as far as to create their own mini-Googles internally to analyze what is happening on the web, but I have a feeling that there are at least five firms going down this road, if not more, at the moment.
To handle this amount of unstructured data, Hadoop would seem to be the proper technology to adopt.
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe
You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.
You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. Printing this content is for the sole use of the Authorised User (named subscriber), as outlined in our terms and conditions - https://www.infopro-insight.com/terms-conditions/insight-subscriptions/
If you would like to purchase additional rights please email info@waterstechnology.com
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. Copying this content is for the sole use of the Authorised User (named subscriber), as outlined in our terms and conditions - https://www.infopro-insight.com/terms-conditions/insight-subscriptions/
If you would like to purchase additional rights please email info@waterstechnology.com
More on Data Management
$135.6m fines prompt Citi to modernize infrastructure, controls
The bank was hit with a combined $135.6 million fine on Wednesday for failing to resolve “longstanding internal controls and risk issues,” amid continued internal work across the enterprise.
SocGen pushes data, analytics use cases for SG Markets
The bank is letting a handful of clients experiment with its proprietary data and models to inform their research.
Finra clears hurdle with CAT launch, but several others remain
Two major components of the consolidated audit trail are now in place. But wrangling over the CAT’s future continues.
Ace high or busted flush? Digital Asset’s mixed fortunes mirror DLT adversity
The vendor hoped to remodel post-trade using blockchain technology—and it still might—but its bumpy progress raises questions over the future of DLT in finance.
The IMD Wrap: It’s the data, Cupid!
As BlackRock buys Preqin, and LSEG strikes a data deal with Dow Jones, Max notes that in data, strange bedfellows breed valuable offspring.
This Week: BlackRock/Preqin, Trading Technologies, FIA Tech and more
A summary of some of the past week’s financial technology news.
US banks seek to open vendors’ black box on green data
Inaugural Fed climate scenario analysis flags lack of transparency around third-party models.
IEX Cloud closure forces fintech clients to seek data alternatives
IEX says it is ditching its unprofitable data arm to focus on its core exchange business, but other vendors believe they can turn a profit from its former client base of fintechs, retail investors and some institutions.