Michael Shashoua: Shadow Play
Michael Shashoua looks at dark data and suggests its value is based on 'meaning' residing within large datasets.
The term "dark data" is becoming a buzzword in data management, in the same manner as big data rose in consciousness five years ago. Like big data before it, the term dark data raises more questions than answers, at least so far, about what it means.
In a feature in the February issue of Inside Reference Data, Thomson Reuters' resident data management expert Tim Lind said that making connections between existing pieces of data is the definition of dark data. Accessibility of the data is only the first part of the problem, and solving that still leaves firms having to figure out what predictive value the data may have once put into a fuller context.
That is similar to the struggle that firms have experienced with big data. Often when big data has been used in financial services operations discussions—particularly in data management operations discussions—it pertains to what is being done with data, such as how firms mine larger data volumes.
Just as the industry shouldn't get caught up in big data hysteria, it should also be cautious about overreacting to dark data. The most important thing is to know exactly what data you have, how frequently you are getting it, and what you are trying to achieve with it.
"Dark data could be a more appropriate term than big data," says Norbert Boon, executive director of Flytxt, a consultancy in Amsterdam that works on big data analytics issues. "Big data is the structured and unstructured data one stores, for which there is no immediate purpose," he says. "Dark data is much more appropriate—it's the value hidden inside. Finding that is the challenge."
Just as the industry shouldn't get caught up in big data hysteria, it should also be cautious about overreacting to dark data. The most important thing is to know exactly what data you have, how frequently you are getting it, and what you are trying to achieve with it.
Customer Data Grows
Another mark against the relevance of big data is the possibility that data generated in our industry is not as large in volume as other industries, such as the telecoms industry. Increased activity around know-your-customer (KYC) data in the financial sector makes this assertion a question rather than a certainty, however. The global data and messaging services provider Swift recently unveiled its KYC Registry, launched in December 2014, which is still growing. The registry is already being upgraded with a new profile feature to collect correspondent firm activity data, making its customer information more complete to better support risk and exposure management.
It could be said that this meets the definition of dark data that Lind proposes: connecting pieces of data that already exist, but haven't been appropriately linked to show a more complete picture of what's occurring in the market or in a firm.
Identifiers' Impact
Also on the subject of knowing what's going on within one's firm, the global legal entity identifier (LEI) initiative, which reached 340,000 registrations at the start February this year, is likely to eventually end up three times that number, according to Bill Hodash, managing director of business development at the Depository Trust & Clearing Corporation (the DTCC and Swift operate the Global Markets Entity Identifier utility that has handled about half of those registrations).
There may, however, be another spur to accelerate the growth of LEIs—more regulatory action. In Europe, Mifid II and the Central Securities Depositories Regulation, already began fueling more LEI issuance in 2014, and the US Securities and Exchange Commission's new amendments to Regulation SBSR in early 2015, which include LEI registration requirements for security-based swaps, are likely to raise registrations yet further.
This suggests that the dark data or big data stories are far from finished. If the number of LEIs to be managed goes far beyond the current expectation of about 1.5 million, that could create higher data volumes and more pieces of information that end up in different silos within firms.
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe
You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.
You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@waterstechnology.com
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@waterstechnology.com
More on Emerging Technologies
An AI-first approach to model risk management
Firms must define their AI risk appetite before trying to manage or model it, says Christophe Rougeaux
Waters Wavelength Ep. 297: How to talk to the media
This week, Tony and Wei-Shen discuss the dos and don’ts for sources interacting with the media.
The Waters Cooler: Tidings of comfort and joy
Christmas is almost upon us. Have you been naughty or nice?
FactSet launches conversational AI for increased productivity
FactSet is set to release a generative AI search agent across its platform in early 2025.
Waters Wavelength Ep. 295: Vision57’s Steve Grob
Steve Grob joins the podcast to discuss all things interoperability, AI, and the future of the OMS.
S&P debuts GenAI ‘Document Intelligence’ for Capital IQ
The new tool provides summaries of lengthy text-based documents such as filings and earnings transcripts and allows users to query the documents with a ChatGPT-style interface.
The Waters Cooler: Are times really a-changin?
New thinking around buy-build? Changing tides in after-hours trading? Trump is back? Lots to get to.
A tech revolution in an old-school industry: FX
FX is in a state of transition, as asset managers and financial firms explore modernizing their operating processes. But manual processes persist. MillTechFX’s Eric Huttman makes the case for doubling down on new technology and embracing automation to increase operational efficiency in FX.