Big Data Vs. Dark Data
Looking at what the buzzword terms really describe for financial data management

In a story last week, I had looked at the use of the term "dark data" and what it means. That is a newer buzzword than one that has been thrown around much more over the past several years, "big data."
Often when "big data" has been used in financial services operations discussions, particularly in data management operations discussions, it really pertains to what is being done with data, such as how firms mine larger amounts of data or how they organize it to get the most insight and value out of it.
So this raises the question of whether "big data" is really even an appropriate term. Increased activity concerning know-your-customer (KYC) data might lead one to think that type of data is rising to the level of "big data," but if compared to something like phone call records, even from just one major cell service provider, KYC data could look small in volume by comparison.
Is there a threshold that KYC data or financial services industry data as a whole needs to cross to truly be "big" and not just "medium," perhaps? Customer details, if cross-referenced with extensive transaction data, could be that "big" in this industry.
Failing that, firms may be better off not getting caught up in "big data" hysteria. It's more effective to work to know exactly what data you do have, how frequently you are getting it and what you are trying to achieve with it—namely, what insights you are trying to generate from it.
In effect, dark data and big data are really about the same thing—data management. Dark data describes knowing where all the meaningful data is and how to aggregate it, while big data infers understanding the complete picture of what data is coming in, especially if the volume of that data is getting larger.
To join a discussion on how "big data" should be defined, visit Inside Reference Data's LinkedIn discussion group.
More on Data Management
As datacenter cooling issues rise, FPGAs could help
IMD Wrap: As temperatures are spiking, so too is demand for capacity related to AI applications. Max says FPGAs could help to ease the burden being forced on datacenters.
Bloomberg introduces geopolitical country-of-risk scores to terminal
Through a new partnership with Seerist, terminal users can now access risk data on seven million companies and 245 countries.
A network of Cusip workarounds keeps the retirement industry humming
Restrictive data licenses—the subject of an ongoing antitrust case against Cusip Global Services—are felt keenly in the retirement space, where an amalgam of identifiers meant to ensure licensing compliance create headaches for investment advisers and investors.
LLMs are making alternative datasets ‘fuzzy’
Waters Wrap: While large language models and generative/agentic AI offer an endless amount of opportunity, they are also exposing unforeseen risks and challenges.
Cloud offers promise for execs struggling with legacy tech
Tech execs from the buy side and vendor world are still grappling with how to handle legacy technology and where the cloud should step in.
Bloomberg expands user access to new AI document search tool
An evolution of previous AI-enabled features, the new capability allows users to search terminal content as well as their firm’s proprietary content by asking natural language questions.
CDOs must deliver short-term wins ‘that people give a crap about’
The IMD Wrap: Why bother having a CDO when so many firms replace them so often? Some say CDOs should stop focusing on perfection, and focus instead on immediate deliverables that demonstrate value to the broader business.
BNY standardizes internal controls around data, AI
The bank has rolled out an internal enterprise AI platform, invested in specialized infrastructure, and strengthened data quality over the last year.