Data quality (DQ?) comes in flavors. There is structured data collected and subject to a system of validation routines and controls and it’s possible to know what it should look like. There is a mature suite of statistical control tools to address this. Some is structured Big Data collected in the wild. The objective here is to measure the contributions to noise in aggregate to random variation. Again, well developed tools, such as ROC exist. Unstructured Big Data is yielding to AI tools.
In contrast, there is the realm of data small enough that individual GIGObytes drive the confidence intervals of any statistical test to widths making it worthless. Here we pass from the realms of applied statistics into plumbing the secrets of the human heart. Call it data psychology. It’s an area where analysts in the business units not organized around issues like Big Data spend the most time.
Self-inflicted wounds cover the data that make it into the analyst’s hands range from simple Excel contaminant to cognitive bias. But the biggest problem is the absence of considering “what question is this data supposed to answer?” Posing the question more often than not yields the blank stare signifying “I don’t know what I’m really looking for and haven’t a clue to how I can possibly expect to recognize it when it I see it.”
That is where a good data analyst has the opportunity to contribute something useful, by triggering the thinking to pose the question. It’s an art, not a science.
Data quality (DQ?) comes in flavors. There is structured data collected and subject to a system of validation routines and controls and it’s possible to know what it should look like. There is a mature suite of statistical control tools to address this. Some is structured Big Data collected in the wild. The objective here is to measure the contributions to noise in aggregate to random variation. Again, well developed tools, such as ROC exist. Unstructured Big Data is yielding to AI tools.
In contrast, there is the realm of data small enough that individual GIGObytes drive the confidence intervals of any statistical test to widths making it worthless. Here we pass from the realms of applied statistics into plumbing the secrets of the human heart. Call it data psychology. It’s an area where analysts in the business units not organized around issues like Big Data spend the most time.
Self-inflicted wounds cover the data that make it into the analyst’s hands range from simple Excel contaminant to cognitive bias. But the biggest problem is the absence of considering “what question is this data supposed to answer?” Posing the question more often than not yields the blank stare signifying “I don’t know what I’m really looking for and haven’t a clue to how I can possibly expect to recognize it when it I see it.”
That is where a good data analyst has the opportunity to contribute something useful, by triggering the thinking to pose the question. It’s an art, not a science.