All you've ever wanted to know about Data Lineage
Guillaume Bodet - CEO - Zeenea
Discover all of the secrets and best practices of Data Lineage by downloading our free guide: "All you've ever wanted to know about Data Lineage".
Overview
Data Quality usually refers to a companyโs ability to ensure the longevity of its data. At Zeenea (a data catalog provider), we believe Data Quality is ensured through the 9 following dimensions - all essential to extract value to your company:
๐ธ Completeness
๐ธ Accuracy
๐ธ Validity
๐ธ Uniqueness
๐ธ Consistency
๐ธ Timeliness
๐ธ Traceability
๐ธ Clarity
๐ธ Availability
We will detail these dimensions with the help of a simple example in part one. We will then elaborate on how Data Quality management is an important challenge for organizations seeking to extract maximum value from their data.
We will also draw parallels between these different Data Quality dimensions and the different risk management phases to overcome - identification, analysis, evaluation, and processing. This will enable you to hone your risk management reflexes by tying in Data Quality improvement processing to a company objective (and evaluating the ROI on each quality dimension).
Once we have established the main features of an enterprise Data Quality management tool, we will detail how a Data Catalog - though not a Data Quality tool - can contribute towards Data Quality improvement (through the clarity, availability, and traceability dimensions mentioned above).
Overview
As CEO and Product Manager at Zeenea, a next-generation data catalog vendor, I often get to discuss metadata management expectations with our customers, prospects and partners.
In all our discussions, one topic comes up time and time again: Data Lineage.
As a concept, Data Lineage seems universal: whatever the sector of activity, any stakeholder in a data-driven organization needs to know the origin (upstream lineage) and the destination (downstream lineage) of the data they are handling or interpreting. And this need has important underlying motives.
For a Data Catalog vendor, the ability to manage Data Lineage is an essential component in its offer. As is often the case however, behind a simple and universal question lies a world of complexity that is difficult to grasp.
I believe there are several reasons for this complexity
๐ก The first reason is that while the question is always the same, the answers will vary substantially from one interlocutor to another,
๐ก The second reason is more operational - How can we capture, maintain, and update the huge mass of information that lineage represents?
๐ก The third reason involves ergonomy. How can such a volume of information be rendered in a digestible and relevant way?
It is these different aspects that I will elaborate on in this document, specifying throughout the approaches that we favor at Zeenea.