Was ist Data Lineage und warum ist sie so wichtig?

Verfolgen Sie, woher die Daten eines Unternehmens stammen und welchen Weg sie im System durchlaufen. Stellen Sie außerdem sicher, dass Geschäftsdaten jederzeit compliant und korrekt sind.

Data lineage is the story of an organization’s data from the source, through
all processes and changes, to storage or consumption. It provides a stepwise
record of how data arrived at its current form, including both transformations
made to the data and its journey through different business systems. A data
lineage is essentially a map that can provide information such as:

  • Wann die Daten erstellt wurden und ob Änderungen vorgenommen wurden
  • Welche Informationen die Daten enthalten
  • Wie die Daten verwendet werden
  • Woher die Daten ursprünglich stammen
  • Wer die Daten verwendet und die Schritte im Lebenszyklus genehmigt und ausgeführt hat

The entire data flow is mapped to understand, document, and visualize data in
all stages.

 

Warum sollte man die Datenherkunft verfolgen?

In most business settings, data is being amassed constantly. It trickles (or
gushes) in from a variety of sources such as inventory data, point of sale,
and Internet of Things (IoT) devices. How this data is cleansed, organized,
stored, and maintained is vital to an organization’s success.

Different roles have needs when understanding data lineage. IT teams are often
interested in technical data lineage, where operations, compliance, and
processes are important. For executives, business data lineage is vital,
allowing them to understand the role data plays in overall business processes
and assures them the data used when making critical business decisions is
accurate.

Nachverfolgte Daten lassen sich einfach prüfen

Any data-dependent decision relies heavily on the accuracy of the raw data.
Executives can act with confidence when they know that they have extracted the
insights from verified, authenticated data. When data isn’t tracked
meticulously, it becomes cumbersome, time-consuming, and expensive to verify
its accuracy. It’s also easier to spot anomalies in clean, structured data. An
ounce of prevention is indeed worth a pound of cure in tracking data and
maintaining its consistency.

In a business setting, this could mean that executives are confident signing
an audit report, knowing its data is accurate.

Prozessänderungen mit geringem Risiko implementieren

Organizations also need to identify errors in their data, and where these
problems originated. Locating issues allows them to make process changes that
specifically target the issue with a clear understanding of where it occurred
and what impact new processes changes will have downstream.

An example of this is when data lineage accurately shows all the people
involved in a chain of responsibility. It’s simple for an organization to find
where data is coming from, and how changes were introduced to ensure both the
trustworthiness of data and address change control.

Die Nachverfolgung von Daten ist für Compliance-Zwecke erforderlich

It’s important to document that any changes implemented were made by an
authorized entity and for a valid reason, especially to protect the
confidentiality and safety of sensitive data sets. In addition to noting who
made the change, it’s also important to record the process used to make the
change and run the update to maintain the integrity of data lineage.

In an organization, this means knowing which policies were applied when
completing a business process. No surprises, no room for error.

Einfache Datenmigration sicherstellen

The volume and types of data collected are vast, and this creates problems.
How is the data stored? Can all those who need information access it? Do these
storage methods work across software platforms, geography, and time zones? The
data lineage process helps the data remain platform agnostic, allowing system
migrations with certainty.

Ein Data Mapping Framework erstellen

Employees and other stakeholders need to be able to access appropriate levels
of data. With a broad view of metadata, data lineage creates a data mapping
foundation, assisting with this need.

Data lineage means that organizations know the data has come from a trusted
source, was transformed in accordance with best practices, and stored safely.

Auf welche kritischen Geschäftsbereiche wirkt sich Data Lineage aus?

Strategische datenabhängige Geschäftsentscheidungen

Good decision making is one of the primary reasons why validating data lineage
is so important. All units of a modern organization rely on data to make
strategic decisions: Marketing, supply chain management, manufacturing,
operations, sales, and customer support all need information and insights from
field research or operational data. Data lineage impacts all aspects of
business growth, including product and service development.

Compliance und Data Governance

Regulatory compliance and audits are an inevitable part of being in business.
Data lineage tracking is vital for all components of business associated with
compliance and maintaining accurate records of all accounts and events. Data
lineage improves risk management scenarios, ensures standardization of all
data handling, makes sure data processes follow company policies, and that
data meets all regulatory requirements. In many organizations, reporting
requirements include granular reporting data to support results. In finance sectors, important metrics and figures depicted in reports must be backed up
with data. Therefore, it’s critical that organizations can backtrack over the
entire history of any data transformation and provide explanations for any
query.

Komponenten der Data Lineage

Die Datenflüsse, die Teil der Data Lineage sind, kennzeichnen die Beziehung zwischen Daten und den folgenden Komponenten einer Organisation:

  • Datenanwendungen innerhalb eines Betriebs- oder Geschäftsprozesses
  • Various business roles and levels of authorization in creating, handling,
    accessing, deleting, or updating specific data sets
  • Netzwerksegmente
  • Sicherheitszuordnung
  • Andere IT-Systeme

Technische Vorteile der Pflege der Data Lineage

Schnelle Anpassung neuer Technologien

Data lineage tracking helps companies stay abreast of new technologies. Data
is not static in terms of its components or methods of collection. Lineage
tracking makes it possible to reconcile old and new data sets, combining and
recombining them, and maintaining them in a format that organizations can
still use to extract actionable insights from.

Bessere IT-Systeme und Datenportierung

Data migration from one storage system to another is inevitable in these times
of rapidly developing technologies. Data lineage tracking between source and
destination systems makes life easier for IT departments when moving data to
new servers or software.

Identifizierung von Compliance- oder Sicherheitsproblemen

During data processing, lineage helps to document and analyze specific
operations at every distinct stage to pinpoint errors or any compliance or
security violations.

Optimierung von Datenabfragen

Lineage can track query history such as users’ queries, filtering data, and
joining datasets. Data lineage should be performed on all queries plus
automated reports generated by data warehouses or databases for validation.
Lineage data can help users with optimizing queries to get the best results.

Data Lineage-Methoden

A few standard techniques are used to carry out data lineage on an
organization’s strategic, structured datasets. These include:

Musterbasierte Data Lineage

As the name suggests, this technique performs lineage investigation by
sweeping and looking for significant patterns in metadata. It assesses tables,
business reports, and columns within disparate datasets for similarities
indicative of redundancy. Having found highly similar columns with
corresponding values, it links them together in the data lineage chart to
account for the data in various stages of its life cycle. This technique does
not vary with database technology, plus, it can do the job irrespective of
algorithms or technological advancements. However, it cannot access data
processing logic if it is embedded in the program code. It can only crawl
metadata that is human-readable.

Data Lineage durch Parsen

This is a highly advanced method of performing data lineage, which
reverse-engineers data transformation logic to achieve end-to-end tracing of
the data. It requires an understanding of every programming language and tool
involved in transforming or altering the data, therefore, is extremely
in-depth and comprehensive.

Daten-Tagging

Data tagging is most effective in closed data systems, wherein there is
consistency in the tool used to transform data or move it. Data tagging works
on the assumption that a transformation tool or engine puts an identifiable
mark (a tag) on the data, which tracks it from beginning to end.

Eigenständige Data Lineage

As the name suggests, this format of data lineage works best within a
self-contained system or data environment which includes processing logic,
master data management, and storage. Such controlled environments include a
data lake which is a repository of all data across all steps of its life,
making data easy to access, albeit within the self-contained system’s
boundaries.

Data Lineage mit anderen Datenpraktiken kombinieren

Data lineage is one step in a solid data process. An organization needs a raft
of automated techniques, software, and practices to ensure good data
management. Each of these practices weave into data lineage to form a robust
framework.

For example, data classification is used to find data that is confidential,
critical, or needs some level of compliance. Data classification works with
data lineage by investigating the data’s lifecycle, finding integrity or
security issues, and helping to resolve them.

Kümmern Sie sich um die Grundlagen Ihrer Daten

Your data situation is never going to be any better unless you take steps to
resolve it. The amount of data collected, speed of processing, and data
legislation is only going to increase. You need to find a data management solution now. Alteryx has the answer, with powerful in-built data analytics
and management tools.

If you leave your data unprotected, disorganized, and without lineage
tracking, you’re leaving your organization open to errors, fines, and loss of
customer confidence. Contact us today to find out how our data quality
management tools protect your data, organize it, and create clear data lineage
for data governance. We’ve got you covered with solutions to help you
centralize and catalogue data, streamline discovery, drive collaboration and
data sharing, and understand the trustworthiness of data assets.

Nächste Begriff
Feature Engineering