A strong data governance framework is central to a successful data governance implementation. However, when it comes to a data governance framework, most people think of rules, roles, and responsibilities. Rules such as how data should and shouldn’t be stored, managed, and used. Roles associated with governing data such as data stewards, owners/custodians, and responsibilities around the decision making associated with and the overall “care and feeding” of well-governed data assets. Foundationally, they are correct, but there is much more required for a data governance framework to be successful and sustainable in today’s modern enterprise.
What is a data governance framework?
A data governance framework is a standardized meta-model of key information that should be available for any data element or data source. It needs to be as broad as necessary to provide the wide use of data intelligence required to protect the data, mitigate risks associated with the data, and optimize the usage to capture the business value of the data. A data governance framework should be a single source of all relevant artifacts related to data as it is developed, stored, managed, transformed, moved, integrated and consumed across an enterprise. It needs to be able to deal with data in the abstract (agnostic to data management technologies, infrastructure and platforms, cloud or otherwise) for the business stakeholders while having deep support and awareness for the broad array of technologies, infrastructure and platforms impacting data today and into the future.
What are the key elements of a data governance framework?
The following is a typical but not exhaustive list of the key elements of a data governance framework. Once captured, the framework should enable the association and inter-relating of this information to provide visibility and intelligence around your data.
- Technical metadata for data stores, data movement processes and data consumption processes. This should include schema, relationships, data types, definitions, valid values, validation rules, constraints, and views.
- Business metadata such as ownership, stewardships, terminology, definitions, classification policies for sensitive data governance (i.e., PII, IP, HCI), business rules and any other artifact that may bring additional context such as data sharing agreements, regulatory association, and data certification.
- Data quality scoring, rules, health checks, remediation processes and profiling results.
- Reference data, including code sets and crosswalks that associate your reference data to specific use cases.
- Data governance processes and workflows to manage the interaction between data governors, managers and users.
- Usage metadata such as who has accessed the data, how often and for what, as well as other key performance indicators that will promote and guide further usage.
- Insights derived from the association of these artifacts such as impact analysis, data intelligence dashboards, where used, navigable graph views of relationships and association and of course, data lineage view and analysis.
- Communities, tagging and collaboration workflows to capture and codify stakeholders’ tribal knowledge and feedback on the data they use and/or find interesting.
Types of data governance frameworks
Establishing a data governance framework typically involves two traditional approaches: top-down and bottom-up. These approaches are rooted in opposing philosophies—one emphasizing data control for optimized quality and the other prioritizing accessible data for end users across business units.
Top-Down Method: Focus on Data Control
This centralized approach relies on a small team of data professionals employing well-defined methodologies and best practices, emphasizing data modeling and governance. However, scalability becomes a challenge as data providers (usually IT) have exclusive control over the data, creating a bottleneck as demands from data consumers increase. Today’s business requirements necessitate widespread access to clean, complete data, challenging the traditional gatekeeper role of data providers.
Bottom-Up Method: Focus on Data Access
In contrast, the bottom-up method allows for more agility. Starting with raw data, structures are created on top (referred to as “schema on read”), and data quality controls, security rules, and policies are implemented after data ingestion. While more scalable than the top-down approach, the bottom-up method introduces challenges in establishing control since data governance is implemented later in the process.
What’s needed is a modern data governance framework that strikes a balance between access and control, establishing control early without compromising the ability for users and subject matter experts to become data owners and curators.
How do you create a data governance framework?
Due to the complexity of every organization’s business, data, and technology landscape, as well as the uniqueness of their digital transformation goals, a data governance framework needs to be flexible enough to accommodate the individuality of the organization’s needs. Additionally, you don’t know what new data may come along in the future that could expand your governance requirements. With that being said, there are some core foundational principles to creating a solid framework. Here are five key steps to apply when creating a data governance framework in your organization.
1. Identify and document assets, processes and pipelines
It all begins with the ability to identify and document your physical data assets, processes and pipelines. This provides a foundation that represents the reality on the ground, detailing the structures that store data and the processes that move and consume data. This develops the basis for providing detailed data lineage and impact analysis and navigation aids that enable stakeholders to traverse the landscape in context.
A flexible data governance framework must breakdown organizational silos with awareness and support for hybrid cloud deployments (cloud and on-premises), different data formats (relational, flat files, NoSQL), legacy and modern integration and movement approaches (ETL, ELT, streaming etc.). It also must include the wide variety of data consumption uses cases and technologies (BI/Reporting, ML/AI, advanced analytics) combined with the ability to normalize these differences into a single comprehensive physical view.
2. Create business context
Next is the ability to create business context. A flexible business glossary capability is critical for this. A typical business glossary enables organizations to define business data terms, descriptions, policies and rules and associate these with the physical data landscape. However, the real value comes when organizations can easily define and maintain other business data assets that reflect the unique needs of the organization and associate these with any and all aspects of the framework that applies. Examples of this can be things like data sharing agreements, regulatory and other compliance tagging, data classification schemes and metrics that enable organizations to increase visibility and literacy into the organizational impact, risks and value of their data assets.
3. Understand data quality
The third leg of the stool is data quality. Business users need to understand the quality of data in order to weigh the value and insights it can bring to develop the trust required to make decisions using it. It starts with providing profiling statistics in order to understand the nuances of the data. Data quality rules need to be accessible so that stakeholders can understand what should be there, constantly check the data against the rules and request quality remediation when there are problems. Quality scores should be associated with all data sources so that end users can choose the highest quality data sources to satisfy their analytics, AI and ML use cases.
4. Stakeholder socialization and collaboration
Once you have captured and combined the physical and business perspectives of your data, the next key capability to enable is stakeholder socialization and collaboration. A read-only view into the data governance framework allows stakeholders of all stripes from across the enterprise to discover and navigate their data, in the context of their unique role, providing the literacy required to enable a data-driven culture.
Enhancing this view with managed feedback loops and guided workflows allows organizations to capture and share the “tribal knowledge” that exists across the organization and provides the rigor and clarity required to maintain and enhance the framework with clear roles and responsibilities. With this, organizations can foster “data communities” and a social network around data that again, breaks down organizational silos, enhances cross-functional coordination and increases the organizational trust in and strategic use of data.
5. Automation
The last critical component for success is automation. If delivering a data governance framework was an easy task, you would have already done it. Setting up a flexible data governance framework is one thing, but being able to maintain its accuracy, completeness and timeliness is a daunting task. If this facility becomes out of date or incomplete, it will die on the vine due to lack of stakeholder trust and the risk that inaccurate insights into your data could lead to unwanted business results.
Thus, automating as much of your data governance framework as possible will assure that you will be able to maintain its relevance and agility no matter where your digital transformation efforts take you.
It starts with automating the ongoing capture and harvesting of metadata from your physical data landscape and ensuring it is clearly versioned and visible through the complete lifecycle, mitigating any negative impacts by ensuring there is no latency that results from this process.
Govern your way to
business value.
Using automation intelligence
Using automated intelligence to create and maintain the associations between physical and business data assets is important so that new data assets and/or governance artifacts are available, protected and relevant on day one.
Data intelligence and insights
The next automation point is data intelligence and insights. The ability to utilize the contents of this framework to synthesize actionable insights such as on-demand lineage, impact analysis, asset inter-relationships and topical metrics will enable the organization to make better data decisions and maximize the return on the opportunity that your data represents.
Drive operational change
Finally, leveraging automation to drive functional and operational change by utilizing this comprehensive view of your business data landscape to deliver new and modified data pipelines and satisfy new data use cases faster and more accurately is key. Activating your metadata in this way will increase the return on managing the metadata and create a more agile and risk-mitigated backbone to your data driven enterprise.
Maximizing business value
At the root of data intelligence is data governance, which helps ensure the right level of data access, availability and usage based on a defined set of data policies and principles. While the maturity of data governance best practices and implementations varies across all organizations, overcoming data governance challenges is a high priority for organizations of all sizes and types. However, deriving business value from data governance is not guaranteed and many data governance initiatives have failed. Adopting a flexible and automated data governance framework will ensure your business will benefit from having an agile, comprehensive, opportune and sustainable source of data intelligence to underpin your business-critical data capabilities and assure your data-driven approach to doing business.