Articles
AI in Data Governance
Assisting in defining business terms and identifying weak ones
Assisting in developing data quality rules
Assist in data labeling and information classification (sensitive data)
Identify gaps in data lineage
Identify objects a policy applies to
Future of Data Governance
Implementation of cloud based services
Amazon has a growing number of services that help with governing data in the cloud. As a result Data Governance teams will need to learn how to utilize them and determine which services
Implementation of AI
Currently there are no Data Governance tools taking advantage of AI to help with governance. Some examples of AI include assistive business definition creation, data labeling,
Must know Data Governance Vocabulary
Data Catalog
Metadata
Business Glossary
Business Term
Data Lineage
Data Model
Policy
Standard
Data Custodian
Data Governance Applications
Collibra
Positives
Negatives
Expensive
Informatica
DataHub
About
This is an open source data catalog
Positives
Negatives
No licensing costs
SAP Information Steward
Data Governance Success Measures
Enterprise use of data governance application
Difference in number of data quality incidents
Difference in number of downstream impact incidents
Enterprise business term fluency
Data Quality Rules - Can AI Help?
Yes but since general AI doesn't exist yet we will need human input, it's not out of this world for a data governance application to analyze a dataset and suggest data quality rules. I'm hoping the data governance applications begin to implement something like this feature, it would reduce the amount of human involvement at the least. There could even be an ML model for specific dataset contents, for example a database field "SSN" could be read and identified as a social security field. The model would then suggest data quality rules and ask for human input, this is taking use of the data labeling concept.
Data Governance Policies
What is a Data Governance Policy?
Describes guidelines for data which the enterprise wants to follow for its data management
Why are policies needed?
Communicates what guidelines need to be followed and it will result in a positive outcome
What are some examples?
Data Retention
Classification
Data Integrity
Data Classification
Confidential
Highly Confidential
Public
Restricted
Internal Only
Data Quality
Rules
Value range
Data length, precision, scale
Data type
Distinct values
Allowed characters
Allow present/future dates
Monitoring
A process/application will need to be utilized to monitor data quality (typically daily basis)
It will need to separate the data objects with data quality issues by a specific domain, the organization will need to identify this
Responsibility
Someone will need to be identified that will take responsibility for addressing data quality issues that are discovered