Course Title:
Data Cleansing - Practical Skills
Geared To:
Data quality practitioners - those in the trenches who are responsible to design, develop,
maintain, and operate data cleansing processes and to perform data cleansing activities.
You Will Learn:
The what, why, and how of data cleansing.
The relationships and roles of data quality assessment in data cleansing processes.
How to define a goal-oriented processing architecture for data cleansing.
A variety of techniques and solutions to specific data cleansing issues and problems.
A variety of data cleansing approaches that can be applied to different data types.
Summary:
It has been widely accepted that most databases are riddled with errors. These errors are
the cancer of information systems, spreading from place to place and wreaking operational and
financial havoc. Unfortunately, while data errors are spread throughout all parts of databases,
the data cleansing efforts in practice mostly focus on customer data standardization, de-duplication,
and matching. Cleansing the rest of the data is relegated to manual work and rarely succeeds.
This course presents a comprehensive, failsafe approach to data cleansing for all data types.
Course Outline:
1. Introduction to Data Cleansing
What is data cleansing?
What makes automated data cleansing possible?
What are the common mistakes of data cleansing?
What are the steps of data cleansing?
What are the roles and responsibilities in a data cleansing team?
2. Data Quality Assessment and Data Cleansing
Why recurring data quality assessment is necessary?
How frequently should data quality be assessed?
How to compare results of periodic assessments?
How to draw conclusions from periodic assessments?
What actions can be taken based on periodic assessments?
How to deal with changes in data structure and requirements?
3. Data Cleansing Overview
How to define data cleansing objectives?
How to build staging area?
How to organize and analyze error reports?
How to identify automated data correction rules?
What are the sources and types of data corrections?
How to apply data cleansing to production data?
How to organize ongoing data cleansing program?
4. Data Cleansing Problems and Solutions
How to decompose data cleansing into simple steps?
How to build and use data cleansing decision tree?
How to validate data cleansing results?
How to integrate automated and manual data cleansing?
How to keep audit trail of data corrections?
How to integrate data cleansing metadata into data quality metadata warehouse?
5. Data Cleansing Approaches for Different Data Types
How to cleanse basic indicative data?
How to cleanse time-dependent data?
How to cleanse state-dependent data?
How to cleanse interrelated data for complex business objects?
How to parse and standardize free-flow data?
How to deduplicate and match records?