Data cleansing is a process of removing irrelevant and redundant data, and correcting the incorrect and incomplete data. It is also called as data cleaning or data scrubbing.
All organizations are growing drastically with huge competitions, they take business decisions based on their past performance data and future projection. Always better decision can be made through right and consistent data.
But all the source systems do not have data with expected accuracy level. We need to do amendments in data to achieve the accuracy level which would lead to taking better decisions.
- Irrelevant – deleting data which are not required for business or not needed anymore
- Redundant – deleting the duplicate data
- Incorrect – updating incorrect values with correct value
- Incomplete – updating incomplete values with full information
All data cleansing can be achieved by using transformation components in ETL tools or executing SQL procedure or simple queries in staging area.