Importance of ETL Performance testing:
ETL performance testing is very importance one which we need have as must testable one as part of data integration or ETL testing.
The performance of an ETL job is one of the major factors which need to be taken care as there will be a dependency on other job and there is need of data load without any delay.
Deciding factors to select an ETL job for performance testing:
For bulk load, (truncate and load) job below factors is to be considered
- First, check the data volume of table in production
- Second check the job run frequency, one time or regular interval run
If a source table has a huge volume of data but it’s just a one-time run after that incremental job will take care of data load.
Then performance testing is optional. In this case, we can select this table based on business need or still the job runs for very long time.
In another case, high data volume and frequent run then definitely we should have test case for performance measurement
For incremental load (loading only modified data) job below factors to be considered,
- Job runs frequency, most of the cases it runs every day
- We need to check the maximum volume of data which will undergo changes (all three operations insert, update and delete) within a day in production
If the source table undergoes a reasonable volume of record and the job runs every data then not required performing performance testing for this job unless it runs with poor performance.
Common factors which we have to consider this for performance testing,
Job with complex SQL queries
As we all know that SQL file or query can be executed through a job, a sometimes developer might implement the incremental load process with SQL file. In this case strongly, recommend for doing performance testing.
Job with more transformations
Job which has performance requirement for completion
Doing performance testing for ETL jobs:
- Get the maximum data volume from production
- Simulate the same volume or more than that by loading or other SQL technique
- Note down source table record count
- Start the job
- After successful completion, calculate the time taken for each component/command
- Report the issue if any deviation