Skip to main content


Showing posts from October, 2018

CD-Stream:CDC Replicator Tool & Cons on ETL pipelines

Just another day at the work place;

5 minutes post the boot:

You hear everyone complain that the production database is slow. You quickly start to investigate; exploring all possible outcomes on the dashboards.. 
Could it have been the long-running slow query which you had raised a ticket for the production support to fix?.. Or Is it one of the queries run based on an un-indexed column?

6th Minute and 15 minutes down the lane:

Next you hear the fellow data-analysts lament over their failed reports. 
You now realize that your CPU had taken a humongous amount of query load and you understand that your relational database system has gone for a toss into an eternal slumber. And all of this due to a slow running query of your ETL pipeline..!! Ding. Ding.. Ding...!! We have a winner!!! Alright, let's phrase it this way. 
Probably you did/used one of the following:
- SELECT * from production_database.table where updated_at between x and y; - Airflow pipelines - Bulk exports and Dumps onc…