Skip to main content


Showing posts from 2018

CD-Stream:CDC Replicator Tool & Cons on ETL pipelines

Just another day at the work place;

5 minutes post the boot:

You hear everyone complain that the production database is slow. You quickly start to investigate; exploring all possible outcomes on the dashboards.. 
Could it have been the long-running slow query which you had raised a ticket for the production support to fix?.. Or Is it one of the queries run based on an un-indexed column?

6th Minute and 15 minutes down the lane:

Next you hear the fellow data-analysts lament over their failed reports. 
You now realize that your CPU had taken a humongous amount of query load and you understand that your relational database system has gone for a toss into an eternal slumber. And all of this due to a slow running query of your ETL pipeline..!! Ding. Ding.. Ding...!! We have a winner!!! Alright, let's phrase it this way. 
Probably you did/used one of the following:
- SELECT * from production_database.table where updated_at between x and y; - Airflow pipelines - Bulk exports and Dumps onc…

The No-BS guide to AutoComplete and FuzzySearch in Elasticsearch

Before we begin.. Here are a few basics.Analyzer: An analyzer does the analysis or splits the indexed phrase/word into tokens/terms upon which the search is performed with much ease.

An analyzer is made up of tokenizer and filters.

There are numerous analyzers in elasticsearch, by default;
here, we use some of the custom analyzers tweaked in order to meet our requirements.
Filter: A filter removes/filters keywords from the query. Useful when we need to remove false positives from the search results based on the inputs.

We will be using a stop word filter to remove the specified keywords in the search configuration from the query text.
Tokenizer: The input string needs to be split, in order  to be searched against the indexed documents. We are about to use ngram here, which splits the query text into sizeable terms.
Mappings: The created analyzer need to be mapped to a fieldname, for it to be efficiently used while querying.
T'is time!!! Now that we have covered the basics, t'is t…

Elasticsearch to MongoDB Migration - MongoES

The following are some of the instances where the developers simply love to hate! The one-last-thing syndrome - This reminds me of the following quote:   The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time. —Tom Cargill, Bell Labs, from the book `Programming Pearls ` QAs declaring certain undocumented features to be as bugs - Seriously, this create traumas for a devloper.Interruptions during coding - Here's an idea. Try talking to developers while they code; chances are, they have just about <10% of your attention. There are some problems which we get used to..

But, there are others which makes us wanna do this..

DISCONNECTION FROM THE SERVER DUE TO BAD INTERNET DURING A MIGRATION - Ouch!! That's gotta hurt real bad. Talking about ES to MongoDB Migration  - How hard could that be? Good Side: JSON objects are common for both. Numerous tools to…