Skip to main content

Posts

Flyway - Database Migrations made easy & How not to accidentally Roleback all of your migrations

Flyway - by boxfuse: Is a schema migration tool and it acts more of like a version control for your relational databases.

If you are manually executing your sql scripts or if your administrator is manually executing the sql scripts, on your production or UAT environment, you definitely need this tool to be setup in all of your environments.

Before we proceed:

Statutory Warning: 

Never ever execute the following command, be it your production or UAT environment:

$ flyway clean   # Do not execute this, ever!!!!

Wondering what it does? It roles back whatever table migrations/changes you have done through flyway, along with their data. 

In short, Don't ever execute this command.

Now that we are done with all the warnings:


Installation:It is fairly straight forward:
Run the above command in a shell prompt.
Running the above creates a directory called as flyway-x.x.x/
Inside this directory are many other directories of which, the two most import directories are:
 conf/ - Configuration for eac…
Recent posts

CD-Stream:CDC Replicator Tool & Cons on ETL pipelines

Just another day at the work place;

5 minutes post the boot:

You hear everyone complain that the production database is slow. You quickly start to investigate; exploring all possible outcomes on the dashboards.. 
Could it have been the long-running slow query which you had raised a ticket for the production support to fix?.. Or Is it one of the queries run based on an un-indexed column?


6th Minute and 15 minutes down the lane:

Next you hear the fellow data-analysts lament over their failed reports. 
You now realize that your CPU had taken a humongous amount of query load and you understand that your relational database system has gone for a toss into an eternal slumber. And all of this due to a slow running query of your ETL pipeline..!! Ding. Ding.. Ding...!! We have a winner!!! Alright, let's phrase it this way. 
Probably you did/used one of the following:
- SELECT * from production_database.table where updated_at between x and y; - Airflow pipelines - Bulk exports and Dumps onc…

The No-BS guide to AutoComplete and FuzzySearch in Elasticsearch

Before we begin.. Here are a few basics.Analyzer: An analyzer does the analysis or splits the indexed phrase/word into tokens/terms upon which the search is performed with much ease.

An analyzer is made up of tokenizer and filters.

There are numerous analyzers in elasticsearch, by default;
here, we use some of the custom analyzers tweaked in order to meet our requirements.
Filter: A filter removes/filters keywords from the query. Useful when we need to remove false positives from the search results based on the inputs.

We will be using a stop word filter to remove the specified keywords in the search configuration from the query text.
Tokenizer: The input string needs to be split, in order  to be searched against the indexed documents. We are about to use ngram here, which splits the query text into sizeable terms.
Mappings: The created analyzer need to be mapped to a fieldname, for it to be efficiently used while querying.
T'is time!!! Now that we have covered the basics, t'is t…

Elasticsearch to MongoDB Migration - MongoES

The following are some of the instances where the developers simply love to hate! The one-last-thing syndrome - This reminds me of the following quote:   The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time. —Tom Cargill, Bell Labs, from the book `Programming Pearls ` QAs declaring certain undocumented features to be as bugs - Seriously, this create traumas for a devloper.Interruptions during coding - Here's an idea. Try talking to developers while they code; chances are, they have just about <10% of your attention. There are some problems which we get used to..

But, there are others which makes us wanna do this..



DISCONNECTION FROM THE SERVER DUE TO BAD INTERNET DURING A MIGRATION - Ouch!! That's gotta hurt real bad. Talking about ES to MongoDB Migration  - How hard could that be? Good Side: JSON objects are common for both. Numerous tools to…

ES Index - S3 Snapshot & Restoration:

The question is.. What brings you here? Fed up with all the searches on how to back-up and restore specific indices? 

Fear not, for your search quest ends here.!

After going through a dozens of tiny gists and manual pages, here it is.. We've done all the heavy-lifting for you.



The following tutorial was tested on elasticsearch V5.4.0

And before we proceed, remember:

Do's:

Make sure that the elasticsearch version of the backed-up cluster/node <= Restoring Cluster's version.

Dont's:

Unless it's highly necessary;

curl -XDELETE 'http://localhost:9200/nameOfTheIndex

      - deletes a specific index

Especially not, when you are drunk!:

curl -XDELETE 'http://localhost:9200/_all

      - deletes all indexes (This is where the drunk part comes in..!!)



Step1:Install S3 plugin Support:        sudo bin/elasticsearch-plugin install repository-s3
                                  (or)
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-s3

Depends on w…

Postgres to Mongo Migrator - Batteries Included!!!

DATABASE MIGRATION ACROSS PLATFORMS - Got your goosebumps yet?      Well, long story short; cross-platform database migrations equals sleep talking, distress and long day works with coffee; and what good does it do? We will just end up writing hours and hours of scripts to conquer the end-result. However, It is of one-time-use-only, which lets you think to yourself; "All this horsepower and no room to gallop?".

Postgres to MongoDB:     Be it a platform change, or maybe, it's due to the organizational growth or perhaps bad coding, or perhaps you have got your own microservices all set in, dwelling on the JSON objects; you might have had to switch from relational to noSQL databases. Switching can be tedious, I hear you and here lies the solution to all your worries.
Behold! Enter the Pg2Mongo:
Pg2Mongo is an open source migration tool, written on pythonV3 which gives you an exclusive control over the migrations. First Steps: The initial step is to make sure you have acces…

Model Productionisation - MNIST Handwritten Digits Prediction

Yet another post about MNIST Handwritten Digits Prediction?

Nope. Not this time!!

There are about a hundred of tutorials available on-line for this cause.
Here's a quickie, to understand all the mechanics of the prediction process in tensorflow for the MNIST datasets which should get you, up and running.

Done with the basics? Head over to

https://github.com/datawrangl3r/mnistProduction

and clone the project.

We are about to deploy an image prediction RESTful API, powered by Flask microframework.

The code in the repo is written in pythonV2.7, you may also use python3; should be a breeze.


Step 2, mentioned above powers up the API, serving the end-point through the port 5000. Time to test-query our API.

The project directory contains the numerals_sample, from which one may crop the required digits out. As for this demo, we shall look at numba3.jpg, numba5.jpg, numba6.jpg, numba7.jpg and numba9.jpg present in the same directory, as that of the project.

Fire up the browser, and hit the …