Skip to main content

Postgres to Mongo Migrator - Batteries Included!!!


     Well, long story short; cross-platform database migrations equals sleep talking, distress and long day works with coffee; and what good does it do? We will just end up writing hours and hours of scripts to conquer the end-result. However, It is of one-time-use-only, which lets you think to yourself; "All this horsepower and no room to gallop?".

Postgres to MongoDB:

    Be it a platform change, or maybe, it's due to the organizational growth or perhaps bad coding, or perhaps you have got your own microservices all set in, dwelling on the JSON objects; you might have had to switch from relational to noSQL databases. Switching can be tedious, I hear you and here lies the solution to all your worries.

Behold! Enter the Pg2Mongo:

 Pg2Mongo is an open source migration tool, written on pythonV3 which gives you an exclusive control over the migrations.

First Steps:

The initial step is to make sure you have access to both the Postgres and MongoDB servers. Upon cloning the repository, make sure you install the requirements for the pg2mongo to run.

For demonstration-sake, let's try to migrate the dataset provided along with the pg2mongo for us to play-around.

Configuration setup:

And now, all we got to do is to set up the instructions for the migrator to wrangle. The configuration file is at the location - 'pg2mongo/pg2mongo.yml' and it goes as follows:

The preliminary sections such as extraction and commit are self expanatory, stating the configuration settings for the extraction and commit databases. The component Migration is where all the magic happens!

The following section explains what the individual components are all about:


Inital table from which data needs to be migrated. This could be a prime table such as a transactions table with a primary key having multiple foreign constraints to other tables of the postgreSQL database. FOR EACH ENTRY IN THIS TABLE, THE LINKING OF OTHER TABLES WILL HAPPEN WHILE DEFINING THE TABLES.


KEYS of the init_table (aliases can be given using 'as')


Skeleton is an empty raw python dictionary assignment which will transform to a mongodb document, upon migration

The order by which the TABLES section needs to be executed for each of the entry from INIT_TABLE


Set of PostgreSQL tables enlisted along with condition and corresponding mapping. In the case of lists inside a dictionary, list can be mentioned. Mapping is where, the association of skeleton to the table keys is defined. The value assignments are python compatible; hence, they are defined by using '%s' and other python based variable transformation functions can be used over here.


This is where the push of the skeleton to the corresponding MongoDB collection takes place.
With all the instructions in place, it's time to wrangle. You may invoke the migration by keying in the following command.
And off she goes!!


Popular posts from this blog

ES Index - S3 Snapshot & Restoration:

The question is.. What brings you here? Fed up with all the searches on how to back-up and restore specific indices? 

Fear not, for your search quest ends here.!

After going through a dozens of tiny gists and manual pages, here it is.. We've done all the heavy-lifting for you.

The following tutorial was tested on elasticsearch V5.4.0

And before we proceed, remember:


Make sure that the elasticsearch version of the backed-up cluster/node <= Restoring Cluster's version.


Unless it's highly necessary;

curl -XDELETE 'http://localhost:9200/nameOfTheIndex

      - deletes a specific index

Especially not, when you are drunk!:

curl -XDELETE 'http://localhost:9200/_all

      - deletes all indexes (This is where the drunk part comes in..!!)

Step1:Install S3 plugin Support:        sudo bin/elasticsearch-plugin install repository-s3
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-s3

Depends on w…

The No-BS guide to AutoComplete and FuzzySearch in Elasticsearch

Before we begin.. Here are a few basics.Analyzer: An analyzer does the analysis or splits the indexed phrase/word into tokens/terms upon which the search is performed with much ease.

An analyzer is made up of tokenizer and filters.

There are numerous analyzers in elasticsearch, by default;
here, we use some of the custom analyzers tweaked in order to meet our requirements.
Filter: A filter removes/filters keywords from the query. Useful when we need to remove false positives from the search results based on the inputs.

We will be using a stop word filter to remove the specified keywords in the search configuration from the query text.
Tokenizer: The input string needs to be split, in order  to be searched against the indexed documents. We are about to use ngram here, which splits the query text into sizeable terms.
Mappings: The created analyzer need to be mapped to a fieldname, for it to be efficiently used while querying.
T'is time!!! Now that we have covered the basics, t'is t…

ELK Stack... Not!!! FEK, it is.!!! Fluentd, Elasticsearch & Kibana

If you are here, you probably know what elasticsearch is and at some point, trying to get into the mix. You were searching for the keywords "logging and elasticsearch" or perhaps, "ELK"; and probably ended up here. Well, you might have to take the following section with a pinch of salt, especially the "ELK Stack"  fam.
At least from my experience, working for start-ups teaches oneself, a lot of lessons and one of the vast challenges include minimizing the resource utilization bottlenecks. On one hand, the logging and real-time application tracking is mandatory; while on the the other hand, there's a bottle neck in the allocated system resource, which is probably an amazon EC2 instance with 4Gigs of RAM.
ELK Stack 101: Diving in, ELK => Elasticsearch, Logstash and Kibana. Hmm, That doesn't add up; don't you think? Elasticsearch stores the reformed log inputs, Logstash chops up the textual logs and transforms them to facilitate query, deriva…