Skip to main content

Elasticsearch to MongoDB Migration - MongoES

The following are some of the instances where the developers simply love to hate!
  • The one-last-thing syndrome - This reminds me of the following quote:
  The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time.
Tom Cargill, Bell Labs, from the book `Programming Pearls `
  • QAs declaring certain undocumented features to be as bugs - Seriously, this create traumas for a devloper.
  • Interruptions during coding - Here's an idea. Try talking to developers while they code; chances are, they have just about <10% of your attention. 
There are some problems which we get used to..

But, there are others which makes us wanna do this..



  • DISCONNECTION FROM THE SERVER DUE TO BAD INTERNET DURING A MIGRATION - Ouch!! That's gotta hurt real bad.

Talking about ES to MongoDB Migration 

- How hard could that be?

Good Side:
JSON objects are common for both.
Numerous tools to choose from, for migration.
Bad Side: 
The Migration can be hideous, and can eat up a lot of the system resources. Be ready for a system-freeze, in case the migration tool uses a queue.
Ugly Side:
Can never be resumed from the point of failure. If the connectivity goes down during the migration; the transferred collection has to be deleted and the data transfer has to be initiated once again from the beginning.


Alright, there's nothing there to be felt bad about.

Enter, MongoES.



MongoES is a pure python3-bred Migration tool to migrate documents from the elasticsearch's index to the MongoDB collections.

It's robust in it's native way; no queues/message brokers are involved; which means that there won't be any memory spikes or system freezes.

This became achievable due to the fact that MongoES specifically uses a tagging strategy prior to the migration. The tagging happens in the source elasticsearch, which stands as a checkpoint during the migration.

Why a new custom id tagging, while there's an _id already?

Unless the documents are explicitly tagged, the _id fields in elasticsearch documents are a bunch of alphanumeric strings generated to serialize the documents. These _id columns become unusable, since queries/aggregations can not be run using them.

MongoES - How to:
  1. Install all the Prerequisites.
  2. Clone the repository from https://github.com/datawrangl3r/mongoes.git
  3. Edit the mongoes.json file according to your requirements.

  4. Make sure that both the elasticsearch and mongoDB services are up and running, and fire up the migration by keying in:

  5. Sit back and relax; for we got you covered! The migration's default value is 1000 documents per transfer.
Happy Wrangling!!! :)

Comments

Popular posts from this blog

ES Index - S3 Snapshot & Restoration:

The question is.. What brings you here? Fed up with all the searches on how to back-up and restore specific indices? 

Fear not, for your search quest ends here.!

After going through a dozens of tiny gists and manual pages, here it is.. We've done all the heavy-lifting for you.



The following tutorial was tested on elasticsearch V5.4.0

And before we proceed, remember:

Do's:

Make sure that the elasticsearch version of the backed-up cluster/node <= Restoring Cluster's version.

Dont's:

Unless it's highly necessary;

curl -XDELETE 'http://localhost:9200/nameOfTheIndex

      - deletes a specific index

Especially not, when you are drunk!:

curl -XDELETE 'http://localhost:9200/_all

      - deletes all indexes (This is where the drunk part comes in..!!)



Step1:Install S3 plugin Support:        sudo bin/elasticsearch-plugin install repository-s3
                                  (or)
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-s3

Depends on w…

Flyway - Database Migrations made easy & How not to accidentally Roleback all of your migrations

Flyway - by boxfuse: Is a schema migration tool and it acts more of like a version control for your relational databases.

If you are manually executing your sql scripts or if your administrator is manually executing the sql scripts, on your production or UAT environment, you definitely need this tool to be setup in all of your environments.

Before we proceed:

Statutory Warning: 

Never ever execute the following command, be it your production or UAT environment:

$ flyway clean   # Do not execute this, ever!!!!

Wondering what it does? It roles back whatever table migrations/changes you have done through flyway, along with their data. 

In short, Don't ever execute this command.

Now that we are done with all the warnings:


Installation:It is fairly straight forward:
Run the above command in a shell prompt.
Running the above creates a directory called as flyway-x.x.x/
Inside this directory are many other directories of which, the two most import directories are:
 conf/ - Configuration for eac…

ELK Stack... Not!!! FEK, it is.!!! Fluentd, Elasticsearch & Kibana

If you are here, you probably know what elasticsearch is and at some point, trying to get into the mix. You were searching for the keywords "logging and elasticsearch" or perhaps, "ELK"; and probably ended up here. Well, you might have to take the following section with a pinch of salt, especially the "ELK Stack"  fam.
At least from my experience, working for start-ups teaches oneself, a lot of lessons and one of the vast challenges include minimizing the resource utilization bottlenecks. On one hand, the logging and real-time application tracking is mandatory; while on the the other hand, there's a bottle neck in the allocated system resource, which is probably an amazon EC2 instance with 4Gigs of RAM.
ELK Stack 101: Diving in, ELK => Elasticsearch, Logstash and Kibana. Hmm, That doesn't add up; don't you think? Elasticsearch stores the reformed log inputs, Logstash chops up the textual logs and transforms them to facilitate query, deriva…