Skip to main content

The No-BS guide to AutoComplete and FuzzySearch in Elasticsearch


Before we begin.. Here are a few basics.

Analyzer:

An analyzer does the analysis or splits the indexed phrase/word into tokens/terms upon which the search is performed with much ease.

An analyzer is made up of tokenizer and filters.

There are numerous analyzers in elasticsearch, by default;
here, we use some of the custom analyzers tweaked in order to meet our requirements.

Filter:

A filter removes/filters keywords from the query. Useful when we need to remove false positives from the search results based on the inputs.

We will be using a stop word filter to remove the specified keywords in the search configuration from the query text.

Tokenizer:

The input string needs to be split, in order  to be searched against the indexed documents. We are about to use ngram here, which splits the query text into sizeable terms.

Mappings:

The created analyzer need to be mapped to a fieldname, for it to be efficiently used while querying.

T'is time!!!

Now that we have covered the basics, t'is time to create our index.

Fuzzy Search:

The first upon our index list is fuzzy search:

Index Creation:

curl -vX PUT http://localhost:9200/books -d @fuzzy_index.json \
--header "Content-Type: application/json"
 

And, the following books and their corresponding authors are loaded to the index.

name author
To Kill a Mockingbird Harper Lee
When You're Ready J.L. Berg
The Book Thief Markus Zusak
The Underground Railroad          Colson Whitehead
Pride and Prejudice Jane Austen
Ready Player One Ernest Cline

When a fuzzy query such as:

This query with the match keyword as "ready" returns the matched books ready as a keyword in the phrase; as,

AutoComplete:

Next up, is the autocomplete. The only difference between a fuzzy search and an autocomplete is the min_gram and max_gram values.

In this case, depending on the number of characters to be auto-filled, the min_gram and max_gram values are set, as follows:


Comments

Popular posts from this blog

ES Index - S3 Snapshot & Restoration:

The question is.. What brings you here? Fed up with all the searches on how to back-up and restore specific indices? 

Fear not, for your search quest ends here.!

After going through a dozens of tiny gists and manual pages, here it is.. We've done all the heavy-lifting for you.



The following tutorial was tested on elasticsearch V5.4.0

And before we proceed, remember:

Do's:

Make sure that the elasticsearch version of the backed-up cluster/node <= Restoring Cluster's version.

Dont's:

Unless it's highly necessary;

curl -XDELETE 'http://localhost:9200/nameOfTheIndex

      - deletes a specific index

Especially not, when you are drunk!:

curl -XDELETE 'http://localhost:9200/_all

      - deletes all indexes (This is where the drunk part comes in..!!)



Step1:Install S3 plugin Support:        sudo bin/elasticsearch-plugin install repository-s3
                                  (or)
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-s3

Depends on w…

Flyway - Database Migrations made easy & How not to accidentally Roleback all of your migrations

Flyway - by boxfuse: Is a schema migration tool and it acts more of like a version control for your relational databases.

If you are manually executing your sql scripts or if your administrator is manually executing the sql scripts, on your production or UAT environment, you definitely need this tool to be setup in all of your environments.

Before we proceed:

Statutory Warning: 

Never ever execute the following command, be it your production or UAT environment:

$ flyway clean   # Do not execute this, ever!!!!

Wondering what it does? It roles back whatever table migrations/changes you have done through flyway, along with their data. 

In short, Don't ever execute this command.

Now that we are done with all the warnings:


Installation:It is fairly straight forward:
Run the above command in a shell prompt.
Running the above creates a directory called as flyway-x.x.x/
Inside this directory are many other directories of which, the two most import directories are:
 conf/ - Configuration for eac…

ELK Stack... Not!!! FEK, it is.!!! Fluentd, Elasticsearch & Kibana

If you are here, you probably know what elasticsearch is and at some point, trying to get into the mix. You were searching for the keywords "logging and elasticsearch" or perhaps, "ELK"; and probably ended up here. Well, you might have to take the following section with a pinch of salt, especially the "ELK Stack"  fam.
At least from my experience, working for start-ups teaches oneself, a lot of lessons and one of the vast challenges include minimizing the resource utilization bottlenecks. On one hand, the logging and real-time application tracking is mandatory; while on the the other hand, there's a bottle neck in the allocated system resource, which is probably an amazon EC2 instance with 4Gigs of RAM.
ELK Stack 101: Diving in, ELK => Elasticsearch, Logstash and Kibana. Hmm, That doesn't add up; don't you think? Elasticsearch stores the reformed log inputs, Logstash chops up the textual logs and transforms them to facilitate query, deriva…