Before we begin.. Here are a few basics.
Analyzer:An analyzer does the analysis or splits the indexed phrase/word into tokens/terms upon which the search is performed with much ease.
An analyzer is made up of tokenizers and filters.
There are numerous analyzers in elasticsearch, by default;
here, we use some of the custom analyzers tweaked to meet our requirements.
Filter:A filter removes/filters keywords from the query. Useful when we need to remove false positives from the search results based on the inputs.
We will be using a stop word filter to remove the specified keywords in the search configuration from the query text.
Tokenizer:The input string needs to be split, to be searched against the indexed documents. We are about to use ngram here, which splits the query text into sizeable terms.
Mappings:The created analyzer needs to be mapped to a field name, for it to be efficiently used while querying.
T'is time!!!Now that we have covered the basics, it's time to create our index.
Fuzzy Search:The first upon our index list is fuzzy search:
curl -vX PUT http://localhost:9200/books -d @fuzzy_index.json \ --header "Content-Type: application/json"
And, the following books and their corresponding authors are loaded to the index.
|To Kill a Mockingbird||Harper Lee|
|When You're Ready||J.L. Berg|
|The Book Thief||Markus Zusak|
|The Underground Railroad||Colson Whitehead|
|Pride and Prejudice||Jane Austen|
|Ready Player One||Ernest Cline|
When a fuzzy query such as:
This query with the match keyword as "ready" returns the matched books ready as a keyword in the phrase; as,
Next up, is the autocomplete. The only difference between a fuzzy search and an autocomplete is the min_gram and max_gram values.
In this case, depending on the number of characters to be auto-filled, the min_gram and max_gram values are set, as follows: