Tuesday, July 23, 2013

Using ElasticSearch to cache more than 4.6 million records and integrating with jQuery auto-complete highlighting


Hi,

The topic is really interesting, right :)
Buzz words in the topic “Using ElasticSearch to cache more than 4.6 million records and integrating with jQuery auto-complete highlighting” is ElasticSearch.  Let's go through it.

PROBLEM TAKEN
Some background: In one of my project there is requirement to show auto completion box to display company names.  These company names were stored in one of the oracle table and in count total companies records are more than 4.6 million.

It was expected to
-          Fetch the records from database; populate the elastic-search index using these records. 
-          Integrate the calls for elastic-search from jQuery using ajax.
-          Server hits as the user types, fetching data from elastic-search and populate in jQuery auto-complete box.
-          Highlight the data matching in auto-complete box.

PICTORIAL REPRESENTATION

This is what expected.



TECHNOLOGY STACK
  •           ElasticSearch
  •          jQuery


SOLUTION APPROACH
In the approach, there were multiple tasks that were completed.
1.       ElasticSearch
Elastic search is a “flexible and powerful open source, distributed real-time search and analytics engine for the cloud”.

I installed the elastic search 0.90.2 version.   

By default elastic search does not provide any way to populate its indices via any data source.  There is a plugin called jdbc-river (https://github.com/jprante/elasticsearch-river-jdbc).  River is basically a terminology used to populate indices with specific key.  There are other types of rivers available using which you can populate indices via CSV, Oracle, MySQL etc.

After downloading elastic search I set it up and then install the river-jdbc plugin (instructions are available on site).  Now using the river I executed below command to fetch the records from database and build indices.

PUT Body:
{
    "type" : "jdbc",
    "jdbc" : {
        "driver" : "oracle.jdbc.driver.OracleDriver",
        "url" : "jdbc:oracle:thin:@//[IP]:1521/[DB]",
        "user" : "user",
        "password" : "pwd",
        "sql" : "select col1, col2, col3 from table"
    },
    "index" : {
        "index" : "index_name",
        "type" : "type_name"
      
    }
}

You can put any string as index and type.  You can use any REST client from FF or Chrome to fire above query.  Just after successful execution, elastic search will start fetching the data and building indices.  In my case elastic search fetch 4.6 million records.  I specifically fetched 2 columns and third column was concatenation of first two.

Now elastic search is ready and populated.  I just need to hit the REST services of elastic search to get desired data. 


You can test your elastic search using below command
     Url: http://localhost:9200/index_name/type_name/_search
     POST Body
     {"from":0,"size":10, "query":{"field":{"COL1":"*760*" } } }


2.       jQuery
As a second part, jQuery need to be integrated.  I created one html page that has code to call elastic search REST services after user has typed 3 characters in one of the text box.  I call the records in the chunk of 10. 
After getting 10 matching records, process it to make a java-script array and pass it to auto-complete control. 
Then there is one hack or you can say MonkeyPatch (a terminology to override thirdparty’s function for custom behavior.) as a java script function to match the words with spaces in search result and highlight their all occurrences.

<html lang="en">
<head>
  <meta charset="utf-8" />
  <title>jQuery UI Autocomplete - Default functionality</title>
  <link rel="stylesheet" href="jquery-ui.css" />
  <script src="jquery-1.9.1.js"></script>
  <script src="jquery-ui.js"></script>
  <script>

var AUTOCOMPLETE_VIEW = 1;

function monkeyPatchAutocomplete() {

          // Don't really need to save the old fn,
          // but I could chain if I wanted to
          var oldFn = jQuery.ui.autocomplete.prototype._renderItem;

          jQuery.ui.autocomplete.prototype._renderItem = function( ul, item) {
              //var re = new RegExp("^" + this.term, "i") ;
              var searchStr = this.term;
              if (searchStr.indexOf (" ")!=-1)
              {
                  var re = new RegExp((replaceAll (" ", searchStr, "|")), "ig") ;
                  t = item.label.replace(re,"<span style='font-weight:bold;color:Blue;'>" + "$&" +  "</span>");
              } else {
                  var re = new RegExp(this.term, "i") ;
                  var t = item.label.replace(re,"<span style='font-weight:bold;color:Blue;'>" + "$&" +  "</span>");
              }

              return jQuery( "<li></li>" )
                  .data( "item.autocomplete", item )
                  .append( "<a>" + t + "</a>" )
                  .appendTo( ul );
          };
      }


function replaceAll(oldStrPattern, str, newStrPattern)
{
  var temp="";
  if(str!=null && oldStrPattern!=null)
   {
     var idx = str.indexOf(oldStrPattern);
     while (idx > -1)
     {
         temp += str.substr (0, idx);
         temp += newStrPattern;
         str=str.substr (idx+oldStrPattern.length,str.length);
         idx = str.indexOf(oldStrPattern);
     }
     temp += str;
   }
   return temp;
}

  jQuery(function() {

    jQuery.support.cors = true;
    monkeyPatchAutocomplete();

    jQuery( "#tags" ).autocomplete({

      source: function( request, response ) {
     
        var isDuns = false;
        var val = jQuery("#tags").val();
        var arr = new Array ();
        try {
            var i = parseInt (val);
            isDuns = !isNaN(i);
        } catch (e) {
            isDuns = false;
        }

        if (isDuns){
            val = "*"+val+"*";
        } else {
            val = "+"+val+"*";
        }

        if (AUTOCOMPLETE_VIEW==2)
        {
            var dt = {"from":0,"size":10, "query":{ "field" : { "COL2" : val } } };
            if (val.indexOf(" ")!=-1)
            {
                val = replaceAll (" ", val, "* +");
                dt = {"from":0,"size":10, "query":{ "field" : { "COL2" : val } } };
            }
        } else {
            //var dt = {"from":0,"size":10,"query":{"wildcard" : { "COL1" : val+"*" }},"sort":[{"COL1":{"order":"asc"}}]};
            var dt = {"from":0,"size":10, "query":{ "field" : { "COL1" : val } } };
            if (isDuns){
                dt = {"from":0,"size":10, "query":{ "field" : { "COL3" : val } } }
            }
            if (val.indexOf(" ")!=-1)
            {
                val = replaceAll (" ", val, "* +");
                dt = {"from":0,"size":10, "query":{ "field" : { "COL1" : val } } };
                isDuns = false;
            }
        }

        if (val != null && val.length > 2) {

            jQuery.ajax({
                type: "POST",
                url: "http://192.168.35.106:9200/index_name/type_name/_search",
                dataType: "json",
                async: false,
                data: JSON.stringify(dt),
                contentType: "application/json; charset=utf-8",
                success: function(data) {
                    //console.log(data);
                    var result=data;
                    var idx = 0;
                    var hitsJson = (data.hits.hits);                  
                    for (var key in hitsJson) {
                      if (hitsJson.hasOwnProperty(key)) {
                          if (AUTOCOMPLETE_VIEW==2) {
                            arr.push ((hitsJson[key]._source.COL2));
                          } else {
                            if (!isDuns){
                                arr.push ((hitsJson[key]._source.COL1));
                            } else {
                                arr.push ((hitsJson[key]._source.COL3));
                            }
                          }
                      }
                    }                                  
                    response(arr);

          }
                , error: function (xhr) {
                    alert ("err");
                    alert (JSON.stringify(xhr));
                }
            });
        }
    }, minLength: 3, delay: 300
    });
});

  </script>
</head>
<body>

<div class="ui-widget">
  <label for="tags">Tags: </label>
  <input id="tags" />
</div>


</body>
</html>

I used jQuery word instead of $ as in project multiple JS frameworks were used and conflicts were detected.

That’s it.
Please let me know if you need any other help or more information.

Thanks

Shailendra