Elastic Search - implement "Did you Mean" - ruby-on-rails

We are trying to use Elastic Search in a Rails app and would like any input/code example on the implementation of "did you mean" feature. Essentially, we want to provide the end user an option to search for an alternate query like in google.

As of version 0.90.0.Beta1, ElasticSearch has a "term suggest" feature included, which is what you are looking for:
http://www.elasticsearch.org/guide/reference/api/search/term-suggest/
E.g. get from this query: "devloping distibutd saerch engies"
this result: "developing distributed search engines"

Elasticsearch doesn't have it yet, it is opened as issue here basically it is waiting for the next Lucene release.
I achieved a similar "did you mean" behaviour using the phonetic analyzers, which worked for my use case, location names, that is not gonna work for all use cases....
a example mapping:- https://gist.github.com/1171014
so you can query using the REST api like this (mispelled london):-
{
"query": {
"field": {
"nameSounds": "lundon"
}
}
}

You can use fuzzy search:
"fuzzy" : {
"user" : {
"value" : "Jon",
"boost" : 1.0,
"fuzziness" : 3,
"prefix_length" : 0,
"max_expansions": 100
}
}
Check this link for fuzzy : http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html

Related

How to concatenate 2 fields into one during query time in solr

I have a document in solr which is already indexed and stored like
{
"title":"Harry potter",
"url":"http://harrypotter.com",
"series":[
"sorcer's stone",
"Goblin of fire",
]
}
My requirement is,during query time when I try to retrieve the document
it should concatenate 2 fields in to and give the output like
{
"title":"Harry potter",
"url":"http://harrypotter.com",
"series":[
"sorcer's stone",
"Goblin of fire",
],
"title_url":"Harry potter,http://harrypotter.com"
}
I know how to do it during index time by using URP but I'm not able to understand how to achieve this during query time.Could anyone please help me with this.Any sample code for reference would be a great help to me.Thanks for your time.
concat function is available in solr7:
http://localhost:8983/solr/col/query?...&fl=title,url,concat(title,url)
if you are in an older solr, how difficult is to do this on the client side?
To concat you can use concat(field1, field2).
There are many other functions to manipulate data while retrieving.
You can see that here.

Conditional queries for ElasticSearch 5.x (elasticsearch-rails/elasticsearch-model)

New to ElasticSearch and I was wondering if there is a way to construct conditional queries/filters. I am working with Rails, so I suppose it has to be on that particular level, as I couldn't find anything that points to conditional queries at ES-Level and I am pretty sure it was silly just to assume!
So here is the (working) query I have:
search_definition = {
query: {
bool: {
must: [
{
more_like_this: {
fields: tag_types,
docs: [
{
_index: self.class.index_name,
_type: self.class.document_type,
_id: id
}
],
min_term_freq: 1
}
}
],
should: [
range: {
age: {
gte: min_age,
lte: max_age,
boost: 4.0
}
}
],
filter: {
bool: {
must: [
term: {
active: true
}
],
must: [
geo_distance: {
distance: xdistance,
unit: "km",
location: {
lat: xlat,
lon: xlng
},
boost: 5.0
}
]
}
}
}
},
size: how_many
}
And it works perfectly fine. Now let's assume I'd like to apply additional filters, in this particular example I need to verify when the user who is searching, that the users on the other end are, in fact, looking for a person of gender for whoever is searching. This is held in 2 separate boolean attributes in the database (male/female). I thought it would be simple enough to prepare two similar filters - however, there are a few more conditional filters that run into the queries, and I would eventually end up with more than ten pre-prepared filters. There must be a more elegant way! Thank you!
Are you familiar with elasticsearch search templates?
Using search templates you can have conditional and dynamic queries. for example you can have a list of fields and values to do terms filter and pass it to search template as a parameter.
As suggested by Mohammad - in the end, I pursued a solution using ES search templates which made my life a lot easier. The problem with JBuilder, ElasticSearch-DSL and other solutions is that they appear not to be working with the latest ES, and subsequently, I am not sure where I end up should there me ever any changes to gems or version of ES. So cutting the middle man out and taking full control with templates that are in fact super easy to create made a lot of sense to me. The versions I set up with JBuilder and ES-DSL never worked correctly as their output was random at best.
Search Templates -> More Information
JBuilder -> More Information
ElasticSearch-DSL -> More Information
There are other solutions that I haven't tried, but with search templates, I didn't see any need for that.

grails gorm mongodb `like` functionality in criteria

Is like or rlike supported for searching a string in a collection's property value?
Does the collection need to define text type index for this to work? Unfortunately I can not create a text index for the property. There are 100 million documents and text index killed the performance (MongoDB is on single node). If this is not do-able without text index, its fine with me. I will look for alternatives.
Given below collection:
Message {
'payload' : 'XML or JSON string'
//few other properties
}
In grails, I created a Criteria to return me a list of documents which contain a specific string in the payload
Message.list {
projections {
like('payload' : searchString)
}
}
I tried using rlike('payload' : ".*${searchString}.*") as well. It did not result in any doc to me.
Note: I was able to get the document when I fired the native query on Mongo shell.
db.Message.find({payload : { $regex : ".*My search string.*" }}).pretty()
I got it working in a round about way. I believe there is a much better grails solution. Criteria approach did not work. So used the low level API converted the DBObjects to Domain objects.
def query = ['payload' : [ '$regex' : /${searchString}/ ] ]
def dbObjects = Message.collection.find(query).skip(offset).limit(defaultPageSize).toArray()
dbObjects?.collect { new Message(new JsonSlurper().parseText(it.toString()))}

Elastic Search: how to see the indexed data

I had a problem with ElasticSearch and Rails, where some data was not indexed properly because of attr_protected. Where does Elastic Search store the indexed data? It would be useful to check if the actual indexed data is wrong.
Checking the mapping with Tire.index('models').mapping does not help, the field is listed.
Probably the easiest way to explore your ElasticSearch cluster is to use elasticsearch-head.
You can install it by doing:
cd elasticsearch/
./bin/plugin -install mobz/elasticsearch-head
Then (assuming ElasticSearch is already running on your local machine), open a browser window to:
http://localhost:9200/_plugin/head/
Alternatively, you can just use curl from the command line, eg:
Check the mapping for an index:
curl -XGET 'http://127.0.0.1:9200/my_index/_mapping?pretty=1'
Get some sample docs:
curl -XGET 'http://127.0.0.1:9200/my_index/_search?pretty=1'
See the actual terms stored in a particular field (ie how that field has been analyzed):
curl -XGET 'http://127.0.0.1:9200/my_index/_search?pretty=1' -d '
{
"facets" : {
"my_terms" : {
"terms" : {
"size" : 50,
"field" : "foo"
}
}
}
}
More available here: http://www.elasticsearch.org/guide
UPDATE : Sense plugin in Marvel
By far the easiest way of writing curl-style commands for Elasticsearch is the Sense plugin in Marvel.
It comes with source highlighting, pretty indenting and autocomplete.
Note: Sense was originally a standalone chrome plugin but is now part of the Marvel project.
Absolutely the easiest way to see your indexed data is to view it in your browser. No downloads or installation needed.
I'm going to assume your elasticsearch host is http://127.0.0.1:9200.
Step 1
Navigate to http://127.0.0.1:9200/_cat/indices?v to list your indices. You'll see something like this:
Step 2
Try accessing the desired index:
http://127.0.0.1:9200/products_development_20160517164519304
The output will look something like this:
Notice the aliases, meaning we can as well access the index at:
http://127.0.0.1:9200/products_development
Step 3
Navigate to http://127.0.0.1:9200/products_development/_search?pretty to see your data:
ElasticSearch data browser
Search, charts, one-click setup....
Aggregation Solution
Solving the problem by grouping the data - DrTech's answer used facets in managing this but, will be deprecated according to Elasticsearch 1.0 reference.
Warning
Facets are deprecated and will be removed in a future release. You are encouraged to
migrate to aggregations instead.
Facets are replaced by aggregates - Introduced in an accessible manner in the Elasticsearch Guide - which loads an example into sense..
Short Solution
The solution is the same except aggregations require aggs instead of facets and with a count of 0 which sets limit to max integer - the example code requires the Marvel Plugin
# Basic aggregation
GET /houses/occupier/_search?search_type=count
{
"aggs" : {
"indexed_occupier_names" : { <= Whatever you want this to be
"terms" : {
"field" : "first_name", <= Name of the field you want to aggregate
"size" : 0
}
}
}
}
Full Solution
Here is the Sense code to test it out - example of a houses index, with an occupier type, and a field first_name:
DELETE /houses
# Index example docs
POST /houses/occupier/_bulk
{ "index": {}}
{ "first_name": "john" }
{ "index": {}}
{ "first_name": "john" }
{ "index": {}}
{ "first_name": "mark" }
# Basic aggregation
GET /houses/occupier/_search?search_type=count
{
"aggs" : {
"indexed_occupier_names" : {
"terms" : {
"field" : "first_name",
"size" : 0
}
}
}
}
Response
Response showing the relevant aggregation code. With two keys in the index, John and Mark.
....
"aggregations": {
"indexed_occupier_names": {
"buckets": [
{
"key": "john",
"doc_count": 2 <= 2 documents matching
},
{
"key": "mark",
"doc_count": 1 <= 1 document matching
}
]
}
}
....
A tool that helps me a lot to debug ElasticSearch is ElasticHQ. Basically, it is an HTML file with some JavaScript. No need to install anywhere, let alone in ES itself: just download it, unzip int and open the HTML file with a browser.
Not sure it is the best tool for ES heavy users. Yet, it is really practical to whoever is in a hurry to see the entries.
Kibana is also a good solution. It is a data visualization platform for Elastic.If installed it runs by default on port 5601.
Out of the many things it provides. It has "Dev Tools" where we can do your debugging.
For example you can check your available indexes here using the command
GET /_cat/indices
If you are using Google Chrome then you can simply use this extension named as Sense it is also a tool if you use Marvel.
https://chrome.google.com/webstore/detail/sense-beta/lhjgkmllcaadmopgmanpapmpjgmfcfig
Following #JanKlimo example, on terminal all you have to do is:
to see all the Index:
$ curl -XGET 'http://127.0.0.1:9200/_cat/indices?v'
to see content of Index products_development_20160517164519304:
$ curl -XGET 'http://127.0.0.1:9200/products_development_20160517164519304/_search?pretty=1'

Storing graph-like structure in Couch DB or do include_docs yourself

I am trying to store network layout in Couch DB, but my solution provides rather randomized graph.
I store a nodes with a document:
{_id ,
nodeName,
group}
and storing links in traditional:
{_id, source_id, target_id, value}
Following multiple tutorials on handling joins and multiple relationship in Couch DB I created view:
function(doc) {
if(doc.type == 'connection') {
if (doc.source_id)
emit("source", {'_id': doc.source_id});
if(doc.target_id)
emit("target", {'_id': doc.target_id});
}
}
which should have emitted sequence of source and target id, then I pass it to the list function with include_docs=true, assumes that source and target come in pairs stitches everything back in a structure like this:
{
"nodes":[
{"nodeName":"Name 1","group":"1"},
{"nodeName":"Name 2","group":"1"},
],
"links": [
{"source":7,"target":0,"value":1},
{"source":7,"target":5,"value":1}
]
}
Although my list produce a proper JSON, view map returns number of rows of source docs and then target docs.
So far I don't have any ideas how to make this thing working properly - I am happy to fetch additional values from document _id in the list, but so far I havn't find any good examples.
Alternative ways of achieving the same goal are welcome. _id values are standard for CouchDB so far.
Update: while writing a question I came up with different view which sorted my immediate problem, but I still would like to see other options.
updated map:
function(doc) {
if(doc.type == 'connection') {
if (doc.source_id)
emit([doc._id,0,"source"], {'_id': doc.source_id});
if(doc.target_id)
emit([doc._id,1,"target"], {'_id': doc.target_id});
}
}
Your updated map function makes more sense. However, you don't need 0 and 1 in your key since you have already "source"and "target".

Resources