Tire gem: How to access Elasticsearch's 'highlight' property? - ruby-on-rails

I have some Rails models that are indexed in Elasticsearch (via Tire gem). I can index new documents and query the existing index.
What I can't seem to do is get ahold of the highlight attached to a record from within my Rails app. I can however see that highlight is returned in the json when I interact with Elasticsearch directly via curl.
When I try to access the highlight property of my record I get: undefined method 'highlight' for #<Report:0x007fe8afa54700>
# app/views/reports/index.html.haml
%h1 Listing reports
...
- #reports.results.each do |report|
%tr
%td= report.title
%td= raw report.highlight.attachment.first.to_s
But if I use curl I can see the highlight is returned to Tire...
$ curl -X GET "http://localhost:9200/testapp_development_reports/report/_search?load=true&pretty=true" -d '{query":{"query_string":{"query":"contains","default_operator":"AND"}},"highlight":{"fields":{"attachment":{}}}}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.111475274,
"hits" : [ {
"_index" : "testapp_development_reports",
"_type" : "report",
"_id" : "1",
"_score" : 0.111475274, "_source" : {"id":1,"title":"Sample Number One",...,"attachment":"JVBERi0xMJ1Ci... ...UlRU9GCg==\n"},
"highlight" : {
"attachment" : [ "\nThis <em>contains</em> one\n\nodd\n\n\n" ]
}
}, {
"_index" : "testapp_development_reports",
"_type" : "report",
"_id" : "2",
"_score" : 0.111475274, "_source" : {"id":2,"title":"Number two",...,"attachment":"JVBERi0xLKM3OA... ...olJVPRgo=\n"},
"highlight" : {
"attachment" : [ "\nThis <em>contains</em> two\n\neven\n\n\n" ]
}
} ]
}
}
The search method in the model:
...
def self.search(params)
tire.search(load: true) do
query { string params[:query], default_operator: "AND" } if params[:query].present?
highlight :attachment
end
end
...

Method highlight is inaccessible when you are using load: true option. This should be fixed in future versions of Tire.
edit: you can use each_with_hit method to access returned elasticsearch values now
For example:
results = Article.search 'One', :load => true
results.each_with_hit do |result, hit|
puts "#{result.title} (score: #{hit['_score']})"
end

You can find my answer right at this post
Elasticsearch/Lucene highlight
My method works fine for me and wish you can get it work as well.

Related

How to check elasticsearch tokens after running a query in Rails?

My problem is the following:
I run an elasticsearch query in a rails app using specific settings to my index and my search analyzer, the problem is that it doesnt return any results in the app, in the other hand when i try to run it directly from my elasticsearch docker, i have tokens returned. If i use these tokens in my app query, i get results...
so this is my elasticsearch query:
curl -XGET 'localhost:9200/development-stoot-services/_analyze?analyzer=search_francais' -d 'cours de guitare'
{"tokens":[{"token":"cour","start_offset":0,"end_offset":5,"type":"<ALPHANUM>","position":1},{"token":"guitar","start_offset":9,"end_offset":16,"type":"<ALPHANUM>","position":3}]}
here is the query from my rails app to elasticsearch:
query = {
"query" : {
"bool" : {
"must" : [
{
"range" : {
"deadline" : {
"gte" : "2016-05-26T10:27:19+02:00"
}
}
},
{
"terms" : {
"state" : [
"open"
]
}
},
{
"query_string" : {
"query" : "cours de guitare",
"default_operator" : "AND",
"fields" : [
"title",
"description",
"brand",
"category_name"
]
}
}
]
}
},
"filter" : {
"and" : [
{
"geo_distance" : {
"distance" : "40km",
"location" : {
"lat" : 48.855736,
"lon" : 2.32927300000006
}
}
}
]
},
"sort" : [
{
"created_at" : "desc"
}
]
}
the last query does not return any result, but if i try a query with the tokens returned by elasticsearch ('cour', 'guitar') i have expected results. So i guess there is a problem between rails and elasticsearch that i dont find...
Can anyone help on that ?
Try to modify your query like this, i.e. you need to specify the search_francais analyzer in your query_string in order to analyze cours de guitare the same way you did with the _analyze endpoint:
...
{
"query_string" : {
"query" : "cours de guitare",
"default_operator" : "AND",
"analyzer": "search_francais", <--- add this line
"fields" : [
"title",
"description",
"brand",
"category_name"
]
}
},
...

Escaping # at symbol in Ruby Elastic Search gem?

I have the following code in the custom ES 'where' wrapper method
filter: { term: params }
Then we have a sample ES document that contains:
"emails" => { "email" => "johndoe#email.com" }
It is returned when my search is:
query.where("emails.email" => "johndoe")
but I get no results when:
query.where("emails.email" => "johndoe#email.com")
It seems like I have to escape at symbol somehow when using ES gem?
It's probably because your field is analyzed using the default standard analyzer and is thus tokenized at the # sign.
You can see what ES has indexed by running the command below:
curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty' -d 'johndoe#email.com'
And the result is
{
"tokens" : [ {
"token" : "johndoe",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "email.com",
"start_offset" : 8,
"end_offset" : 17,
"type" : "<ALPHANUM>",
"position" : 2
} ]
}
As you can see, your email field has been tokenized as two different tokens and that's probably why searching for johndoe works, while searching for the full email address doesn't.
There are a few ways out from here, but one way that would work is to create your own analyzer based on a pattern_capture token filter and use it as index_analyzer for your emails.email field.
{
"settings" : {
"analysis" : {
"filter" : {
"email" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [ "([^#]+)", "(\\p{L}+)", "(\\d+)", "#(.+)" ]
}
},
"analyzer" : {
"email" : {
"tokenizer" : "uax_url_email",
"filter" : [ "email", "lowercase", "unique" ]
}
}
}
},
"mappings": {
"emails": {
"properties": {
"email": {
"type": "string",
"analyzer": "email" <-- use the analyzer here
}
}
}
}
}
At indexing time, that analyzer will produce all of the following tokens, which will allow you to search for any parts of your email address:
johndoe#email.com
johndoe
email.com
email
com

Bulk Data Delete in elasticsearch

This is my code:
HTTParty.delete("http://#{SERVER_DOMAIN}:9200/monitoring/mention_reports/_query?q=id:11321779,11321779", {
})
I want to delete data in bulk using id but this query is not deleting data from elasticsearch
Can anyone help me figuring out how can I delete data in bulk?
index_name should be as provided as per the index name in your code. Provide the ids to be deleted in the array(1,2,3).
CGI::escape is the URL encoder.
HTTParty.delete "http://#{SERVER_DOMAIN}:9200/index_name/_query?source=#{CGI::escape("{\"terms\":{\"_id\":[1,2,3]}}")}"
This actually uses the delete by query api of elasticsearch.
Incase if you are using tire ruby client to connect to elasticsearch:
id_array = [1,2,3]
query = Tire.search do |search|
search.query { |q| q.terms :_id, id_array }
end
index = Tire.index('<index_name>') # provide the index name as you have in your code
Tire::Configuration.client.delete "#{index.url}/_query?source=#{Tire::Utils.escape(query.to_hash[:query].to_json)}"
Reference: https://github.com/karmi/tire/issues/309
Provision is provided using: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/docs-bulk.html
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field2" : "value2"} }
OR
curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}
'
Refer : How to handle multiple updates / deletes with Elasticsearch?

mongodb/rails "exception: can't find special index: 2d for:"

i have a rails app where i have some problems with indexes. I search locations by name.
First i thought its a problem with the addresses.coords but iam not sure about it.
The relevant parts of the search controller:
#practices = Practice.published
#practices = #practices.where(:"addresses.country" => params[:country].upcase) if params[:country].present?
if params[:location].present? && latlng = get_coordinates
#practices = #practices.near_sphere(:"addresses.coords" => latlng).max_distance(:"addresses.coords" => get_distance )
end
# now find doctors based on resulting practices
#doctors = Doctor.published.in("organization_relations.practice_id" => #practices.distinct(:_id))
The complete crash log:
Moped::Errors::OperationFailure (The operation: #<Moped::Protocol::Command
#length=255
#request_id=646
#response_to=0
#op_code=2004
#flags=[]
#full_collection_name="um-contacts.$cmd"
#skip=0
#limit=-1
#selector={:distinct=>"practices", :key=>"_id", :query=>{"deleted_at"=>nil, "published_at"=>{"$lte"=>2012-11-05 15:17:14 UTC}, "addresses.country"=>"DE", "addresses.coords"=>{"$nearSphere"=>[13.4060912, 52.519171], "$maxDistance"=>0.01569612305760477}}}
#fields=nil>
failed with error 13038: "exception: can't find special index: 2d for: { deleted_at: null, published_at: { $lte: new Date(1352128634313) }, addresses.country: \"DE\", addresses.coords: { $nearSphere: [ 13.4060912, 52.519171 ], $maxDistance: 0.01569612305760477 } }"
See https://github.com/mongodb/mongo/blob/master/docs/errors.md
for details about this error.):
app/controllers/search_controller.rb:16:in `index'
Thats the result of the indexes, not sure how to query them from the addresses which are embedded via has_many.
> db.practices.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "um-contacts.practices",
"name" : "_id_"
}
]
Help would be really appreciated!
Edit: Looks like the indexes for adresses.coords arent created,
db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "um-contacts.users", "name" : "_id_" }
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "um-contacts.doctors", "name" : "_id_" }
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "um-contacts.collaborations", "name" : "_i
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "um-contacts.practices", "name" : "_id_" }
but should be created within the practice class:
class Practice
...
embeds_many :addresses, cascade_callbacks: true, as: :addressable
...
field :name, type: String
field :kind, type: String
field :slug, type: String
index({"addresses.coords" => '2d'}, { min: -180, max: 180, background: true })
index({name: 1})
index({slug: 1}, { unique: true })
...
Anyone have an idea why its failing?
try to re-create your indexes. for mongoid:
rake db:mongoid:create_indexes

ElasticSearch geo_bounding_box coordinate format

I was following the ElasticSearch guide online to represent coordinates as "lat, lng" but it doesnt seem to be working until I flip everything around to "lng, lat". I even have to flip around top_left and bottom_right in order for the query to work.
Is anyone experiencing the same problem? Clearly this is not how the documentation says to use it, but it's only working when I format it this way.
Rails format
def self.search(params)
tire.search( page: params[:page], per_page: 2 ) do
query { all }
filter :geo_bounding_box, location: { top_left: " -121.88596979687497, 37.33588487375733", bottom_right: " -122.43528620312497, 37.553946238118264" }
end
end
CURL format
curl -X GET "http://localhost:9200/articles/article/_search?page=&per_page=2&size=2&pretty=true" -d '{"query":{"match_all":{}},"facets":{"condition":{"terms":{"field":"condition","size":10,"all_terms":false}}},"filter":{"geo_bounding_box":{"location":{"top_left":" -121.88596979687497, 37.33588487375733","bottom_right":" -122.43528620312497, 37.553946238118264"}}},"size":2}'
Console response
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 169,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "4f72bc7d0bdb820f02000002",
"_score" : 1.0, "_source" : {"content":"words here!","location":[37.444995,-122.160628],"name":"harro"}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "4fdf0cf20bdb82336c000002",
"_score" : 1.0, "_source" : {"content":"Run of the mill","location":[37.33588487375733,-121.88596979687497],"name":"Billy Bob"}
} ]
},
"facets" : {
"condition" : {
"_type" : "terms",
"missing" : 5597,
"total" : 0,
"other" : 0,
"terms" : [ ]
}
}
When geo point is specified as a string, it should be in "lat,lon" format. When it is specified as an array, it should be in [lon, lat] format.

Resources