I am still doing something wrong.
Could somebody pls help me?
I want to create a custom analyzer with ascii filter in Rails + Mongoid.
I have a simple model product which has field name.
class Product
include Mongoid::Document
field :name
settings analysis: {
analyser: {
ascii: {
type: 'custom',
tokenizer: 'whitespace',
filter: ['lowercase','asciifolding']
}
}
}
mapping do
indexes :name, analyzer: 'ascii'
end
end
Product.create(name:"svíčka")
Product.search(q:"svíčka").count #1
Product.search(q:"svicka").count #0 can't find - expected 1
Product.create(name:"svicka")
Product.search(q:"svíčka").count #0 can't find - expected 1
Product.search(q:"svicka").count #1
And when I check the indexes with elasticsearch-head I expected that the index is stored without accents like this "svicka", but the index looks like this "Svíčka".
What am I doing wrong?
When I check it with API it looks OK:
curl -XGET 'localhost:9200/_analyze?tokenizer=whitespace&filters=asciifolding' -d 'svíčka'
{"tokens":[{"token":"svicka","start_offset":0,"end_offset":6,"type":"word","position":1}]}
http://localhost:9200/development_imango_products/_mapping
{"development_imango_products":{"product":{"properties":{"name":{"type":"string","analyzer":"ascii"}}}}}
curl -XGET 'localhost:9200/development_imango_products/_analyze?field=name' -d 'svíčka'
{"tokens":[{"token":"svicka","start_offset":0,"end_offset":6,"type":"word","position":1}]}
You can check how you are actually indexing your document using the analyze api.
You need also to take into account that there's a difference between what you index and what you store. What you store is returned when you query, and it is exactly what you send to elasticsearch, while what you index determines what documents you get back while querying.
Using the asciifolding is a good choice for you usecase, it should return results either query ing for svíčka or svicka. I guess there's just a typo in your settings: analyser should be analyzer. Probably that analyzer is not being used as you'd expect.
UPDATE
Given your comment you didn't solve the problem yet. Can you check what your mapping looks like (localhost:9200/index_name/_mapping)? The way you're using the analyze api is not that useful since you're manually providing the text analysis chain, but that doesn't mean that chain is applied as you'd expect to your field. Better if you provide the name of the field like this:
curl -XGET 'localhost:9200/index_name/_analyze?field=field_name' -d 'svíčka'
That way the analyze api will rely on the actual mapping for that field.
UPDATE 2
After you made sure that the mapping is correctly submitted and everything looks fine, I noticed you're not specifying the field that you want to to query. If you don't specify it you're querying the _all special field, which contains by default all the field that you're indexing, and uses by default the StandardAnalyzer. You should use the following query: name:svíčka.
elasticsearch needs settings and mapping in a single api call. I am not sure if its mentioned in tire docs, but I faced a similar problem, using both settings and mapping when setting up tire. Following should work:
class Product
include Mongoid::Document
# include tire stuff
field :name
settings(self.tire_settings) do
mapping do
indexes :name, analyzer: 'ascii'
end
end
# this method is just created for readablity,
# settings hash can also be passed directly
def self.tire_settings
{
analysis: {
analyzer: {
ascii: {
type: 'custom',
tokenizer: 'whitespace',
filter: ['lowercase','asciifolding']
}
}
}
}
end
end
Your notation for settings/mappings is incorrect, as #rubish suggests, check documentation in https://github.com/karmi/tire/blob/master/lib/tire/model/indexing.rb (no question the docs should be better)
Always, always, always check the mapping of the index to see if your desired mapping has been applied.
Use the Explain API, as #javanna suggests, to check how your analyzing chain works quickly, without having to store documents, check results, etc.
Please note that It is very important to add two lines in a model to make it searchable through Tire. Your model should look like
class Model
include Mongoid::Document
include Mongoid::Timestamps
include Tire::Model::Search
include Tire::Model::Callbacks
field :filed_name, type: String
index_name "#{Tire::Model::Search.index_prefix}model_name"
settings :analysis => {
:analyzer => {
"project_lowercase_analyzer" => {
"tokenizer" => "keyword",
"filter" => ["lowercase"],
"type" => "custom"
}
},
} do
mapping do
indexes :field_name, :boost => 10, :type => 'String', :analyzer => 'standard', :filter => ['standard', 'lowercase','keyword']
end
end
def self.search( params = {} )
query = params[:search-text_field_name_from_form]
Model.tire.search(load: true, page: params[:page], per_page: 5) do
query { string query, default_operator: "AND" } if query.present?
end
end
You can change the index_name(should be unique) and the analyzer
And Your controller would be like
def method_name
#results = Model.search( params ).results
end
You can use #results in your view. Hope this may help you.
Related
I'm a Rails developer. I've been tasked with upgrading Elasticsearch from 0.9 in a large and very old Rails application.
This application, in its current state, runs on Elasticsearch 0.90.7 and is using the retired gem Tire. I'll be switching to the officially supported elasticsearch-ruby (and elasticsearch-model) gems, but in the mean time, I need to address breaking changes in the index definitions, which are defined in the Rails models.
My problem is that I don't understand how Tire (or Elasticsearch - not sure what's doing what here) is defining the mappings of classes that inherit from a class with an explicitly defined mapping block.
So, the code:
There's one model that has a ton of children. 'Tag' can be of many types. So the mapping looks like this:
class Tag < ActiveRecord::Base
mapping do
# some mappings that aren't causing me any problems
# and then there's these guys
indexes :name, analyzer: 'snowball', boost: 100
indexes :type
# more mappings that are happily being good, understandable mappings
end
end
And we have some different types of tags:
class ATagType < Tag
# absolutely no code regarding mappings, index definition, or the like
end
class BTagType < Tag
# still no code regardining mappings, index definitions, or the like
end
class CTagType < Tag
# ... just, yeah. nothing helpful here either.
end
So when I run Tag.create_elasticsearch_index and then look at the new index that's been created (just adding in the relevant properties for now):
$ $elasticsearch.indices.get_mapping["application_development_tags"]
=> { "tag" => {
"properties" => {
"name" => { "type" => "string", "boost" => 100.0, "analyzer" => "snowball" },
"type" => { "type" => "string" }
}
}
}
And then I run Tag.import in order to index all of the tags in the database.
This changes the "application_development_tags" index mapping:
$ $elasticsearch.indices.get_mapping["application_development_tags"]
=> { "tag" => {
"properties" => {
"name" => { "type" => "string", "boost" => 100.0, "analyzer" => "snowball" },
"type" => { "type" => "string" }
}
},
"atagtype" => {
"properties" => {
"name" => { "type" => "string" },
"type" => { "type" => "string" }
}
},
"btagtype" => { etc. }
}
And this is the way the index currently exists on production.
Then in upgrading, I've run into an issue regarding the fact that different types of an index have the same property ("name") with different "boost" and "analyzer" settings.
Thing is, I cannot for the life of me figure out what code is re-shaping the index after importing all tags. I think indexes :type is probably the culprit, and I read the conversion of class type into Elasticsearch index type as a feature. But then, why isn't the name for every type of tag that inherits from Tag mapped in the same way as "tag"? The ideal situation makes it so that all tags and all of its children have the same analyzer and boost, as this would maintain current behavior (or as close to current behavior as we're going to get) and make it so I can recreate the indices in the newer version of Elasticsearch.
I've tried explicitly adding the mapping do block to each of the Tag model's children - but that doesn't change behavior at all (not calling TagType1.create_elasticsearch_tag and so the mapping definition in TagType1 is ignored). If I remove the analyzer and boost definition from the Tag mapping, everything is good, but... I'm pretty sure we still want the analyzer and boost, and it'd be better to add them to all tag types than get rid of them entirely. (Related: this - but the provided 'answer' really doesn't actually address the problem).
To make this more complicated, I've listed all of the properties in all of the TagTypes, and each type has different properties from the originally defined "tag" type.
So is there anybody who might be able to help me figure out what's going on, and successfully recreate my indices in the newer versions of Elasticsearch?
Edit
Something else I just learned: Tag.import changes the mapping of application_development_tags but $elasticsearch.search type: 'application_development_tags', type: 'tag returns no results. Only searching by the other types actually returns results.
I' trying to figure out what would be the best way to do a multi table search with elastic.co.
In particular, I was wondering if I could add more indexes to this search method.
Chapter.rb
def self.search(params)
fields = [:title, :description, :content ]
**tables** = [Chapter.index_name, Book.index_name]
tire.search(**tables**, {load: true,page: params[:page], per_page: 5}) do
query do
boolean do
must { string params[:q], default_operator: "AND" } if params[:q].present?
end
end
highlight *fields, :options => { :tag => '<strong>' }
end
The above example works without the Tables. How to make it work with the tables ?
If you're adding more indexes then you are moving away from it being a model-centric search. That's probably fine as I guess you'll be handling the search results differently on account of them being from different indexes.
In which case I think you can do:
Tire.search([Chapter.index_name, Book.index_name],
page: params[:page],
... etc ...
) do
query do
... etc ...
end
end
It does mean that you won't be able to do stuff like load: true because you've moved outside of knowing what model to load the results for.
From digging around in the code (here) it looks like you might be able to specify multiple indexes even for a model-centric search. Something like:
tire.search({
index: [Chapter.index_name, Book.index_name],
load: true,
... etc ...
I haven't tried it though and I'm doubtful as to whether it will work - again because of not being able to load the results into a specific model once multiple indexes are involved.
I have a model Event that is connected to MongoDB using Mongoid:
class Event
include Mongoid::Document
include Mongoid::Timestamps
field :user_name, type: String
field :action, type: String
field :ip_address, type: String
scope :recent, -> { where(:created_at.gte => 1.month.ago) }
end
Usually when I use ActiveRecord, I can do something like this to group results:
#action_counts = Event.group('action').where(:user_name =>"my_name").recent.count
And I get results with the following format:
{"action_1"=>46, "action_2"=>36, "action_3"=>41, "action_4"=>40, "action_5"=>37}
What is the best way to do the same thing with Mongoid?
Thanks in advance
I think you'll have to use map/reduce to do that. Look at this SO question for more details:
Mongoid Group By or MongoDb group by in rails
Otherwise, you can simply use the group_by method from Enumerable. Less efficient, but it should do the trick unless you have hundreds of thousands documents.
EDIT: Example of using map/reduce in this case
I'm not really familiar with it but by reading the docs and playing around I couldn't reproduce the exact same hash you want but try this:
def self.count_and_group_by_action
map = %Q{
function() {
key = this.action;
value = {count: 1};
emit(key, value);
# emit a new document {"_id" => "action", "value" => {count: 1}}
# for each input document our scope is applied to
}
}
# the idea now is to "flatten" the emitted documents that
# have the same key. Good, but we need to do something with the values
reduce = %Q{
function(key, values) {
var reducedValue = {count: 0};
# we prepare a reducedValue
# we then loop through the values associated to the same key,
# in this case, the 'action' name
values.forEach(function(value) {
reducedValue.count += value.count; # we increment the reducedValue - thx captain obvious
});
# and return the 'reduced' value for that key,
# an 'aggregate' of all the values associated to the same key
return reducedValue;
}
}
self.map_reduce(map, reduce).out(inline: true)
# we apply the map_reduce functions
# inline: true is because we don't need to store the results in a collection
# we just need a hash
end
So when you call:
Event.where(:user_name =>"my_name").recent.count_and_group_by_action
It should return something like:
[{ "_id" => "action1", "value" => { "count" => 20 }}, { "_id" => "action2" , "value" => { "count" => 10 }}]
Disclaimer: I'm no mongodb nor mongoid specialist, I've based my example on what I could find in the referenced SO question and Mongodb/Mongoid documentation online, any suggestion to make this better would be appreciated.
Resources:
http://docs.mongodb.org/manual/core/map-reduce/
http://mongoid.org/en/mongoid/docs/querying.html#map_reduce
Mongoid Group By or MongoDb group by in rails
I'm trying index and search by email using Tire and elasticsearch.
The problem is that if I search for: "something#example.com". I get strange results because of # and . symbols. I "solved" by hacking the query string and adding "email:" before a string I suspect is a string. If I don't do that, when searching "something#example.com", I would get results as "something#gmail.com" or "asd#example.com".
include Tire::Model::Search
include Tire::Model::Callbacks
settings :analysis =>{
:analyzer => {
:whole_email => {
'tokenizer' => 'uax_url_email'
}
}
} do
mapping do
indexes :id
indexes :email, :analyzer => 'whole_email', :boost => 10
end
end
def self.search(params)
params[:query] = params[:query].split(" ").map { |x| x =~ EMAIL_REGEXP ? "email:#{x}" : x }.join(" ")
tire.search(load: {:include => {'event' => 'organizer'}}, page: params[:page], per_page: params[:per_page] || 10) do
query do
boolean do
must { string params[:query] } if params[:query].present?
must { term :event_id, params[:event_id] } if params[:event_id].present?
end
end
sort do
by :id, 'desc'
end
end
end
def to_indexed_json
self.to_json
end
When searching with "email:" the analyzer works perfectly but without it, it search that string in email without the specified analyzer, getting lots of undesired results.
I think your issue is to do with the _all field. By default, all fields get indexed twice, once under their field name, and again, using a different analyzer, in the _all field.
If you send a query without specifying which field you are searching in, then it will be executed against the _all field. When you index your doc, the email fields content is indexed again under the _all field (to stop this set include_in_all: false in your mapping) where they are tokenized the standard way (split on # and .). This means that unguided queries will give strange results.
The way I would fix this is to use a term query for the emails and make sure to specify the field to search on. A term query is faster as it doesn't have a query parsing step the query_string query has (which is why when you prefix the string with "email:" it goes to the right field, that's the query parser working). Also you don't need to specify a custom analyzer unless you are indexing a field that contains both free text and urls and emails. If the field only contains emails then just set index: not_analyzed and it will remain a single token. (You might want to have a custom analyzer that lowercases the email though.)
Make your search query like this:
"term": {
"email": "example#domain.com"
}
Good luck!
Add the field to _all and try search with adding escape character(\) to special characters of emailid.
example:something\#example\.com
I'm trying to use Tire to perform a nested query on a persisted model. The model (Thing) has Tags and I'm looking to find all Things tagged with a certain Tag
class Thing
include Tire::Model::Callbacks
include Tire::Model::Persistence
index_name { "#{Rails.env}-thing" }
property :title, :type => :string
property :tags, :default => [], :analyzer => 'keyword', :class => [Tag], :type => :nested
end
The nested query looks like
class Thing
def self.find_all_by_tag(tag_name, args)
self.search(args) do
query do
nested path: 'tags' do
query do
boolean do
must { match 'tags.name', tag_name }
end
end
end
end
end
end
end
When I execute the query I get a "not of nested type" error
Parse Failure [Failed to parse source [{\"query\":{\"nested\":{\"query\":{\"bool\":{\"must\":[{\"match\":{\"tags.name\":{\"query\":\"TestTag\"}}}]}},\"path\":\"tags\"}},\"size\":10,\"from\":0,\"version\":true}]]]; nested: QueryParsingException[[test-thing] [nested] nested object under path [tags] is not of nested type]; }]","status":500}
Looking at the source for Tire it seems that mappings are created from the options passed to the "property" method, so I don't think I need a separate "mapping" block in the class. Can anyone see what I am doing wrong?
UPDATE
Following Karmi's answer below, I recreated the index and verified that the mapping is correct:
thing: {
properties: {
tags: {
properties: {
name: {
type: string
}
type: nested
}
}
title: {
type: string
}
}
However, when I add new Tags to Thing
thing = Thing.new
thing.title = "Title"
thing.tags << {:name => 'Tag'}
thing.save
The mapping reverts to "dynamic" type and "nested" is lost.
thing: {
properties: {
tags: {
properties: {
name: {
type: string
}
type: "dynamic"
}
}
title: {
type: string
}
}
The query fails with the same error as before. How do I preserve the nested type when adding new Tags?
Yes, indeed, the mapping configuration in property declarations is passed on in the Persistence integration.
In a situation like this, there's always the and and only first question: how does the mapping look like for real?
So, use eg. the Thing.index.mapping method or the Elasticsearch's REST API: curl localhost:9200/things/_mapping to have a look.
Chances are, that your index was created with the dynamic mapping, based on the JSON you have used, and you have changed the mapping later. In this case, the index creation logic is skipped, and the mapping is not what you expect.
There's a Tire issue opened about displaying warning when the index mapping is different from the mapping defined in the model.