Multiple model single index approach - elasticsearch via tire - ruby-on-rails

In my multi-tenant app (account based with number of users per account), how would I update index for a particular account when a user document is changed.
Using Elasticsearch via Tire gem.
Rails 2.3 app - applied changes to enable support for Rails 2.3 as per loe/tire's commit
Account Model:
include Tire::Model::Search
Tire.index('account_1') do
create(
:mappings => {
:user => {
:properties => {
:name => { :type => :string, :boost => 10 },
:company_name => { :type => :string, :boost => 5 }
}
},
:comments => {
:properties => {
:description => { :type => :string, :boost => 5 }
}
}
}
)
end
As you can see above, there are two models here user and comments. Is it the correct way to address single index with multiple models.
In that case how do I update index when a user document or comment document alone is changed?

Usually when you are indexing a model it is good to index the self attributes along with its associations. So in this case if you want index users and their commments, you should have the index in the user model and index the comments referenced by its association so that tire callbacks apply on the user model to reindex the user object if any attributes in the model are changed. This is only for the model on which you have the index on.
If at all you want to index associations, you need to have hooks that will index the account object after save/ after destroy of user/comments model. Or you could also use :touch => true option to touch the account model on change of user/comments.
Example: if you want index user and comments,
include Tire::Model::Search
include Tire::Model::Callbacks
mapping do
indexes :id, :type => 'integer', :index => :not_analyzed
indexes :about_me, :type => 'string', :index => :snowball
indexes :name, :type => 'string', :index => :whitespace
indexes :comments do
indexes :content, :type => 'string', :analyzer => 'snowball'
end
end
So here the index is on the user model and user.comments is an association. Hope this example explains

The answer to the question as posted by Tire owner Karmi is as follows:
Let's say we have an Account class and we deal in articles entities.
In that case, our Account class would have following:
class Account
#...
# Set index name based on account ID
#
def articles
Article.index_name "articles-#{self.id}"
Article
end
end
So, whenever we need to access articles for a particular account, either for searching or for indexing, we can simply do:
#account = Account.find( remember_token_or_something_like_that )
# Instead of `Article.search(...)`:
#account.articles.search { query { string 'something interesting' } }
# Instead of `Article.create(...)`:
#account.articles.create id: 'abc123', title: 'Another interesting article!', ...
Having a separate index per user/account works perfect in certain cases -- but definitely not well in cases where you'd have tens or hundreds of thousands of indices (or more). Having index aliases, with properly set up filters and routing, would perform much better in this case. We would slice the data not based on the tenant identity, but based on time.
Let's have a look at a second scenario, starting with a heavily simplified curl http://localhost:9200/_aliases?pretty output:
{
"articles_2012-07-02" : {
"aliases" : {
"articles_plan_pro" : {
}
}
},
"articles_2012-07-09" : {
"aliases" : {
"articles_current" : {
},
"articles_shared" : {
},
"articles_plan_basic" : {
},
"articles_plan_pro" : {
}
}
},
"articles_2012-07-16" : {
"aliases" : {
}
}
}
You can see that we have three indices, one per week. You can see there are two similar aliases: articles_plan_pro and articles_plan_basic -- obviously, accounts with the “pro” subscription can search two weeks back, but accounts with the “basic” subscription can search only this week.
Notice also, that the the articles_current alias points to, ehm, current week (I'm writing this on Thu 2012-07-12). The index for next week is just there, laying and waiting -- when the time comes, a background job (cron, Resque worker, custom script, ...) will update the aliases. There's a nifty example with aliases in “sliding window” scenario in the Tire integration test suite.
Let's not look on the articles_shared alias right now, let's look at what tricks we can play with this setup:
class Account
# ...
# Set index name based on account subscription
#
def articles
if plan_code = self.subscription && self.subscription.plan_code
Article.index_name "articles_plan_#{plan_code}"
else
Article.index_name "articles_shared"
end
return Article
end
end
Again, we're setting up an index_name for the Article class, which holds our documents. When the current account has a valid subscription, we get the plan_code out of the subscription, and direct searches for this account into relevant index: “basic” or “pro”.
If the account has no subscription -- he's probably a “visitor” type -- , we direct the searches to the articles_shared alias. Using the interface is as simple as previously, eg. in ArticlesController:
#account = Account.find( remember_token_or_something_like_that )
#articles = #account.articles.search { query { ... } }
# ...
We are not using the Article class as a gateway for indexing in this case; we have a separate indexing component, a Sinatra application serving as a light proxy to elasticsearch Bulk API, providing HTTP authentication, document validation (enforcing rules such as required properties or dates passed as UTC), and uses the bare Tire::Index#import and Tire::Index#store APIs.
These APIs talk to the articles_currentindex alias, which is periodically updated to the current week with said background process. In this way, we have decoupled all the logic for setting up index names in separate components of the application, so we don't need access to the Article or Account classes in the indexing proxy (it runs on a separate server), or any component of the application. Whichever component is indexing, indexes against articles_current alias; whichever component is searching, searches against whatever alias or index makes sense for the particular component.

Related

Sort Elasticsearch results by integer value via Searchkick

I'm working on a Rails application that uses Searchkick as an interface to Elasticsearch. Site search is working just fine, but I'm running into an unexpected issue on a page where I'm attempting to retrieve the most recent recoreds from Searchkick across a couple different models. The goal is a reverse chronological list of this recent activity, with the two object types intermingled.
I'm using the following code:
models = [ Post, Project ]
includes = {
Post => [ :account => [ :profile ] ],
Project => [ :account => [ :profile ] ],
}
#results = Searchkick.search('*',
:models => models,
:model_includes => includes,
:order => { :id => :desc },
:limit => 27,
)
For the purposes of getting the backend working, the page in development is currently just displaying the title, record type (class name), and ID, like this:
<%= "#{result.title} (#{result.class} #{result.id})" %>
Which will output this:
Greetings from Tennessee! (Post 999)
This generally seems to be working fine, except that ES is returning the results sorted by ID as strings, not integers. I tested by setting the results limit to 1000 and found that with tables containing ~7,000 records, 999 is considered highest, while 6905 comes after 691 in the list.
Looking through the Elasticsearch documentation, I do see mention of sorting numeric fields but I'm unable to figure out how to translate that to the Seachkick DSL. It this possible and supported?
I'm running Searchkick 4.4 and Elasticsearch 7.
Because Elasticsearch stores IDs as strings rather than integers, I solved this problem by adding a new obj_id field in ES and ordering results based on that.
In my Post and Project models:
def search_data
{
:obj_id => id,
:title => title,
:content => ActionController::Base.helpers.strip_tags(content),
}
end
And in the controller I changed the order value to:
:order => { :obj_id => :desc }
The records are sorting correctly now.

MongoDB conditional aggregate query on a HABTM relationship (Mongoid, RoR)?

Rails 4.2.5, Mongoid 5.1.0
I have three models - Mailbox, Communication, and Message.
mailbox.rb
class Mailbox
include Mongoid::Document
belongs_to :user
has_many :communications
end
communication.rb
class Communication
include Mongoid::Document
include Mongoid::Timestamps
include AASM
belongs_to :mailbox
has_and_belongs_to_many :messages, autosave: true
field :read_at, type: DateTime
field :box, type: String
field :touched_at, type: DateTime
field :import_thread_id, type: Integer
scope :inbox, -> { where(:box => 'inbox') }
end
message.rb
class Message
include Mongoid::Document
include Mongoid::Timestamps
attr_accessor :communication_id
has_and_belongs_to_many :communications, autosave: true
belongs_to :from_user, class_name: 'User'
belongs_to :to_user, class_name: 'User'
field :subject, type: String
field :body, type: String
field :sent_at, type: DateTime
end
I'm using the authentication gem devise, which gives access to the current_user helper, which points at the current user logged in.
I have built a query for a controller that satisfied the following conditions:
Get the current_user's mailbox, whose communication's are filtered by the box field, where box == 'inbox'.
It was constructed like this (and is working):
current_user.mailbox.communications.where(:box => 'inbox')
My issue arrises when I try to build upon this query. I wish to chain queries so that I only obtain messages whose last message is not from the current_user. I am aware of the .last method, which returns the most recent record. I have come up with the following query but cannot understand what would need to be adjusted in order to make it work:
current_user.mailbox.communications.where(:box => 'inbox').where(:messages.last.from_user => {'$ne' => current_user})
This query produces the following result:
undefined method 'from_user' for #<Origin::Key:0x007fd2295ff6d8>
I am currently able to accomplish this by doing the following, which I know is very inefficient and want to change immediately:
mb = current_user.mailbox.communications.inbox
comms = mb.reject {|c| c.messages.last.from_user == current_user}
I wish to move this logic from ruby to the actual database query. Thank you in advance to anyone who assists me with this, and please let me know if anymore information is helpful here.
Ok, so what's happening here is kind of messy, and has to do with how smart Mongoid is actually able to be when doing associations.
Specifically how queries are constructed when 'crossing' between two associations.
In the case of your first query:
current_user.mailbox.communications.where(:box => 'inbox')
That's cool with mongoid, because that actually just desugars into really 2 db calls:
Get the current mailbox for the user
Mongoid builds a criteria directly against the communication collection, with a where statement saying: use the mailbox id from item 1, and filter to box = inbox.
Now when we get to your next query,
current_user.mailbox.communications.where(:box => 'inbox').where(:messages.last.from_user => {'$ne' => current_user})
Is when Mongoid starts to be confused.
Here's the main issue: When you use 'where' you are querying the collection you are on. You won't cross associations.
What the where(:messages.last.from_user => {'$ne' => current_user}) is actually doing is not checking the messages association. What Mongoid is actually doing is searching the communication document for a property that would have a JSON path similar to: communication['messages']['last']['from_user'].
Now that you know why, you can get at what you want, but it's going to require a little more sweat than the equivalent ActiveRecord work.
Here's more of the way you can get at what you want:
user_id = current_user.id
communication_ids = current_user.mailbox.communications.where(:box => 'inbox').pluck(:_id)
# We're going to need to work around the fact there is no 'group by' in
# Mongoid, so there's really no way to get the 'last' entry in a set
messages_for_communications = Messages.where(:communications_ids => {"$in" => communications_ids}).pluck(
[:_id, :communications_ids, :from_user_id, :sent_at]
)
# Now that we've got a hash, we need to expand it per-communication,
# And we will throw out communications that don't involve the user
messages_with_communication_ids = messages_for_communications.flat_map do |mesg|
message_set = []
mesg["communications_ids"].each do |c_id|
if communication_ids.include?(c_id)
message_set << ({:id => mesg["_id"],
:communication_id => c_id,
:from_user => mesg["from_user_id"],
:sent_at => mesg["sent_at"]})
end
message_set
end
# Group by communication_id
grouped_messages = messages_with_communication_ids.group_by { |msg| mesg[:communication_id] }
communications_and_message_ids = {}
grouped_messages.each_pair do |k,v|
sorted_messages = v.sort_by { |msg| msg[:sent_at] }
if sorted_messages.last[:from_user] != user_id
communications_and_message_ids[k] = sorted_messages.last[:id]
end
end
# This is now a hash of {:communication_id => :last_message_id}
communications_and_message_ids
I'm not sure my code is 100% (you probably need to check the field names in the documents to make sure I'm searching through the right ones), but I think you get the general pattern.

Exclude indexed data in search results - elasticsearch (tire)

I'm trying to use elasticsearch via tire gem for a multi-tenant app. The app has many accounts with each account having many users.
Now I would like to index the User based on account id.
User Model:
include Tire::Model::Search
mapping do
indexes :name, :boost => 10
indexes :account_id
indexes :company_name
indexes :description
end
def to_indexed_json
to_json( :only => [:name, :account_id, :description, :company_name],
)
end
Search Query:
User.tire.search do
query do
filtered do
query { string 'vamsikrishna' }
filter :term, :account_id => 1
end
end
end
The filter works fine and the results are displayed only for the filtered account id (1). But, when I search for a query string 1:
User.tire.search do
query do
filtered do
query { string '1' }
filter :term, :account_id => 1
end
end
end
Lets say there is an user named 1. In that case, the results are getting displayed for all the users with account id 1 and the user too. This is because I added account_id in the to_indexed_json. Now, how can I display only the user with name 1? (All users should not be available in the hits. Only the user with name 1 should be displayed)
When there are no users with 1 as name or company name or description, I just don't want any results to be displayed. But, in my case as I explained I would get all the users in the account id 1.
You are searching on the _all special field, which contains a copy of the content of all the fields that you indexed. You should search on a specific field to have the right results like this: field_name:1.
If you want you can search on multiple fields using the query string:
{
"query_string" : {
"fields" : ["field1", "field2", "field3"],
"query" : "1"
}
}

How to enforce unique embedded document in mongoid

I have the following model
class Person
include Mongoid::Document
embeds_many :tasks
end
class Task
include Mongoid::Document
embedded_in :commit, :inverse_of => :tasks
field :name
end
How can I ensure the following?
person.tasks.create :name => "create facebook killer"
person.tasks.create :name => "create facebook killer"
person.tasks.count == 1
different_person.tasks.create :name => "create facebook killer"
person.tasks.count == 1
different_person.tasks.count == 1
i.e. task names are unique within a particular person
Having checked out the docs on indexes I thought the following might work:
class Person
include Mongoid::Document
embeds_many :tasks
index [
["tasks.name", Mongo::ASCENDING],
["_id", Mongo::ASCENDING]
], :unique => true
end
but
person.tasks.create :name => "create facebook killer"
person.tasks.create :name => "create facebook killer"
still produces a duplicate.
The index config shown above in Person would translate into for mongodb
db.things.ensureIndex({firstname : 1, 'tasks.name' : 1}, {unique : true})
Can't you just put a validator on the Task?
validates :name, :uniqueness => true
That should ensure uniqueness within parent document.
Indexes are not unique by default. If you look at the Mongo Docs on this, uniqueness is an extra flag.
I don't know the exact Mongoid translation, but you're looking for something like this:
db.things.ensureIndex({firstname : 1}, {unique : true, dropDups : true})
I don't believe this is possible with embedded documents. I ran into the same issue as you and the only workaround I found was to use a referenced document, instead of an embedded document and then create a compound index on the referenced document.
Obviously, a uniqueness validation isn't enough as it doesn't guard against race conditions. Another problem I faced with unique indexes was that mongoid's default behavior is to not raise any errors if validation passes and the database refuses to accept the document. I had to change the following configuration option in mongoid.yml:
persist_in_safe_mode: true
This is documented at http://mongoid.org/docs/installation/configuration.html
Finally, after making this change, the save/create methods will start throwing an error if the database refuses to store the document. So, you'll need something like this to be able to tell users about what happened:
alias_method :explosive_save, :save
def save
begin
explosive_save
rescue Exception => e
logger.warn("Unable to save record: #{self.to_yaml}. Error: #{e}")
errors[:base] << "Please correct the errors in your form"
false
end
end
Even this isn't really a great option because you're left guessing as to which fields really caused the error (and why). A better solution would be to look inside MongoidError and create a proper error message accordingly. The above suited my application, so I didn't go that far.
Add a validation check, comparing the count of array of embedded tasks' IDs, with the count of another array with unique IDs from the same.
validates_each :tasks do |record, attr, tasks|
ids = tasks.map { |t| t._id }
record.errors.add :tasks, "Cannot have the same task more than once." unless ids.count == ids.uniq.count
end
Worked for me.
You can define a validates_uniqueness_of on your Task model to ensure this, according to the Mongoid documentation at http://mongoid.org/docs/validation.html this validation applies to the scope of the parent document and should do what you want.
Your index technique should work too, but you have to generate the indexes before they brought into effect. With Rails you can do this with a rake task (in the current version of Mongoid its called db:mongoid:create_indexes). Note that you won't get errors when saving something that violates the index constraint because Mongoid (see http://mongoid.org/docs/persistence/safe_mode.html for more information).
You can also specify the index in your model class:
index({ 'firstname' => 1, 'tasks.name' => 1}, {unique : true, drop_dups: true })
and use the rake task
rake db:mongoid:create_indexes
you have to run :
db.things.ensureIndex({firstname : 1, 'tasks.name' : 1}, {unique : true})
directly on the database
You appear to including a "create index command" inside of your "active record"(i.e. class Person)

Return embedded documents in query

Is it possible to perform a query and return the embedded documents?
Currently, I have:
class Post
include MongoMapper::Document
many :comments
end
class Comment
include MongoMapper::EmbeddedDocument
belongs_to :post
key :author
key :date
key :body
end
Here is a query that is almost there:
Post.all("comments.date" => {"$gt" => 3.days.ago})
This will return all the post objects but not the comments. I guess I could do something like:
Post.all("comments.date" => {"$gt" => 3.days.ago}).map(&:comments)
But this would return all comments from the posts. I'd like to get all of the comments that met this condition. Perhaps Comment should not be embedded.
I assume you're looking for all comments newer than three days ago? Since your Comments are just embedded documents, they do not exist without the Post object so there's no way to "query" them separately (this is actually a future feature of MongoDB). However, you could easily add a convenience method to help you out:
class Comment
include MongoMapper::EmbeddedDocument
def self.latest
Post.all(:conditions => {"comments.date" => {"$gt" => 3.days.ago}}).map{|p| p.comments}.flatten
end
end
This method would get you all of the comments that have been updated in the last three days but they would not entirely be in order. A better solution might be to use Map/Reduce to pull the latest comments:
class Comment
include MongoMapper::EmbeddedDocument
def self.latest
map = <<-JS
function(){
this.comments.forEach(function(comment) {
emit(comment.created_at, comment)
});
}
JS
r = "function(k,vals){return 1;}"
q = {'comments.created_at' => {'$gt' => 3.days.ago}}
Post.collection.map_reduce(m,r,:query => q).sort([['$natural',-1]])
end
end
Caveat: the above is completely untested code and just exists as an example, but theoretically should return all comments from the last three days sorted in descending order.

Resources