Unit Testing Tire (Elastic Search) - Filtering Results with Method from to_indexed_json - ruby-on-rails

I am testing my Tire / ElasticSearch queries and am having a problem with a custom method I'm including in to_indexed_json. For some reason, it doesn't look like it's getting indexed properly - or at least I cannot filter with it.
In my development environment, my filters and facets work fine and I am get the expected results. However in my tests, I continuously see zero results.. I cannot figure out where I'm going wrong.
I have the following:
def to_indexed_json
to_json methods: [:user_tags, :location_users]
end
For which my user_tags method looks as follows:
def user_tags
tags.map(&:content) if tags.present?
end
Tags is a polymorphic relationship with my user model:
has_many :tags, :as => :tagable
My search block looks like this:
def self.online_sales(params)
s = Tire.search('users') { query { string '*' }}
filter = []
filter << { :range => { :created_at => { :from => params[:start], :to => params[:end] } } }
filter << { :terms => { :user_tags => ['online'] }}
s.facet('online_sales') do
date :created_at, interval: 'day'
facet_filter :and, filter
end
end
end
I have checked the user_tags are included using User.last.to_indexed_json:
{"id":2,"username":"testusername", ... "user_tags":["online"] }
In my development environment, if I run the following query, I get a per day list of online sales for my users:
#sales = User.online_sales(start_date: Date.today - 100.days).results.facets["online_sales"]
"_type"=>"date_histogram", "entries"=>[{"time"=>1350950400000, "count"=>1, "min"=>6.0, "max"=>6.0, "total"=>6.0, "total_count"=>1, "mean"=>6.0}, {"time"=>1361836800000, "count"=>7, "min"=>3.0, "max"=>9.0, "total"=>39.0, "total_count"=>7, "mean"=>#<BigDecimal:7fabc07348f8,'0.5571428571 428571E1',27(27)>}....
In my unit tests, I get zero results unless I remove the facet filter..
{"online_sales"=>{"_type"=>"date_histogram", "entries"=>[]}}
My test looks like this:
it "should test the online sales facets", focus: true do
User.index.delete
User.create_elasticsearch_index
user = User.create(username: 'testusername', value: 'pass', location_id: #location.id)
user.tags.create content: 'online'
user.tags.first.content.should eq 'online'
user.index.refresh
ws = User.online_sales(start: (Date.today - 10.days), :end => Date.today)
puts ws.results.facets["online_sales"]
end
Is there something I'm missing, doing wrong or have just misunderstood to get this to pass? Thanks in advance.
-- EDIT --
It appears to be something to do with the tags relationship. I have another method, ** location_users ** which is a has_many through relationship. This is updated on index using:
def location_users
location.users.map(&:id)
end
I can see an array of location_users in the results when searching. Doesn't make sense to me why the other polymorphic relationship wouldn't work..
-- EDIT 2 --
I have fixed this by putting this in my test:
User.index.import User.all
sleep 1
Which is silly. And, I don't really understand why this works. Why?!

Elastic search by default updates it's indexes once per second.
This is a performance thing because committing your changes to Lucene (which ES uses under the hood) can be quite an expensive operation.
If you need it to update immediately include refresh=true in the URL when inserting documents. You normally don't want this since committing every time when inserting lots of documents is expensive, but unit testing is one of those cases where you do want to use it.
From the documentation:
refresh
To refresh the index immediately after the operation occurs, so that the document appears in search results immediately, the refresh parameter can be set to true. Setting this option to true should ONLY be done after careful thought and verification that it does not lead to poor performance, both from an indexing and a search standpoint. Note, getting a document using the get API is completely realtime.

Related

Updating Lots of Records at Once in Rails

I've got a background job that I run about 5,000 of them every 10 minutes. Each job makes a request to an external API and then either adds new or updates existing records in my database. Each API request returns around 100 items, so every 10 minutes I am making 50,000 CREATE or UPDATE sql queries.
The way I handle this now is, each API item returned has a unique ID. I search my database for a post that has this id, and if it exists, it updates the model. If it doesn't exist, it creates a new one.
Imagine the api response looks like this:
[
{
external_id: '123',
text: 'blah blah',
count: 450
},
{
external_id: 'abc',
text: 'something else',
count: 393
}
]
which is set to the variable collection
Then I run this code in my parent model:
class ParentModel < ApplicationRecord
def update
collection.each do |attrs|
child = ChildModel.find_or_initialize_by(external_id: attrs[:external_id], parent_model_id: self.id)
child.assign_attributes attrs
child.save if child.changed?
end
end
end
Each of these individual calls is extremely quick, but when I am doing 50,000 in a short period of time it really adds up and can slow things down.
I'm wondering if there's a more efficient way I can handle this, I was thinking of doing something instead like:
class ParentModel < ApplicationRecord
def update
eager_loaded_children = ChildModel.where(parent_model_id: self.id).limit(100)
collection.each do |attrs|
cached_child = eager_loaded_children.select {|child| child.external_id == attrs[:external_id] }.first
if cached_child
cached_child.update_attributes attrs
else
ChildModel.create attrs
end
end
end
end
Essentially I would be saving the lookups and instead doing a bigger query up front (this is also quite fast) but making a tradeoff in memory. But this doesn't seem like it would be that much time, maybe slightly speeding up the lookup part, but I'd still have to do 100 updates and creates.
Is there some kind of way I can do batch updates that I'm not thinking of? Anything else obvious that could make this go faster, or reduce the amount of queries I am doing?
You can do something like this:
collection2 = collection.map { |c| [c[:external_id], c.except(:external_id)]}.to_h
def update
ChildModel.where(external_id: collection2.keys).each |cm| do
ext_id = cm.external_id
cm.assign_attributes collection2[ext_id]
cm.save if cm.changed?
collection2.delete(ext_id)
end
if collection2.present?
new_ids = collection2.keys
new = collection.select { |c| new_ids.include? c[:external_id] }
ChildModel.create(new)
end
end
Better because
fetches all required records all at once
creates all new records at once
You can use update_columns if you don't need callbacks/validations
Only drawback, more ruby code manipulation which I think is a good tradeoff for db queries..

Indexing and ordering by dynamic field with sunspot

So I have many items that can be part of many different pages. So here is the simplified models:
class Page
#we just need the id for this question
end
class Item
embeds_many :page_usages
end
class PageUsage
field :position, :default => 0
embedded_in :item
belongs_to :page
end
So the page_usage is holding the position of the items on every page. I want to put that into solr so it can pull up the right items and in the right order for me.
I've looked into dynamic fields and ended up with something like this but not really sure. I want the field to basically be the page id pointing to the position of the item:
searchable do
dynamic_integer :page_usages do
page_usages.inject({}) do |hash, page_usage|
hash.merge(page_usage.page_id => page_usage.position)
end
end
end
And in my controller I have something like this:
Item.search do
dynamic :page_usages do
#i have #page.id but not sure how to get all items with the #page.id
end
end
I need something that will check if the item exist on the page and find out how to use order_by with the position. Is this possible this way or do I have to find another solution?
Solved it after lots of trial and error.
searchable do
dynamic_integer :page_usages do
page_usages.inject({}) do |hash, page_usage|
hash.merge( ("page" + page_usage.page_id.to_s).to_sym => page_usage.position)
end
end
end
So I first had to store the key as a symbol which is important. But the problem I ran into was that the symbol couldn't have quotes in it. So if you call to_sym on the id, it would look something like :"123456789" which will give you a "wrong constant name" error later on. So I threw on a string before the id to create the new symbol which looks like :page123456789.
Next step was to create the search block:
Item.search do
dynamic :page_usages do
with ("page" + page.id.to_s).to_sym ).greater_than(-1)
order_by(("page" + page.id.to_s).to_sym, :asc)
end
end
By using that page id, I was able to pull up all the right items in the right order. I used greater than -1 because by default my positions start at 0 and goes up from there.

Rails 3 multiple parameter filtering using scopes

Trying to do a basic filter in rails 3 using the url params. I'd like to have a white list of params that can be filtered by, and return all the items that match. I've set up some scopes (with many more to come):
# in the model:
scope :budget_min, lambda {|min| where("budget > ?", min)}
scope :budget_max, lambda {|max| where("budget < ?", max)}
...but what's the best way to use some, none, or all of these scopes based on the present params[]? I've gotten this far, but it doesn't extend to multiple options. Looking for a sort of "chain if present" type operation.
#jobs = Job.all
#jobs = Job.budget_min(params[:budget_min]) if params[:budget_min]
I think you are close. Something like this won't extend to multiple options?
query = Job.scoped
query = query.budget_min(params[:budget_min]) if params[:budget_min]
query = query.budget_max(params[:budget_max]) if params[:budget_max]
#jobs = query.all
Generally, I'd prefer hand-made solutions but, for this kind of problem, a code base could become a mess very quickly. So I would go for a gem like meta_search.
One way would be to put your conditionals into the scopes:
scope :budget_max, lambda { |max| where("budget < ?", max) unless max.nil? }
That would still become rather cumbersome since you'd end up with:
Job.budget_min(params[:budget_min]).budget_max(params[:budget_max]) ...
A slightly different approach would be using something like the following inside your model (based on code from here:
class << self
def search(q)
whitelisted_params = {
:budget_max => "budget > ?",
:budget_min => "budget < ?"
}
whitelisted_params.keys.inject(scoped) do |combined_scope, param|
if q[param].nil?
combined_scope
else
combined_scope.where(whitelisted_params[param], q[param])
end
end
end
end
You can then use that method as follows and it should use the whitelisted filters if they're present in params:
MyModel.search(params)

Performance: minimize database hitting

I am using Ruby on Rails 3.0.7 and I am trying to minimize database hitting. In order to do that I retrieve from the database all Article objects related to a User and then perform a search on those retrieved objects.
What I do is:
stored_objects = Article.where(:user_id => <id>) # => ActiveRecord::Relation
<some_iterative_function_1>.each { |...|
stored_object = stored_objects.where(:status => 'published').limit(1)
...
# perform operation on the current 'stored_object' considered
}
<some_iterative_function_2>.each { |...|
stored_object = stored_objects.where(:visibility => 'public').limit(1)
...
# perform operation on the current 'stored_object' considered
}
<some_iterative_function_n>.each { |...|
...
}
The stored_object = stored_objects.where(:status => 'published') code will really avoid to hitting the database (I ask this because in my log file it seams still run a database query for each iteration)? If no, how can I minimize database hitting?
P.S.: in few words, what I would like to do is to work on the ActiveRecord::Relation (an array of ) but the where method called on it seams to hit the database.
Rails has functionality to grab chunks of the database at one time, then iterate over the rows without having to hit the database again.
See "Retrieving Multiple Objects in Batches" for more information about find_each and find_in_batches.
Once you start iterating over stored_objects (if that's what you're doing), they'll be loaded from the database. If you want to load only the users's published articles, you could do this:
stored_objects = Article.where(:user_id => id, :status => 'published')
If you instead want to load published and unpublished articles and do something different with the published ones, you could do this:
stored_objects = Article.where(:user_id => id)
stored_objects.find_all { |a| a.status == 'published' }. each do |a|
# ... do something with a published article
end
Or perhaps:
Article.where(:user_id => id).each do |article|
case article.status
when 'published'
# ... do something with a published article
else
# ... do something with an article that's not published
end
end
Each of these examples performs only one database query. Choosing which one depends on which data you really want to work with.

How do I populate a table in rails from a fixture?

Quick summary:
I have a Rails app that is a personal checklist / to-do list. Basically, you can log in and manage your to-do list.
My Question:
When a user creates a new account, I want to populate their checklist with 20-30 default to-do items. I know I could say:
wash_the_car = ChecklistItem.new
wash_the_car.name = 'Wash and wax the Ford F650.'
wash_the_car.user = #new_user
wash_the_car.save!
...repeat 20 times...
However, I have 20 ChecklistItem rows to populate, so that would be 60 lines of very damp (aka not DRY) code. There's gotta be a better way.
So I want to use seed the ChecklistItems table from a YAML file when the account is created. The YAML file can have all of my ChecklistItem objects to be populated. When a new user is created -- bam! -- the preset to-do items are in their list.
How do I do this?
Thanks!
(PS: For those of you wondering WHY I am doing this: I am making a client login for my web design company. I have a set of 20 steps (first meeting, design, validate, test, etc.) that I go through with each web client. These 20 steps are the 20 checklist items that I want to populate for each new client. However, while everyone starts with the same 20 items, I normally customize the steps I'll take based on the project (and hence my vanilla to-do list implementation and desire to populate the rows programatically). If you have questions, I can explain further.
Just write a function:
def add_data(data, user)
wash_the_car = ChecklistItem.new
wash_the_car.name = data
wash_the_car.user = user
wash_the_car.save!
end
add_data('Wash and wax the Ford F650.', #user)
I agree with the other answerers suggesting you just do it in code. But it doesn't have to be as verbose as suggested. It's already a one liner if you want it to be:
#new_user.checklist_items.create! :name => 'Wash and wax the Ford F650.'
Throw that in a loop of items that you read from a file, or store in your class, or wherever:
class ChecklistItem < AR::Base
DEFAULTS = ['do one thing', 'do another']
...
end
class User < AR::Base
after_create :create_default_checklist_items
protected
def create_default_checklist_items
ChecklistItem::DEFAULTS.each do |x|
#new_user.checklist_items.create! :name => x
end
end
end
or if your items increase in complexity, replace the array of strings with an array of hashes...
# ChecklistItem...
DEFAULTS = [
{ :name => 'do one thing', :other_thing => 'asdf' },
{ :name => 'do another', :other_thing => 'jkl' },
]
# User.rb in after_create hook:
ChecklistItem::DEFAULTS.each do |x|
#new_user.checklist_items.create! x
end
But I'm not really suggesting you throw all the defaults in a constant inside ChecklistItem. I just described it that way so that you could see the structure of the Ruby object. Instead, throw them in a YAML file that you read in once and cache:
class ChecklistItem < AR::Base
def self.defaults
##defaults ||= YAML.read ...
end
end
Or if you wand administrators to be able to manage the default options on the fly, put them in the database:
class ChecklistItem < AR::Base
named_scope :defaults, :conditions => { :is_default => true }
end
# User.rb in after_create hook:
ChecklistItem.defaults.each do |x|
#new_user.checklist_items.create! :name => x.name
end
Lots of options.
A Rails Fixture is used to populate test-data for unit tests ; Dont think it's meant to be used in the scenario you mentioned.
I'd say just Extract a new method add_checklist_item and be done with it.
def on_user_create
add_checklist_item 'Wash and wax the Ford F650.', #user
# 19 more invocations to go
end
If you want more flexibility
def on_user_create( new_user_template_filename )
#read each line from file and call add_checklist_item
end
The file can be a simple text file where each line corresponds to a task description like "Wash and wax the Ford F650.". Should be pretty easy to write in Ruby,

Resources