Sort Elasticsearch results by integer value via Searchkick - ruby-on-rails

I'm working on a Rails application that uses Searchkick as an interface to Elasticsearch. Site search is working just fine, but I'm running into an unexpected issue on a page where I'm attempting to retrieve the most recent recoreds from Searchkick across a couple different models. The goal is a reverse chronological list of this recent activity, with the two object types intermingled.
I'm using the following code:
models = [ Post, Project ]
includes = {
Post => [ :account => [ :profile ] ],
Project => [ :account => [ :profile ] ],
}
#results = Searchkick.search('*',
:models => models,
:model_includes => includes,
:order => { :id => :desc },
:limit => 27,
)
For the purposes of getting the backend working, the page in development is currently just displaying the title, record type (class name), and ID, like this:
<%= "#{result.title} (#{result.class} #{result.id})" %>
Which will output this:
Greetings from Tennessee! (Post 999)
This generally seems to be working fine, except that ES is returning the results sorted by ID as strings, not integers. I tested by setting the results limit to 1000 and found that with tables containing ~7,000 records, 999 is considered highest, while 6905 comes after 691 in the list.
Looking through the Elasticsearch documentation, I do see mention of sorting numeric fields but I'm unable to figure out how to translate that to the Seachkick DSL. It this possible and supported?
I'm running Searchkick 4.4 and Elasticsearch 7.

Because Elasticsearch stores IDs as strings rather than integers, I solved this problem by adding a new obj_id field in ES and ordering results based on that.
In my Post and Project models:
def search_data
{
:obj_id => id,
:title => title,
:content => ActionController::Base.helpers.strip_tags(content),
}
end
And in the controller I changed the order value to:
:order => { :obj_id => :desc }
The records are sorting correctly now.

Related

Thinking sphinx results based on model preference

I have two models: 'A' and 'B', and want to search objects from both of them using Thinking sphinx, but I want all results of model 'A' first and then 'B'. How can I do that?
I pass the following options to sphinx query
{:match_mode=>:extended, :sort_mode=>:extended, :star=>true, :order=>"#relevance DESC", :ignore_errors=>true, :populate=>true, :per_page=>10, :retry_stale=>true, :classes => [A,B]}
And then get search results using:
ThinkingSphinx.search "*xy*", options
But it gives results in mixed ordering, whereas I need all 'A' objects first. How can I do that?
The easiest way is to add an attribute to both models' indices:
has "1", :as => :sort_order, :type => :integer
The number within the string should be different per model. And then your :order argument becomes:
:order => 'sort_order ASC, #relevance DESC'

Multiple model single index approach - elasticsearch via tire

In my multi-tenant app (account based with number of users per account), how would I update index for a particular account when a user document is changed.
Using Elasticsearch via Tire gem.
Rails 2.3 app - applied changes to enable support for Rails 2.3 as per loe/tire's commit
Account Model:
include Tire::Model::Search
Tire.index('account_1') do
create(
:mappings => {
:user => {
:properties => {
:name => { :type => :string, :boost => 10 },
:company_name => { :type => :string, :boost => 5 }
}
},
:comments => {
:properties => {
:description => { :type => :string, :boost => 5 }
}
}
}
)
end
As you can see above, there are two models here user and comments. Is it the correct way to address single index with multiple models.
In that case how do I update index when a user document or comment document alone is changed?
Usually when you are indexing a model it is good to index the self attributes along with its associations. So in this case if you want index users and their commments, you should have the index in the user model and index the comments referenced by its association so that tire callbacks apply on the user model to reindex the user object if any attributes in the model are changed. This is only for the model on which you have the index on.
If at all you want to index associations, you need to have hooks that will index the account object after save/ after destroy of user/comments model. Or you could also use :touch => true option to touch the account model on change of user/comments.
Example: if you want index user and comments,
include Tire::Model::Search
include Tire::Model::Callbacks
mapping do
indexes :id, :type => 'integer', :index => :not_analyzed
indexes :about_me, :type => 'string', :index => :snowball
indexes :name, :type => 'string', :index => :whitespace
indexes :comments do
indexes :content, :type => 'string', :analyzer => 'snowball'
end
end
So here the index is on the user model and user.comments is an association. Hope this example explains
The answer to the question as posted by Tire owner Karmi is as follows:
Let's say we have an Account class and we deal in articles entities.
In that case, our Account class would have following:
class Account
#...
# Set index name based on account ID
#
def articles
Article.index_name "articles-#{self.id}"
Article
end
end
So, whenever we need to access articles for a particular account, either for searching or for indexing, we can simply do:
#account = Account.find( remember_token_or_something_like_that )
# Instead of `Article.search(...)`:
#account.articles.search { query { string 'something interesting' } }
# Instead of `Article.create(...)`:
#account.articles.create id: 'abc123', title: 'Another interesting article!', ...
Having a separate index per user/account works perfect in certain cases -- but definitely not well in cases where you'd have tens or hundreds of thousands of indices (or more). Having index aliases, with properly set up filters and routing, would perform much better in this case. We would slice the data not based on the tenant identity, but based on time.
Let's have a look at a second scenario, starting with a heavily simplified curl http://localhost:9200/_aliases?pretty output:
{
"articles_2012-07-02" : {
"aliases" : {
"articles_plan_pro" : {
}
}
},
"articles_2012-07-09" : {
"aliases" : {
"articles_current" : {
},
"articles_shared" : {
},
"articles_plan_basic" : {
},
"articles_plan_pro" : {
}
}
},
"articles_2012-07-16" : {
"aliases" : {
}
}
}
You can see that we have three indices, one per week. You can see there are two similar aliases: articles_plan_pro and articles_plan_basic -- obviously, accounts with the “pro” subscription can search two weeks back, but accounts with the “basic” subscription can search only this week.
Notice also, that the the articles_current alias points to, ehm, current week (I'm writing this on Thu 2012-07-12). The index for next week is just there, laying and waiting -- when the time comes, a background job (cron, Resque worker, custom script, ...) will update the aliases. There's a nifty example with aliases in “sliding window” scenario in the Tire integration test suite.
Let's not look on the articles_shared alias right now, let's look at what tricks we can play with this setup:
class Account
# ...
# Set index name based on account subscription
#
def articles
if plan_code = self.subscription && self.subscription.plan_code
Article.index_name "articles_plan_#{plan_code}"
else
Article.index_name "articles_shared"
end
return Article
end
end
Again, we're setting up an index_name for the Article class, which holds our documents. When the current account has a valid subscription, we get the plan_code out of the subscription, and direct searches for this account into relevant index: “basic” or “pro”.
If the account has no subscription -- he's probably a “visitor” type -- , we direct the searches to the articles_shared alias. Using the interface is as simple as previously, eg. in ArticlesController:
#account = Account.find( remember_token_or_something_like_that )
#articles = #account.articles.search { query { ... } }
# ...
We are not using the Article class as a gateway for indexing in this case; we have a separate indexing component, a Sinatra application serving as a light proxy to elasticsearch Bulk API, providing HTTP authentication, document validation (enforcing rules such as required properties or dates passed as UTC), and uses the bare Tire::Index#import and Tire::Index#store APIs.
These APIs talk to the articles_currentindex alias, which is periodically updated to the current week with said background process. In this way, we have decoupled all the logic for setting up index names in separate components of the application, so we don't need access to the Article or Account classes in the indexing proxy (it runs on a separate server), or any component of the application. Whichever component is indexing, indexes against articles_current alias; whichever component is searching, searches against whatever alias or index makes sense for the particular component.

Multiple SQL statements for a single array of hash insert

I am doing something like this to insert multiple records at the same time in my rails app.
VoteRecord.create(
[
{ :prospect_id => prospect.id, :state => "OH", :election_type => "GE" },
{ :prospect_id => prospect.id, :state => "OH", :election_type => "PR" }
...
]
)
When I check the log i see that the insert query is fired multiple times by sql. Is it possible to do this in a single query?
You can try active record import for bulk import,checkout wiki and example page.
I haven't used it myself, but you should check out the activerecord-import project (for Rails 3)
github More on this can be found here: wiki

Newbie: Rails' way to query database in my case

I am using Ruby v1.8 and Rails v2.3.
I have a two model objects: Cars and Customers,
Model Cars:
class car < ActiveRecord::Base
#car has attribute :town_code
has_many :customers
end
Model Customers:
class customer < ActiveRecord::Base
# customer has attribute :first_name, :last_name
belongs_to :car
end
Now in my controller, I got a string from VIEW, and the received string has the format firstname.lastname#town_code, for example a string like "John.smith#ac01" which can be parsed as first_name="John", last_name="smith" and town_code="ac01"
Now I would like use the Rails's way to query the database to find all the customer objects (match the above conditions) from Customers table which has :
first_name="John",
last_name="smith"
and owned a car(by car_id) with car's town_code="ac01".
what is Rails' syntax to query this?
I know it should be something like (if I wanna count the nr of matched customer):
Customer.count :consitions =>{:first_name => "John", :last_name=>"smith"...}
But, I am not sure how to refer to a customer that has a referenced car with car's town_code= "ac01" ?
------------------ My question --------------------
I want to have two queries:
-one is used to count the number of matching customers,
-the other query returns the customers objects like find_by_ query.
What is the syntax in Ruby on Rails for the two queries?
It should be something similar to
Customer.where(:firstname => "John", :last_name => "Smith").count
If you have many Customers of Car, you can do something like
Car.where(...).customers.where(...)
You should really be firing rails c to test your queries in (I might be slightly off)
You could have something like:
#customers = car.where(:town_code => town_code).customers.where(:first_name => first_name, :last_name => last_name)
And then just count the results:
#customer_count = #customers.count
This assuming you parsed your string into the variables town_code, first_name, and last_name, like you said.
Edit
I don't think Rails v2.3 supports these chains of Active Record queries because I believe it lacks lazy loading from DB. I'm not completely sure. Also, I realize my first suggestion would't work because there could be many cars with the same town_code. I guess you could solve it using the map function like so (not tested):
#customers = car.all(:conditions => {:town_code => town_code}).map{ |c| c.customers.where(:first_name => first_name, :last_name => last_name) }
And then count them like before:
#customer_count = #customers.count
I believe you could do something like this: source
Customer.find(:all, :include => :car, :conditions => "customers.first_name = 'John' AND customers.last_name = 'Smith' AND cars.town_code = 'ac01'")
Counting all customers with a specification can be achieved by this command: source
Customer.count(:all, :include => :car, :conditions => "customers.first_name = 'John' AND customers.last_name = 'Smith' AND cars.town_code = 'ac01'")
By the way, if you are in the position to choose what you work with, I would advise you to go for Rails 3. The chaining methods described by Joseph would make this kind of query a lot easier and it'll save you upgrading issues down the road. (And you tagged the question for Rails 3)

Querying embedded objects in Mongoid/rails 3 ("Lower than", Min operators and sorting)

I am using rails 3 with mongoid.
I have a collection of Stocks with an embedded collection of Prices :
class Stock
include Mongoid::Document
field :name, :type => String
field :code, :type => Integer
embeds_many :prices
class Price
include Mongoid::Document
field :date, :type => DateTime
field :value, :type => Float
embedded_in :stock, :inverse_of => :prices
I would like to get the stocks whose the minimum price since a given date is lower than a given price p, and then be able to sort the prices for each stock.
But it looks like Mongodb does not allow to do it.
Because this will not work:
#stocks = Stock.Where(:prices.value.lt => p)
Also, it seems that mongoDB can not sort embedded objects.
So, is there an alternative in order to accomplish this task ?
Maybe i should put everything in one collection so that i could easily run the following query:
#stocks = Stock.Where(:prices.lt => p)
But i really want to get results grouped by stock names after my query (distinct stocks with an array of ordered prices for example). I have heard about map/reduce with the group function but i am not sure how to use it correctly with Mongoid.
http://www.mongodb.org/display/DOCS/Aggregation
The equivalent in SQL would be something like this:
SELECT name, code, min(price) from Stock WHERE price<p GROUP BY name, code
Thanks for your help.
MongoDB / Mongoid do allow you to do this. Your example will work, the syntax is just incorrect.
#stocks = Stock.Where(:prices.value.lt => p) #does not work
#stocks = Stock.where('prices.value' => {'$lt' => p}) #this should work
And, it's still chainable so you can order by name as well:
#stocks = Stock.where('prices.value' => {'$lt' => p}).asc(:name)
Hope this helps.
I've had a similar problem... here's what I suggest:
scope :price_min, lambda { |price_min| price_min.nil? ? {} : where("price.value" => { '$lte' => price_min.to_f }) }
Place this scope in the parent model. This will enable you to make queries like:
Stock.price_min(1000).count
Note that my scope only works when you actually insert some data there. This is very handy if you're building complex queries with Mongoid.
Good luck!
Very best,
Ruy
MongoDB does allow querying of embedded documents, http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-ValueinanEmbeddedObject
What you're missing is a scope on the Price model, something like this:
scope :greater_than, lambda {|value| { :where => {:value.gt => value} } }
This will let you pass in any value you want and return a Mongoid collection of prices with the value greater than what you passed in. It'll be an unsorted collection, so you'll have to sort it in Ruby.
prices.sort {|a,b| a.value <=> b.value}.each {|price| puts price.value}
Mongoid does have a map_reduce method to which you pass two string variables containing the Javascript functions to execute map/reduce, and this would probably be the best way of doing what you need, but the code above will work for now.

Resources