will_paginate in rails when using join and group by - ruby-on-rails

I noticed that when using will_paginate 20 results a page, joining multiple tables group_by "nickname", the output is paginated but 3 "nicknames" only showing (this makes sense since pagination counted the output before group by) but how can I solve that? also, I want to display the output like that and limit the number of items per page based on "nickname" column: please note that the "data" table has many "preferences". (data.id = preferences.data_id)
{
"totalCount": 123,
"pageInfo": {
"currentPage": 1,
"nextPage": 2,
"lastPage": 8
},
"results": [
{
"data": {
"id": 1,
"nickname": "foo"
},
"preferences": [
{
"id": 4479,
"created_at": "2019-05-21T00:39:45.772Z",
"updated_at": "2019-05-21T00:39:45.772Z",
"title": "Check Database",
...
},
...
]
},
...
]
}
data_res = Data.paginate(page: params[:page], per_page:
20).joins("INNER JOIN preferences ON data.id =
preferences.data_id").select("data.*, preferences.*").order(id: :asc)
data_group_by = data_res.group_by { |r| r.nickname }
respond_to do |format|
format.any
format.json {
render :json => {
:totalCount => data_res.total_entries,
:pageInfo => {
:currentPage => data_res.current_page,
:nextPage => data_res.next_page,
:total_pages => data_res.total_pages,
:per_page => data_res.per_page,
},
:results => data_res
}
}
end

If I'm understanding your question correctly(probably not), pagination says it has 20 records, but you're only seeing 3 records being returned because they're grouped?
However, what you want is 20 records, with 20x preferences grouped?
If that's the case, I think you're probably overcomplicating your query.
I don't think you should use the select("data.*, preferences.*") because it basically just adds a new record per preference fetched, so the preferences is probably the determinant of how many records you're getting rather than data on which you're paginating on + you're dynamically adding additional methods to each of the data returned to account for the preferences
data_res.group_by { |r| r.nickname } seems unnecessary, unless you have data records that are not unique, in which case I'd question the reason for grouping them by that.
In my opinion, if nicknames are unique, i.e there can only be 1 data record with the same nickname, here's what I'd propose
class Data
has_many :preferences
end
class Preference
belongs_to :data
end
joins-ing and includes-ing here to ensure the preferences are eager loaded,
while conforming to your existing query of only fetching data with preferences
data_res = Data.joins(:preference).includes(:preference).paginate(page: params[:page], per_page: 20).order(id: :asc) # this should give you 20 records of data that has preferences
Then your serializer can do the rest of work to ensure your data is properly mapped and that your preferences are on the same level(there are several ways to achieve that with any serialization package), e.g.
class DataPreferencesSerializer < AMS
attributes :data
:preferences # or can be has_many :preferences serializer
def data
{id: object.id, nickname: object.nickname }
end
def preferences
object.preferences
end
end
results= ArraySerializer.new(data_res, each_serializer: DataPreferencesSerializer, root: false)
Like I said, there are several ways of achieving the serialization, so the implementation above is just an idea of the approach you might take and not a specific implementation.
PS: Your INNER JOIN ensures that all the returned data records have an associated preferences, so any data that doesn't have at least one preference that is probably excluded from the records you get back.

Related

Sort Elasticsearch results by integer value via Searchkick

I'm working on a Rails application that uses Searchkick as an interface to Elasticsearch. Site search is working just fine, but I'm running into an unexpected issue on a page where I'm attempting to retrieve the most recent recoreds from Searchkick across a couple different models. The goal is a reverse chronological list of this recent activity, with the two object types intermingled.
I'm using the following code:
models = [ Post, Project ]
includes = {
Post => [ :account => [ :profile ] ],
Project => [ :account => [ :profile ] ],
}
#results = Searchkick.search('*',
:models => models,
:model_includes => includes,
:order => { :id => :desc },
:limit => 27,
)
For the purposes of getting the backend working, the page in development is currently just displaying the title, record type (class name), and ID, like this:
<%= "#{result.title} (#{result.class} #{result.id})" %>
Which will output this:
Greetings from Tennessee! (Post 999)
This generally seems to be working fine, except that ES is returning the results sorted by ID as strings, not integers. I tested by setting the results limit to 1000 and found that with tables containing ~7,000 records, 999 is considered highest, while 6905 comes after 691 in the list.
Looking through the Elasticsearch documentation, I do see mention of sorting numeric fields but I'm unable to figure out how to translate that to the Seachkick DSL. It this possible and supported?
I'm running Searchkick 4.4 and Elasticsearch 7.
Because Elasticsearch stores IDs as strings rather than integers, I solved this problem by adding a new obj_id field in ES and ordering results based on that.
In my Post and Project models:
def search_data
{
:obj_id => id,
:title => title,
:content => ActionController::Base.helpers.strip_tags(content),
}
end
And in the controller I changed the order value to:
:order => { :obj_id => :desc }
The records are sorting correctly now.

In Rails, how do I return an associated record using includes?

I'm aware of the includes, and extract_associated methods.
extract_associated as used here, in the Rails docs, only returns the user records -
account.memberships.extract_associated(:user)
# => Returns collection of User records
I'm looking to return the Membership records, WITH the User records in the same array. I know that the includes method should do this for me, but my response only includes the user_id and not the actual record, i.e. the use of includes hasn't changed what's returned at all, like so:-
account.memberships.includes(:user)
# => Returns collection of Membership records, with user_ids
[{ "id": 3, "account_id": 1, "user_id": 2, "membership_name": 'Annual Membership'}]
My Memberships belongs_to a User, and a User has_many Memberships.
What am I missing here?
It's not an option for me to do membership.user in my view, because I'm using VueJS so need to pass all the data I need in.
you can include association's collection as json.
account.memberships.includes(:user)
.as_json(include: { memberships: { include: :user } })
See the details for API from documentation.
https://api.rubyonrails.org/classes/ActiveModel/Serializers/JSON.html

Rails order by calculation in associated table

I’m working with the below Rails Models.
Artist.rb
has_many: updates
Update.rb
belongs_to: artist
Updates has a popularity column (int 0-100)
I need to order artists by difference in popularity within the last 30 days. (last row - first row of updates in range)
I’ve made this work in controller by iterating over list of artists, calculate the difference in popularity, and save that value together with the artist id in a new array. Then sort that array by increase value and recreate the list of artists in the correct order. Issue is this causes a timeout error on my application as the iteration happens upon clicking “search”.
Method to calculate difference:
class Update < ApplicationRecord
belongs_to :artist
def self.pop_diff(a)
in_range = joins(:artist).where(artists: {name: a}).where(created_at: 30.days.ago.to_date..Time.now().to_date)
diff = in_range.last.popularity - in_range.first.popularity
return diff
end
end
Creating a new array in controller with correct ordering:
#artists = Artist.all
#ordering = Array.new
#artists.each do |a|
#ordering << {"artist" => a, "diff" => Update.pop_diff(a) }
end
#ordering = #ordering.sort_by { |k| k["diff"]}.reverse!
Does anyone know best practice on dealing with these types of situations?
These are the three paths I can think of:
Tweaking above solution to work more efficiently
Using a virtual column (attr_accessor) and storing the increase there. I’ve never done this before, not sure what’s possible
Build a back-end script that saves increase value in database on a daily base.
It would be most performant to do this in SQL
class Artist < ApplicationRecord
def self.get_popularity_extreme(direction = 'ASC', days_ago = 30)
<<-SQL
SELECT popularity
FROM updates
WHERE updates.created_at BETWEEN (DATEADD(DAY, -#{days_ago.to_i.abs}, NOW()), NOW())
ORDER BY updates.created_at #{direction.presence || 'ASC'}
LIMIT 1
SQL
end
def self.by_popularity_difference
joins(
<<-SQL
LEFT JOIN (
#{get_popularity_extreme}
) earliest_update ON updates.artist_id = artists.id
LEFT JOIN (
#{get_popularity_extreme('DESC')}
) latest_update ON updates.artist_id = artists.id
SQL
).
where('earliest_update.popularity IS NOT NULL').
where('latest_update.popularity IS NOT NULL').
select('artists.*', 'latest_update.popularity - earliest_update.popularity AS popularity_difference').
order('popularity_difference DESC')
end
end
Of course this is not the 'rails way'
The other option I would take would be to add a trigger to Update after_save to also set a column in the parent artist table
class Update < ApplicationRecord
belongs_to :artist
after_save :set_artist_difference
def self.pop_diff(a)
in_range = where(artist_id: a.id).where(created_at: 30.days.ago.to_date..Time.now().to_date).limit(1)
in_range.order(created_at: :desc).first.popularity - in_range.order(:created_at).first.popularity
end
def set_artist_difference
artist.update(difference: self.class.pop_diff(a))
end
end
the downside to this is if not every artist gets an update every day, the number won't be accurate
If you are to continue using your current solution, you should specify the order, explicit return is unnecessary, you shouldn't lookup an artist you already have, and the join isn't needed, (and also it's just wrong because your passing the whole artist, yet filtering it on 'name'):
class Update < ApplicationRecord
belongs_to :artist
def self.pop_diff(a)
in_range = where(artist_id: a.id).where(created_at: 30.days.ago.to_date..Time.now().to_date).limit(1)
in_range.order(created_at: :desc).first.popularity - in_range.order(:created_at).first.popularity
end
end
also instead of sorting the opposite direction then reversing, sort by negative diff:
#artists = Artist.all
#ordering = Array.new
#artists.find_in_batches do |batch|
batch.each do |a|
#ordering << {"artist" => a, "diff" => Update.pop_diff(a) }
end
end
#ordering = #ordering.sort_by { |k| -(k["diff"])}
Well, this approach you took has a problem with slow performance, in part because of the many queries you execute in the DB. Here's a simple way to do that (or very close to):
artists = Artist.all
pops =
artists.
includes(:updates).
where('updates.created_at' => 30.days.ago..Time.zone.now).
pluck(:id, 'updates.popularity').
group_by {|g| g.first}.
flat_map do |id, list|
diffs = list.map(&:second).compact
{
artist: artists.find { |artist| artist.id == id},
pops: diffs.last - diffs.first
}
end
# => [{:artist=>#<Artist id: 1, name: "1", created_at: "2018-07-10 05:44:29", updated_at: "2018-07-10 05:44:29">, :pops=>[10, 11, 1]}, {:artist=>#<Artist id: 2, name: "2", created_at: "2018-07-10 05:44:32", updated_at: "2018-07-10 05:44:32">, :pops=>[]}, {:artist=>#<Artist id: 3, name: "3", created_at: "2018-07-10 05:44:34", updated_at: "2018-07-10 05:44:34">, :pops=>[]}]
Much much more performant! But notice this is still not the most performant way to do the job. Still, it is very quick (although a little bit algebraic - you can improve somewhat) and uses a lot of the ruby and rails tricks to achieve the result you're looking for. Hope it helps! =)

Can't access data in ActiveHash

I'm using the Gem active_hash https://github.com/zilkey/active_hash to create models for simple data that I don't want to create DB tables for.
For example, I have this model setup for FieldTypes:
class FieldType < ActiveHash::Base
self.data = [
{:id => 1, :name => "text", :friendly_name => "Text"},
{:id => 2, :name => "textarea", :friendly_ => "Text Area"},
{:id => 3, :name => "image", :friendly_ => "Image"},
]
end
And I'm trying to list these field types for a select:
def field_types_for_select
#FieldType.all.order('name asc').collect { |t| [t.friendly_name, t.name] }
FieldType.pluck(:friendly_name, :name)
end
But I get an error that order, collect or pluck are not defined.
How do I access this data? This works fine on other models, just not ActiveHash ones. According to the docs the model should work the same as ActiveRecord but I don't seem to be able to access it the same. FieldType.all works, but other methods do not.
Pluck isn't defined on ActiveHash::Base. It is defined on ActiveRecord::Relation::Calculations, and it's purpose is to produce a SQL select for the columns you specify. You will not be able to get it to work with ActiveHash.
You can, however, define your own pluck on your FieldType model.
def self.pluck(*columns)
data.map { |row| row.values_at(*columns) }
end
Or query the data directly:
FiledType.data.map { |row| row.values_at(:friendly_name, :name) }

Multiple model single index approach - elasticsearch via tire

In my multi-tenant app (account based with number of users per account), how would I update index for a particular account when a user document is changed.
Using Elasticsearch via Tire gem.
Rails 2.3 app - applied changes to enable support for Rails 2.3 as per loe/tire's commit
Account Model:
include Tire::Model::Search
Tire.index('account_1') do
create(
:mappings => {
:user => {
:properties => {
:name => { :type => :string, :boost => 10 },
:company_name => { :type => :string, :boost => 5 }
}
},
:comments => {
:properties => {
:description => { :type => :string, :boost => 5 }
}
}
}
)
end
As you can see above, there are two models here user and comments. Is it the correct way to address single index with multiple models.
In that case how do I update index when a user document or comment document alone is changed?
Usually when you are indexing a model it is good to index the self attributes along with its associations. So in this case if you want index users and their commments, you should have the index in the user model and index the comments referenced by its association so that tire callbacks apply on the user model to reindex the user object if any attributes in the model are changed. This is only for the model on which you have the index on.
If at all you want to index associations, you need to have hooks that will index the account object after save/ after destroy of user/comments model. Or you could also use :touch => true option to touch the account model on change of user/comments.
Example: if you want index user and comments,
include Tire::Model::Search
include Tire::Model::Callbacks
mapping do
indexes :id, :type => 'integer', :index => :not_analyzed
indexes :about_me, :type => 'string', :index => :snowball
indexes :name, :type => 'string', :index => :whitespace
indexes :comments do
indexes :content, :type => 'string', :analyzer => 'snowball'
end
end
So here the index is on the user model and user.comments is an association. Hope this example explains
The answer to the question as posted by Tire owner Karmi is as follows:
Let's say we have an Account class and we deal in articles entities.
In that case, our Account class would have following:
class Account
#...
# Set index name based on account ID
#
def articles
Article.index_name "articles-#{self.id}"
Article
end
end
So, whenever we need to access articles for a particular account, either for searching or for indexing, we can simply do:
#account = Account.find( remember_token_or_something_like_that )
# Instead of `Article.search(...)`:
#account.articles.search { query { string 'something interesting' } }
# Instead of `Article.create(...)`:
#account.articles.create id: 'abc123', title: 'Another interesting article!', ...
Having a separate index per user/account works perfect in certain cases -- but definitely not well in cases where you'd have tens or hundreds of thousands of indices (or more). Having index aliases, with properly set up filters and routing, would perform much better in this case. We would slice the data not based on the tenant identity, but based on time.
Let's have a look at a second scenario, starting with a heavily simplified curl http://localhost:9200/_aliases?pretty output:
{
"articles_2012-07-02" : {
"aliases" : {
"articles_plan_pro" : {
}
}
},
"articles_2012-07-09" : {
"aliases" : {
"articles_current" : {
},
"articles_shared" : {
},
"articles_plan_basic" : {
},
"articles_plan_pro" : {
}
}
},
"articles_2012-07-16" : {
"aliases" : {
}
}
}
You can see that we have three indices, one per week. You can see there are two similar aliases: articles_plan_pro and articles_plan_basic -- obviously, accounts with the “pro” subscription can search two weeks back, but accounts with the “basic” subscription can search only this week.
Notice also, that the the articles_current alias points to, ehm, current week (I'm writing this on Thu 2012-07-12). The index for next week is just there, laying and waiting -- when the time comes, a background job (cron, Resque worker, custom script, ...) will update the aliases. There's a nifty example with aliases in “sliding window” scenario in the Tire integration test suite.
Let's not look on the articles_shared alias right now, let's look at what tricks we can play with this setup:
class Account
# ...
# Set index name based on account subscription
#
def articles
if plan_code = self.subscription && self.subscription.plan_code
Article.index_name "articles_plan_#{plan_code}"
else
Article.index_name "articles_shared"
end
return Article
end
end
Again, we're setting up an index_name for the Article class, which holds our documents. When the current account has a valid subscription, we get the plan_code out of the subscription, and direct searches for this account into relevant index: “basic” or “pro”.
If the account has no subscription -- he's probably a “visitor” type -- , we direct the searches to the articles_shared alias. Using the interface is as simple as previously, eg. in ArticlesController:
#account = Account.find( remember_token_or_something_like_that )
#articles = #account.articles.search { query { ... } }
# ...
We are not using the Article class as a gateway for indexing in this case; we have a separate indexing component, a Sinatra application serving as a light proxy to elasticsearch Bulk API, providing HTTP authentication, document validation (enforcing rules such as required properties or dates passed as UTC), and uses the bare Tire::Index#import and Tire::Index#store APIs.
These APIs talk to the articles_currentindex alias, which is periodically updated to the current week with said background process. In this way, we have decoupled all the logic for setting up index names in separate components of the application, so we don't need access to the Article or Account classes in the indexing proxy (it runs on a separate server), or any component of the application. Whichever component is indexing, indexes against articles_current alias; whichever component is searching, searches against whatever alias or index makes sense for the particular component.

Resources