Efficient way to display a nested tree with mongoid and rails - ruby-on-rails

I've got a comment tree nested in the document, using mongoid embeds_many_recursively like this:
Document: {
...
comments: [{
...
updated_at,
child_comments: [{
...
updated_at
child_comments: [{...},{...}],
...},{...}]
...}]
...}]
...}
What's the most effective way of passing it to a view in a way that is ordered by first level 'comment updated_at' attribute?
At the moment I came up with this inside the main document model:
def flatten_comments
#flat_comments = []
self.comments.order_by([[:updated_at, :desc]]).each do |comment|
flatten_comments_iterator(comment)
end
return #flat_comments
end
def flatten_comments_iterator(comment)
#flat_comments << comment
comment.child_comments.each {|reply| flatten_comments_iterator(reply)}
end
and then just iterating in the view over the array.
The problems are:
1) in the recursive flattening, the order is lost somewhere, and I can't figure where, step-by-step on paper it seems to adding items in the needed order, probably some thing to do with class variable scope and access.
2) I'm not sure it's the most efficient way to do a simple retrieval.
Would be thankful for an advise and for experience with how to handle this kind of tasks efficiently.

There are basically 2 design approaches (one of them being the same as yours), that are documented on ruby driver's modeling examples. There is also a similar question on SO about it.
About the other issue: there's nothing bad about recursivity, in general, if the comments don't have a huge depth of nesting. However your implementation is not thread-safe, as it uses instance variables, and not local variables. To deal with it, you should transform #flat_comments to a local variable and pass it as a param to flatten_comments_iterator method.
Tip: since any method recursion can be transformed to a iteration, what you may want to implement is an iterative preorder traversal of a graph.

Related

.where statement to filter posts by date causing large number of database queries?

My first StackOverflow question, so pardon if it's a little rough around the edges. I recently began my first engineering job and have inherited some legacy Ruby on Rails code to work through.
My goal:
is to fetch posts (this is a model, though with no association to user) belonging to a user, as seen below. The posts should be filtered to only include an end_date that is nullor in the future
The problem:
The ActiveRecord query #valid_posts ||= Posts.for_user(user).where('end_date > ? OR end_date IS?', Time.now.in_time_zone, nil).pluck(post_set_id) (some further context below)
generates ~15 calls to my database per user per second when testing with Postman, causing significant memory spikes, notably with increased # of posts. I would only expect (not sure about this) 2 at most (one to fetch posts for the user, a second to fetch posts that match the date constraint).
In absence of the .where('end_date > ? OR end_date IS?', Time.now.in_time_zone, nil), there are no memory issues whatsoever. My question essentially, is why does this particular line cause so many queries to the database (which seems to be the cause of memory spikes), and what would be an improved implementation?
My reasoning thus far:
My initial suspicion was that I was making an N+1 query, though I no longer believe this to be the case (compared .select with .where in the query, no significant changes. A third option would possibly be to use .includes, though there is no association between a user and a post, and I do not believe that it would be feasible to generate one, as to my level of understanding, users are a function of an organization, not their own model.
My second thought is that because I am using a date that is precise to the millisecond, the time is ever changing, and therefore the updated time runs against the posts table every time there is a change in time (in this case every millisecond). Would it be possible to capture the current time in a variable and then pass this to the .where statement, rather than with a varying time, as is currently implemented? This would ultimately cause a sort of caching mechanism if I am not mistaken.
My third thought was to add an index to end_date on the posts table for quicker lookup, though in itself, I do not believe this to provide a solution.
Some basic context:
While there are many files working together, I have tried to overly-simplify them to essentially reflect the information that I believe is necessary to understand the issue at hand. If there is no identifiable cause for this issue, then perhaps I need to dig into other areas of code.
for_user is a user scope defined below:
user_scope
module UserScopable
extend ActiveSupport::Concern
...
scope(:for_user,
lambda { |poster|
for_user_scope(
{ user_id: poster.user_id, organization_id: poster.organization_id}
)
})
scope(:for_user_scope, lambda { |hash|
where(user_id: hash.fetch(:user_id), organization_id: hash.fetch(:organization_id))
})
#valid_posts is contained within a module, PostSetFilter and called in the user controller:
users_controller
def post_ids
post_pools = PostSetFilter.new(user: user)
render json: {PostPools: post_pools}
end
Ultimately, there's a lot that I do not know, and it seems like many approaches, so not entirely sure how to proceed. Any guidance about how to reduce the number of queries, and any reasoning as to why would be greatly appreciated.
I am happy to provide further context if needed, though everything points to the aforementioned line as being the culprit.. Thank you in advance.

What is one way that I can reduce .includes association query?

I have an extremely slow query that looks like this:
people = includes({ project: [{ best_analysis: :main_language }, :logo] }, :name, name_fact: :primary_language)
.where(name_id: limit(limit).unclaimed_people(opts))
Look at the includes method call and notice that is loading huge number of associations. In the RailsSpeed book, there is the following quote:
“For example, consider this:
Car.all.includes(:drivers, { parts: :vendors }, :log_messages)
How many ActiveRecord objects might get instantiated here?
The answer is:
# Cars * ( avg # drivers/car + avg log messages/car + average parts/car * ( average parts/vendor) )
Each eager load increases the number of instantiated objects, and in turn slows down the query. If these objects aren't used, you're potentially slowing down the query unnecessarily. Note how nested eager loads (parts and vendors in the example above) can really increase the number of objects instantiated.
Be careful with nesting in your eager loads, and always test with production-like data to see if includes is really speeding up your overall performance.”
The book fails to mention what could be a good substitute for this though. So my question is what sort of technique could I substitute for includes?
Before i jump to answer. I don't see you using any pagination or limit on a query, that may help quite a lot.
Unfortunately, there aren't any, really. And if you use all of the objects in a view that's okay. There is a one possible substitute to includes, though. It quite complex, but it still helpful sometimes: you join all needed tables, select only fields from them that you use, alias them and access them as a flat structure.
Something like
(NOTE: it uses arel helpers. You need to include ArelHelpers::ArelTable in models where you use syntax like NameFact[:id])
relation.join(name_fact: :primary_language).select(
NameFact[:id].as('name_fact_id')
PrimaryLanguage[:language].as('primary_language')
)
I'm not sure it will work for your case, but that's the only alternative I know.
I have an extremely slow query that looks like this
There are couple of potential causes:
Too many unnecessary objects fetched and created. From you comment, looks like that is not the case and you need all the data that is being fetched.
DB indexes not optimised. Check the time taken by query. Explain the generated query (check logs to get query or .to_sql) and make sure it is not doing table scan and other costly operations.

Hash/Array to Active Record

I have been searching everywhere but I can't seem to find this problem anywhere. In Rails 5.0.0.beta3 I need to sort a my #record = user.records with an association and it's record.
The sort goes something like this.
#record = #record.sort_by { |rec|
If user.fav_record.find(rec.id)
User.fav_record(rec.id).created_at
Else
rec.created_at
End
This is just an example of what I do. But everything sorts fine.
The problem:
This returns an array and not an Active Record Class.
I've tried everything to get this to return an Active Record Class. I've pushed the sorted elements into an ID array and tried to extract it them in that order, I've tried mapping. Every result that I get turns my previous active record into an array or hash. Now I need it to go back into an active record. Does anyone know how to convert an array or hash of that active record back into an Active Record class?
There isn't a similarly easy way to convert ActiveRecord to array.
If you want to optimize the performance of your app, you should try to avoid converting arrays to ActiveRecord queries. Try and keep the object as a query as long as possible.
That being said, working with arrays is generally easier than queries, and it can feel like a hassle to convert a lot of array operations to ActiveRecord query (SQL) code.
It'd be better to write the sort code using ActiveRecord::Query methods or even writing it in plain SQL using find_by_sql.
I don't know what code you should specifically use here, but I do see that your code could be refactored to be clearer. First of all, If and Else should not be capitalized, but I'm assuming that this is just pseudocode and you already realize this. Second, your variable names should be pluralized if they are queries or arrays (i.e. #record.sort_by should be #records.sort_by instead).
It's worth mentioning that ActiveRecord queries are difficult to master and a lot of people just use array operations instead since they're easier to write. If "premature optimization is the root of all evil", it's really not the end of the world if you sacrifice a bit of performance and just keep your array implementation if you're just trying to make an initial prototype. Just make sure that you're not making "n+1" SQL calls, i.e. do not make a database call every iteration of your loop.
Here's an example of an array implementation which avoids the N+1 SQL issue:
# first load all the user's favorites into memory
user_fav_records = user.fav_records.select(:id, :created_at)
#records = #records.sort_by do |record|
matching_rec = user.fav_records.find { |x| x.id.eql?(rec.id) }
# This is using Array#find, not the ActiveRecord method
if matching_rec
matching_rec.created_at
else
rec.created_at
end
end
The main difference between this implementation and the code in your question is that I'm avoiding calling ActiveRecord's find each iteration of the loop. SQL read/writes are computationally expensive, and you want your code to make as little of them as possible.

Best way to implement a large 2D Lookup Table in Rails

I am developing a Rails app and one of my actions compares two of the same kind of objects and returns a decimal value between 0 and 1. There are roughly 800 objects that need to be compared, thus there are roughly 800*800 possible decimal values that can be returned. Each action call requires about 300 or so comparisons, which are made via an API.
Because of the number of API calls that are needed, I have decided that the best approach is to make a lookup table with all 800*800 API comparison values stored locally, to avoid having to rely on the API, which has call limits and a significant overhead per call.
Basically I have decided that a lookup table best suits this task (although I am open for suggestions on this too).
My question is this: what is the best way to implement a 2 dimensional lookup table with ~800 "rows" and ~800 "columns" in rails? For example, if I wanted to compare objects 754 and 348, Would it be best to create models for the rows and columns and access the decimal comparison like:
object1.754.object2.348 # => 0.8738
Or should I store all of the values in a CSV or something like this? If this is the better approach, how should I even approach setting this up? I am relatively new to the rails world so apologies if an obvious answer is dangling in front of me!
Bear in mind that the entire point of this approach was to avoid the overheads from API calls and thus avoid large waiting times for the end user, so I am looking for the most time-efficient way to approach this task!
I would consider a hash of hashes, so you retrieve the values with:
my_hash[754][348]
=> 0.8738
If you might not have already loaded the value for a particular combination then you'd want to be careful to use:
my_hash[754].try(:[],348)
There could be some subtleties in the implimentation to do with loading the hash that make it beneficial to use the hashie gem.
https://rubygems.org/gems/hashie/versions/3.4.2
If you wanted to persist the values then it can be written into a database using serialize, and you can also extend the method to provide expiry dates on the values as well if you wished.
I had a similar problem comparing the contents of a large collection of ebooks, I stored the allready compared results in a matrix that I serialise with Marshal, the lookup key is a 2 dimensional array of the MD5 value of the filepath.
Here I add the Matric class I created for this task.
require 'digest/md5'
class Matrix
attr_accessor :path, :store
def initialize path
#path = path
#store = File.open(#path,'rb') { |f| Marshal.load(f.read) } rescue Hash.new(nil)
end
def save
File.open(#path,'wb') { |f| f.write(Marshal.dump(#store)) }
self
end
def add file1, file2, value
#store[[Digest::MD5.hexdigest(file1), Digest::MD5.hexdigest(file2)]] = value
end
def has? file1, file2
!#store[[Digest::MD5.hexdigest(file1), Digest::MD5.hexdigest(file2)]].nil?
end
def value file1, file2
#store[[Digest::MD5.hexdigest(file1), Digest::MD5.hexdigest(file2)]]
end
def each &blk
#store.each &blk
end
end

Thinking Sphinx Application Wide Search and Dealing with Results

The use case is this:
I'd like to let my user search from a single text box, then on the search results page organize the results by class, essentially.
So for example, say I have the following models configured for Thinking Sphinx: Post, Comment and User. (In my situation i have about 10 models but for clarity on StackOverflow I'm pretending there are only 3)
When i do a search similar to: ThinkingSphinx.search 'search term', :classes => [Post, Comment, User] I'm not sure the best way to iterate through the results and build out the sections of my page.
My first inclination is to do something like:
Execute the search
Iterate over the returned result set and do a result.is_a?(ClassType)
Based on the ClassType, add the item to 1 of 3 arrays -- #match_posts, #matching_comments, or #matching_users
Pass those 3 instance variables down to my view
Is there a better or more efficient way to do this?
Thank you!
I think it comes down to what's useful for people using your website. Does it make sense to have the same query run across all models? Then ThinkingSphinx.search is probably best, especially from a performance perspective.
That said, do you want to group search results by their respective classes? Then some sorting is necessary. Or are you separating each class's results, like a GitHub search? Then having separate collections may be worthwhile, like what you've already thought of.
At a most basic level, you could just return everything sorted by relevance instead of class, and then just render slightly different output depending on each result. A case statement may help with this - best to keep as much of the logic in helpers, and/or possibly partials?
If you have only 3 models to search from then why don't you use only model.search instead of ThinkingSphinx.search . This would resolve your problem of performing result.is_a?. That means easier treatment to the way you want to display results for each model.

Resources