How is this method memory intense? - ruby-on-rails

I found the following method to be a HEAVY memory user on Ruby 1.8.7 and return absolutely no results (when there should be lots). The method also works like a charm on Ruby 1.9.2, returning all the wanted results while consuming no memory at all (or so!). I guess that's because a local variable has the same name as the containing method, but anyone have a clear answer for that?
def contact_of
contact_of = Circle.joins(:ties).where('ties.contact_id' => self.guid).map { |circle| circle.owner } || []
return contact_of.uniq!
end
By the way, I'm running Rails 3.1.1.
Thanks!
UPDATE : There's a part of the question that is erroneous. The fact that no contacts are returned when there should be is my misunderstading of 'uniq!' instead of 'uniq'. The first one does return 'nil' when no duplicates are found.
Still trying to figure out the memory problem...

Yeah, contact_of.uniq! would make a recursive call to the same function. I'm surprised it works in Ruby 1.9, actually.
Also, your DB query is terrible, because it retrieves a lot of unnecessary records and then does further select logic on the Ruby side. You probably want to start the find from Owner, not Circle.

Related

Clone a mongodb collection from within Rails Mongoid

I am trying to implement this solution in rails, using the collection aggregate method, to clone an entire collection within the same database.
In mongo shell, this works perfectly, and a cloned collection is created successfully:
db.source_collection.aggregate([ { $match: {} }, { $out: "target_collection" } ])
The rails-mongoid alternate, according to my research, should be this, which runs without errors:
SourceCollection.collection.aggregate({"$match" => {}, "$out" => "target_collection"})
#<Mongo::Collection::View::Aggregation:0x000000055bced0 #view=#<Mongo::Collection::View:0x44951600 namespace='DB_dev.source_collection' #filter={} #options={}>, #pipeline={"$match"=>{}, "$out"=>"target_collection"}, #options={}>
I also tried with an array
SourceCollection.collection.aggregate([{"$match" => {}}, {"$out" => "target_collection"}])
#<Mongo::Collection::View::Aggregation:0x000000054936d0 #view=#<Mongo::Collection::View:0x44342320 namespace='DB_dev.source_collection' #filter={} #options={}>, #pipeline=[{"$match"=>{}}, {"$out"=>"target_collection"}], #options={}>
UPDATE
This simplest syntax also works in Mongo console:
db.source_collection.aggregate( { $out: "target_collection" } )
But the respective syntax does not seem to work in Ruby:
SourceCollection.collection.aggregate({"$out" => "target_collection"})
Unfortunately, although there are no errors, the collection is not created.
Any clues as to the way I can make this happen?
Mongo gem version 2.5.3
Update2
Apparently $out is not considered in the pipeline, thus rendering the aggregation invalid.
This can be fixed with code... I am looking for a module/class/method override, as contacting mongodb issue tracking system for a change request might not be as quick..
UPDATE - FINAL
This issue has been solved, by help of Thomas R. Koll (thank you).
I add an update to post the response I got from the ticketing service of MongoDB, which pretty much describes Thomas's solution.
The reason you're not seeing the results without count is that the
aggregate method returns a lazy cursor; that is, the query does not
execute until the return value of aggregate is iterated over.
Calling count is one way to do this. This is the same behavior
you'll see if you call find or if you call aggregate without
specifying $out; the difference is that $out has an side-effect
beyond just returning the results, so it's more obvious when exactly
it occurs.
Found the solution, and I have to explain a few thigs:
This returns a Mongo::Collection::View::Aggregation object, it won't send a query to the database
User.collection.aggregate({"$out": "target_collection"})
Only when you call a method like count or to_a on the aggregation object it will be sent to the server, but if you pass a hash you'll get an error, so the pipeline has to be an array of hashes to work
User.collection.aggregate([{"$out": "target_collection"}]).count

Independent ActiveRecord query inside ActiveRecord::Relation context

There is some ruby on rails code
class User < ActiveRecord::Base
def self.all_users_count
User.all
end
end
User.all_users_count
returns, for example, 100
User.limit(5).all_users_count
Now it return 5 because of ActiveRecord::Relation context, in despite of i wroute name of class User.all instead simple all
(.to_sql show that query always contains limit or where id or other things in other cases)
So, how can i make context-independent AR queries inside model methods? like User.all and others?
Thank you!
Ps. Or maybe my code has an error or something like this, and in fact User.all inside any methods and context always must returns correct rows count of this model table
This is very weird and unexpected (unfortunately I can't confirm that, because my computer crashed, and have no rails projects at hand).
I would expect
User.all
to create a new scope (or as you call it - context)
Try working around this with
User.unscoped.all
Edit:
I tried it out on my project and on clean rails repo, and the results are consistent.
And after thinking a bit - this is maybe not even an issue - I think your approach could be faulty.
In what scenario would you chain User.limit(2).all_users_count ?? I can't think of any. Because either you need all users count, and you call User.all_usert_count (or just User.count)
... or you need something else and you call User.limit(2).where(...) - there's no point in calling all_users_count in that chain, is it?
And, when you think of it, it makes sense. Imagine you had some different method like count_retired, what would you expect from such call:
User.limit(2).count_retired ?
The number of retired users not bigger than 2, or the number of all retired users in the system? I would expect the former.
So I think one of two possibilities here:
either you implemented it wrong and should do it in a different way (as described above in the edit section)
or you have some more complex issue, but you boiled your examples down to a point where they don't make much sense anymore (please follow up with another question if you please, and please, ping me in the comment with a link if you do, because it sounds interesting)

Should PG::Result#clear be called after you've executed raw SQL?

When you use ActiveRecord::Base.connection.execute(sql_string), should you call clear on the result in order to free memory?
At 19:09 in this podcast, the speaker (a Rails committer who has done a lot of work on Active Record) says that if we use ActiveRecord::Base.connection.execute, we should call clear on the result, or we should use the method ActiveRecord::Base.connection.execute_and_clear, which takes a block.
(He’s a bit unclear on the method names. The method for the MySQL adapter is free and the method for the Postgres adapter is clear. He also mentions release, but that method doesn't exist.)
My understanding is that he's saying we should change
result = ActiveRecord::Base.connection.execute(sql_string).to_a
process_result(result)
to
ActiveRecord::Base.connection.execute_and_clear(sql_string, "SCHEMA", []) do |result|
process_result(result)
end
or
result = ActiveRecord::Base.connection.execute(sql_string)
process_result(result)
result.clear
That podcast was the only place I've heard this claim, and I couldn't find any other information about it. The Rails app I'm working on uses execute without clear in a number of instances, and we don't know of any problems caused by it. Are there certain circumstances under which failing to call clear is more likely to cause memory problems?
It depends on the adapter. Keep in mind that Rails doesn't control the object that is returned by execute. If you're using PostgreSQL, you'll get back a PG::Result, and using the mysql2 adapter, you'll get back a Mysql2::Result.
For PG (documented here), you need to call clear unless autoclear? returns true or you'll get a memory leak. You may also want to call clear manually if you've got a large enough result set to ensure it doesn't cause memory issues before it gets cleaned up.
Mysql2 doesn't appear to expose its free through the Ruby API, and appears to always clean itself up during GC.

Mongoid identity_map and memory usage, memory leaks

When I executing query
Mymodel.all.each do |model|
# ..do something
end
It uses allot of memory and amount of used memory increases at all the time and at the and it crashes. I found out that to fix it I need to disable identity_map but when I adding to my mongoid.yml file identity_map_enabled: false I am getting error
Invalid configuration option: identity_map_enabled.
Summary:
A invalid configuration option was provided in your mongoid.yml, or a typo is potentially present. The valid configuration options are: :include_root_in_json, :include_type_for_serialization, :preload_models, :raise_not_found_error, :scope_overwrite_exception, :duplicate_fields_exception, :use_activesupport_time_zone, :use_utc.
Resolution:
Remove the invalid option or fix the typo. If you were expecting the option to be there, please consult the following page with repect to Mongoid's configuration:
I am using Rails 4 and Mongoid 4, Mymodel.all.count => 3202400
How can I fix it or maybe some one know other way to reduce amount of memory used during executing query .all.each ..?
Thank you very much for the help!!!!
I started with something just like you by doing loop through millions of record and the memory just keep increasing.
Original code:
#portal.listings.each do |listing|
listing.do_something
end
I've gone through many forum answers and I tried them out.
1st attempt: I try to use the combination of WeakRef and GC.start but no luck, I fail.
2nd attempt: Adding listing = nil to the first attempt, and still fail.
Success Attempt:
#start_date = 10.years.ago
#end_date = 1.day.ago
while #start_date < #end_date
#portal.listings.where(created_at: #start_date..#start_date.next_month).each do |listing|
listing.do_something
end
#start_date = #start_date.next_month
end
Conclusion
All the memory allocated for the record will never be released during
the query request. Therefore, trying with small number of record every
request does the job, and memory is in good condition since it will be
released after each request.
Your problem isn't the identity map, I don't think Mongoid4 even has an identity map built in, hence the configuration error when you try to turn it off. Your problem is that you're using all. When you do this:
Mymodel.all.each
Mongoid will attempt to instantiate every single document in the db.mymodels collection as a Mymodel instance before it starts iterating. You say that you have about 3.2 million documents in the collection, that means that Mongoid will try to create 3.2 million model instances before it tries to iterate. Presumably you don't have enough memory to handle that many objects.
Your Mymodel.all.count works fine because that just sends a simple count call into the database and returns a number, it won't instantiate any models at all.
The solution is to not use all (and preferably forget that it exists). Depending on what "do something" does, you could:
Page through all the models so that you're only working with a reasonable number of them at a time.
Push the logic into the database using mapReduce or the aggregation framework.
Whenever you're working with real data (i.e. something other than a trivially small database), you should push as much work as possible into the database because databases are built to manage and manipulate big piles of data.

How to get result of the previous action

Hi Inside rails console you can get the result of the previous operation with _ Is there any way to do such a thing inside ruby program?
everything in Ruby is an object, so think about it, if any returned object is not assigned a reference then it will be marked for garbage collection, so no there is no way other than to assign a returned object to a variable!
You can do this with irb (and programs that improve on irb) - it's not specific to Rails. But apart from that, I'm not aware of being able to do what you want.

Resources