When you use ActiveRecord::Base.connection.execute(sql_string), should you call clear on the result in order to free memory?
At 19:09 in this podcast, the speaker (a Rails committer who has done a lot of work on Active Record) says that if we use ActiveRecord::Base.connection.execute, we should call clear on the result, or we should use the method ActiveRecord::Base.connection.execute_and_clear, which takes a block.
(He’s a bit unclear on the method names. The method for the MySQL adapter is free and the method for the Postgres adapter is clear. He also mentions release, but that method doesn't exist.)
My understanding is that he's saying we should change
result = ActiveRecord::Base.connection.execute(sql_string).to_a
process_result(result)
to
ActiveRecord::Base.connection.execute_and_clear(sql_string, "SCHEMA", []) do |result|
process_result(result)
end
or
result = ActiveRecord::Base.connection.execute(sql_string)
process_result(result)
result.clear
That podcast was the only place I've heard this claim, and I couldn't find any other information about it. The Rails app I'm working on uses execute without clear in a number of instances, and we don't know of any problems caused by it. Are there certain circumstances under which failing to call clear is more likely to cause memory problems?
It depends on the adapter. Keep in mind that Rails doesn't control the object that is returned by execute. If you're using PostgreSQL, you'll get back a PG::Result, and using the mysql2 adapter, you'll get back a Mysql2::Result.
For PG (documented here), you need to call clear unless autoclear? returns true or you'll get a memory leak. You may also want to call clear manually if you've got a large enough result set to ensure it doesn't cause memory issues before it gets cleaned up.
Mysql2 doesn't appear to expose its free through the Ruby API, and appears to always clean itself up during GC.
Related
I'm working on a rails based web backend, and I've ran into a bit of an issue. I'm building a crypto trading based application, which relies on knowing the exact current price of many cryptos/stocks. To do this I seem to need a websocket to update certain data, however I can't figure out how to store this data. I need to be able to write to it on every websocket update, as well as read from it when sending out data to the front end. Both of these actions seem too fast to rely on my database, so I'm wondering if there is a better option. My idea was to use a class with a class method that is set on server startup. Then read from/write to that method when needed. The class looks something like this
class CryptoSocket
def self.start
##cryptos = {
BTC: 0,
ETH: 0,
DOGE: 0
}
end
def self.value(symbol)
##cryptos[symbol]
end
end
Inside the start method is a websocket which gets opened, and on message writes to ##cryptos with the updated value of the coin. I call CryptoSocket.start when the server boots up
To get the value for a symbol I can just call CryptoSocket.value(symbol) anywhere in my app. This seemed like it was working, however I've noticed sometimes it fails telling me NameError: uninitialized class variable ##cryptos in CryptoSocket
It seems like the issue is running reload! in the rails console, or entering a binding.pry then exiting. My guess is some garbage collection is happening, but overall it's something I'd like to avoid.
Does anyone have a suggestion for a better way to go about setting this class up? Does rails have a better way to persist an object in memory? It's fine to lose it when the server shuts down, but I would like to keep access to it when the server stays up
When I executing query
Mymodel.all.each do |model|
# ..do something
end
It uses allot of memory and amount of used memory increases at all the time and at the and it crashes. I found out that to fix it I need to disable identity_map but when I adding to my mongoid.yml file identity_map_enabled: false I am getting error
Invalid configuration option: identity_map_enabled.
Summary:
A invalid configuration option was provided in your mongoid.yml, or a typo is potentially present. The valid configuration options are: :include_root_in_json, :include_type_for_serialization, :preload_models, :raise_not_found_error, :scope_overwrite_exception, :duplicate_fields_exception, :use_activesupport_time_zone, :use_utc.
Resolution:
Remove the invalid option or fix the typo. If you were expecting the option to be there, please consult the following page with repect to Mongoid's configuration:
I am using Rails 4 and Mongoid 4, Mymodel.all.count => 3202400
How can I fix it or maybe some one know other way to reduce amount of memory used during executing query .all.each ..?
Thank you very much for the help!!!!
I started with something just like you by doing loop through millions of record and the memory just keep increasing.
Original code:
#portal.listings.each do |listing|
listing.do_something
end
I've gone through many forum answers and I tried them out.
1st attempt: I try to use the combination of WeakRef and GC.start but no luck, I fail.
2nd attempt: Adding listing = nil to the first attempt, and still fail.
Success Attempt:
#start_date = 10.years.ago
#end_date = 1.day.ago
while #start_date < #end_date
#portal.listings.where(created_at: #start_date..#start_date.next_month).each do |listing|
listing.do_something
end
#start_date = #start_date.next_month
end
Conclusion
All the memory allocated for the record will never be released during
the query request. Therefore, trying with small number of record every
request does the job, and memory is in good condition since it will be
released after each request.
Your problem isn't the identity map, I don't think Mongoid4 even has an identity map built in, hence the configuration error when you try to turn it off. Your problem is that you're using all. When you do this:
Mymodel.all.each
Mongoid will attempt to instantiate every single document in the db.mymodels collection as a Mymodel instance before it starts iterating. You say that you have about 3.2 million documents in the collection, that means that Mongoid will try to create 3.2 million model instances before it tries to iterate. Presumably you don't have enough memory to handle that many objects.
Your Mymodel.all.count works fine because that just sends a simple count call into the database and returns a number, it won't instantiate any models at all.
The solution is to not use all (and preferably forget that it exists). Depending on what "do something" does, you could:
Page through all the models so that you're only working with a reasonable number of them at a time.
Push the logic into the database using mapReduce or the aggregation framework.
Whenever you're working with real data (i.e. something other than a trivially small database), you should push as much work as possible into the database because databases are built to manage and manipulate big piles of data.
I have the following code in a rails model:
foo = Food.find(...)
foo.with_lock do
if bar = foo.bars.find_by_stuff(stuff)
# do something with bar
else
bar = foo.bars.create!
# do something with bar
end
end
The goal is to make sure that a Bar of the type being created is not being created twice.
Testing with_lock works at the console confirms my expectations. However, in production, it seems that in either some or all cases the lock is not working as expected, and the redundant Bar is being attempted -- so, the with_lock doesn't (always?) result in the code waiting for its turn.
What could be happening here?
update
so sorry to everyone who was saying "locking foo won't help you"!! my example initially didin't have the bar lookup. this is fixed now.
You're confused about what with_lock does. From the fine manual:
with_lock(lock = true)
Wraps the passed block in a transaction, locking the object before yielding. You pass can the SQL locking clause as argument (see lock!).
If you check what with_lock does internally, you'll see that it is little more than a thin wrapper around lock!:
lock!(lock = true)
Obtain a row lock on this record. Reloads the record to obtain the requested lock.
So with_lock is simply doing a row lock and locking foo's row.
Don't bother with all this locking nonsense. The only sane way to handle this sort of situation is to use a unique constraint in the database, no one but the database can ensure uniqueness unless you want to do absurd things like locking whole tables; then just go ahead and blindly try your INSERT or UPDATE and trap and ignore the exception that will be raised when the unique constraint is violated.
The correct way to handle this situation is actually right in the Rails docs:
http://apidock.com/rails/v4.0.2/ActiveRecord/Relation/find_or_create_by
begin
CreditAccount.find_or_create_by(user_id: user.id)
rescue ActiveRecord::RecordNotUnique
retry
end
("find_or_create_by" is not atomic, its actually a find and then a create. So replace that with your find and then create. The docs on this page describe this case exactly.)
Why don't you use a unique constraint? It's made for uniqueness
A reason why a lock wouldn't be working in a Rails app in query cache.
If you try to obtain an exclusive lock on the same row multiple times in a single request, query cached kicks in so subsequent locking queries never reach the DB itself.
The issue has been reported on Github.
I'm accessing Solr in a Ruby on Rails application by using rsolr (not Sunspot). I create the local solr object that I use to send requests like this:
solr = RSolr.connect(:url => "http://localhost:8983/solr")
as far as I understand, this is not really a connection but just an object that will issue requests on demand, so it shouldn't be expensive to keep it initialized and it should never disconnect. According to that, it should be ok to have one global solr object, create it at start time and forget about it. Right? But maybe it's not thread safe?
When should I create the solr connection?
All that the RSolr.connect method really does is sanitize and save the options that you're using. You can see that method here. It's passed a new connection object (which, notably, doesn't have an initialize method, so it's not doing anything when created) and the options that you pass to RSolr.connect.
So yes, you're right -- no harm at all in connecting once and leaving it connected forever hanging around in a variable somewhere. (For example, I memoize the result of RSolr.connect in my Solr/Rails app.)
I found the following method to be a HEAVY memory user on Ruby 1.8.7 and return absolutely no results (when there should be lots). The method also works like a charm on Ruby 1.9.2, returning all the wanted results while consuming no memory at all (or so!). I guess that's because a local variable has the same name as the containing method, but anyone have a clear answer for that?
def contact_of
contact_of = Circle.joins(:ties).where('ties.contact_id' => self.guid).map { |circle| circle.owner } || []
return contact_of.uniq!
end
By the way, I'm running Rails 3.1.1.
Thanks!
UPDATE : There's a part of the question that is erroneous. The fact that no contacts are returned when there should be is my misunderstading of 'uniq!' instead of 'uniq'. The first one does return 'nil' when no duplicates are found.
Still trying to figure out the memory problem...
Yeah, contact_of.uniq! would make a recursive call to the same function. I'm surprised it works in Ruby 1.9, actually.
Also, your DB query is terrible, because it retrieves a lot of unnecessary records and then does further select logic on the Ruby side. You probably want to start the find from Owner, not Circle.