Firstly, sorry about the essay, I'm trying not to provide any actual application code and hope there is some logical explanation just because the problem is so weird.
I have spent the past 2 days debugging some code that works 95% of the time, the error seemed simple to debug at first but now I find myself scratching my head with no answer(s).
Some background
We are running:
Ruby 1.8.6, Rails 2.3.2 & Postgresql 8
We still have to migrate to 2.3.8, so unless the solution lies within upgrading to rails 2.3.8 don't advise me on it ;)
In a nutshell
We have a script that scans through CSV files. I store all the relevant columns into hashes by row and then proceed to iterate through the hashes afterwards, calling various ActiveRecord models and methods to store the data into our database.
The problem
The most important data that needs to get saved have method calls that are wrapped inside a Transaction block, so if any errors are raised, no data should get inserted into our core tables.
The table in question has 2 foreign keys that BOTH need to be present in order for the rails application to function as expected.
Like I said, ~95% of the time while processing our data, these values get inserted correctly.
The other ~5% of the time the one foreign key value does not get saved at all, and for no good reason.
I have saved the script flow and object/variable output into a log file and stepped through/scrutinised the log file for the instances where the foreign key was missing.
The objects/variables that I use as foreign keys were all reported as 'saved' by ActiveRecord in each instance where I inspected the objects after being 'saved'.
One thing that might be worth noting is that the value that gets saved into the foreign key column in question gets computed outside of the Transaction block - but I don't see why that would be a problem, seeing as I can output and use the value further down the line.
Simplified code flow
#Returns ActiveRecord object
fk2_source_object = method_to_compute_fk2(x, y)
#logger.info fk2_source_object.inspect
begin
Transaction do
begin
fk1_source_object = FK1Model.new
#etc, etc.
fk1_source_object.save!
#logger.info fk1_source_object.inspect
object_in_question = ObjectInQuestion.new
object_in_question.fk1_source_object_id = fk1_source_object.id
#FK2 >> This is the value that does not reflect in the database!? even if i could inspect it and see an id after calling .save!.
object_in_question.fk2_source_object_id = fk2_source_object.id
begin
object_in_question.save!
#At this point it shows that the object has saved, all values have been set, but it does not reflect in the database?
#logger.info object_in_question.inspect
rescue
#Logger.error "error message"
raise ActiveRecord::Rollback
end
rescue
#Logger.error "error message"
raise ActiveRecord::Rollback
end
end
rescue Exception => e
#logger.error e.inspect
end
#etc, etc.
I'm currently grasping at straws, going to wrap the entire section into the Transaction block.
I cannot recreate the error on my development box, quite frankly if i rerun the csv files on the server, the values get inserted correctly the second / third time. (So it's not a data source problem)
I'm starting to worry that it may be a rails / postgresql 8 problem?
Am I losing the plot or what can be the causes of this?
Related
When I executing query
Mymodel.all.each do |model|
# ..do something
end
It uses allot of memory and amount of used memory increases at all the time and at the and it crashes. I found out that to fix it I need to disable identity_map but when I adding to my mongoid.yml file identity_map_enabled: false I am getting error
Invalid configuration option: identity_map_enabled.
Summary:
A invalid configuration option was provided in your mongoid.yml, or a typo is potentially present. The valid configuration options are: :include_root_in_json, :include_type_for_serialization, :preload_models, :raise_not_found_error, :scope_overwrite_exception, :duplicate_fields_exception, :use_activesupport_time_zone, :use_utc.
Resolution:
Remove the invalid option or fix the typo. If you were expecting the option to be there, please consult the following page with repect to Mongoid's configuration:
I am using Rails 4 and Mongoid 4, Mymodel.all.count => 3202400
How can I fix it or maybe some one know other way to reduce amount of memory used during executing query .all.each ..?
Thank you very much for the help!!!!
I started with something just like you by doing loop through millions of record and the memory just keep increasing.
Original code:
#portal.listings.each do |listing|
listing.do_something
end
I've gone through many forum answers and I tried them out.
1st attempt: I try to use the combination of WeakRef and GC.start but no luck, I fail.
2nd attempt: Adding listing = nil to the first attempt, and still fail.
Success Attempt:
#start_date = 10.years.ago
#end_date = 1.day.ago
while #start_date < #end_date
#portal.listings.where(created_at: #start_date..#start_date.next_month).each do |listing|
listing.do_something
end
#start_date = #start_date.next_month
end
Conclusion
All the memory allocated for the record will never be released during
the query request. Therefore, trying with small number of record every
request does the job, and memory is in good condition since it will be
released after each request.
Your problem isn't the identity map, I don't think Mongoid4 even has an identity map built in, hence the configuration error when you try to turn it off. Your problem is that you're using all. When you do this:
Mymodel.all.each
Mongoid will attempt to instantiate every single document in the db.mymodels collection as a Mymodel instance before it starts iterating. You say that you have about 3.2 million documents in the collection, that means that Mongoid will try to create 3.2 million model instances before it tries to iterate. Presumably you don't have enough memory to handle that many objects.
Your Mymodel.all.count works fine because that just sends a simple count call into the database and returns a number, it won't instantiate any models at all.
The solution is to not use all (and preferably forget that it exists). Depending on what "do something" does, you could:
Page through all the models so that you're only working with a reasonable number of them at a time.
Push the logic into the database using mapReduce or the aggregation framework.
Whenever you're working with real data (i.e. something other than a trivially small database), you should push as much work as possible into the database because databases are built to manage and manipulate big piles of data.
I've inherited some code written in ruby 1.8.7 with Rails ~> 2.3.15 and it contains a test which looks something like this:
class Wibble < ActiveRecord::Base
# Wibbles have integer primary keys and string names
end
def test
create_two_wibbles
w1 = Wibble.first
sleep 2 # This sleep is necessary to
w2 = Wibble.first
w1.name = "Name for w1"
w2.name = "Name for w2"
w1.save
w3 = Wibble.first
assert(!w3.update_attributes(w2.attributes))
end
That comment next to the sleep line hasn't been cut off, it literally says This sleep is necessary to. Without that sleep, this test fails - removing it changes the behaviour somehow beyond making it run 2 seconds faster. I've been digging through the file's history in our version control system, and the messages were uninformative. I also cannot contact the original author of this test to figure out what they were trying to do.
From my understanding, we're pulling the same record out of the database twice, editing it in two different ways, saving the first, and asserting that we can't then save the second. I suspect this is a test to make sure our database locks the table correctly, but surely if this were to fail our Wibbles would be fine, and ActiveRecord would be at fault. Nobody in the office can figure out why this test may have been necessary, nor what difference the sleep statement might make. Any ideas?
This is likely caused by Active Record caching and memoization stopping the second find from actually going to the DB and get fresh data.
In fact, try printing the object_id of each wibble; they might be the exact same memoized object. If that is the case, then the test kinda makes sense.
You could also do the same test in a controller action and look at the verbose SQL logs from Rails; I'd expect the second find call to tell you it just used cached data and did not actually run any SQL query.
I have a simple rails app that scrapes JSON from a remote URL for each instance of a model (let's call it A). The app then creates a new data-point under an associated model of the 1st. Let's call this middle model B and the data point model C. There's also a front end that let's users browse this data graphically/visually.
Thus the hierarchy is A has many -> B which has many -> C. I scrape a URL for each A which returns a few instances of B with new Cs that have data for the respective B.
While attempting to test/scale this app I have encountered a problem where rails will stop processing, hang for a while, and finally throw a "ActiveRecord::ConnectionTimeoutError could not obtain a database connection within 5.000 seconds" Obviously the 5 is just the default.
I can't understand why this is happening when 1) there are no DB calls being made explicitly, 2) the log doesn't show any under the hood DB calls happening when it does work 3) it works sometimes and not others.
What's going on with rails 4 AR and the connection pool?!
A couple of notes:
The general algorithm is to spawn a thread for each model A, scrape the data, create in memory new instances of model C, save all the C's in one transaction at the end.
Sometimes this works, other times it doesn't, i can't figure out what causes it to fail. However, once it fails it seems to fail more and more.
I eager load all the model A's and B's to begin with.
I use a transaction at the end to insert all the newly created C instances.
I currently use resque and resque scheduler to do this work but I highly doubt they are the source of the problem as it persists even if I just do "rails runner Class.do_work"
Any suggestions and or thoughts greatly appreciated!
I believe I have found the cause of this problem. When you loop through an association via
model.association.each do |a|
#work here
end
Rails does some behind the scenes work that "uses" a DB connection. I put uses in quotes because in my case I think the result is actually returned from memory. I eager loaded the association and thus the DB is never actually hit.
Preliminary testing of wrapping my block in a
ActiveRecord::Base.connection_pool.with_connection do
#something me doing?
end
seems to have resolved the issue.
I uncovered this by adding a backtrace to my thread's error message that was printing out.
-----For those using resque----
I also had to add a bit in my resque.rake file to get this fully working as intended.
task 'resque:setup' => :environment do
Resque.after_fork do |job|
ActiveRecord::Base.establish_connection
end
end
If you are you using
ActiveRecord::Base.transaction do
... code
end
to accomplish faster transactions in a thread, note that this locks the database. I had an app that did this for a hugely expensive process, in a thread, and it would lock the DB for over 5 seconds. It is faster, though it will lock your database
So I've done a couple of days worth of research on the matter, and the general consensus is that there isn't one. So I was hoping for an answer more specific to my situation...
I'm using Rails to import a file into a database. Everything is working regarding the import, but I'm wanting to give the database itself an attribute, not just every entry. I'm creating a hash of the file, and I figured it'd be easiest to just assign it to the database (or the class).
I've created a class called Issue (and thus an 'issues' database) with each entry having a couple of attributes. I was wanting to figure out a way to add a class variable (at least, that's what I think is the best option) to Issue to simply store the hash. I've written a rake to import the file, iff the new file is different than the previous file imported (read, if the hash's are different).
desc "Parses a CSV file into the 'issues' database"
task :issues, [:file] => :environment do |t, args|
md5 = Digest::MD5.hexdigest(args[:file])
puts "1: Issue.md5 = #{Issue.md5}"
if md5 != Issue.md5
Issue.destroy_all()
#import new csv file
CSV.foreach(args[:file]) do |row|
issue = {
#various attributes to be columns...
}
Issue.create(issue)
end #end foreach loop
Issue.md5 = md5
puts "2: Issue.md5 = #{Issue.md5}"
end #end if statement
end #end task
And my model is as follows:
class Issue < ActiveRecord::Base
attr_accessible :md5
##md5 = 5
def self.md5
##md5
end
def self.md5=(newmd5)
##md5 = newmd5
end
attr_accessible #various database-entry attributes
end
I've tried various different ways to write my model, but it all comes down to this. Whatever I set the ##md5 in my model, becomes a permanent change, almost like a constant. If I change this value here, and refresh my database, the change is noted immediately. If I go into rails console and do:
Issue.md5 # => 5
Issue.md5 = 123 # => 123
Issue.md5 # => 123
But this change isn't committed to anything. As soon as I exit the console, things return to "5" again. It's almost like I need a .save method for my class.
Also, in the rake file, you see I have two print statements, printing out Issue.md5 before and after the parse. The first prints out "5" and the second prints out the new, correct hash. So Ruby is recognizing the fact that I'm changing this variable, it's just never saved anywhere.
Ruby 1.9.3, Rails 3.2.6, SQLite3 3.6.20.
tl;dr I need a way to create a class variable, and be able to access it, modify it, and re-store it.
Fixes please? Thanks!
There are a couple solutions here. Essentially, you need to persist that one variable: Postgres provides a key/value store in the database, which would be most ideal, but you're using SQLite so that isn't an option for you. Instead, you'll probably need to use either redis or memcached to persist this information into your database.
Either one allows you to persist values into a schema-less datastore and query them again later. Redis has the advantage of being saved to disk, so if the server craps out on you you can get the value of md5 again when it restarts. Data saved into memcached is never persisted, so if the memcached instance goes away, when it comes back md5 will be 5 once again.
Both redis and memcached enjoy a lot of support in the Ruby community. It will complicate your stack slightly installing one, but I think it's the best solution available to you. That said, if you just can't use either one, you could also write the value of md5 to a temporary file on your server and access it again later. The issue there is that the value then won't be shared among all your server processes.
In my model I have:
class Log < ActiveRecord::Base
serialize :data
...
def self.recover(table_name, row_id)
d = Log.where(table_name: table_name, row_id: row_id).where("log_type != #{symbol_to_constant(:delete)}").last
row = d.data
raise "Nothing to recover" if d.nil?
raise "No data to recover" if d.data.nil?
c = const_get(table_name)
ret = c.create(row.attributes)
end
And in my controller I calling it as:
def index
Log.recover params[:t], params[:r]
redirect_to request.referer
end
The problem is, if I access this page for the first time, I am getting error specified below, but after refresh, is everything OK. Where can be problem?
undefined method `attributes' for #<String:0x00000004326fc8>
In data column are saved instances of models. For the first time column isn't properly unserialized, it's just yaml text. But after refresh everything is fine. That's confusing, what is wrong? Bug in rails?
It's not every time, sometimes in first access everything is okey.
Deyamlizing an object of class Foo will do funny things if there is no class Foo. This can quite easily happen in development becauses classes are only loaded when needed and unloaded when rails thinks they might have changed.
Depending on whether the class is loaded or not the YAML load will have different results (YAML doesn't know about rail's automatic loading stuff)
One solution worth considering is to store the attributes hash rather than the activerecord object. You'll probably avoid problems in the long run and it will be more space efficient in the long wrong - there's a bunch of state in an activerecord object that you probably don't care about in this case.
If that's not an option, your best bet is probably to make sure that the classes that the serialized column might contain are loaded - still a few calls to require_dependency 'foo' at the top of the file.