I am not sure how Ruby uses ActiveRecord to save data directly in the model code. How can I save data into the DB in the model itself?
Basically my code run into a race condition for the following reason (Ruby/rails + ActiveRecord + Sidekiq):
- My model does something like the following:
def is_present?(data)
memory['properties'].include?(data)
def update_mem(data, size)
if size != 0 && memory['properties'].length == amount
memory['properties'].shift
memory['properties'].push(data)
It checks if a specific value is present (is_present), if it is not present in the memory['properties'] yet, it will add it.
Clearly if there is only one thread accessing "memory", it works just fine but since Sidekiq is pretty fast there might be multiple threads running and will end up in a race condition (one thread write something, the other thread read what was in memory before).
"memory" is actually a column in a table (MySQL) and as soon as I write something in it with "memory['properties'].push(123)" I would like to save them immediately.
My question is, how can I prevent this weird race condition?
What I would like to do is to save the data directly in the DB using the model. The problem is that it seems is not working.
So, to access the data in the model, I use the following code:
memory['property'].push(123)
or
self.memory['property'].push(123)
They both reference the memory column in the DB table.
But then I want to do something like self.save! but it is not working at all.
I tried to add the following code in the model iteself:
self.memory_will_change!
self.memory['properties'].push(property)
self.save!
Unfortunately, it is not working and I cannot save the data into the DB.
This model is actually called via perform() using Sidekiq as per below:
model = Receivers.find(id)
model.receive(data)
model.time = Time.now
model.save!
So the time is updated correctly but the memory (which is updated in the model when i call "receive") it does not get updated. Anyone knows how to overcome this problem? I need to save the data into the DB directly in the model.
Thanks and I look forward to hearing from you.
One solution would be to use a lock so that only one process can access the table at a time. Once you are done processing, you would release the lock.
Check out the following link:
https://api.rubyonrails.org/v5.1/classes/ActiveRecord/Locking/Pessimistic.html
Related
I'm having a weird issue for which I can't find a logical explanation.
I'm investigating a bug and put some logging in place (through Rollbar) so I can see the evolution some instances of one of my models.
Here is what I got:
class Connexion < ActiveRecord::Base
before_validation :save_info_in_rollbar
after_save :save_info_in_rollbar
def save_info_in_rollbar
Rollbar.log("debug", "Connexion save", :connexion_id => self.id, :connexion_details => self.attributes)
end
end
Now I am getting loads of data in rollbar (pretty much 2 rows for every time a connexion is created/updated). But the weird thing is the following: for some connexions (=> exactly the ones with faulty data which I am investigating), I am getting no data at all!
I don't get how it's possible for a connexion to be created and persisted to the DB, and not have any trace of the before_validation logging.
It looks like the callback is not called, but unless I'm mistaken, it's supposed to be the first one in the callback order => what could prevent it from being called?
EDIT >> Copy and paste from a reply, might be relevant:
There are 3 cases in which a connexion is created or updated, and thoses cases are :
.connexions.create()
connexion.attr = "value"; connexion.save!
connexion.update_attributes(attr: "value")
The only cases in which the callback won’t be run are:
Explicitly skipping validations (e.g. with save(validate: false))
Using an update method that skips Ruby-land (either partially or entirely, see each method’s linked docs) and just runs the SQL directly (e.g. update_columns, update_attribute, update_all).
But: I might be missing a case. Also, I’m assuming there isn’t a bug in ActiveRecord/ActiveModel causing this.
Sorry about the stupid question guys, the explanation was that we were having 2 apps working on the same database, and the modification was made by the other app (which of course was not sending the Rollbar updates).
Sometimes the toughest issues have the most simple answers haha
Firstly, you don't need self in instance methods, as the scope of the method is instance.
Secondly, you need to check, how are you saving the data to the database. You can skip callbacks in Rails: Rails 3 skip validations and callbacks
Thirdly, double check the data.
When I executing query
Mymodel.all.each do |model|
# ..do something
end
It uses allot of memory and amount of used memory increases at all the time and at the and it crashes. I found out that to fix it I need to disable identity_map but when I adding to my mongoid.yml file identity_map_enabled: false I am getting error
Invalid configuration option: identity_map_enabled.
Summary:
A invalid configuration option was provided in your mongoid.yml, or a typo is potentially present. The valid configuration options are: :include_root_in_json, :include_type_for_serialization, :preload_models, :raise_not_found_error, :scope_overwrite_exception, :duplicate_fields_exception, :use_activesupport_time_zone, :use_utc.
Resolution:
Remove the invalid option or fix the typo. If you were expecting the option to be there, please consult the following page with repect to Mongoid's configuration:
I am using Rails 4 and Mongoid 4, Mymodel.all.count => 3202400
How can I fix it or maybe some one know other way to reduce amount of memory used during executing query .all.each ..?
Thank you very much for the help!!!!
I started with something just like you by doing loop through millions of record and the memory just keep increasing.
Original code:
#portal.listings.each do |listing|
listing.do_something
end
I've gone through many forum answers and I tried them out.
1st attempt: I try to use the combination of WeakRef and GC.start but no luck, I fail.
2nd attempt: Adding listing = nil to the first attempt, and still fail.
Success Attempt:
#start_date = 10.years.ago
#end_date = 1.day.ago
while #start_date < #end_date
#portal.listings.where(created_at: #start_date..#start_date.next_month).each do |listing|
listing.do_something
end
#start_date = #start_date.next_month
end
Conclusion
All the memory allocated for the record will never be released during
the query request. Therefore, trying with small number of record every
request does the job, and memory is in good condition since it will be
released after each request.
Your problem isn't the identity map, I don't think Mongoid4 even has an identity map built in, hence the configuration error when you try to turn it off. Your problem is that you're using all. When you do this:
Mymodel.all.each
Mongoid will attempt to instantiate every single document in the db.mymodels collection as a Mymodel instance before it starts iterating. You say that you have about 3.2 million documents in the collection, that means that Mongoid will try to create 3.2 million model instances before it tries to iterate. Presumably you don't have enough memory to handle that many objects.
Your Mymodel.all.count works fine because that just sends a simple count call into the database and returns a number, it won't instantiate any models at all.
The solution is to not use all (and preferably forget that it exists). Depending on what "do something" does, you could:
Page through all the models so that you're only working with a reasonable number of them at a time.
Push the logic into the database using mapReduce or the aggregation framework.
Whenever you're working with real data (i.e. something other than a trivially small database), you should push as much work as possible into the database because databases are built to manage and manipulate big piles of data.
I have a simple rails app that scrapes JSON from a remote URL for each instance of a model (let's call it A). The app then creates a new data-point under an associated model of the 1st. Let's call this middle model B and the data point model C. There's also a front end that let's users browse this data graphically/visually.
Thus the hierarchy is A has many -> B which has many -> C. I scrape a URL for each A which returns a few instances of B with new Cs that have data for the respective B.
While attempting to test/scale this app I have encountered a problem where rails will stop processing, hang for a while, and finally throw a "ActiveRecord::ConnectionTimeoutError could not obtain a database connection within 5.000 seconds" Obviously the 5 is just the default.
I can't understand why this is happening when 1) there are no DB calls being made explicitly, 2) the log doesn't show any under the hood DB calls happening when it does work 3) it works sometimes and not others.
What's going on with rails 4 AR and the connection pool?!
A couple of notes:
The general algorithm is to spawn a thread for each model A, scrape the data, create in memory new instances of model C, save all the C's in one transaction at the end.
Sometimes this works, other times it doesn't, i can't figure out what causes it to fail. However, once it fails it seems to fail more and more.
I eager load all the model A's and B's to begin with.
I use a transaction at the end to insert all the newly created C instances.
I currently use resque and resque scheduler to do this work but I highly doubt they are the source of the problem as it persists even if I just do "rails runner Class.do_work"
Any suggestions and or thoughts greatly appreciated!
I believe I have found the cause of this problem. When you loop through an association via
model.association.each do |a|
#work here
end
Rails does some behind the scenes work that "uses" a DB connection. I put uses in quotes because in my case I think the result is actually returned from memory. I eager loaded the association and thus the DB is never actually hit.
Preliminary testing of wrapping my block in a
ActiveRecord::Base.connection_pool.with_connection do
#something me doing?
end
seems to have resolved the issue.
I uncovered this by adding a backtrace to my thread's error message that was printing out.
-----For those using resque----
I also had to add a bit in my resque.rake file to get this fully working as intended.
task 'resque:setup' => :environment do
Resque.after_fork do |job|
ActiveRecord::Base.establish_connection
end
end
If you are you using
ActiveRecord::Base.transaction do
... code
end
to accomplish faster transactions in a thread, note that this locks the database. I had an app that did this for a hugely expensive process, in a thread, and it would lock the DB for over 5 seconds. It is faster, though it will lock your database
I have one application that is a task manager.
Each user can select a new task to be assigned to himself.
Is there a problem of concurrency if 2 users accept the same task at the same moment?
My code looks like this:
if #user.task == nil
#task.user = #user
#task.save
end
if 2 diferent users, on 2 diferent machines open this url at the same time. Will i have a problem?
You can use optimistic locking to prevent other "stale" records from being saved to the database. To enable it, your model needs to have a lock_version column with a default value of 0.
When the record is fetched from the database, the current lock_version comes along with it. When the record is modified and saved to the database, the database row is updated conditionally, by constraining the UPDATE on the lock_version that was present when the record was fetched. If it hasn't changed, the UPDATE will increment the lock_version. If it has changed, the update will do nothing, and an exception (ActiveRecord::StaleObjectError) will be raised. This is the default behavior for ActiveRecord unless turned off as follows:
ActiveRecord::Base.lock_optimistically = false
You can (optionally) use a column-name other than lock_version. To use a custom name, add a line like the following to your model-class:
set_locking_column :some_column_name
An alternative to optimistic locking is pessimistic locking, which relies on table- or row-level locks at the database level. This mechanism will block out all access to a locked row, and thus may negatively affect your performance.
Never tried it but you may use http://api.rubyonrails.org/classes/ActiveRecord/Locking/Pessimistic.html
You should be able to acquire a lock on your specific task, something like that:
#task = Task.find(some_id)
#task.with_lock do
#Then let's check if there's still no one assigned to this task
if #task.user.nil? && #user.task.nil?
#task.user = #user
#task.save
end
end
Again, I never used this so I'd test it with a big sleep inside the lock to make sure it actually locks everything the way you want it
Also I'm not sure about the reload here. Since the row is locked, it may fail. But you have to make sure your object is fresh from the db after acquiring the lock, there may be another way to do it.
EDit : NO need to reload, I checked the source code and with_lock does it for you.
https://github.com/rails/rails/blob/4c5b73fef8a41bd2bd8435fa4b00f7c40b721650/activerecord/lib/active_record/locking/pessimistic.rb#L61
In my application there can be only one current Event which defaults to the nearest date event. I need to retrieve this event in various places and since it doesn't change it makes sense to cache it. There are two ways of doing it known to me:
class Event < ActiveRecord::Base
CURRENT_EVENT = Event.where('starts_on >= ?', Time.now).
order('starts_on ASC').limit(1).first
# OR
def self.current_event
##current_event ||= Event.where('starts_on >= ?', Time.now).
order('starts_on ASC').limit(1).first
end
end
Which one would be the best? Or any other alternatives? I know that using ## class variables is not recommended since they are not thread safe.
I guess you aren't right about your approach: this way your app will keep your cached value forever. New events won't affect it which is completely wrong. It may be the situation when some event already passed but it is still cached as "current".
By the way: limit(1).first does the same as the only first.
Neither first nor second approach are correct. If you define constant - it will find Event, actual on Rails initialization process time. Second approach will not cache your record.
As for me, this is not so fat data to cache.