Rails unique validation didn't worked and background jobs - ruby-on-rails

I have an application with a model named appointment. On this model, there is a column with the name event_uid and a validation like the following:
validates :event_uid, uniqueness: true, allow_nil: true
The unique validation is only on rails application and not in the database (postgresql).
I am using background job with sidekiq on heroku to sync some remote calendars. I am not sure what happened, but it seems like I got multiple records with duplicate event_uid values. They have been created in the exact same second.
My guess is that something happened on the workers and for some reason they got invoked at the same time or the queue frozen and when it got back it ran the same job twice. I don't understand why rails let the above to pass (maybe because workers run on different threads plays a role?). I added the following migration:
add_index :appointments, [:event_uid], unique: true
With the hope that it won't happen again. Ok so now the questions:
What do you think, will this be enough?
Is it dangerous to allow unique / presence validations to exist only on application level if you are using create / update with background jobs?
Any guess what could have caused the workers to run the same job more than one and exactly the same second?

The Rails uniqueness validation has been reason for confusion a long time.
When you persist a user instance, Rails will validate your model by running a SELECT query to see if any user records already exist with the provided email.
Assuming the record proves to be valid, Rails will run the INSERT statement to persist the user.
https://thoughtbot.com/blog/the-perils-of-uniqueness-validations
This means, if you have several workers / threads selecting at the same time they will all return false and insert the record.
Most of the time it is desirable to have an index on database level to avoid these race conditions too. However, you need to now also handle any ActiveRecord::RecordNotUnique exception.
What do you think, will this be enough?
Yes, adding an index is a good idea but now you need to also handle ActiveRecord::RecordNotUnique.
Is it dangerous to allow unique / presence validations to exist only on application level if you are using create / update with background jobs?
This depends on the application but most of the time you want to have an index on db level too.
Any guess what could have caused the workers to run the same job more than one and exactly the same second?
Most background job libraries only guarantee that at least one job gets enqueued but not exactly one. Your jobs should always be idempotent (can run several times). A good read is this guide about ActiveJob design, especially the part about idempotency.

Usually, validations take place in rails during callbacks only (sometimes before_commit the record on the DB), and yes if you added a unique index this will not happen again because the DB will take charge this time so even if you run into the same flow/issue again the result is likely an error saying that you can't duplicate that index value.
Given the nature of the validator (usually are called during callbacks and there are not thread-safe) meaning that they can run into race conditions, how common this can happens depends on your application, you should add always the validation on the DB as well.
Related to your workers I ran into the same issue due to the retry flow of Sidekiq a few months ago, the solution was to validate on the DB side as well and make a fix to run the workers/jobs after_commit callback (not sure if you are using Sidekiq, but you can always use the after_commit callback, I was using my job after certain operation took place over a particular object).
Hope the above helps! 👍

Related

Rails scan ActiveRecord for attribute change

I am creating a matchmaking system in my rails application where users can post matches for a certain time. If the time the match is posted for is reached without being accepted I need to void that match by setting its active flag to false.
What would be the best approach for this? I have read bad things about cluttering up models with too many callbacks, and am not sure if ActiveModel Dirty is the best solution here. Does anybody have a suggestion that goes along with best practices and does not require a ton of DB queries?
Here's two ideas that should get you started. For all approaches I suggest encapsulating the business logic for creating a new match in a service object, to avoid the cluttering of the model you already mentioned.
immediately after persisting, store the match' id in Redis using a pre-defined TTL. For example using a key in the form matches:open:1234. Whenever you need a list of open matches, just query the keys under matches:open:* and you can use those to do a targeted query on your ActiveRecord DB. Expiration will be completely out of your hair. You can also easily "touch" a record if you want to keep the match open for a bit longer. Just pay attention to your database transactions and the possibility of rollbacks: you don't want to write invalid database ids into Redis :)
when creating the match, also enqueue a job (either via ActiveJob or a more specific framework like Sidekiq) that retrieves the record and checks whether it has been accepted in the meantime. You can optimize this by using something like where(id: match_id, accepted_at: nil).first and if nil is being returned, you can assume that the match has been accepted in the meantime without having to instantiate the record. If you need functionality to keep a match open for longer than the initial delay, you'll need to search for and cancel the already-enqueued job and enqueue a new one.
As for periodically querying all pending matches in a recurring job, it makes protection against race conditions a bit harder with regards to row level locks and also is much harder to scale, so I personally would advise against it.

Prevent duplicate ActiveJob being scheduled

I have a Rails app that queries an external webservice to update records.
I want to continue polling the web service until the user session expires.
Currently I create an ActiveJob in the show action for the record I want updated.
In the ActiveJob I reschedule it using
self.class.set(wait: 60.seconds).perform_later(record_id)
The problem is that if the user goes to the show action again, it will create another ActiveJob.
Is there anyway to prevent duplicate jobs from being created?
Ideally, there would be a way to search the ActiveJobs to see if one already exists. Something like ActiveJob.find_by(job: JobType, params: record_id). That would allow you to manipulate the jobs before creating a duplicate. I'm going to dig in further and see if that could be created...
The activejob-uniqueness gem is designed to de-dupe jobs. The gem is still relatively new, but it's sponsored by an engineering company (Veeqo) and seems well supported. See also: https://devs.veeqo.com/job-uniqueness-for-activejob/
First set a cookie value when the user visits the show for the first time.
self.class.set(wait: 60.seconds).perform_later(record_id) if cookies[:_aj].nil?
cookies[:_aj] = true
Also maybe create a column in your Record, maybe call it pending_update and set it true whenever you schedule a job to run and set it to false at the end of the scheduled job. That way, even if the user clears the cookies, your program will not create duplicate jobs.

sunspot_rails not re-indexing model after save

I have a model which deploys a delayed job that updates some of its attributes. The model is declared "searchable"...
searchable do
text :content, :stored => true
end
... which I thought would re-index after a save. On testing, this doesn't seem to be the case. If I run: rake sunspot:reindex, then everything works as expected. What could be causing this issue?
As mentioned by Jason, you can call Sunspot.commit_if_dirty to issue a commit from your client.
From the server configuration side, another approach would be to set the autoCommit property in your solrconfig.xml to automatically issue commits when there have been changes made to your index. A maxTime of 60000 ms (one minute) should suffice for most sites.
Using autoCommit is probably the wiser choice in production applications, where a high volume of commits can easily impact your Solr server's performance. In fact, it's a good practice with Sunspot to disable its auto_commit_after_request option when your site starts getting a decent amount of updates.
Lastly, autoCommit has the advantage of being able to set it and forget it.
At Websolr, our default is to ignore client-issued commits in favor of autoCommit.
The index will only reflect changes after Sunspot.commit is called. This happens automatically when you run rake sunspot:reindex.
Sunspot's Rails plugin also has a auto_commit_after_request config option which will call Sunspot.commit_if_dirty after every request but this will not be triggered by your background processes.
Your best bet is to call Sunspot.commit_if_dirty after as the last thing in your delayed job.
I had the exact same problem as you - when I was testing my search functionality sunspot would never issue a commit to solr. If I manually call Sunspot.commit everything works. I fiddled around with auto_commit_after_request, but this is true by default so it shouldn't make a different.
So after some more investigation I found that Sunspot won't issue a commit automatically unless the change is made in the context of a web request. If you're doing a change from a test or a background job you have to call Sunspot.commit manually.

How to have users create scheduled tasks in rails app deployed on Heroku

I have a rails app deployed on Heroku. I want to add a feature that enables users of the app to set a reminder. I need some way for the app to schedule sending an email at the time specified by the user.
I have found numerous posts referring to using delayed_job for this, but none of the write-ups / tutorials / etc. that I have found directly address what I am trying to accomplish (the descriptions I have found seem more geared towards managing long-running jobs that are to be run "whenever").
Am I on the right track looking at delayed_job for this? If so, can somebody point me towards a tutorial that might help me?
If delayed_job is not quite right for the job, does anybody have a suggestion for how I might approach this?
The most typical way of handling this is to use a cron job. You schedule a job to run every 15 minutes or so and deliver any reminders that come up in that time. Unfortunately, heroku only allows cron jobs to run every hour, which usually isn't often enough.
In this case, I'd use delayedjob and trick it into setting up a recurring task that delivers the notifications as often as necessary. For example, you could create a function that begins by rescheduling itself to run in 10 minutes and then goes on to send any reminders that popped up in the previous 10 minutes.
To view delayedjobs send_at syntax to schedule future jobs check here: https://github.com/tobi/delayed_job/wiki
ADDED after comments:
To send the reminder, you would need to create a function that searches for pending reminders and sends them. For example, let's say you have a model called Reminder (rails 3 syntax cause I like it better):
def self.find_and_send_reminders
reminders = Reminder.where("send_at < ? AND sent = ?", Time.now, false).all
reminders.each do |r|
#the following delayed_job syntax is apparently new, and I haven't tried it. came from the collective_idea fork of delayed_job on github
Notifier.delay.deliver_reminder_email(r)
#I'm not checking to make sure that anything actually sent successfully here, just assuming they did. may want to address this better in your real app
r.update_attributes!(:sent => true)
end
#again using the new syntax, untested. heroku may require the old "send_at" and "send_later" syntax
Reminder.delay(:run_at => 15.minutes.from_now).find_and_send_reminders
end
This syntax assumes you decided to use the single reminder entry for every occurence method. If you decide to use a single entry for all recurring reminders, you could create a field like "last_sent" instead of a boolean for "sent" and use that. Keep in mind these are all just ideas, I haven't actually taken the time to implement anything like this yet so I probably haven't considered all the options/problems.
Check out the runt gem, may be useful for you: http://runt.rubyforge.org/
You can use delayed_job's run_at to schedule at a specific time instead of whenever.
If your application allows the users to change the time of the reminders you need to remember the delayed_job to be able to update it or delete it when required.
Here is more details.
It's good to avoid polling if you can. The worker thread will poll at the database level, you don't want to add polling on top of polling.

Monitor database table for external changes from within Rails application

I'm integrating some non-rails-model tables in my Rails application. Everything works out very nicely, the way I set up the model is:
class Change < ActiveRecord::Base
establish_connection(ActiveRecord::Base.configurations["otherdb_#{RAILS_ENV}"])
set_table_name "change"
end
This way I can use the Change model for all existing records with find etc.
Now I'd like to run some sort of notification, when a record is added to the table. Since the model never gets created via Change.new and Change.save using ActiveRecord::Observer is not an option.
Is there any way I can get some of my Rails code to be executed, whenever a new record is added? I looked at delayed_job but can't quite get my head around, how to set that up. I imagine it evolves around a cron-job, that selects all rows that where created since the job last ran and then calls the respective Rails code for each row.
Update Currently looking at Javan's Whenever, looks like it can solve the 'run rails code from cron part'.
Yeah, you'll either want some sort of background task processor (Delayed::Job is one of the popular ones, or you can fake your own with the Daemon library or similar) or to setup a cronjob that runs on some sort of schedule. If you want to check frequently (every minute, say) I'd recommend the Delayed::Job route, if it's longer (every hour or so) a cron job will do it just fine.
Going the DJ route, you'd need to create a job that would check for new records, process them if there are any, then requeue the job, as each job is marked "completed" when it's finished.
-jon
This is what I finally did: Use Whenever, because it integrates nicely with Capistrano and showed me how to run Rails code from within cron. My missing peace was basically
script/runner -e production 'ChangeObserver.recentchanges'
which is now run every 5 minutes. The recentchanges reads the last looked-at ID from a tmp-file, pulls all new Change records which have a higher ID than that and runs the normal observer code for each record (and saves the highest looked-at ID to the tmp-file, of course).
As usual with monitoring state changes, there are two approaches : polling and notification. You seem to have chose to go the polling way for now (having a cron job look at the state of the database on a regular basis and execute some code if that changed)
You can do the same thing using one of the rails schedulers, there are a few out there (google will find them readily, they have various feature sets, I'll let you choose the one which suits your need if you got that way)
You could also try to go the notification way depending on your database. Some database support both triggers and external process execution or specific notification protocols.
In this case you are notified by the database itself that the table changed. there are many such options for various DBMS in Getting events from a database

Resources