sunspot_rails not re-indexing model after save - ruby-on-rails

I have a model which deploys a delayed job that updates some of its attributes. The model is declared "searchable"...
searchable do
text :content, :stored => true
end
... which I thought would re-index after a save. On testing, this doesn't seem to be the case. If I run: rake sunspot:reindex, then everything works as expected. What could be causing this issue?

As mentioned by Jason, you can call Sunspot.commit_if_dirty to issue a commit from your client.
From the server configuration side, another approach would be to set the autoCommit property in your solrconfig.xml to automatically issue commits when there have been changes made to your index. A maxTime of 60000 ms (one minute) should suffice for most sites.
Using autoCommit is probably the wiser choice in production applications, where a high volume of commits can easily impact your Solr server's performance. In fact, it's a good practice with Sunspot to disable its auto_commit_after_request option when your site starts getting a decent amount of updates.
Lastly, autoCommit has the advantage of being able to set it and forget it.
At Websolr, our default is to ignore client-issued commits in favor of autoCommit.

The index will only reflect changes after Sunspot.commit is called. This happens automatically when you run rake sunspot:reindex.
Sunspot's Rails plugin also has a auto_commit_after_request config option which will call Sunspot.commit_if_dirty after every request but this will not be triggered by your background processes.
Your best bet is to call Sunspot.commit_if_dirty after as the last thing in your delayed job.

I had the exact same problem as you - when I was testing my search functionality sunspot would never issue a commit to solr. If I manually call Sunspot.commit everything works. I fiddled around with auto_commit_after_request, but this is true by default so it shouldn't make a different.
So after some more investigation I found that Sunspot won't issue a commit automatically unless the change is made in the context of a web request. If you're doing a change from a test or a background job you have to call Sunspot.commit manually.

Related

Rails unique validation didn't worked and background jobs

I have an application with a model named appointment. On this model, there is a column with the name event_uid and a validation like the following:
validates :event_uid, uniqueness: true, allow_nil: true
The unique validation is only on rails application and not in the database (postgresql).
I am using background job with sidekiq on heroku to sync some remote calendars. I am not sure what happened, but it seems like I got multiple records with duplicate event_uid values. They have been created in the exact same second.
My guess is that something happened on the workers and for some reason they got invoked at the same time or the queue frozen and when it got back it ran the same job twice. I don't understand why rails let the above to pass (maybe because workers run on different threads plays a role?). I added the following migration:
add_index :appointments, [:event_uid], unique: true
With the hope that it won't happen again. Ok so now the questions:
What do you think, will this be enough?
Is it dangerous to allow unique / presence validations to exist only on application level if you are using create / update with background jobs?
Any guess what could have caused the workers to run the same job more than one and exactly the same second?
The Rails uniqueness validation has been reason for confusion a long time.
When you persist a user instance, Rails will validate your model by running a SELECT query to see if any user records already exist with the provided email.
Assuming the record proves to be valid, Rails will run the INSERT statement to persist the user.
https://thoughtbot.com/blog/the-perils-of-uniqueness-validations
This means, if you have several workers / threads selecting at the same time they will all return false and insert the record.
Most of the time it is desirable to have an index on database level to avoid these race conditions too. However, you need to now also handle any ActiveRecord::RecordNotUnique exception.
What do you think, will this be enough?
Yes, adding an index is a good idea but now you need to also handle ActiveRecord::RecordNotUnique.
Is it dangerous to allow unique / presence validations to exist only on application level if you are using create / update with background jobs?
This depends on the application but most of the time you want to have an index on db level too.
Any guess what could have caused the workers to run the same job more than one and exactly the same second?
Most background job libraries only guarantee that at least one job gets enqueued but not exactly one. Your jobs should always be idempotent (can run several times). A good read is this guide about ActiveJob design, especially the part about idempotency.
Usually, validations take place in rails during callbacks only (sometimes before_commit the record on the DB), and yes if you added a unique index this will not happen again because the DB will take charge this time so even if you run into the same flow/issue again the result is likely an error saying that you can't duplicate that index value.
Given the nature of the validator (usually are called during callbacks and there are not thread-safe) meaning that they can run into race conditions, how common this can happens depends on your application, you should add always the validation on the DB as well.
Related to your workers I ran into the same issue due to the retry flow of Sidekiq a few months ago, the solution was to validate on the DB side as well and make a fix to run the workers/jobs after_commit callback (not sure if you are using Sidekiq, but you can always use the after_commit callback, I was using my job after certain operation took place over a particular object).
Hope the above helps! 👍

Prevent duplicate ActiveJob being scheduled

I have a Rails app that queries an external webservice to update records.
I want to continue polling the web service until the user session expires.
Currently I create an ActiveJob in the show action for the record I want updated.
In the ActiveJob I reschedule it using
self.class.set(wait: 60.seconds).perform_later(record_id)
The problem is that if the user goes to the show action again, it will create another ActiveJob.
Is there anyway to prevent duplicate jobs from being created?
Ideally, there would be a way to search the ActiveJobs to see if one already exists. Something like ActiveJob.find_by(job: JobType, params: record_id). That would allow you to manipulate the jobs before creating a duplicate. I'm going to dig in further and see if that could be created...
The activejob-uniqueness gem is designed to de-dupe jobs. The gem is still relatively new, but it's sponsored by an engineering company (Veeqo) and seems well supported. See also: https://devs.veeqo.com/job-uniqueness-for-activejob/
First set a cookie value when the user visits the show for the first time.
self.class.set(wait: 60.seconds).perform_later(record_id) if cookies[:_aj].nil?
cookies[:_aj] = true
Also maybe create a column in your Record, maybe call it pending_update and set it true whenever you schedule a job to run and set it to false at the end of the scheduled job. That way, even if the user clears the cookies, your program will not create duplicate jobs.

Ruby on Rails times out. How do I fork a process?

I have a page of a long list of items. Each has a check box next to it. There's a jQuery check-all function, but when I submit all of them at once, the request times out because it's doing a bunch of queries and inserting a bunch of records in the MySQL database for each item. If it were to not timeout, it'd probably take about 20 minutes. Instead, I just submit like 30 at a time.
I want to be able to just check all and submit and then just go on doing other work. My coworker (1) said I should just write a rake task. I did that, but I ended up duplicating code, and I prefer the user interface because what if I want to un-check a few? The rake task just submits them all.
Another coworker (2) recommended I use fork. He said that would spawn a new process that would run on the server but allow the server to respond before it's done. Then, since an item disappears after it's been submitted, I could just refresh the page to check if they're done.
I tried this on my local, however, it still seems that Rails is waiting for the process to finish before it responds to the POST request sent by the HTML form. The code I used looks like this:
def bulk_apply
pid = fork do
params[:ids].each do |id|
Item.find(id).apply # takes a LONG time, esp. x 100
end
end
Process.detach(pid) # reap child process automatically; don't leave running
flash[:notice] = "Applying... Please wait... Then, refresh page. Only submit once. PID: #{pid}"
redirect_to :back
end
Coworker 1 said that generally you don't want to fork Rails because fork creates a child process that is basically a copy of the Rails process. He said if you want to do it through the web GUI, use BackgroundJob (Bj) (because we're already using that in our Rails app). So, I'm looking into BackgroundJob now, but what do you recommend?
I've had good success using background job. If you need rails you will be using script/runner which still starts up a new process with rails. The good thing is that Backround Job will make sure that there is never more than one running at a time.
You can also use script runner directly, or even run a rake task in the background like so:
system " RAILS_ENV=#{RAILS_ENV} ruby #{RAILS_ROOT}/script/runner 'CompositeGrid.calculate_values(#{self.id})' & " unless RAILS_ENV == "test"
The ampersand tells it to start a new process. Be careful because you probably don't want a bunch of these running at the same time. I would definitely take advantage of background job if it is already available.
you should check out IronWorker . It would be super easy to do what you want and it doesn't matter how long it takes.
In your action you'd just instantiate a worker which has the code that's doing all your database queries. Example worker:
Item.find(id).apply # takes a LONG time, esp. x 100
And here's how you'd queue up those jobs to run in parallel:
client = IronWorkerNG::Client.new
ids.each do |id|
client.tasks.create("MyWorker", "id"=>id)
end
That's all you'd need to do and IronWorker takes care of the rest.
Try delayed_job gem. This is a database-based background job gem. We used it in an e-commerce website. For example, sending order confirmation email to the user is an ideal candidate for delayed job.
Additionally you can try multi-threading, which is supported by Ruby. This could make things run faster. Forking an entire process tends to be expensive due to memory usage.

How to have users create scheduled tasks in rails app deployed on Heroku

I have a rails app deployed on Heroku. I want to add a feature that enables users of the app to set a reminder. I need some way for the app to schedule sending an email at the time specified by the user.
I have found numerous posts referring to using delayed_job for this, but none of the write-ups / tutorials / etc. that I have found directly address what I am trying to accomplish (the descriptions I have found seem more geared towards managing long-running jobs that are to be run "whenever").
Am I on the right track looking at delayed_job for this? If so, can somebody point me towards a tutorial that might help me?
If delayed_job is not quite right for the job, does anybody have a suggestion for how I might approach this?
The most typical way of handling this is to use a cron job. You schedule a job to run every 15 minutes or so and deliver any reminders that come up in that time. Unfortunately, heroku only allows cron jobs to run every hour, which usually isn't often enough.
In this case, I'd use delayedjob and trick it into setting up a recurring task that delivers the notifications as often as necessary. For example, you could create a function that begins by rescheduling itself to run in 10 minutes and then goes on to send any reminders that popped up in the previous 10 minutes.
To view delayedjobs send_at syntax to schedule future jobs check here: https://github.com/tobi/delayed_job/wiki
ADDED after comments:
To send the reminder, you would need to create a function that searches for pending reminders and sends them. For example, let's say you have a model called Reminder (rails 3 syntax cause I like it better):
def self.find_and_send_reminders
reminders = Reminder.where("send_at < ? AND sent = ?", Time.now, false).all
reminders.each do |r|
#the following delayed_job syntax is apparently new, and I haven't tried it. came from the collective_idea fork of delayed_job on github
Notifier.delay.deliver_reminder_email(r)
#I'm not checking to make sure that anything actually sent successfully here, just assuming they did. may want to address this better in your real app
r.update_attributes!(:sent => true)
end
#again using the new syntax, untested. heroku may require the old "send_at" and "send_later" syntax
Reminder.delay(:run_at => 15.minutes.from_now).find_and_send_reminders
end
This syntax assumes you decided to use the single reminder entry for every occurence method. If you decide to use a single entry for all recurring reminders, you could create a field like "last_sent" instead of a boolean for "sent" and use that. Keep in mind these are all just ideas, I haven't actually taken the time to implement anything like this yet so I probably haven't considered all the options/problems.
Check out the runt gem, may be useful for you: http://runt.rubyforge.org/
You can use delayed_job's run_at to schedule at a specific time instead of whenever.
If your application allows the users to change the time of the reminders you need to remember the delayed_job to be able to update it or delete it when required.
Here is more details.
It's good to avoid polling if you can. The worker thread will poll at the database level, you don't want to add polling on top of polling.

Monitor database table for external changes from within Rails application

I'm integrating some non-rails-model tables in my Rails application. Everything works out very nicely, the way I set up the model is:
class Change < ActiveRecord::Base
establish_connection(ActiveRecord::Base.configurations["otherdb_#{RAILS_ENV}"])
set_table_name "change"
end
This way I can use the Change model for all existing records with find etc.
Now I'd like to run some sort of notification, when a record is added to the table. Since the model never gets created via Change.new and Change.save using ActiveRecord::Observer is not an option.
Is there any way I can get some of my Rails code to be executed, whenever a new record is added? I looked at delayed_job but can't quite get my head around, how to set that up. I imagine it evolves around a cron-job, that selects all rows that where created since the job last ran and then calls the respective Rails code for each row.
Update Currently looking at Javan's Whenever, looks like it can solve the 'run rails code from cron part'.
Yeah, you'll either want some sort of background task processor (Delayed::Job is one of the popular ones, or you can fake your own with the Daemon library or similar) or to setup a cronjob that runs on some sort of schedule. If you want to check frequently (every minute, say) I'd recommend the Delayed::Job route, if it's longer (every hour or so) a cron job will do it just fine.
Going the DJ route, you'd need to create a job that would check for new records, process them if there are any, then requeue the job, as each job is marked "completed" when it's finished.
-jon
This is what I finally did: Use Whenever, because it integrates nicely with Capistrano and showed me how to run Rails code from within cron. My missing peace was basically
script/runner -e production 'ChangeObserver.recentchanges'
which is now run every 5 minutes. The recentchanges reads the last looked-at ID from a tmp-file, pulls all new Change records which have a higher ID than that and runs the normal observer code for each record (and saves the highest looked-at ID to the tmp-file, of course).
As usual with monitoring state changes, there are two approaches : polling and notification. You seem to have chose to go the polling way for now (having a cron job look at the state of the database on a regular basis and execute some code if that changed)
You can do the same thing using one of the rails schedulers, there are a few out there (google will find them readily, they have various feature sets, I'll let you choose the one which suits your need if you got that way)
You could also try to go the notification way depending on your database. Some database support both triggers and external process execution or specific notification protocols.
In this case you are notified by the database itself that the table changed. there are many such options for various DBMS in Getting events from a database

Resources