How to use sidekiq and queue as searchkick `callbacks` option - ruby-on-rails

I'm using searchkick to work with elasticsearch on RoR 4 app. Searchkick is great, and surprisingly easy to use but some of it's options are not well described.
It's quite heavy traffic site so i'm trying to do most of the work asynchronously with sidekiq.
I;m trying to put updating index after creating/updating record to work asynchronously as well but :queue option seems even more fitting my case, as it's doing a bulk updates of missing records.
So, docs are saying to set_up redis, and callbacks option on model, and:
Then, set up a background job to run.
Searchkick::ProcessQueueJob.perform_later(class_name: "Product")
where to put that code?
When I add some records they are invisible until I run that once, so should it by run in a schedule? as CRON task?

Yes, you should set up the cron job: create rake task and start executing it. The searchkick callback as queue uses Redis queue. Once the job starts, it takes all accumulated "Product" ids so far and index them. So if you want to your Product's index bring up to date you need to perform that job repeatedly.

Related

Can I assign my own randomized Job ID to Sidekiq?

I am using Sidekiq to schedule some tasks based on a schedule that the user provides. However, if the user changes the schedule, I want to be able to simply update the old schedule with the new one.
Suggestion one
I saw a suggestion to just find the old job with Sidekiq::ScheduledSet.new.find_job(job_id), but I am trying to avoid having to create a new model just to simply store the job ID and the task.
Suggestion two
Another suggestion I saw was to just have the worker check if the time of the task matches the current time, but that won't work because if the server is offline, it won't process jobs when it returns back online because the time of those delayed jobs won't match the current time.
If I could assign my own job ID, like a hex version of the job name or a padded version of the task ID, then I could easily avoid having to create a new model to store the job IDs. So when the user reschedules a task, then it would be a lot easier.
Other thoughts
Maybe if I could check the job's at attribute and match that with the task, that may work, but I'm not sure how to access that attribute from within the worker without knowing the job ID.
Edit
I just tried to pull the current job's at attribute, but it looks like once the job kicks off, it doesn't exist anymore in Sidekiq::ScheduledSet, so there's no matching this job's time with Task's time from what it seems like.
I am using Sidekiq to schedule some tasks based on a schedule that the user provides...
There's an extension for that. Sidekiq-Scheduler gives you a cron-like schedule configuration file. Then you can alter the schedule as you see fit. This seems like the best option as it avoids having to write your own scheduler interface.
Can I assign my own randomized Job ID to Sidekiq?
Yes, though it's undocumented. You can give Sidekiq::Client.push a jid attribute.
Sidekiq::Client.push('class' => MyWorker, 'args' => [1, 2, 3], 'jid' => ... )
This is not a good way to solve your problem. It's relying on an undocumented feature. And it invites collisions with normal Sidekiq IDs.
Maybe if I could check the job's at attribute and match that with the task, that may work, but I'm not sure how to access that attribute from within the worker without knowing the job ID.
This sounds very error prone. You'd have to store the timestamp in a model anyway. Better to store the job ID in the first place.
I am trying to avoid having to create a new model just to simply store the job ID and the task.
Storing things in models is what Rails does really well. This would seem to be the way to go. It will take a trivial amount of coding, database storage, and processing. You should have a model, view, and controller for your scheduled jobs anyway else how will you create scheduled jobs and view your schedule?
However, the Sidekiq docs notes that find_job is "a slow, inefficient operation. Do not use under normal conditions. Sidekiq Pro contains a faster version." This is because it has to iterate through all jobs.
I had a case where I had to reschedule jobs based on updates from the User. It is actually pretty slow and complicated.
It's simpler to not reschedule, but instead make the old queued tasks no-ops (no operations) and then queue up the new tasks.
This is basically defined by the logic within the task. You'd have to know that the user updated their schedule somehow and check that within the old jobs and based on some if-check, not go through with the job.

Are rake tasks suitable for long running processes in production?

I'm planing in using a rake task to develop a long running background process for my rails application. Are rake tasks appropriate for this kind of processes? Ideally, I would like wrap it inside a linux daemon to be able to start and end the process easily.
If it's not the best option, which are the alternatives? I'm trying to avoid using a cron-based solution to avoid having to worry about the schedule and the posibility of having diferent running instances of the same process overlapping between them.
Thanks!
You can try delayed job with this extension.
class MyJob
include Delayed::ScheduledJob
run_every 1.day
def display_name
"MyJob"
end
def perform
# code to run ...
end
end
Or manually enqueue another job with Time.now + 5.minutes for example after current job is finished inside perform method.
Have you looked at the delayed_job gem?
https://github.com/collectiveidea/delayed_job
From their documentation:
Delayed::Job (or DJ) encapsulates the common pattern of asynchronously executing longer tasks in the background.
It is a direct extraction from Shopify where the job table is responsible for a multitude of core tasks. Amongst those tasks are:
sending massive newsletters
image resizing
http downloads
updating smart collections
updating solr, our search server, after product changes
batch imports
spam checks
it might depend on the kind of background jobs you need to run.
Basically if you need some sort of post processing on data the users enter, like rendering images for posts, do some async integration with third party resources, etc. then you better off with using Sidekiq (yeah, it's better than DelayedJob as people suggested)
But if you need to run something on schedule, like some say night downloads, cleaning up blocked users and stuff, then writing a rake task and kick it in with a cron task might be a perfectly useful option, coz you could use those tasks from CLI whenever you need to run them on demand

rails periodic task

I have a ruby on rails app in which I'm trying to find a way to run some code every few seconds.
I've found lots of info and ideas using cron, or cron-like implementations, but these are only accurate down to the minute, and/or require external tools. I want to kick the task off every 15 seconds or so, and I want it to be entirely self contained within the application (if the app stops, the tasks stop, and no external setup).
This is being used for background generation of cache data. Every few seconds, the task will assemble some data, and then store it in a cache which gets used by all the client requests. The task is pretty slow, so it needs to run in the background and not block client requests.
I'm fairly new to ruby, but have a strong perl background, and the way I'd solve this there would be to create an interval timer & handler which forks, runs the code, and then exits when done.
It might be even nicer to just simulate a client request and have the rails controller fork itself. This way I could kick off the task by hitting the URI for it (though since the task will be running every few seconds, I doubt I'll ever need to, but might have future use). Though it would be trivial to just have the controller call whatever method is being called by the periodic task scheduler (once I have one).
I'd suggest the whenever gem https://github.com/javan/whenever
It allows you to specify a schedule like:
every 15.minutes do
MyClass.do_stuff
end
There's no scheduling cron jobs or monkeying with external services.
Generally speaking, there's no built in way that I know of to create a periodic task within the application. Rails is built on Rack and it expects to receive http requests, do something, and then return. So you just have to manage the periodic task externally yourself.
I think given the frequency that you need to run the task, a decent solution could be just to write yourself a simple rake task that loops forever, and to kick it off at the same time that you start your application, using something like Foreman. Foreman is often used like this to manage starting up/shutting down background workers along with their apps. On production, you may want to use something else to manage the processes, like Monit.
You can either write you own method, something like
class MyWorker
def self.work
#do you work
sleep 15
end
end
run it with rails runner MyWorker.work
There will be a separate process running in the background
Or you can use something like Resque, but that's a different approach. It works like that: something adds a task to the queue, meanwhile a worker is fetching whatever job it is in the queue, and tries to finish it.
So that depends on your own need.
I know it is an old question. But maybe for someone this answer could be helpful. There is a gem called crono.
Crono is a time-based background job scheduler daemon (just like Cron) for Ruby on Rails.
Crono is pure Ruby. It doesn't use Unix Cron and other platform-dependent things. So you can use it on all platforms supported by Ruby. It persists job states to your database using Active Record. You have full control of jobs performing process. It's Ruby, so you can understand and modify it to fit your needs.
The awesome thing with crono is that its code is self explained. In order to do a task periodically you can just do:
Crono.perform(YourJob).every 2.days
Maybe you can also do:
Crono.perform(YourJob).every 30.seconds
Anyway you really can do a lot of things. Another example could be:
Crono.perform(TestJob).every 1.week, on: :monday, at: "15:30"
I suggest this gem instead of whenever because whenever uses Unix Cron table which not always is available.
Throwing out a solution just because it looks somewhat elegant and answers the question without any extra gems. In my scenario I wanted to run some code, but only after all my Sidekiq workers were done doing their thing.
First I defined a method to check if any workers were working...
def workers_working?
workers = Sidekiq::Workers.new.map do |_process_id, _thread_id, work|
work
end
workers.size > 0
end
Then we just call the method with a loop which sleeps between calls.
sleep 5 while workers_working?
Use something like delayed job, and requeue it every so often?
Use thin or other server which uses eventmachine, then just use timers that are part of eventmachine. Example: in config/application.rb
EM.add_periodic_timer(2) do
do_this_every_2_sec
end

Destroying all delayed job in rails

I am using collectiveidea for rails 2.3.8. I am creating array of delayed jobs to
perform some tasks, after some time I want to destroy all the delayed jobs which are running.
If anyone know the way to do this please help me.
You can invoke rake jobs:clear to delete all jobs in the queue.
In addition to the rake task, DelayedJob jobs are just a normal ActiveRecord model, so if you're in Ruby code you can do what you like with them:
Delayed::Job.destroy_all
Delayed::Job.delete_all
Delayed::Job.find(4).destroy
# etc.
Sounds like you've got a parent process that wants to timeout if its set of jobs doesn't complete within a certain time. Instead of hanging on to references to the jobs themselves, set a flag on a model that indicates that the process has given up. Jobs can check for that flag and short circuit if they're not needed anymore. (Your Job class should also wrap the contents of its #perform method in a timeout.)
It's almost always a bad idea to try to hang on to references to DJ objects as you seem to be suggesting.

Some questions about using resque

I am using Resque to run a background process. This is how my background process works:
Scans through all the rows in an ActiveRecord model
Checks for a condition
Updates the row if the condition is met
And this needs to go on infinitely.
This is how I am trying to use Resque for my purpose, here's my worker class:
class ThumbnailMaker
#queue = :thumbnail_queue
def self.perform()
MyObj.check_thumbnails(root_url)
end
end
I understand the perform() method keeps a task in a queue, which is run periodically. In my case, I need a task that scans the whole table, so it runs for a longer time. Is it a good solution to my requirements?
On another note, I need the root url for my Rails application, which is easily obtained with the root_url in Rails Controller. But I need it in a class I have created, can you suggest me how I can get it here?
Resque is for queueing tasks to be run in the background; each item in the queue runs once and then is removed. What you want is more like a scheduled task--for example, a custom Rake task or other script that runs from time to time; there are many scheduling gems available for this kind of thing (wenever is very popular) or just use cron. There is a great RailsCasts episode about this very topic.
You might want to try putting your code in a rake task and running it periodically through a cron job. Resque/Redis seems a bit too much for your needs.
You may consider passing the root url in with as parameter if you are calling your class through your controller. Otherwise, you may want to set it as a ENV setting and configure each of your deployments accordingly.

Resources