How to move Sidekiq (Redis) queue to another server? - ruby-on-rails

So we use Sidekiq as our queue managing system in our Rails application.
We also use Sidetiq to manage scheduled and recurring tasks.
At the moment there is around 200-300 scheduled tasks that will run anytime from couple of minutes to 30 days.
I would transfer just Redis database rdb file but due to some configuration changes, Rails project path has changed (hence tasks will not be able to run anymore)
What would be a preferred way to transfer whole scheduled tasks queue to work with new project path and manually is not the case.
Ruby 2.1.6
Rails 3.2.22
Sidekiq 3.4.2
Redis 2.8.4

Use DUMP and RESTORE:
redis-cli -h source_host dump schedule | head -c-1 | redis-cli -h dest_host restore 0 schedule
http://redis.io/commands/restore

You can copy your redis dump file as you said. It's not clear for me why you are excluding that option.
Doing it manually (just create a ruby script for it), moving the scheduled tasks should be pretty easy. The only thing you have to do is moving the redis sets retry and schedule

My main concern was I thought that I couldn't just copy redis database because path of my project changes but as it turned out, it wasn't an issue.
Fastest way to replicate db for me was to first bind redis on old server to either it's IP or 0.0.0.0
and then on new server run
redis-cli slaveof OLD_SERVER_IP 6379
and then when everything is copied (copying is done in matter of seconds), run
redis-cli slaveof no one
Tada. Your Redis db is fully replicated.

Related

Sidekiq set-up from localhost to deployment

Context: First time usage of ActiveJob via Sidekiq and Redis.
Situation: Sidekiq and redis are installed and running (route /sidekiq does generate the control panel) and is polling.
Issue(s?): Installation on remote server and enabling rather opaque... which may explain whay a default defined job is not executing although the server is inactive.
Maybe the sequence of going through Rails, then Sidekiq, then redis documentation was mistaken.
Uncertainty #1: However, I finally picked up that redis-server has to be launched. But what is not clear is how this can be part of the deployment process and a server start-up process to ensure this is running without manual input?
Uncertainty #2: Documentation indicates that sidekiq should be started via
bundle exec sidekiq -q critical -q high -q default -q low
again, what is not clear is how this can be part of the deployment process and a server start-up process to ensure this is running without manual input?
Uncertainty #3: if one of the above two is done improperly it may explain the following behaviour.
After starting sidekiq with the above command (this is assuming 4 levels of priority are needed. In the present context they are not; but I wanted to test and observe behaviour) and through an action activating a job
GenerateCodeJob.perform_later(item, shop)
=> it is found in Sidekiq's queues, under the Queue default and sits there for yonks.
My understanding is that jobs are performed based on server resource availability. But that is clearly not the case.
So what are the priorities and what type of timings are involved?
Or could the set-up be mistaken?

Where should I run scheduled background jobs?

Here in my company we have our regular application in aws ebs with some background jobs. The problem is, these jobs are starting to get heavier and we were thinking in separate them from the application. The question is: Where should we do it?
We were thinking in doing it in aws lambda, but then we would have to port our rails code to python, node or java, which seems to be a lot of work. What are other options for this? Should we just create another ec2 environment for the jobs? Thanks in advance.
Edit: I'm using shoryuken gem: http://github.com/phstc/shoryuken integrated with SQS. But its currently with some memory leak and my application is going down sometimes, I dont know if the memory leak is the cause tough. We already separated the application between an API part in EBS and a front-end part in S3.
Normally, just another EC2 instance with a copy of your Rails app, where instead of rails s to start the web server, you run rake resque:work or whatever your job runner start command is. Both would share the same Redis instance and database so that your web server writes the jobs to the queue and the worker picks them up and runs them.
If you need more workers, just add more EC2 instances pointing to the same Redis instance. I would advise separating your jobs by queue name, so that one worker can just process fast stuff e.g. email sending, and others can do long running or slow jobs.
We had a similar requirement, for us it was the sidekiq background jobs, they started to get very heavy, so we split it into a separate opsworks stack, with a simple recipe to build the machine dependencies ( ruby, mysql, etc ), and since we don't have to worry about load balancers and requests timing out, it's ok for all machines to deploy at the same time.
Also another thing you could use in opsworks is using scheduled machines ( if the jobs are needed at certain times during the day ), having the machine get provisioned few minutes before the time of the task, and then after the task is done you could make it shutdown automatically, that would reduce your cost.
EB also has a different type of application, which is the worker application, you could also check that out, but honestly I haven't looked into it so I can't tell you what are the pros and cons of that.
We recently passed on that route. I dockerized our rails app, and wrote a custom entrypoint to that docker container. In summary that entrypoint parses commands after you run docker run IMAGE_NAME
For example: If you run: docker run IMAGE_NAME sb rake do-something-magical entrypoint understands that it will run rake job with sandbox envrionment config. if you only run: docker run IMAGE_NAME it will do rails s -b 0.0.0.0
PS: I wrote custom entrypoint because we have 3 different environments, that entrypoint downloads environment specific config from s3.
And I set up an ECS Cluster, wrote an task-runner job on Lambda this lambda function schedules a task on ecs cluster, and we trigger that lambda from CloudWatch Events. You can send custom payloads to lambda when using CloudWatch Events.
It sounds complicated but implementation is really simple.
You may consider to submit your task to AWS SQS services, then you may use elasticbeantaslk worker enviroment to process your backgrown task.
Elasticbeanstalk supports rail application:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_Ruby_rails.html
Depending on what kind of work these background jobs perform, you might want to think about maybe extracting those functions into microservices if you are running your jobs on a difference instance anyway.
Here is a good codeship blog post on how to approach this.
For simple mailer type stuff, this definitely feels a little heavy handed, but if the functionality is more complex, e.g. general notification of different clients, it might well be worth the overhead.

How the Sidekiq server process pulls jobs from the queue in Redis?

I've two Rails application running on two different instance(lets say Server1 and Server2) but they have similar codes and shares the same Postgresql DB.
I installed Sidekiq and pushing the jobs in Queue from both the servers, but I'm running the Sidekiq process only in Server1.
I've single Redis server and its running on Server1 which shares the Redis with Server2.
If a job pushed from Server2 it getting processed in Server1's Sidekiq process and its what I actually wanted.
My question is
How the Sidekiq process on Server1 knows that a job is pushed in Redis?
Whether the Sidekiq process continuously checks on the Redis server for any new jobs or the Redis server is intimating to the Sidekiq process about the new job?
I got confused and amazed about this!!!
Could anyone please clarify the Sidekiq's process to get the job from Redis server?
It will be helpful for newbies like me.
Sidekiq uses redis command named BRPOP.
This command gets an element from a list (which is your job queue). And if the list is empty, it waits for element to appear and then pops/returns it. This also works with multiple queues at the same time.
So no, sidekiq does not poll redis and redis does not push notifications to sidekiq.
Sidekiq uses a polling mechanism to check for new jobs in Redis. The default polling interval is set at 5 seconds and can be adjusted in the configuration file located at lib/sidekiq/config.rb [link]
# lib/sidekiq/config.rb
average_scheduled_poll_interval: 5
By the way, jobs are stored in Redis as a list and Sidekiq retrieves them by using the BRPOP (Blocking Right Pop) command to avoid any race conditions. This ensures that multiple Sidekiq processes running on different instances are able to retrieve the jobs in a coordinated manner.

Rails 4.2 load balancing with nginx redis and sidekiq

Hi I just launched a rails 4 application which uses nginx as load balancer with thin serving rails on 2 ports. Additionally I use redis as cache which is also getting used by sidekiq.
I was wondering how can I scale up using another machine in order to run two more rails applications there. My idea is just running two more rails applications on another machine but the headache comes with redis since sidekiq is making heavy use of it. My first idea was just to have another redis slave which is just read only on the second machine . But this might be error prone since I have a lot of writes into redis in order to check a worker queue.
The following scenario kind of confuses me. The web app makes a request and triggers sidekiq which performs a long running action, it continuously updates the status in redis. The web client polls the app every second in order to get the status. Now it could be possible that the request gets redirected to the second machine with the redis slave which is not yet updated. So I was wondering how would be the best setup, just using one redis instance taking into account latency or run a redis slave?
You have two machines:
MachineA running thin and sidekiq.
MachineB running thin and sidekiq.
Now you install redis on MachineA and point Sidekiq to MachineA for Redis. Both Sidekiqs will talk to Redis on MachineA. See Using Redis for more detail.
Side note: A redis slave is useful for read-only debugging but isn't useful for scaling Sidekiq.

Keeping rake jobs:work running

I'm using delayed_job to run jobs, with new jobs being added every minute by a cronjob.
Currently I have an issue where the rake jobs:work task, currently started with 'nohup rake jobs:work &' manually, is randomly exiting.
While God seems to be a solution to some people, the extra memory overhead is rather annoying and I'd prefer a simpler solution that can be restarted by the deployment script (Capistrano).
Is there some bash/Ruby magic to make this happen, or am I destined to run a monitoring service on my server with some horrid hacks to allow the unprivelaged account the site deploys to the ability to restart it?
I'd suggest you to use foreman. It allows you to start any number of jobs in development by using foreman run, and then export your configuration (number of processes per type, limits etc) as upstart scripts, to make them available to Ubuntu's upstart (why invoking God when the operating system already has this for free??).
The configuration file, Procfile, is also exactly the same file Heroku uses for process configuration, so with just one file you get three process management systems covered.

Resources