Jobs are executed many times on an AWS worker - ruby-on-rails

I'm performing some jobs on an AWS worker environment. I can't get why but for some reason my job is executed many times. I can say that because I save my job in the database and I saw it in a state running, then completed, and then running again. Even though I launched it just one time. My application is built in Ruby on Rails, I'm using active_job and the gem active_elastic_job because I take advantage of elastic beanstalk. Anyone have any idea? I can provide all the info you want.

Related

Rails/Capistrano: check if sidekiq is running on an EC2 instance

I'm using an EC2 instance for hosting a rails application. I'm deploying with capistrano and I had already included sidekiq and it's working fine. However, sometimes on deploy, and sometimes sporadically, sidekiq stops running and I don't notice until some tasks that use sidekiq doesn't run.
I could do something on deploy to check that, but if it stops to work eventually after deploy, that would still be a problem.
I would like to know what is the best way, in that scenario, to check periodically if sidekiq is running, and if not to, run it.
I thought of doing a bash script for that, but apparently, when I run sidekiq from command line, it creates another process with a different pid of the one launched by sidekiq... so I think it could get messy.
Any help is appreciated. Thanks!
Learn and use systemd to manage the service.
https://github.com/mperham/sidekiq/wiki/Deployment#running-your-own-process

Where should I run scheduled background jobs?

Here in my company we have our regular application in aws ebs with some background jobs. The problem is, these jobs are starting to get heavier and we were thinking in separate them from the application. The question is: Where should we do it?
We were thinking in doing it in aws lambda, but then we would have to port our rails code to python, node or java, which seems to be a lot of work. What are other options for this? Should we just create another ec2 environment for the jobs? Thanks in advance.
Edit: I'm using shoryuken gem: http://github.com/phstc/shoryuken integrated with SQS. But its currently with some memory leak and my application is going down sometimes, I dont know if the memory leak is the cause tough. We already separated the application between an API part in EBS and a front-end part in S3.
Normally, just another EC2 instance with a copy of your Rails app, where instead of rails s to start the web server, you run rake resque:work or whatever your job runner start command is. Both would share the same Redis instance and database so that your web server writes the jobs to the queue and the worker picks them up and runs them.
If you need more workers, just add more EC2 instances pointing to the same Redis instance. I would advise separating your jobs by queue name, so that one worker can just process fast stuff e.g. email sending, and others can do long running or slow jobs.
We had a similar requirement, for us it was the sidekiq background jobs, they started to get very heavy, so we split it into a separate opsworks stack, with a simple recipe to build the machine dependencies ( ruby, mysql, etc ), and since we don't have to worry about load balancers and requests timing out, it's ok for all machines to deploy at the same time.
Also another thing you could use in opsworks is using scheduled machines ( if the jobs are needed at certain times during the day ), having the machine get provisioned few minutes before the time of the task, and then after the task is done you could make it shutdown automatically, that would reduce your cost.
EB also has a different type of application, which is the worker application, you could also check that out, but honestly I haven't looked into it so I can't tell you what are the pros and cons of that.
We recently passed on that route. I dockerized our rails app, and wrote a custom entrypoint to that docker container. In summary that entrypoint parses commands after you run docker run IMAGE_NAME
For example: If you run: docker run IMAGE_NAME sb rake do-something-magical entrypoint understands that it will run rake job with sandbox envrionment config. if you only run: docker run IMAGE_NAME it will do rails s -b 0.0.0.0
PS: I wrote custom entrypoint because we have 3 different environments, that entrypoint downloads environment specific config from s3.
And I set up an ECS Cluster, wrote an task-runner job on Lambda this lambda function schedules a task on ecs cluster, and we trigger that lambda from CloudWatch Events. You can send custom payloads to lambda when using CloudWatch Events.
It sounds complicated but implementation is really simple.
You may consider to submit your task to AWS SQS services, then you may use elasticbeantaslk worker enviroment to process your backgrown task.
Elasticbeanstalk supports rail application:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_Ruby_rails.html
Depending on what kind of work these background jobs perform, you might want to think about maybe extracting those functions into microservices if you are running your jobs on a difference instance anyway.
Here is a good codeship blog post on how to approach this.
For simple mailer type stuff, this definitely feels a little heavy handed, but if the functionality is more complex, e.g. general notification of different clients, it might well be worth the overhead.

How the Sidekiq server process pulls jobs from the queue in Redis?

I've two Rails application running on two different instance(lets say Server1 and Server2) but they have similar codes and shares the same Postgresql DB.
I installed Sidekiq and pushing the jobs in Queue from both the servers, but I'm running the Sidekiq process only in Server1.
I've single Redis server and its running on Server1 which shares the Redis with Server2.
If a job pushed from Server2 it getting processed in Server1's Sidekiq process and its what I actually wanted.
My question is
How the Sidekiq process on Server1 knows that a job is pushed in Redis?
Whether the Sidekiq process continuously checks on the Redis server for any new jobs or the Redis server is intimating to the Sidekiq process about the new job?
I got confused and amazed about this!!!
Could anyone please clarify the Sidekiq's process to get the job from Redis server?
It will be helpful for newbies like me.
Sidekiq uses redis command named BRPOP.
This command gets an element from a list (which is your job queue). And if the list is empty, it waits for element to appear and then pops/returns it. This also works with multiple queues at the same time.
So no, sidekiq does not poll redis and redis does not push notifications to sidekiq.
Sidekiq uses a polling mechanism to check for new jobs in Redis. The default polling interval is set at 5 seconds and can be adjusted in the configuration file located at lib/sidekiq/config.rb [link]
# lib/sidekiq/config.rb
average_scheduled_poll_interval: 5
By the way, jobs are stored in Redis as a list and Sidekiq retrieves them by using the BRPOP (Blocking Right Pop) command to avoid any race conditions. This ensures that multiple Sidekiq processes running on different instances are able to retrieve the jobs in a coordinated manner.

Dockerizing Delayed Job

We are currently working on Dockerizing our Ruby on Rails application, which also includes Delayed Job. A question buzzing within our development team is whether and/or how to Dockerize the Delayed Job component separately from the application.
This would allow Delayed Job to start up new containers when necessary for high traffic within the jobs queue. In addition, since Delayed Job actually starts up the Rails application each time when first booting up, we thought the following benefits would follow:
The Delayed Job container might start up quicker
Application code would start up regardless of the Delayed Job container startup time
So I know a guy responsible for a rails app that uses delayed jobs. When it came time to dockerize said app, it got a container for each. Both containers are using the same codebase but one runs the frontend and one the jobs. It's not devops microservice-eriffic but it works.
Outside of the logical separation between the two, docker containers are supposed to only have a single process running inside. Could've hacked it together but it seemed wrong to break a docker fundamental out of the gate.

Sidekiq servers in different machines consuming from one Redis

Right now I have a Rails application hosted in Heroku and I also have the Sidekiq workers managed from there.
I am thinking on having Rails in Heroku as a client to push jobs to the Redis queue, and then have N machines that have Sidekiq daemons that get jobs from that Redis queue where rails is pushing jobs.
My question is:
Can I have several Sidekiq daemons that process jobs from that same queue? I can't find an example that specifies how to do so. I have seen sidekiq pure ruby, but I fail to see how would I write the server so that it starts fetching jobs.
(This is what actually concerns me the most) Once the job is done, how do I send the results back to Rails? I am thinking on POSTing the result via JSON. But I would appreciate any feedback on this too.
Finally, if it is possible to have several Sidekiq servers, from where should I manage all the workers/jobs/ - Set up the UI in the Rails side would be enough?

Resources