How to decrease the memory consumption of Sidekiq processes? - ruby-on-rails

I've a server that launches 10 web applications that are almost identical (only assets and content are different). These applications uses Sidekiq to send emails after successful submitting. The problem is in memory usage. Each process consumes 80-100MB of RAM.
I've already set up concurrency: 1 for every project. Since the amount of jobs is small I want to somewhat combine these processes into one. How to do this? Is it reliable solution? Maybe, I should search for some memory leaks?
I'm not so experienced in this field so any advices are welcome.

You could centralize all the email sending into a single sidekiq process to send out the emails of all applications.
Here's a good answer about the details of sharing sidekiq between various apps:
How to share worker among two different applications on heroku?

Related

How to isolate worker dynos from web dynos on heroku?

We have developed a Rails app in Heroku, we have around 3 web dynos, and 2-3 worker dynos. We have some exporting and importing functionalities that use a lot of our worker dynos, when that happens, everything crashes, and we get an App error un the website.
Sentry tells us that it is due to a Timeout. We are trying to find out which functionality of our software is taking so much worker time. The problem is that it affects all of our users, some of them only using web layer functionalities.
But I was wondering, is there a way to isolate our worker dyno problems from the web dynos work? I mean, Is there a way that our site does not crash when one user exports a big amount of data and saturates the workers?
Thanks in advance!
Regards,
Gonzalo
thank you for the answers, let me give you some related info:
- We use delayed_job for the workers, which is async.
- We used to have a DB connection before, it supported 120 connections, and we never saw it completely busy. The current AWS RDS only reached 24% of its use and we only saw 28 concurrent connections on the day of the crash.
- New Relic did not indicate delay in the DB.
- The web dynos start to generate timeout in many functionalities, if the crash is not related to the workers, it may be because of some functionality that is not in jobs.
Update:
- We have set some limits in our exports, even when those are in jobs, this was affecting our web and giving some App Errors. When we set the limit, the App Errors were dramatically reduced.
- We are still searching for any other unoptimized functionality.

Do you have to use worker pools in Erlang?

I have a server I am creating (a messaging service) and I am doing some preliminary tests to benchmark it. So far, the fastest way to process the data is to do it directly on the process of the user and to use worker pools. I have tested spawning and that is unbelievable slow.
The test is just connecting 10k users, and having each one send 15kb of data a couple of times at the same time(or trying too atleast) and having the server process the data (total length, headers, and payload).
The issue I have with worker pools is its only fast when you have enough workers to offset the amount of connections. For example, if you have 500k, or 1 million users, you would need more workers to process all the concurrent data coming in. And, as for my testing, having 1000 workers would make it unusable.
So my question is the following: When does it make sense to use pools of workers? Will there be a tipping point where I would have to use workers to process the data to free up the user process? How many workers is too much, is 500,000 too much?
And, if workers are the way to go (for those massive concurrent distributed servers), I am guessing you can dynamically create/delete as you need?
Any literature is also appreciated!
Thanks for your answer!
Maybe worker pools are not the best tool for your problem. If I were you I would try using Jay Nelson's epocxy, which gives you a very basic backpressure mechanism while still letting you parallelize your tasks. From that library I would check either concurrency fount or concurrency control tools.

Running large amount of long running background jobs in Rails

We're building a web-app where users will be uploading potentially large files that will need to be processed in the background. The task involves calling 3rd-party APIs so each job can take several hours to complete. We're using DelayedJob to run the background jobs. With every user kicking off a background job, each of which will take a few hours to finish, that will add up to a lot of background jobs every quickly. I am wondering what would be the best way to setup the deployment for this? We're currently hosted on DigitalOcean. I've kicked off 10 DelayedJob workers. Each one (when ideal) takes up 157MB. When actively running it utilizes around 900 MB. Our user-base right now is pretty small so it's not an issue but will be one soon. So on a 4GB droplet, I can probably run like 2 or 3 workers at a time. How should we approach this issue? Should we be looking at using DigitalOcean's API to auto-spin cheap droplets on demand? Should we subscribe to high-memory droplets on a monthly basis instead? If we go with auto-spinning droplets, should we stick with DigitalOcean or would Heroku make more sense? Or is the entire approach wrong and should we be approaching it from an entire different direction? Any help/advice would be very much appreciated.
Thanks!
It sounds like you are limited by memory on the number of workers that you can run on your DigitalOcean host.
If you are worried about scaling, I would focus on making the workers as efficient as possible. Have you done any benchmarking to understanding where the 900MB of memory is being allocated? I'm not sure what the nature of these jobs are, but you mentioned large files. Are you reading the contents of these files into memory, or are you streaming them? Are you using a database with SQL you can tune? Are you making many small API calls when you could be using a batch endpoint? Are you assigning intermediary variables that must then be garbage collected? Can you compress the files before you send them?
Look at the job structure itself. I've found that background jobs work best with many smaller jobs rather than one larger job. This allows execution to happen in parallel, and be more load balanced across all workers. You could even have a job that generates other jobs. If you need a job to orchestrate callbacks when a group of jobs finishes there is a DelayedJobGroup plugin at https://github.com/salsify/delayed_job_groups_plugin that allows you to invoke a final job only after the sibling jobs complete. I would aim for an execution time of a single job to be under 30 seconds. This is arbitrary but it illustrates what I mean by smaller jobs.
Some hosting providers like Amazon provide spot instances where you can pay a lower price on servers that do not have guaranteed availability. These pair well with the many fewer jobs approach I mentioned earlier.
Finally, Ruby might not be the right tool for the job. There are faster languages, and if you are limited by memory, or CPU, you might consider writing these jobs and their workers in another language like Javascript, Go or Rust. These can pair well with a Ruby stack, but offload computationally expensive subroutines to faster languages.
Finally, like many scaling issues, if you have more money than time, you can always throw more hardware at it. At least for a while.
I thing memory and time is more problem for you. you have to use sidekiq gem for this process because it will consume less time and memory consumption for doing the same job,because it uses redis as database which is key value pair db.if the problem continues go with java script.

Split Heroku Web Workers by URL

Firstly: I realise in an ideal world I could achieve this using SOA. Humour me :)
Background
Imagine I have a rails app running on heroku with very minimal traffic in terms of user requests, they can be happily served by 1 web dyno.
I also have a machine somewhere in the world which is regularly and repeatedly submitting large files to my application via http://example.com/api/bigupload as fast as it is able.
The large files eat up my web dynos and so the user experience is bad. I increase the web dynos, but the large file uploads continue to tie them all up in long requests.
Question
Is there some way I can keep one worker in 'reserve' which will not respond to the big upload requests and concentrate on serving user traffic for other URLs?
Note: I have a similar situation to this one where automated large image uploads are eating my requests and delaying users accessing the API, albeit on a larger scale.
I think you're effectively asking: "Is there a way to partition my web dynos so that only some respond to a certain subset of requests".
The answer (today) is no unfortunately. Heroku routes randomly across all your web dynos.
What web server are you running on your web dyno? Are you using a concurrent web server? If you're not, that may have a large impact (in that it won't tie the dyno up nearly as much).
Have you explored a different architecture where instead of your other app submitting big uploads, it submits pointers to the big payloads. That way your web dyno can simply dump them on a queue, and your workers can fetch the payloads and process - and then you can scale by increasing the number of workers.

Heroku | Different performance parameters for different parts of your application

I have an Rails 3 application hosted on heroku, it has pretty common configuration where I have a client facing part of my application say: www.myapplication.com and an admin part of my application admin.myapplication.com.
I want my client facing part of my application to be fast, and I don't really care about how fast my admin module is. What I do care about is that I do not want usage on my admin site to slow down the client facing part of my application.
Ideally my client-side of the app with have 3 dedicated dynos, and my admin side will have 1 dedicated dyno.
Does anyone have any idea on the best way to accomplish this?
Thanks!
If you split the applications you're going to have share the database connections between the two apps. To be honest, I'd just have it one single app and give it 4 dynos :)
Also, Dynos don't increase performance, they increase throughput so you're capable of dealing with more requests a second.
For example,
Roughly - If a typical page response is 100ms, 1 dyno could process 10 requests a second. If you only have a single dyno and your app suddenly receives 10 requests per second then the excess requests will be queued until the dyno is free'd up to process those requests. Also requests > 30s will be timed out.
If you add a second dyno requests would be shared between the 2 dynos so you'd now be able to process 20 requests a second (in an ideal world) and so on as you add more dynos.
And remember a dyno is single threaded, so if it's doing something ANYTHING ie rendering a page, building a pdf and including receiving an uploaded image etc then it's busy and unable to process further requests until it's finished and if you don't have an more dynos requests will be queued.
My advice is to split your application into it's logical parts. Having a separate application for the admin interface is a good thing.
It does not have to be on the same domain as the main application. It could have a global client IP restriction or just a simple global Basic Auth.
Why complicate things and stuff two things into one application? This also lets you eperimenting more with the admin part and redeploy it without affecting your users.

Resources