Ruby Puma see all threads if they are occupied? - ruby-on-rails

I want to see if something is slow and occupied doing a request. Or if it is time to scale up.
I used Puma.stats but it only returns:
{
"started_at": "2020-09-07T14:43:53Z",
"backlog": 0,
"running": 7,
"pool_capacity": 3,
"max_threads": 7
}
I cannot see if the threadpool is full. Is there a way of seeing that info?

Puma.stats includes the information needed to know how many Puma threads are occupied. According to Puma's documentation, pool_capacity is the number of threads that are not occupied:
This number represents the number of requests that the server is capable of taking right now.
For example if the number is 5 then it means there are 5 threads sitting idle ready to take a request. If one request comes in, then the value would be 4 until it finishes processing.

Related

Understand how k6 manages at low level a large number of API call in a short period of time

I'm new with k6 and I'm sorry if I'm asking something naive. I'm trying to understand how that tool manage the network calls under the hood. Is it executing them at the max rate he can ? Is it queuing them based on the System Under Test's response time ?
I need to get that because I'm running a lot of tests using both k6 run and k6 cloud but I can't make more than ~2000 requests per second (looking at k6 results). I was wondering if it is k6 that implement some kind of back-pressure mechanism if it understand that my system is "slow" or if there are some other reasons why I can't overcome that limit.
I read here that is possible to make 300.000 request per second and that the cloud environment is already configured for that. I also try to manually configure my machine but nothing changed.
e.g. The following tests are identical, the only changes is the number of VUs. I run all test on k6 cloud.
Shared parameters:
60 api calls (I have a single http.batch with 60 api calls)
Iterations: 100
Executor: per-vu-iterations
Here I got 547 reqs/s:
VUs: 10 (60.000 calls with an avg response time of 108ms)
Here I got 1.051,67 reqs/s:
VUs: 20 (120.000 calls with an avg response time of 112 ms)
I got 1.794,33 reqs/s:
VUs: 40 (240.000 calls with an avg response time of 134 ms)
Here I got 2.060,33 ​reqs/s:
VUs: 80 (480.000 calls with an avg response time of 238 ms)
Here I got 2.223,33 ​reqs/s:
VUs: 160 (960.000 calls with an avg response time of 479 ms)
Here I got 2.102,83 peak ​reqs/s:
VUs: 200 (1.081.380 calls with an avg response time of 637 ms) // I reach the max duration here, that's why he stop
What I was expecting is that if my system can't handle so much requests I have to see a lot of timeout errors but I haven't see any. What I'm seeing is that all the API calls are executed and no errors is returned. Can anyone help me ?
As k6 - or more specifically, your VUs - execute code synchronously, the amount of throughput you can achieve is fully dependent on how quickly the system you're interacting with responds.
Lets take this script as an example:
import http from 'k6/http';
export default function() {
http.get("https://httpbin.org/delay/1");
}
The endpoint here is purposefully designed to take 1 second to respond. There is no other code in the exported default function. Because each VU will wait for a response (or a timeout) before proceeding past the http.get statement, the maximum amount of throughput for each VU will be a very predictable 1 HTTP request/sec.
Often, response times (and/or errors, like timeouts) will increase as you increase the number of VUs. You will eventually reach a point where adding VUs does not result in higher throughput. In this situation, you've basically established the maximum throughput the System-Under-Test can handle. It simply can't keep up.
The only situation where that might not be the case is when the system running k6 runs out of hardware resources (usually CPU time). This is something that you must always pay attention to.
If you are using k6 OSS, you can scale to as many VUs (concurrent threads) as your system can handle. You could also use http.batch to fire off multiple requests concurrently within each VU (the statement will still block until all responses have been received). This might be slightly less overhead than spinning up additional VUs.

Circumventing negative side effects of default request sizes

I have been using Reactor pretty extensively for a while now.
The biggest caveat I have had coming up multiple times is default request sizes / prefetch.
Take this simple code for example:
Mono.fromCallable(System::currentTimeMillis)
.repeat()
.delayElements(Duration.ofSeconds(1))
.take(5)
.doOnNext(n -> log.info(n.toString()))
.blockLast();
To the eye of someone who might have worked with other reactive libraries before, this piece of code
should log the current timestamp every second for five times.
What really happens is that the same timestamp is returned five times, because delayElements doesn't send one request upstream for every elapsed duration, it sends 32 requests upstream by default, replenishing the number of requested elements as they are consumed.
This wouldn't be a problem if the environment variable for overriding the default prefetch wasn't capped to minimum 8.
This means that if I want to write real reactive code like above, I have to set the prefetch to one in every transformation. That sucks.
Is there a better way?

sidekiq-pro batches don't appear to release redis memory after batches complete

We are using sidekiq pro 1.7.3 and sidekiq 3.1.4, Ruby 2.0, Rails 4.0.5 on heroku with the redis green addon with 1.75G of memory.
We run a lot of sidekiq batch jobs, probably around 2 million jobs a day. What we've noticed is that the redis memory steadily increases over the course of a week. I would have expected that when the queues are empty and no workers are busy that redis would have low memory usage, but it appears to stay high. I'm forced to do a flushdb pretty much every week or so because we approach our redis memory limit.
I've had a series of correspondence with Redisgreen and they suggested I reach out to the sidekiq community. Here are some stats from redisgreen:
Here's a quick summary of RAM use across your database:
The vast majority of keys in your database are simple values taking up 2 bytes each.
200MB is being consumed by "queue:low", the contents of your low-priority sidekiq queue.
The next largest key is "dead", which occupies about 14MB.
And:
We just ran an analysis of your database - here is a summary of what we found in 23129 keys:
18448 strings with 1048468 bytes (79.76% of keys, avg size 56.83)
6 lists with 41642 items (00.03% of keys, avg size 6940.33)
4660 sets with 3325721 members (20.15% of keys, avg size 713.67)
8 hashs with 58 fields (00.03% of keys, avg size 7.25)
7 zsets with 1459 members (00.03% of keys, avg size 208.43)
It appears that you have quite a lot of memory occupied by sets. For example - each of these sets have more than 10,000 members and occupies nearly 300KB:
b-3819647d4385b54b-jids
b-3b68a011a2bc55bf-jids
b-5eaa0cd3a4e13d99-jids
b-78604305f73e44ba-jids
b-e823c15161b02bde-jids
These look like Sidekiq Pro "batches". It seems like some of your batches are getting filled up with very large numbers of jobs, which is causing the additional memory usage that we've been seeing.
Let me know if that sounds like it might be the issue.
Don't be afraid to open a Sidekiq issue or email prosupport # sidekiq.org directly.
Sidekiq Pro Batches have a default expiration of 3 days. If you set the Batch's expires_in setting longer, the data will sit in Redis longer. Unlike jobs, batches do not disappear from Redis once they are complete. They need to expire over time. This means you need enough memory in Redis to hold N days of Batches, usually not a problem for most people, but if you have a busy Sidekiq installation and are creating lots of batches, you might notice elevated memory usage.

Parse list 3 threads a time, when 5 completed works, server signal to do something

Hy I am curious does anyone know a tutorial example where semaphores are used for more than 1 process /thread. I'm looking forward to fix this problem. I have an array, of elements and an x number of threads. This threads work over the array, only 3 at a moment. After 5 works have been completed, the server is signelised and it clean those 5 nodes. But I'm having problems with the designing this problem. (node contains worker value which contains the 'name' of the thread that is allowed to work on it, respectivly nrNodes % nrThreads)
In order to make changes on the list a mutex is neccesarly in order not to overwrite / make false evaluations.
But i have no clue how to limit 3 threads to parse, at a given point, the list, and how to signal the main for cleaning session. I have been thinking aboutusing a semafor and a global constant. When the costant reaches 5, the server to be signaled(which probably would eb another thread.)
Sorry for lack of code but this is a conceptual question, what i have written so far doesn't affect the question in any way.

rails high memory usage

I am planning on using delayed job to run some background analytics. In my initial test I saw tremendous amount of memory usage, so I basically created a very simple task that runs every 2 minutes just to observe how much memory is is being used.
The task is very simple and the analytics_eligbile? method always return false, given where the data is now, so basically none of the heavy hitting code is being called. I have around 200 Posts in my sample data in development. Post has_one analytics_facet.
Regardless of the internal logic/business here, the only thing this task is doing is calling the analytics_eligible? method 200 times every 2 minutes. In a matter of 4 hours my physical memory usage is at 110MB and Virtual memory at 200MB. Just for doing something this simple! I can't even begin to imagine how much memory this will eat if its doing real analytics on 10,000 Posts with real production data!! Granted it may not run evevery 2 minutes, more like every 30, still I don't think it will fly.
This is running ruby 1.9.7, rails 2.3.5 on Ubuntu 10.x 64 bit. My laptop has 4GB memory, dual core CPU.
Is rails really this bad or am I doing something wrong?
Delayed::Worker.logger.info('RAM USAGE Job Start: ' + `pmap #{Process.pid} | tail -1`[10,40].strip)
Post.not_expired.each do |p|
if p.analytics_eligible?
#this method is never called
Post.find_for_analytics_update(p.id).update_analytics
end
end
Delayed::Worker.logger.info('RAM USAGE Job End: ' + `pmap #{Process.pid} | tail -1`[10,40].strip)
Delayed::Job.enqueue PeriodicAnalyticsJob.new(), 0, 2.minutes.from_now
Post Model
def analytics_eligible?
vf = self.analytics_facet
if self.total_ratings > 0 && vf.nil?
return true
elsif !vf.nil? && vf.last_update_tv > 0
ratio = self.total_ratings / vf.last_update_tv
if (ratio - 1) >= Constants::FACET_UPDATE_ELIGIBILITY_DELTA
return true
end
end
return false
end
ActiveRecord is fairly memory-hungry - be very careful when doing selects, and be mindful that Ruby automatically returns the last statement in a block as the return value, potentially meaning that you're passing back an array of records that get saved as a result somewhere and thus aren't eligible for GC.
Additionally, when you call "Post.not_expired.each", you're loading all your not_expired posts into RAM. A better solution is find_in_batches, which specifically only loads X records into RAM at a time.
Fixing it could be something as simple as:
def do_analytics
Post.not_expired.find_in_batches(:batch_size => 100) do |batch|
batch.each do |post|
if post.analytics_eligible?
#this method is never called
Post.find_for_analytics_update(post.id).update_analytics
end
end
end
GC.start
end
do_analytics
A few things are happening here. First, the whole thing is scoped in a function to prevent variable collisions from holding onto references from the block iterators. Next, find_in_batches retrieves batch_size objects from the DB at a time, and as long as you aren't building references to them, become eligible for garbage collection after each iteration runs, which will keep total memory usage down. Finally, we call GC.start at the end of the method; this forces the GC to start a sweep (which you wouldn't want to do in a realtime app, but since this is a background job, it's okay if it takes an extra 300ms to run). It also has the very distinct benefit if returning nil, which means that the result of the method is nil, which means we can't accidentally hang on to AR instances returned from the finder.
Using something like this should ensure that you don't end up with leaked AR objects, and should vastly improve both performance and memory usage. You'll want to make sure you aren't leaking elsewhere in your app (class variables, globals, and class references are the worst offenders), but I suspect that this'll solve your problem.
All that said, this is a cron problem (periodic recurring work), rather than a DJ problem, in my opinion. You can have a one-shot analytics parser that runs your analytics every X minutes with script/runner, invoked by cron, which very neatly cleans up any potential memory leaks or misuses per-run (since the whole process terminates at the end)
Loading data in batches and using the garbage collector aggressively as Chris Heald has suggested is going to give you some really big gains, but another area people often overlook is what frameworks they're loading in.
Loading a default Rails stack will give you ActionController, ActionMailer, ActiveRecord and ActiveResource all together. If you're building a web application you may not be using all of these, but you're probably using most.
When you're building a background job, you can avoid loading things you don't need by creating a custom environment for that:
# config/environments/production_bg.rb
config.frameworks -= [ :action_controller, :active_resource, :action_mailer ]
# (Also include config directives from production.rb that apply)
Each of these frameworks will just be sitting around waiting for an email that will never be sent, or a controller that will never be called. There's simply no point in loading them. Adjust your database.yml file, set your background job to run in the production_bg environment, and you'll have a much cleaner slate to start with.
Another thing you can do is use ActiveRecord directly without loading Rails at all. This might be all that you need for this particular operation. I've also found using a light-weight ORM like Sequel makes your background job very light-weight if you're doing mostly SQL calls to reorganize records or delete old data. If you need access to your models and their methods you will need to use ActiveRecord, though. Sometimes it's worth re-implementing simple logic in pure SQL for reasons of performance and efficiency, though.
When measuring memory usage, the only number to be concerned with is "real" memory. The virtual amount contains shared libraries and the cost of these is spread amongst every process using them even though it is counted in full for each one.
In the end, if running something important takes 100MB of memory but you can get it down to 10MB with three weeks of work, I don't see why you'd bother. 90MB of memory costs at most about $60/year on a managed provider which is usually far less expensive than your time.
Ruby on Rails embraces the philosophy of being more concerned with your productivity and your time than about memory usage. If you want to trim it back, put it on a diet, you can do it but it will take a bit of effort.
If you are experiencing memory issues, one solution is to use another background processing tech, like resque. It is the BG processing used by github.
Thanks to Resque's parent / child
architecture, jobs that use too much
memory release that memory upon
completion. No unwanted growth
How?
On certain platforms, when a Resque
worker reserves a job it immediately
forks a child process. The child
processes the job then exits. When the
child has exited successfully, the
worker reserves another job and
repeats the process.
You can find more technical details in README.
It is a fact that Ruby consumes (and leaks) memory. I don't know if you can do much about it, but at least I recommend that you take a look on Ruby Enterprise Edition.
REE is an open source port which promises "33% less memory" among all the other good things. I have used REE with Passenger in production for almost two years now and I'm very pleased.

Resources