sdkperf_java takes forever to launch multiple clients while sdkperf_mqtt is instantaneous

sdkperf_java takes forever to launch multiple clients while sdkperf_mqtt is instantaneous - solace

Trying to test message delivery to multiple clients and found that sdkperf_java takes about 30 seconds per client before receiving messages.
./sdkperf_java.sh -cip=HOST -cu default#vpn-1 -cp default -stl=topics/data -cc=20
The MQTT version of sdkperf takes maybe 5 seconds to subscribe for hundreds of clients.
./sdkperf_mqtt.sh -cip=mqtt://HOST:PORT -stl=topics/data -cc=925
As I am trying to stress test with both types of connections, is there anything I can do about the launch time? I need hundreds of clients running the java integration, but the spin-up time is an issue.

Related

How can I see how long my Cloud Run deployed revision took to spin up?

I deployed a Vue.js and a Kotlin server app. Cloud Run does promise to put a service to sleep if no request to it arise for a specific time. I did not opened my app for a day now. As I opened it - it was available almost immediatly. Since I know how long it takes to spin up when started locally I kinda don't trust the promise that Cloud Run really had put the app to sleep and span it up so crazy fast.
I'd love to know a way how I can really see how long it took for the spinup - also for startup improvement for the backend service.

After having the service inactive for some time, record the time when you request the service URL and request it.
Then go to the logs for the Cloud Run service, and use this filter to see the logs for the service:
resource.type="cloud_run_revision"
resource.labels.service_name="$SERVICE_NAME"
Look for the log entry with the normal app output after your request, check its time and compare it with the recorded time.

You can't know when the instance will be evicted or if it is kept in memory. It could happen quickly, or take hours or days before eviction. it's "serverless".
About the starting time, when I test, I deploy a new revision and I have a try on it. In the logging service, the first log entry of the new revision provides me the cold start duration. (Usually 300+ ms, compare to usual 20 - 50 ms with warm start).
The billing instance time is the sum of all the containers running times. A container is considered as "running" when it process request(s).

Why is ApproximateAgeOfOldestMessage in SQS not bigger than approx 5 mins

I am utilising spring cloud aws messaging (2.0.1.RELEASE) in java to consume from an SQS queue. If it's relevant we use default settings, java 10 and spring cloud Finchley.SR2,
We recently had an issue where a message could not be processed due to an application bug, leading to an exception and no confirmation (deletion) of the message. The message is later retried (this is desirable) presumably after the visibility timeout has elapsed (again default values are in use), we have not customised the settings here.
We didn't spot the error above for a few days, meaning the message receive count was very high and the message had conceptually been on the queue for a while (several days by now). We considered creating a cloud watch SQS alarm to alert us to a similar situation in future. The only suitable metric appeared to be ApproximateAgeOfOldestMessage.
Sadly, when observing this metric I see this:
The max age doesn't go much above 5 mins (despite me knowing it was several days old). If a message is getting older each time a receive happens, assuming no acknowledgment comes and the message isn't deleted - but is instead becoming available again after the visibility timeout has elapsed should this graph not be much much higher?
I don't know if this is something specific to thew way that spring cloud aws messaging consumes the message or whether it's a general SQS quirk, but my expectation was that if a message was put on the queue 5 days ago, and a consumer had not successfully consumed the message then the max age would be 5 days?
Is it in fact the case that if a message is received by a consumer, but not ultimately deleted that the max age is actually the length between consume calls?
Can anyone confirm whether my expectation is incorrect, i.e. this is indeed how SQS is expected to behave (it doesn't consider the age to be the duration of time since the message was first put on the queue, but instead considers it to be the time between receive calls?

Based on a similar question on AWS forums, this is apparently a bug with regular SQS queues where only a single message is affected.
In order to have a useful alarm for this issue, I would suggest setting up a dead-letter-queue (where messages get automatically delivered after a configurable number of consume-without-deletes), and alarm on the size of the dead-letter-queue (ApproximateNumberOfMessagesVisible).

I think this might have to do with the poison pill handling by this metric. After 3+ tries, the message won't be included in the metric. From the AWS docs:
After a message is received three times (or more) and not processed,
the message is moved to the back of the queue and the
ApproximateAgeOfOldestMessage metric points at the second-oldest
message that hasn't been received more than three times. This action
occurs even if the queue has a redrive policy.

How can I get Jelastic to sleep?

Yesterday I got a trial account on webhosting.net's Jelastic v2.2.2 and configured an environment with a minimum of 0 cloudlets (max 8, i.e., all dynamic, no reserved). Then I deployed a Grails war which was using 3 cloudlets after it started up (around 350 MB). It worked great, and I was very impressed.
However, I did not access my app overnight, and the billing history shows it kept using 3 dynamic cloudlets every hour, even with 0 requests (i.e., 0 MB paid traffic) for 14 hours. Is there some way I can get my Jelastic environment to sleep (i.e., hibernation) after some period with no requests (e.g., after an hour or two)? Then, when it gets a request, I'd like it to automatically wake up (i.e., allocate some cloudlets and restore memory from disk). I see how to stop and restart it manually, but I would like it to work automatically, for any requester.
edit: I found the following documentation, but does it not work for Tomcat/Grails?
Hibernation
Jelastic’s hibernation feature delivers even better utilization of cluster resources. Optimal use of resources is achieved by suspending non-active containers and returning released resources back to the cluster.
Because they are in sleep mode, hibernated containers do not consume resources (only disk space). As a result you save money while your containers are in hibernate mode. If applications are needed again the platform returns them to a running state again in just a few seconds.

It takes a little time to awaken your environment from sleep, so it's not suitable to work how you describe for production use - you would effectively lose visitors because it would seem like your service is offline due to the delays for that first access.
For that reason the 'sleep' function is only active for trial accounts, and the inactivity time before sleep is set by the hosting provider (so you should contact them directly for help on that point).
Of course you should also remember that accesses from search engine spiders etc. may keep your environment awake.

Using Puma and Sidekiq in a backend Rails app

I have a backend Rails server with Sidekiq, which serves as API server. The app works as follow:
My Rails server receives many requests from incoming API clients at the same time.
For each of these requests, the Rails server will allocate jobs to a Sidekiq server. Sidekiq server makes requests to external APIs (such as Facebook) to get data, and analyze it and return a result to Rails server.
For example, if I receive 10 incoming requests from my API clients, for each request, I need to make 10 requests to external API servers, get data and process it.
My challenge is to make my app responds to incoming requests concurrently. That is, for each incoming request, my app should process in parallel: make calls to external APIs, get data and return result.
Now, I know that Puma can add concurrency to Rails app, while Sidekiq is multi-threaded.
My question is: Do I really need Sidekiq if I already have Puma? What would be the benefit of using both Puma and Sidekiq?
In particular, with Puma, I just invoke my external API calls, data processing etc. from my Rails app, and they will automatically be concurrent.

Yes, you probably do want to use Puma and Sidekiq. There are really two issues at play here.
Concurrency (as it seems you already know) is the number of web requests that can be handled simultaneously. Using an app server like Puma or Unicorn will definitely help you get better concurrency than the default web brick server.
The other issue at play is the length of time that it takes your server to process a web request.
The reason that these two things are related is that number or requests per second that your app can process is a function of both the average processing time for each request and the number of worker processes that are accepting requests. Say your average response time is 100ms. Then a single web worker can process 10 requests per second. If you have 5 workers, then you can handle 50 requests per second. If your average response time is 500ms, then you can handle 2 reqs/sec with a single worker, and 10 reqs/sec with 5 workers.
Interacting with external APIs can be slow at times, and in the worst cases it can be very unreliable with unresponsive servers on the remote end, or network outages or slowdowns. Sidekiq is a great way to insulate your application (and your end users) from the possibility that the remote API is responding slowly. Imagine that the remote API is running slowly for some reason and that the average response time from it has slowed down to 2 seconds per request. In that case you'd only be able to handle 2.5 reqs/sec with 5 workers. With anymore traffic than that your end users might start to have a long wait time before any page on your app could respond, even those that don't make remote API calls, because all of your web workers might be waiting for the slow remote API to respond. As traffic continues to increase your users would start getting connection timeouts.
The idea with using Sidekiq is that you separate the time spent waiting on the external API from your web workers. You'd basically take the request for data from your user, pass it to Sidekiq, and then immediately return a response to the user that basically says "we're processing your request". Sidekiq can then pick up the job and make the external request. After it has the data it can save that data back into your application. Then you can use web sockets to push a notification to the user that the data is ready. Or even push the data directly to them and update the page accordingly. (You could also use polling to have the page continually asking "is it ready yet?", but that gets very inefficient very quickly.)
I hope this makes sense. Let me know if you have any questions.

Sidekiq, like Resque and Delayed Job, is designed to provide asynchronous job processing from a queue.
If you don't need jobs to be queued up and run asynchronously, there's no substantial benefit (or harm) to using Sidekiq.
If the tasks need to run synchronously (which it sounds like you might—it's not clear if clients are waiting for data or just requesting that jobs run), Sidekiq and its relatives are likely the wrong tool for the job. There is no guaranteed processing time when using Sidekiq or other solutions; jobs are pushed onto the end of the stack, however long that may be, and won't be processed until their turn comes up. If clients are waiting for data, they may time out long before your worker pool ever processes their jobs.

Scaling Puppet - when is too much for WEBrick?

I've found the following at Docs: Scaling Puppet:
Are you using the default webserver?
WEBrick, the default web server used to enable Puppet’s web services connectivity, is essentially a reference implementation, and becomes unreliable beyond about ten managed nodes. In any sort of production environment serving many nodes, you should switch to a more efficient web server implementation such as Passenger or Mongrel.
Where does the the number 10 come from in "ten managed nodes"?
I have a little over 20 nodes and I might soon have little over 30. Should I change to Passenger or not?

You should change to Passenger when you start having problems with WEBrick (or a little before). When that happens for you will depend on your workload.
The biggest problem with WEBrick is that it's single-threaded and blocking; once it's started working on a request, it cannot handle any other requests until it's done with the first one. Thus, what will make the difference to you is how much of the time Puppet spends processing requests.
Each time a client asks for its catalog, that's a request. Each separate file retrieved via puppet:/// URLs is also a request. If you're using Puppet lightly, each catalog won't take too long to generate, you won't be distributing many files on any given Puppet run, and each client won't be taking more than four to six seconds of server time every hour. If each client takes four seconds of server time per hour, 10 clients have a 5% chance of collisions0--of at least one client having to wait while another's request is processed. For 20 or 30 clients, those chances are 19% and 39%, respectively. As long as each request is short, you might be able to live with some contention, but the odds of collisions increase pretty quickly, so if you've got more than, say, 50 hosts (75% collision chance) you really ought to by using Passenger unless you're doing active performance measuring that shows that you're doing okay.
If, however, you're working your Puppet master harder--taking longer to generate catalogs, serving lots of files, serving large files, or whatever--you need to switch to Passenger sooner. I inherited a set of about thirty hosts with a WEBrick Puppet master where things were doing okay, but when I started deploying new systems, all of the Puppet traffic caused by a fresh deployment (including a couple of gigabyte files1) was preventing other hosts from getting their updates, so that's when I was forced to switch to Passenger.
In short, you'll probably be okay with 30 nodes if you're not doing anything too intense with Puppet, but at that point you need to be monitoring the performance of at least your Puppet master and preferably your clients' update status, too, so you'll know when you start running beyond the capabilities of WEBrick.
0 This is a standard birthday paradox calculation; if n is the number of clients and s is the average number of seconds of server time each client uses per hour, then the chance of having at least one collision during an hour is given by 1-(s/3600)!/((s/3600)^n*((s/3600)-n)!).
1 Puppet isn't really a good avenue for distributing files of this size in any case. I eventually switched to putting them on an NFS share that all of the hosts had access to.

For 20-30 nodes, there shouldn't be any problem. Note that passenger provides some additional features. It may be faster serving the nodes, but I am not sure how much improvement you will get if you have only 30 nodes.
You should change to passenger if you are using more than hundred nodes. I started seeing problems when the number of nodes requesting service from the puppet-master reached about 200. In my case, with the default web-server, about 5% of the nodes (random) couldn't receive the catalog during hourly run.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart