I have a Jmeter script which I am running from local environment and achieved 20 tps(transactions) per second.
I have moved the same script to Jenkins and ran it from there. It worked as expected.
My Next step is to reduce tps from 20 to 2.
So I introduced Ramp up time of 30 seconds and it worked as expected from local environment.
I moved the script to jenkins, it gave me 20 tps when I run the script from jenkins.
Can someone tell me why this is happening and what I need to do to fix this.
I have tried several approaches like hard coding the ramp up time, creating a new jenkins project with new script.
Thanks in advance
It's hard to say what exactly is wrong without seeing your full Thread Group configuration, normally people use Timers for throttling JMeter's throughput to the given number of requests/transactions per second
Depending on what you're trying to achieve you can consider using:
Constant Throughput Timer (however it's precise enough on "minute" level, for the first minute of your test execution you will need to manipulate the load using the ramp-up approach
Precise Throughput Timer which is more powerful and "precise" however it's configuration is more complex and you need to provide test duration, units, etc.
Throughput Shaping Timer which is kind of a balance between precision and simplicity however it's a custom plugin so you will have to install it on Jenkins master/slaves
Related
We are testing Cloud Dataflow which pulls message from Pub/Sub subscription and convert data to BigQuery TableRow and load them to BigQuery as load job in every 1 min 30 sec.
We can see the pipeline works well and can process 500,000 elements per second with 40 workers. But when trying autoscaling, the number of workers unexpectedly goes up to 40 and stay there even if we send only 50,000 messages to Pub/Sub. In this situation, no unacknowledged message and workers' CPU utilizations are bellow 60%. One thing we noticed is that the Dataflow system lag goes up slowly.
Is system lag affects autoscaling?
If so, is there any solutions or ways to debugging this problem?
Is system lag affects autoscaling?
Google does not really expose the specifics of its autoscaling algorithm. Generally, though, it is based on CPU utilization, throughput and backlog. Since you're using Pub/Sub, backlog in by itself should be based on the number of unacknowledged messages. Still, the rate at which these are being consumed (i.e. the throughput at the Pub/Sub read stage) is also taken into account. Now, throughput as a whole relates to the rate at which each stage processes input bytes. As for CPU utilization, if the aforementioned don't "run smoothly", 60% usage is already too high. So, system lag at some stage could be interpreted as the throughput of that stage and therefore should affect autoscaling. Then again, these two should not always be conflated. If for example a worker gets stuck due to a hot key, system lag is high but there's no autoscaling, as the work is not parallelizable. So, all in all, it depends.
If so, is there any solutions or ways to debugging this problem?
The most important tools you have at hand are the execution graph, stackdriver logging and stackdriver monitoring. From monitoring, you should consider jvm, compute and dataflow metrics. gcloud dataflow jobs describe can also be useful, mostly to see how steps are fused and, by extension, see which steps are run in the same worker, like so:
gcloud dataflow jobs describe --full $JOB_ID --format json | jq '.pipelineDescription.executionPipelineStage[] | {"stage_id": .id, "stage_name": .name, "fused_steps": .componentTransform }'
Stackdriver monitoring exposes all three of the main autoscaling components.
Now, how you're going to take advantage of the above obviously depends on the problem. In your case, at first glance I'd say that, if you can work without autoscaling and 40 workers, you should normally expect that you can do the same with autoscaling when you've set maxNumWorkers to 40. Then again, the number of messages alone does not say the full story, their size/content also matters. I think you should start by analyzing your graph, check which step has the highest lag, see what's the input/output ratio and check for messages with severity>=WARNING in your logs. If you shared any of those here maybe we could spot something more specific.
I'm curious if anyone can point me towards greater visibility into how various Beam Runners manage autoscaling. We seem to be experiencing hiccups during both the 'spin up' and 'spin down' phases, and we're left wondering what to do about it. Here's the background of our particular flow:
1- Binary files arrive on gs://, and object notification duly notifies a PubSub topic.
2- Each file requires about 1Min of parsing on a standard VM to emit about 30K records to downstream areas of the Beam DAG.
3- 'Downstream' components include things like inserts to BigQuery, storage in GS:, and various sundry other tasks.
4- The files in step 1 arrive intermittently, usually in batches of 200-300 every hour, making this - we think - an ideal use case for autoscaling.
What we're seeing, however, has us a little perplexed:
1- It looks like when 'workers=1', Beam bites off a little more than it can chew, eventually causing some out-of-RAM errors, presumably as the first worker tries to process a few of the PubSub messages which, again, take about 60 seconds/message to complete because the 'message' in this case is that a binary file needs to be deserialized in gs.
2- At some point, the runner (in this case, Dataflow with jobId 2017-11-12_20_59_12-8830128066306583836), gets the message additional workers are needed and real work can now get done. During this phase, errors decrease and throughput rises. Not only are there more deserializers for step1, but the step3/downstream tasks are evenly spread out.
3-Alas, the previous step gets cut short when Dataflow senses (I'm guessing) that enough of the PubSub messages are 'in flight' to begin cooling down a little. That seems to come a little too soon, and workers are getting pulled as they chew through the PubSub messages themselves - even before the messages are 'ACK'd'.
We're still thrilled with Beam, but I'm guessing the less-than-optimal spin-up/spin-down phases are resulting in 50% more VM usage than what is needed. What do the runners look for beside PubSub consumption? Do they look at RAM/CPU/etc??? Is there anything a developer can do, beside ACK a PubSub message to provide feedback to the runner that more/less resources are required?
Incidentally, in case anyone doubted Google's commitment to open-source, I spoke about this very topic with an employee there yesterday, and she expressed interest in hearing about my use case, especially if it ran on a non-Dataflow runner! We hadn't yet tried our Beam work on Spark (or elsewhere), but would obviously be interested in hearing if one runner has superior abilities to accept feedback from the workers for THROUGHPUT_BASED work.
Thanks in advance,
Peter
CTO,
ATS, Inc.
Generally streaming autoscaling in Dataflow works like this :
Upscale: If the pipeline's backlog is more than a few seconds based on current throughput, pipeline is upscaled. Here CPU utilization does not directly affect the amount of upsize. Using CPU (say it is at 90%), does not help in answering the question 'how many more workers are required'. CPU does affect indirectly since pipelines fall behind when they they don't enough CPU thus increasing backlog.
Downcale: When backlog is low (i.e. < 10 seconds), pipeline is downcaled based on current CPU consumer. Here, CPU does directly influence down size.
I hope the above basic description helps.
Due to inherent delays involved in starting up new GCE VMs, the pipeline pauses for a minute or two during resizing events. This is expected to improve in near future.
I will ask specific questions about the job you mentioned in description.
We're building a web-app where users will be uploading potentially large files that will need to be processed in the background. The task involves calling 3rd-party APIs so each job can take several hours to complete. We're using DelayedJob to run the background jobs. With every user kicking off a background job, each of which will take a few hours to finish, that will add up to a lot of background jobs every quickly. I am wondering what would be the best way to setup the deployment for this? We're currently hosted on DigitalOcean. I've kicked off 10 DelayedJob workers. Each one (when ideal) takes up 157MB. When actively running it utilizes around 900 MB. Our user-base right now is pretty small so it's not an issue but will be one soon. So on a 4GB droplet, I can probably run like 2 or 3 workers at a time. How should we approach this issue? Should we be looking at using DigitalOcean's API to auto-spin cheap droplets on demand? Should we subscribe to high-memory droplets on a monthly basis instead? If we go with auto-spinning droplets, should we stick with DigitalOcean or would Heroku make more sense? Or is the entire approach wrong and should we be approaching it from an entire different direction? Any help/advice would be very much appreciated.
Thanks!
It sounds like you are limited by memory on the number of workers that you can run on your DigitalOcean host.
If you are worried about scaling, I would focus on making the workers as efficient as possible. Have you done any benchmarking to understanding where the 900MB of memory is being allocated? I'm not sure what the nature of these jobs are, but you mentioned large files. Are you reading the contents of these files into memory, or are you streaming them? Are you using a database with SQL you can tune? Are you making many small API calls when you could be using a batch endpoint? Are you assigning intermediary variables that must then be garbage collected? Can you compress the files before you send them?
Look at the job structure itself. I've found that background jobs work best with many smaller jobs rather than one larger job. This allows execution to happen in parallel, and be more load balanced across all workers. You could even have a job that generates other jobs. If you need a job to orchestrate callbacks when a group of jobs finishes there is a DelayedJobGroup plugin at https://github.com/salsify/delayed_job_groups_plugin that allows you to invoke a final job only after the sibling jobs complete. I would aim for an execution time of a single job to be under 30 seconds. This is arbitrary but it illustrates what I mean by smaller jobs.
Some hosting providers like Amazon provide spot instances where you can pay a lower price on servers that do not have guaranteed availability. These pair well with the many fewer jobs approach I mentioned earlier.
Finally, Ruby might not be the right tool for the job. There are faster languages, and if you are limited by memory, or CPU, you might consider writing these jobs and their workers in another language like Javascript, Go or Rust. These can pair well with a Ruby stack, but offload computationally expensive subroutines to faster languages.
Finally, like many scaling issues, if you have more money than time, you can always throw more hardware at it. At least for a while.
I thing memory and time is more problem for you. you have to use sidekiq gem for this process because it will consume less time and memory consumption for doing the same job,because it uses redis as database which is key value pair db.if the problem continues go with java script.
I'm running a job which reads about ~70GB of (compressed data).
In order to speed up processing, I tried to start a job with a large number of instances (500), but after 20 minutes of waiting, it doesn't seem to start processing the data (I have a counter for the number of records read). The reason for having a large number of instances is that as one of the steps, I need to produce an output similar to an inner join, which results in much bigger intermediate dataset for later steps.
What should be an average delay before the job is submitted and when it starts executing? Does it depend on the number of machines?
While I might have a bug that causes that behavior, I still wonder what that number/logic is.
Thanks,
G
The time necessary to start VMs on GCE grows with the number of VMs you start, and in general VM startup/shutdown performance can have high variance. 20 minutes would definitely be much higher than normal, but it is somewhere in the tail of the distribution we have been observing for similar sizes. This is a known pain point :(
To verify whether VM startup is actually at fault this time, you can look at Cloud Logs for your job ID, and see if there's any logging going on: if there is, then some VMs definitely started up. Additionally you can enable finer-grained logging by adding an argument to your main program:
--workerLogLevelOverrides=com.google.cloud.dataflow#DEBUG
This will cause workers to log detailed information, such as receiving and processing work items.
Meanwhile I suggest to enable autoscaling instead of specifying a large number of instances manually - it should gradually scale to the appropriate number of VMs at the appropriate moment in the job's lifetime.
Another possible (and probably more likely) explanation is that you are reading a compressed file that needs to be decompressed before it is processed. It is impossible to seek in the compressed file (since gzip doesn't support it directly), so even though you specify a large number of instances, only one instance is being used to read from the file.
The best way to approach the solution of this problem would be to split a single compressed file into many files that are compressed separately.
The best way to debug this problem would be to try it with a smaller compressed input and take a look at the logs.
In Jenkins I have 100 java projects. Each has its own build file.
Every time I want clear the build file and compile all source files again.
Using bulkbuilder plugin I tried compling all the jobs.. Having 100 jobs run parallel.
But performance is very bad. Individually if the job takes 1 min. in the batch it takes 20mins. More the batch size more the time it takes.. I am running this on powerful server so no problem of memory and CPU.
Please Suggest me how to over come this.. what configurations need to be done in jenkins.
I am launching jenkins using war file.
Thanks..
Even though you say you have enough memory and CPU resources, you seem to imply there is some kind of bottleneck when you increase the number of parallel running jobs. I think this is understandable. Even though I am not a java developer, I think most of the java build tools are able to parallelize build internally. I.e. building a single job may well consume more than one CPU core and quite a lot of memory.
Because of this I suggest you need to monitor your build server and experiment with different batch sizes to find an optimal number. You should execute e.g. "vmstat 5" while builds are running and see if you have idle cpu left. Also keep an eye on the disk I/O. If you increase the batch size but disk I/O does not increase, you are consuming all of the I/O capacity and it probably will not help much if you increase the batch size.
When you have found the optimal batch size (i.e. how many executors to configure for the build server), you can maybe tweak other things to make things faster:
Try to spend as little time checking out code as possible. Instead of deleting workspace before build starts, configure the SCM plugin to remove files that are not under version control. If you use git, you can use a local reference repo or do a shallow clone or something like that.
You can also try to speed things up by using SSD disks
You can get more servers, run Jenkins slaves on them and utilize the cpu and I/O capacity of multiple servers instead of only one.