How is the jenkins slave response time calculated? - jenkins

My Jenkins slaves are online.
It's the response time showing in node list that puzzles me how the response time is calculated.
Some response time is higher than 18000ms. some is normal around 50-60ms.
I ping'ed from the slave server to jenkins master. The result is normal and never be like higher than 18000ms.
I need to display the slave response time to our system users so that they can at least know their network status quo, network performance, now the bizzare ping result is far different lower than the response time showing in jenkins ndoe list.
Can someone explain me how that response time is calculated ?
Or can someone direct me to the source code jenkins slave response time is calculated, is it different from ping ?

Can someone explain me how that response time is calculated ?
Or can someone direct me to the source code jenkins slave response time is calculated, is it different from ping ?
The response time is different from a normal ping time.
Ping is the latency between two nodes for the data to travel.
Worker response time is the time to get a response to a command sent from the controller (earlier called master) to a worker (earlier called slave) which goes through the Jenkins remoting layer.Thus, it is a kind of ping through the remoting layer.
Why response time is shown high?
Jenkins has a set time period in which it checks for the response time with workers.Now, if this time overlapped with the time when the worker is running intensive operations then the response to the commands might be delayed.
You can use the "Refresh status" button on http:///computer to calculate this time on-demand & check.

Related

Prometheus errors and log location

I have a Prometheus service running in a docker container and we have a group of servers that are rotating reporting up and down with the error "context deadline exceeded".
Our time interval is 15 seconds and timeout is 10 second.
The servers have been polled with no issues for months, no new changes have been identified. At first I suspected a networking issues but I have triple checked the entire path and all containers and everything is okay. I have even tcpdumped on the destination server and Prometheus polling server and can see the connections establish and complete, yet still being reported as down.
Can anyone tell me where I can find logs relating to "content deadline exceeded"? Is there any additional information I can find on what is causing this?
From other thread it seems like this is a timeout issue, but the servers are a subsecond away and again there is no packetloss occurring anywhere.
Thanks for any help.

How can I see how long my Cloud Run deployed revision took to spin up?

I deployed a Vue.js and a Kotlin server app. Cloud Run does promise to put a service to sleep if no request to it arise for a specific time. I did not opened my app for a day now. As I opened it - it was available almost immediatly. Since I know how long it takes to spin up when started locally I kinda don't trust the promise that Cloud Run really had put the app to sleep and span it up so crazy fast.
I'd love to know a way how I can really see how long it took for the spinup - also for startup improvement for the backend service.
After having the service inactive for some time, record the time when you request the service URL and request it.
Then go to the logs for the Cloud Run service, and use this filter to see the logs for the service:
resource.type="cloud_run_revision"
resource.labels.service_name="$SERVICE_NAME"
Look for the log entry with the normal app output after your request, check its time and compare it with the recorded time.
You can't know when the instance will be evicted or if it is kept in memory. It could happen quickly, or take hours or days before eviction. it's "serverless".
About the starting time, when I test, I deploy a new revision and I have a try on it. In the logging service, the first log entry of the new revision provides me the cold start duration. (Usually 300+ ms, compare to usual 20 - 50 ms with warm start).
The billing instance time is the sum of all the containers running times. A container is considered as "running" when it process request(s).

How to send warning email when build queue exceeds a particular length?

I manage a Jenkins server with a few hundred projects in the whole ecosystem. Many of the projects rely on upstream servers, that, unfortunately, are not always responsive. When I have a lag on these servers, my build queue can get to 10 or more. Is there a plugin or setting to send a warning email when the build queue exceeds a particular length?
I have been unable to find a plugin that does this, but you can query Jenkins for the information as detailed here: Jenkins command to get number of builds in queue.
If you have a Jenkins slave available you could set up a job that runs every 15 minutes and just hit each of the other Jenkins servers with the API call to get build queue counts (this is easy if you have just one master and many slaves.)
If you wanted to stay completely outside of Jenkins (not add another job to the mix) you could write a script to poll the Jenkins API for the information. You could then run that script under, say, a 15 minute (or some other relevant time step) timer using cron (or windows scheduled task). Admittedly then you have to dedicate some resources to running this job.
It looks like you could use python to get the build queue and check the length of the returned list. get_queue_info()
I haven't mucked about with the Jenkins API much myself so I'm not sure offhand exactly what the script would need, but it should be simple enough once you dig into it.

Jenkins - monitoring the estimated time of builds

I would like to monitor the estimated time of all of my builds to catch the cases where this value is shown as 'N/A'.
In these cases the build gets stuck (probably due to network issues in my environment) and it won't start new builds for that job until killed manually.
What I am missing is how to get that data for each job, either from api or other source.
I would appreciated any suggestion.
Thanks.
For each job, you can click "Trend" on the job run history table, and it will show you the currently executing progress along with a graph of "usual" execution times.
Using the API, you can go to http://jenkins/job/<your_job_name>/<build_number>/api/xml (or /json) and the information is under <duration> and <estimatedDuration> fields.
Finally, there is a Jenkins Timeout Plugin that you can use to automatically take care of "stuck" builds

Jenkins - Master and Slave asynchronous

I've installed a master jenkins instance and 2 slave nodes.
Both slaves are not synchronous with the master. Sometimes it shows that the slaves are 2 days or 1 hour in the future, sometimes it shows that time on slaves is behind the master - it seems to randomize.
Because of this some selenium tests or builds or other jobs doesn't work correctly anymore. The problem occurred suddenly and it doesn't matter which version of jenkins has been installed.
Has anyone an idea how to fix this problem?
Thank you very much.
Cheers
Christoph
It is hard to explain why the time difference would vary abruptly between the two machines. I assume you are referring to the information given by the http://jenkins.mydomain/computer/ url.
Normally you want to keep your machines in time sync, and enable NTP clients on each host, each pointing to the same set of NTP servers, either internal to your organization (if that is available), or the standard free NTP services available on the web.
Do you have this setup already and see abrupt time variations? If so, review your list of NTP services and make sure to use reliable ones and also the same list, it should help. Maybe narrow it down to just one service and then expand if need be.

Resources