What are the 4 golden points to monitor Jenkins E2E server? - jenkins

I have a task to monitor 4 golden signals of Jenkins E2E server. I already configured Latency, Network Throughput & Errors. Please let me know what metrics I should include along with other 3 metrics?

Related

Is it possible to run e2e tests in different google kubernetes engine clusters?

I have two private GKE clusters: the first with jenkins, and the second with an app to be tested. The problem is that jenkins can not communicate with the app in the second cluster.
Making a direct connection between these clusters is not allowed. Unfortunately, installing jenkins in a cluster with the app is not allowed. Spinnaker has access to all clusters, so we started to use spinnaker to run our containerized tests. In some ways it works, but there is a problem with gathering test results or retrying the tests. Also the whole setup for doing this is very complex.
I am looking for simpler solution, maybe there is a way to run tests on a second cluster using jenkins? Any help will be appreciated
Running e2e tests using Spinnaker is a tricky task.
We have standby Jenkins in the same cluster as Spinnaker which have the same network rules, so such Jenkins has connection to the same clusters/namespaces/resources as Spinnaker does.

Query about jenkins in linux

I am having problem with the executor in jenkins.
Can anyone please tell me about executor in jenkins?
Also, explain it's practical implementation.
from : Jenkins User Documentation Home - Glossary
Executor: A slot for execution of work defined by a Pipeline or Project on a Node. A Node may have zero or more Executors configured which corresponds to how many concurrent Projects or Pipelines are able to execute on that Node.
An Executor does "the work" of executing the job steps. In our configuration, we have many nodes, each one corresponding to a VM host / server. We have each node configured with one executor per core. That let's us run one job per core, which is a generally good performance balance. That gives use the ability to run n jobs in parallel on an n-core VM. There are no rules regarding the ratios, depends really on what your jobs do and where the performance issues may be.

How to support disconnections or reboots for Jenkins slaves on Windows?

I have many long running jobs that take almost a day to complete. Splitting is not possible. If the network fails then all progress is lost.
How can a slave survive disconnections?
EDIT 1
I have around 300 slaves running in Windows tied to one single Jenkins instance.
Slaves are connected using the manual method java - jar slave.jar -jlnpUrl <serverUrl> <slaveName>. I cannot run them as a regular Windows service because some tests manipulate GUI elements and require a real interactive session otherwise test fail.
EDIT 2
According to Jenkins Cookbook I should be using Cygwin + OpenSSH approach instead of custom script with JLNP-connector. Could this improve stability?
Jenkins was not originally designed for builds to survive across server or slave restarts. There is a CloudBees Long-Running Build plugin that supports long-running builds but, unfortunately, it is available only for enterprise users and still beta.
I didn't find any free alternative and would suggest you to try to improve your network stability and to split your long running jobs. At least you can divide your tests on logical groups (test suites).
Jenkins now has a workflow plugin. It claims to handle "server" restart and loss-of connectivity with slave.
From the link
A key feature of a workflow execution is that it's suspendable. That
is, while the workflow is running your script, you can shut down
Jenkins or lose a connectivity to a slave. When it comes back, Jenkins
will still remember what it was doing, and your workflow script
resumes execution as if it was never interrupted. A technique known as
the "continuation-passing style" execution plays a key role in
achieving this.
(not tested at all)
Edit: Copied from #Jesse Glick's comments :
Workflow is open source and available for anyone running Jenkins 1.580.1+ or later. CloudBees Jenkins Enterprise does include a checkpoint feature, but this is not necessary simply to have a build survive slave disconnections and Jenkins restarts: that is automatic

Jenkins Build Stability/Statisitics Report plugin

I have a jenkins instance running around 200 jobs. What I need is a plugin to show the build statistics for all the jobs.
Total Builds for each project
Failures
Success
Average time per build.
Searched a lot, but couldn't find a proper report plugin. Please help
These are few which you can look depending on how much customization/features you want to do/display:
https://wiki.jenkins-ci.org/display/JENKINS/Global+Build+Stats+Plugin
https://wiki.jenkins-ci.org/display/JENKINS/build-metrics-plugin
https://wiki.jenkins-ci.org/display/JENKINS/Project+Statistics+Plugin
https://wiki.jenkins-ci.org/display/JENKINS/eXtreme+Feedback+Panel+Plugin
https://wiki.jenkins-ci.org/display/JENKINS/InfluxDB+Plugin
https://wiki.jenkins-ci.org/display/JENKINS/CouchDB+Statistics+Plugin
And there is Dashboard-View too.
For collecting nodes machine metrics (CPU Time/ Used Memory / Build Time per node etc.), I found the monitoring plugin to be the best.
https://wiki.jenkins.io/display/JENKINS/Monitoring
When it comes to aggregates build times group by job, I couldn't find anything good within Jenkins UI, but if you have a datadog license, you can just enable the datadog Jenkins plugin, configure the datadog api key and hostname in Jenkins Config, and you are good to go.
https://www.datadoghq.com/blog/monitor-jenkins-datadog/

Tool for Job scheduling with logs instead of Jenkins (Rails project)

I have more than 30 rake tasks added to Jenkins for scheduling jobs. (Rails project)
But the jenkins server goes down frequently and uses 100% of CPU at most of the time.
Please suggest me a better job scheduler instead of Jenkins, which is also capable of
doing steps like
Notify an email when jobs fail
Log the jobs terminal output
Add dependency to jobs
Your question seems to come out as "recommend me a CI server".
But - why does Jenkins fall over and/or use 100% CPU most of the time? I'd be looking at why this is. My experience of Jenkins is that it is pretty stable and low overhead. If your hardware / OS / something else is flaky or just under provisioned for the task then swapping Jenkins out isn't going to fix that.

Resources