How to stop a Flink job using REST API - jenkins

I am trying to deploy a job to Flink from Jenkins. Thus far I have figured out how to submit the jar file that is created in the build job. Now I want to find any Flink jobs running with the old jar, stop them gracefully, and start a new job utilizing my new jar.
The API has methods to list the jobs, cancel jobs, and submit jobs. However, there does not seem to be a stop job endpoint. Any ideas on how to gracefully stop a job using API?

Even though the stop endpoint is not documented, it does exist and behaves similarly to the cancel one.
Basically, this is the bit missing in the Flink REST API documentation:
Stop Job
DELETE request to /jobs/:jobid/stop.
Stops a job, result on success is {}.
For those who are not aware of the difference between cancelling and stopping (copied from here):
The difference between cancelling and stopping a (streaming) job is the following:
On a cancel call, the operators in a job immediately receive a cancel() method call to cancel them as
soon as possible.
If operators are not not stopping after the cancel call, Flink will start interrupting the thread periodically
until it stops.
A “stop” call is a more graceful way of stopping a running streaming job. Stop is only available for jobs
which use sources that implement the StoppableFunction interface. When the user requests to stop a job,
all sources will receive a stop() method call. The job will keep running until all sources properly shut down.
This allows the job to finish processing all inflight data.

As i'm using Flink 1.7, below is how to cancel/stop flink job about this version.
Already Tested By Myself
Request path:
/jobs/{jobid}
jobid - 32-character hexadecimal string value that identifies a job.
Request method: PATCH
Query parameters:
mode (optional): String value that specifies the termination mode. Supported values are: "cancel, stop".
Example
10.xx.xx.xx:50865/jobs/4c88f503005f79fde0f2d92b4ad3ade4?mode=cancel
host an port is available when start yarn-seesion
jobid is available when you submit a job
Ref:
https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/rest_api.html`

Related

Jenkins job gets stuck in queue. How to make it fail automatically if the worker is offline

So on my jenkins sometimes my worker "slave02" gets offline and needs to manually get unstuck. I will not get into details, because it's not the point of this question here.
The scenario so far:
I've configured a job intentionally to get processed on that exact worker. But obviously it would not start since the worker is offline. I want to get notified when that job gets stuck in queue. I've tried to use Build Timeout Jenkins Plugin and I've configured it to fail the build if it waits for longer than 5 minutes to complete the job.
The problem with this is that the plugin makes sure the job fails 5 minutes after the build gets started... which does not help in my case. Because the job doesn't start, rather it sits in queue waiting to get processed but that never happens. So my question is - is there a way to make the job check if that worker is down to just automatically fail the build and send notification?
I am pretty sure that can be done but I could not find a thread where this type of scenario is being discussed.

Get Jenkins build job duration once finished

I am using Jenkins integration server for my CI/CD.
I am working with freestyle projects.
I want to get build duration once finished (in seconds) using RESTFUL API (JSON).
This is what i tried:
"duration=$(curl -g -u login:token--silent "$BUILD_URL/api/json?pretty=true&tree=duration" | jq -r '.duration')"
Duration is always equel to 0 even though i did this shell script in a post buil task.
There are a few reasons why this may not work for you, but the most probable reason is that the build hasn't finished when you make the API call.
I tried it on our instance and for finished jobs it works fine and for running jobs it always returns 0. If your post build task is executed as part of the job, then the job has probably not finished executing yet, which is why you are always getting 0.
The Api call for the build will not contain the duration attribute as long as the build is running, and therefore you cannot use that mechanism during the build duration.
However you have a nice alternative for achieving what you want using Freestyle jobs.
The solution, which still uses the Api method, is to create a separate generic job for updating your data base with the results, this jobs will receive as parameters the project name and the build number, run the curl command for receiving the duration, update your database and run any other logic you need.
This job can now be called from any freestyle job using the Parameterized Trigger plugin post task with the relevant build environment parameters.
This has the additional benefit that the duration update mechanism is controlled in a single job and if updates are needed they can be made in a single location avoiding the need to update all separate jobs.
Assuming your job is called Update-Duration and it receives two parameters Project and Build the post trigger can look like the following:
And thats it, just add this tigger to any job that needs it, and in the future you can update the logic without changing the calling jobs.
Small thing, in order to avoid a race condition that can be caused if the caller job has not yet finished the execution you can increase the quite period of your data base updater job to allow enough time for the caller jobs to finish so the duration will be populated.

I do need to reorder jobs from build queue which are blocked by Block Queued Job Plugin

I do have a job which requires external ressources and therefore it should not executed twice or more often. I used Block Queued Job Plugin to block the job if of a list of jobs is currently running.
This creates sometimes a build queue with some jobs blocked by the plugin ... which is correct.
But now I do need to reorder the build queue to give a specific build a chance to be executed.
There is usually just the fifo principle in place but I do need to overwrite this in specific situations manually.
simple queue plugin ... can not deal with blocked jobs
priority sorter .... sees to be outdated and not working for such a simple thing ...
Currently I write down the parameter handed over per job delete all and afterwards rebuild with the new order and with the parameters which were manually written down.
This is quit bad and I do need a working solution. Maybe I missed the right plugin.

Trigger a build asynchronously in Jenkins

I have a job A running in Jenkins, which kicks off a process A on a VM, waits for it to finish, picks up the report generated by it and sends it as an attachment to the build notification. The problem is this process A takes too long to finish and job A keeps waiting on it. Is there any way I can start this process A, stop job A and when process A is done, trigger a new job B which would pick up the report generated by process A and sends it out with build sucess/failure status.
Any help is appreciated.
Thanks
Jenkins provides an API for kicking off jobs via simple HTTP requests. You kick off job B using curl or something like that, as the final step in process A on the VM.
The docs are on the Jenkins site. You can use your own Jenkins find the specific URLs for kicking off particular jobs; there's a link in the bottom right hand corner of the Jenkins page.
Perhaps an even better match for your use case would be a job of type "Monitor an external job". I have not used it myself, but from the documentation it sounds like a useful tool. The docs are at: https://wiki.jenkins-ci.org/display/JENKINS/Monitoring+external+jobs

Blocking a triggered Jenkins job until something *outside* Jenkins is done

I have a Jenkins job which starts a long-running process outside of Jenkins. The job itself is triggered by Gerrit.
If this job is triggered again while the long-running process is ongoing, I need to ensure that the job remains on the Jenkins queue until said process has completed. Effectively I want to ensure that the job never runs in parallel with itself, with the wrinkle that "the job" is really the Jenkins job plus the external long-running process.
I can't find any way to achieve this. The External Resource Dispatcher plugin seems like it could work, but every time I've configured it on our system, Jenkins got extremely unstable (refusing page loads for minutes on end, slave threads dying with NPEs). Everything else I can see, such as the Exclusions plugin, depend on Jenkins itself controlling the entirety of the job.
I've tried hacking something together with node labels - having the job depend on a label "can_run", assigning that label to master, and then having the job execute a Groovy script that removes that label from master. (Theoretically there would be another Jenkins job that adds the label back, which would be triggered by the end of the long-running process.) But it didn't work: if there were any queued instances of the job on Jenkins, they went ahead and started right away even though the label had been removed.
I don't know what else to try! Is there anything other than a required node label being missing which will cause Jenkins to queue the job if it is triggered, but not start it?
I guess the long-running process is triggered and your job return immediately, which make it an async process, right? I would suggest you handle the long-running process detection and waiting logic in your trigger process. Every time before you trigger the job, check if the long-running process is running, if not, trigger it.
Actually I am not quite getting what you are trying to do. Basically because of that long-running process, it is impossible for you to run 2 jobs in parallel. If this is true, make it non parallel job.

Resources