The Dataflow jobs are cluttered all over my dashboard, and I'd like to delete the failed jobs from my project. But in the dashboard, I don't see any option to delete the Dataflow job. I'm looking for something like below at least,
$ gcloud beta dataflow jobs delete JOB_ID
To delete all jobs,
$ gcloud beta dataflow jobs delete
Can someone please help me with this?
Unfortunately, this is not currently possible. You cannot delete a Dataflow job. This is something that you could request via the public issue tracker (I've wanted it in the past too).
gcloud dataflow jobs --help
COMMANDS
COMMAND is one of the following:
cancel
Cancels all jobs that match the command line arguments.
describe
Outputs the Job object resulting from the Get API.
drain
Drains all jobs that match the command line arguments.
list
Lists all jobs in a particular project.
run
Runs a job from the specified path.
show
Shows a short description of the given job.
As Graham mentions, it is not possible to delete Dataflow jobs. However, note that you can filter the job list to only show the jobs you care about. For example, Status:Running,Succeeded will exclude all failed or cancelled jobs.
On the commandline, you can use --status=(active|terminated|all):
gcloud beta dataflow jobs list --status=active
Related
I am trying to write a script to automate the deployment of a Java Dataflow job. The script creates a template and then uses the command
gcloud dataflow jobs run my-job --gcs-location=gs://my_bucket/template
The issue is, I want to update the job if the job already exists and it's running. I can do the update if I run the job via maven, but I need to do this via gcloud so I can have a service account for deployment and another one for running the job. I tried different things (adding --parameters update to the command line), but I always get an error. Is there a way to update a Dataflow job exclusively via gcloud dataflow jobs run?
Referring to the official documentation, which describes gcloud beta dataflow jobs - a group of subcommands for working with Dataflow jobs, there is no possibility to use gcloud for update the job.
As for now, the Apache Beam SDKs provide a way to update an ongoing streaming job on the Dataflow managed service with new pipeline code, you can find more information here. Another way of updating an existing Dataflow job is by using REST API, where you can find Java example.
Additionally, please follow Feature Request regarding recreating job with gcloud.
We're using BitBucket to host our Git repositories.
We have defined build jobs in a locally-hosted Jenkins server.
We are wondering whether we could use BitBucket pipelines to trigger builds in Jenkins, after pull request approvals, etc.
Triggering jobs in Jenkins, through its REST API is fairly straightforward.
1: curl --request POST --user $username:$api_token --head http://jenkins.mydomain/job/myjob/build
This returns a location response header. By doing a GET on that, we can obtain information about the queued item:
2: curl --user $username:$api_token http://jenkins.mydomain/queue/item/<item#>/api/json
This returns JSON describing the queued item, indicating whether the item is blocked, and why. If it's not, it includes the URL for the build. With that, we can check the status of the build, itself:
3: curl -–user $username:$api_token http://jenkins.mydomain/job/myjob/<build#>/api/json
This will return yet more json, indicating whether the job is currently building, and if it's completed, whether the build succeeded.
Now BitBucket pipeline steps run in Docker containers, and have to run on Linux. Our Jenkins build jobs run on a number of platforms, not all of which are Linux. But BitBucket shouldn't care. Making the necessary REST API calls can be done in Linux, as I am in the examples above.
But how do we script this?
Do we create a single step that runs a shell script that runs command #1, then repeatedly calls command #2 until the build is started, then repeatedly calls command #3 until the build is done?
Or do we create three steps, one for each? Do BitBucket pipelines provide for looping on steps? Calling a step, waiting for a bit, then calling it again until it succeeds?
I think you should either use Bitbucket pipeline or Jenkins pipeline. Using both will give you to many options and make the project more complex than it should be.
I know there's a gcloud command for this:
gcloud dataflow jobs list --help
NAME
gcloud dataflow jobs list - lists all jobs in a particular project, optionally filtered by region
DESCRIPTION
By default, 100 jobs in the current project are listed; this can be overridden with the gcloud --project flag, and the --limit flag.
Using the --region flag will only list jobs from the given regional endpoint.
But I'd like to retrieve this list programmatically through Dataflow Java SDK.
The problem I'm trying to solve:
I have a Dataflow pipeline in streaming mode and I want to set the update option (https://cloud.google.com/dataflow/pipelines/updating-a-pipeline) accordingly based on if this job has been deployed or not.
e.g. When I'm deploying this job for the first time, the code shouldn't set this update flag to true since there's no existing job to update (otherwise the driver program will complain and fail to launch); and the code should be able to query the list of running jobs and acknowledge the job's been running and set the update option to update it (otherwise DataflowJobAlreadyExistsException is thrown).
I've found org.apache.beam.runners.dataflow.DataflowClient#listJobs(String) that can achieve this.
I run dataflow jobs within a unix shell script, and need to know each job final/completion status, is there any command line tool to grab the job completion status?
There's indeed a CLI to retrieve jobs execution status :
gcloud dataflow jobs list --project=<PROJECT_ID> --filter="id=<JOB_ID>" --format="get(state)"
Yes! Dataflow has a CLI that is available as part of gcloud.
You may need to install gcloud alpha components:
$ gcloud components update alpha
After that, you should be able to use gcloud alpha dataflow jobs list to list all the jobs in a project or gcloud alpha dataflow jobs show <JOBID> for more information on a specific job.
You can find more details about this command and others below at https://cloud.google.com/sdk/gcloud/reference/alpha/dataflow/jobs/list
I have Jenkins setup of 6 Slaves and master, all windows machines. Now I have a housekeeping Jenkins job which I want to periodically run on all the slaves and master, as this job does following tasks
Delete unused temporary files.
Delete unwanted processes, as some of the tests are leaking processes (why leak is different question).
Set certain environment variables, as sometimes I want to push environment variable changes to all machines.
Any idea how can I force Jenkins to run this one job on all slaves and master once every day? As a work around I can create multiple Jenkins job and mark each one to run on one particular slave or master, but I would rather avoid having so many duplicate jobs.
The Node and Label Parameter plugin allows you to parameterize where a job should be run. The job can be run on more than one node -- each node shows up as a separate execution in the job's build history. When multiple nodes are selected, you can configure whether the job should continue to run on other nodes if an execution fails.
I had a similar need, but using the Node and Label Parameter Plugin didn't seem quite right, as I do not want to parameterize my cleanup jobs.
I found a more satisfying answer in this post and thought it would also benefit to this question: Jenkins - Running a single job in master as well as slave.
Here is some documentation on how to configure a "Matrix project": https://wiki.jenkins.io/display/JENKINS/Building+a+matrix+project.
What you are looking for is the "Slave axis". It's not very well documented in the page above, but it appears as an option of the "Add axis" menu whenever there are more than one node. Here's a screenshot of the interesting part:
Updates according to recent Jenkins
Pipeline type:
On "Configuration" page for pipeline: