Is it possible to get the worker-id from a apache beam job? Or any unique identifier that can tell about the current worker ? Cause I want to use it as label for my metric.
Thank you.
I don't believe this is supported but feel free to file a JIRA for this and mention your use-case. Also note that such information might be runner specific so you might be able to get information regarding the workers started for a job through the API of the runner you are using.
Related
Wondering if there is some kind of "hook" to place a piece of code that would be executed when apache beam pipeline is being shutdown (for whatever reason - crash, cancel)
I need to delete a subscription on pubsub topic each time Dataflow is stopped.
Apache Beam is not naturally suited for this sort of flow. For this you may want to look at an orchestration engine, such as Apache Airflow.
With Airflow you should be able to schedule any sort of script to run after a Beam pipeline finishes/fails/is cancelled, etc. Take a look at it!
There are some examples of waiting for pipelines to finish and indeed managing Pubsub topics/subscriptions in a ExampleUtils class in the examples folder in the apache/beam repository here. See if there is anything you can use in the waitUntilFinish and tearDown methods.
This is java code - not sure if that is the language you use.
(In the long run #Pablo's suggestion to separate this further from the pipeline code may be best - perhaps depends on your exact goal here.)
We have multiple jenkins-jobs scheduled at roughly near the same time every night.
I would like a report-summary of status to be available to me / or sent to me.
I do not repeatedly want to do a walk through test suite every day.
Much appreciated any advice on topic ?
The Global Build Stats plugin might fit your needs. It does not support scheduled email, but if you need that you could use the rest API it exposes to write your own.
I have been looking at using projects built using spring-cloud-task within spring-cloud-dataflow. Having looked at the example projects and the documentation, the indication seems to be that tasks are launched manually through the dashboard or the shell. Does spring-cloud-dataflow provide any way of scheduling task definitions so that they can run for example on a cron schedule? I.e. Can you create a spring-cloud-task app which itself has no knowledge of a schedule, but deploy it to the dataflow server and configure the scheduling there?
Among the posts and blogs I have looked at I noticed the following:
https://spring.io/blog/2016/01/27/introducing-spring-cloud-task
Some of the Q&A afterwards hints at this being a possibility, with the reference to triggers, but I think this was discussed before it was released.
Any advice would be greatly appreciated, many thanks.
There are few ways you could launch Tasks in Spring Cloud Data Flow. Following are the available options today.
Launch it using TriggerTask; with this you could either choose to launch it with fixedDelay or via a cron expression - example here.
Launch it via an event in streaming pipeline. Imagine a use-case where you would want to create a "thumbnail" as and when there's a new image (event) in s3-bucket or in a file-system directory; the "thumbnail" operation could be a task in this case - example here.
Lastly, in the upcoming releases, we will port over "scheduler" functionality from Spring XD to Spring Cloud Data Flow.
Yes, Spring Cloud Data Flow does provide a scheduling option. To enable it, you need to add below arguments while starting the server:
--spring.cloud.dataflow.features.schedules-enabled=true
While I'm interested in Jenkins as a means to provide continuous build functionality, I'm really even more interested in Jenkins as a means to exercise my application in its prod environment against unexpected changes in infrastructure beyond my control that may effect my application. I can't find a ton of information on using Jenkins in this way, but I was wondering if there are others out there doing this? Essentially I have a project that runs maven test parametized with my prod url, but for these projects I don't actually do any building. Are there other tools besides Jenkins I should be considering to do this? If so, why?
If you've got your tests set up to run via Maven already, I think Jenkins would be a good option. You could set up email, IM or SMS alerts using Jenkins plugins, and keep a record of the results within Jenkins.
The only down sides I can think of are:
You'll probably want to run your monitoring a lot more frequently than a regular CI job, so you might want to keep more build records than the default of 10.
If you already have a system like Nagios or OpenView to monitor system resources, it might be better to integrate app monitoring into that rather than having another source of truth.
Jenkins Provides a plugin called Status Monitor Plugin
We have ours set to check a specific URL every 5 mins and email us when something fails. Our problem is that it won't sent emails to cell phone carrier email addresses. However, if regular email will suffice, the setup time for a plugin is less than a half hour and it is reliable as long as the Jenkins server stays up.
Sorry for another non programming question, but I'm using Quartz.NET, a scheduler for .NET applications, for a Windows Service which allows users to schedule transferrig of files that match a regular expression from various sources - for example the user may schedule a job to occur every day at 6pm that transfers the files from a network path to a FTP server.
The adding jobs and management is done using an ASP.NET project, and I'm creating a Dashboard to display useful info to the user. I have the following information on the dashboard so far:
Total number of jobs
Windows Service status
Time since scheduler active
I know it's a very general question, but what other snippets of info can I add to the dashboard, as it's very sparse at the moment.
I've worked as a product manager on a few schedulers. Here are some common requirements for these types of things, but I urge you to talk to some target users to find out if they are applicable to your application.
The use cases:
1. Trying to identify if the jobs are running okay.
2. If jobs are not running okay, give the user clues as to the cause. Give user tools to debug and fix.
General requirements:
1. Table with info on last N jobs:
- Time started, time completed. Status of completion (success / failure). Length of time. Any errors. User who scheduled job. Any dependencies this job has on other jobs or other events. Specific machine the job ran on (if in a cluster).
Might be nice to include links in this dashlet that would allow you to cancel a job that might be hung.
Priority of the job (if you have priorities).
Compare all jobs: %succeed, %failure. Avg time to complete job.
Compare jobs by the scheduling user: avg time, %success, %failure.
This is by no means a comprehensive list or something. Just my trying to give you a few ideas, based on what I can remember off the top of my head.
-