Hangfire Register IoC Dependency in Scope For Job at Run Time - dependency-injection

Issue: most jobs are dependent on configuration dependency.
Ideal Solution: (copied here and at end just to save reading if you already know exactly how to do this)
I would like to
during job Q (using filter or any other options) save some data (object / json / string)
during job processing: intercept creation process of job and register dependency using stashed data to IoC container in scope for that job
Old Solution: for the custom "job scheduler" that I am replacing we take the following steps
before resolving the job processor we crate a lifetime scope and using job data register the dependency with data provided
then using "job type" resolve the job and process
Hack Solution: so this works for most but I still want to know how to accomplish what I am trying to do, by creating a "wrapper job" class that uses the same data to crate the same scope when running the job.
Ideal Solution: Ideally I can inject logic and
during save stash needed data (plenty of ways to do this)
during pre job processing access IoC to create the scope register this dependency
I have Tried / Looked Into Filter Attributes - works to save and restore data but have not found any access to IoC or lifetime scope here.
there were a few other options but they too were limited in "exposure" to the IoC container / scope
Ideal Solution:
I would like to
during job Q (using filter or any other options) save some data (object / json / string)
during job processing: intercept creation process of job and register dependency using stashed data to IoC container in scope for that job

Related

Automated ansible output parsing in jenkins pipeline

How do you automatically parse ansible warnings and errors in your jenkins pipeline jobs?
I greatly enjoy the power of leveraging in ansible in jenkins when it works. Upon a failure, the hunt to locate the actual error can be challenging.
I use WarningsNG which supports custom parsers (and allows their programmatic generation)
Do you know of any plugins or addons that already transform these logs into the kind charts similar to WarningsNG?
I figured I'd ask as I go off into deep regex land and make my own.
One good way to achieve this seems to be the following:
select an existing structured output ansible callback plugin (json, junit and yaml are all viable) . I selected junit as I can play with the format to get a really nice view into the playbook with errors reported in a very obvious way.
fork that GPL file (yes, so be careful with that license) to augment with the following:
store output as file
implement the missing callback methods (the three mentioned above do not implement the v2...item callbacks.
forward events to the default or debug callback to ensure operators see something when they execute the plan
add a secrets cleaner - if you use jenkins credentials-binding-plugin it will hide secrets from the console, it will not not hide secrets within stored files. You'll need to handle that in your playbook or via some groovy code (if groovy, try{...} finally { clean } seems a good pattern)
Snippet - forewarding to default callback
from ansible.plugins.callback.default import CallbackModule as CallbackModule_default
...
class CallbackModule(CallbackBase):
CALLBACK_VERSION = 2.0
CALLBACK_TYPE = 'stdout'
CALLBACK_NAME = 'json'
def __init__(self, display=None):
super(CallbackModule, self).__init__(display)
self.default_callback = CallbackModule_default()
...
def v2_on_file_diff(self, result):
self.default_callback.v2_on_file_diff(result)
... do whatever you'd want to ensure the content appears in the json file

Dataflow/Beam Templates, Productionization, Initialization, and ValueProviders

I have an Apache Beam job running on Google Cloud Dataflow, and as part of its initialization it needs to run some basic sanity/availability checks on services, pub/sub subscriptions, GCS blobs, etc. It's a streaming pipeline intended to run ad infinitum that processes hundreds of thousands of pub/sub messages.
Currently it needs a whole heap of required, variable parameters: which Google Cloud project it needs to run in, which bucket and directory prefix it's going to be storing files in, which pub/sub subscriptions it needs to read from, and so on. It does some work with these parameters before pipeline.run is called - validation, string splitting, and the like. In its current form in order to start a job we've been passing these parameters to to a PipelineOptionsFactory and issuing a new compile every single time, but it seems like there should be a better way. I've set up the parameters to be ValueProvider objects, but because they're being called outside of pipeline.run, Maven complains at compile time that ValueProvider.get() is being called outside of a runtime context (which, yes, it is.)
I've tried using NestedValueProviders as in the Google "Creating Templates" document, but my IDE complains if I try to use NestedValueProvider.of to return a string as shown in the document. The only way I've been able to get NestedValueProviders to compile is as follows:
NestedValueProvider<String, String> pid = NestedValueProvider.of(
pipelineOptions.getDataflowProjectId(),
(SerializableFunction<String, String>) s -> s
);
(String pid = NestedValueProvider.of(...) results in the following error: "incompatible types: no instance(s) of type variable(s) T,X exist so that org.apache.beam.sdk.options.ValueProvider.NestedValueProvider conforms to java.lang.String")
I have the following in my pipelineOptions:
ValueProvider<String> getDataflowProjectId();
void setDataflowProjectId(ValueProvider<String> value);
Because of the volume of messages we're going to be processing, adding these checks at the front of the pipeline for every message that comes through isn't really practical; we'll hit daily account administrative limits on some of these calls pretty quickly.
Are templates the right approach for what I want to do? How do I go about actually productionizing this? Should (can?) I compile with maven into a jar, then just run the jar on a local dev/qa/prod box with my parameters and just not bother with ValueProviders at all? Or is it possible to provide a default to a ValueProvider and override it as part of the options passed to the template?
Any advice on how to proceed would be most appreciated. Thanks!
The way templates are currently implemented there is no point to perform "post-template creation" but "pre-pipeline start" initialization/validation.
All of the existing validation executes during template creation. If the validation detects that there the values aren't available (due to being a ValueProvider) the validation is skipped.
In some cases it is possible to approximate validation by adding runtime checks either as part of initial splitting of a custom source or part of the #Setup method of a DoFn. In the latter case, the #Setup method will run once for each instance of the DoFn that is created. If the pipeline is Batch, after 4 failures for a specific instance it will fail the pipeline.
Another option for productionizing pipelines is to build the JAR that runs the pipeline, and have a production process that runs that JAR to initiate the pipeline.
Regarding the compile error you received -- the NestedValueProvider returns a ValueProvider -- it isn't possible to get a String out of that. You could, however, put the validation code into the SerializableFunction that is run within the NestedValueProvider.
Although I believe this will currently re-run the validation everytime the value is accessed, it wouldn't be unreasonable to have the NestedValueProvider cache the translated value.

Perform action after Dataflow pipeline has processed all data

Is it possible to perform an action once a batch Dataflow job has finished processing all data? Specifically, I'd like to move the text file that the pipeline just processed to a different GCS bucket. I'm not sure where to place that in my pipeline to ensure it executes once after the data processing has completed.
I don't see why you need to do this post pipeline execution. You could use side outputs to write the file to multiple buckets, and save yourself the copy after the pipeline finishes.
If that's not going to work for you (for whatever reason), then you can simply run your pipeline in blocking execution mode i.e. use pipeline.run().waitUntilFinish(), and then just write the rest of your code (which does the copy) after that.
[..]
// do some stuff before the pipeline runs
Pipeline pipeline = ...
pipeline.run().waitUntilFinish();
// do something after the pipeline finishes here
[..]
A little trick I got from reading the source code of apache beam's PassThroughThenCleanup.java.
Right after your reader, create a side input that 'combine' the entire collection (in the source code, it is the View.asIterable() PTransform) and connect its output to a DoFn. This DoFn will be called only after the reader has finished reading ALL elements.
P.S. The code literally name the operation, cleanupSignalView which I found really clever
Note that you can achieve the same effect using Combine.globally() (java) or beam.CombineGlobally() (python). For more info check out section 4.2.4.3 here
I think two options can help you here:
1) Use TextIO to write to the bucket or folder you want, specifying the exact GCS path (for e.g. gs://sandbox/other-bucket)
2) Use Object Change Notifications in combination with Cloud Functions. You can find a good primer on doing this here and the SDK for GCS in JS here. What you will do in this option is basically setting up a trigger when something drops in a certain bucket, and move it to another one using your self-written Cloud Function.

quartz grails multi-entity environment

I have a web-app (grails 2.3.5, quartz plugin) with multiple users. Now I want that my users can schedule jobs using quartz. I wonder what the best approach is to separate Triggers from one user from the triggers from another user.
e.g. provide a list of all scheduled tasks for a given user
Are there any recommendations how to make this differentiation?
Implementing some scheme for naming your triggers would likely be the best approach here. That way you can query the triggers for a job and filter them by some type of matching pattern.
It's really up to you to decide how you want to manage the visibility and management of the triggers. Using the trigger name seems to be the most logical approach in my own opinion.
Alternatively, you could build a framework (e.g. Domain model) that relates the triggers to a user.
Update
In light of the content of your comment I'd like to offer you a glimpse into how you can dynamically add a trigger to an existing job. This is only an example to help you get further down the path of accomplishing the goal you have.
import org.quartz.CronScheduleBuilder
import org.quartz.Trigger
import org.quartz.TriggerBuilder
...
def jobManagerService
String cronExpression = ... // whatever the expression is
Trigger trigger = TriggerBuilder.newTrigger()
.withIdentity("UniqueNameOfYourTriggerHere-UserId")
.withPriority(6)
.forJob("com.example.package.JobClassNameJob", "groupName")
.withSchedule(CronScheduleBuilder.cronSchedule(cronExpression))
.build()
// if you need job parameters
trigger.jobDataMap.putAll([param1: 'example'])
jobManagerService.getQuartzScheduler().scheduleJob(trigger)

Jenkins Plugin: create a new job programmatically

How to create a new Jenkins job within a plugin?
I have a Jenkins plugin that listens to a message queue and, when a message arrives, fires a new event to create a new job (or start a run).
I'm looking for something like:
Job myJob = new Job(...);
I know I can use REST API or CLI but since I'm in the plugin I'd use java internal solution.
Use Job DSL Plugin.
From the plugin page:
Jenkins is a wonderful system for managing builds, and people love using its UI to configure jobs. Unfortunately, as the number of jobs grows, maintaining them becomes tedious, and the paradigm of using a UI falls apart. Additionally, the common pattern in this situation is to copy jobs to create new ones, these "children" have a habit of diverging from their original "template" and consequently it becomes difficult to maintain consistency between these jobs.
The Jenkins job-dsl-plugin attempts to solve this problem by allowing jobs to be defined with the absolute minimum necessary in a programmatic form, with the help of templates that are synced with the generated jobs. The goal is for your project to be able to define all the jobs they want to be related to their project, declaring their intent for the jobs, leaving the common stuff up to a template that were defined earlier or hidden behind the DSL.
You can create a new hudson/jenkins job by simply doing:
FreeStyleProject proj = Hudson.getInstance().createProject(FreeStyleProject.class, NAMEOFJOB);
If you want to be able to handle updates (and you already have the config.xml):
import hudson.model.AbstractItem
import javax.xml.transform.stream.StreamSource
import jenkins.model.Jenkins
final jenkins = Jenkins.getInstance()
final itemName = 'name-of-job-to-be-created-or-updated'
final configXml = new FileInputStream('/path/to/config.xml')
final item = jenkins.getItemByFullName(itemName, AbstractItem.class)
if (item != null) {
item.updateByXml(new StreamSource(configXml))
} else {
jenkins.createProjectFromXML(itemName, configXml)
}
Make sure though you have the core .jar file before doing this though.

Resources