Apache beam Initializer - google-cloud-dataflow

In my dataflow job, I need to initialize a Config factory and log certain messages in an audit log before actual processing begins.
I have placed the Config factory initialization code + audit logging in a parent class PlatformInitializer and extending that in my Main Pipeline class.
public class CustomJob extends PlatformInitializer implements Serializable {
public static void main(String[] args) throws PropertyVetoException {
CustomJob myCustomjob = new CustomJob();
// Initialize config factories
myCustomjob.initialize();
// trigger dataflow job
myCustomjob.parallelRead(args);
}
as a result, I had to also implement Serializable interface in my Pipeline class because beam was throwing error - java.io.NotSerializableException: org.devoteam.CustomJob
Inside PlatformInitializer, I have an initilize() method that contains initialization logic for config factory and also log some initial audit messages.
public class PlatformInitializer {
public void initialize() {
// Configfactory factory = new Configfactory()
// CustomLogger.log("JOB-BEGIN-EVENT" + Current timestamp )
}
}
My question is - is this right way to invoke some code that needs to be called before pipeline begins execution?

If you need the initialized object at runtime (not at the pipeline construction time), you should move your initialization logic to a Beam DoFn. DoFn has a number of method annotations that could be used to denote methods that should be executed in different lifecycle phases. Setup and StartBundle annotations might be useful for your use-case. See here for more details.

Related

Access side input in a non - anonymous DoFn

How to access the elements of a side input if I have my class extend DoFn?
For example:
Say I have a ParDo transform like:
PCollection<String> data = myData.apply("Get data",
ParDo.of(new MyClass()).withSideInputs(myDataView));
And I have a class:-
static class MyClass extends DoFn<String,String>
{
//How to access side input here
}
c.sideInput() isn't working in this case.
Thanks.
In this case, the problem is that the processElement method in your DoFn does not have access to the PCollectionView instance in your main method.
You can pass the PCollectionView to the DoFn in the constructor:
class MyClass extends DoFn<String,String>
{
private final PCollectionView<..> mySideInput;
public MyClass(PCollectionView<..> mySideInput) {
// List, or Map or anything:
this.mySideInput = mySideInput;
}
#ProcessElement
public void processElement(ProcessContext c) throws IOException
{
// List or Map or any type you need:
List<..> sideInputList = c.sideInput(mySideInput);
}
}
You would then pass the side input to the class when you instantiate it, and indicate it as a side input like so:
p.apply(ParDo.of(new MyClass(mySideInput)).withSideInputs(mySideInput));
The explanation for this is that when you use an anonymous DoFn, the process method has a closure with access to all the objects within the scope that encloses the DoFn (among them is the PCollectionView). When you're not using an anonymous DoFn, there is no closure, and you need another way of passing the PCollectionView.
So although the answer above is correct, it is still a little incomplete.
So once you finish implementing the above answer, you need to execute your pipeline like this:
p.apply(ParDo.of(new MyClass(mySideInput)).withSideInputs(mySideInput));

Quartz .net: with a JobListener, Jobs are not triggered

I have properly created and scheduled a Job (I don't write the Job and Trigger creation here, just to be brief). Scheduler is created and started as follows:
_scheduler = New StdSchedulerFactory().GetScheduler
_scheduler.Start()
Jobs are executed at the scheduled time.
Then I created a very simple (and empty, at the moment) a JobListener:
Imports Quartz
Public Class JobListener
Implements IJobListener
#Region "Public properties"
Public ReadOnly Property Name As String Implements Quartz.IJobListener.Name
Get
Return "JOB_LISTENER"
End Get
End Property
#End Region
#Region "Methods"
Public Sub JobExecutionVetoed(context As Quartz.IJobExecutionContext) Implements Quartz.IJobListener.JobExecutionVetoed
Throw New NotImplementedException
End Sub
Public Sub JobToBeExecuted(context As Quartz.IJobExecutionContext) Implements Quartz.IJobListener.JobToBeExecuted
Throw New NotImplementedException
End Sub
Public Sub JobWasExecuted(context As Quartz.IJobExecutionContext, jobException As Quartz.JobExecutionException) Implements Quartz.IJobListener.JobWasExecuted
End Sub
#End Region
End Class
and add it to the scheduler:
_scheduler = New StdSchedulerFactory().GetScheduler
_scheduler.Start()
_jobListener = New JobListener()
_scheduler.ListenerManager.AddJobListener(_jobListener, GroupMatcher(Of JobKey).AnyGroup())
and now the Jobs are not executed anymore.
Any hint why it happens?
Same result if I add the JobListener before starting the Scheduler:
_jobListener = New JobListener()
_scheduler = New StdSchedulerFactory().GetScheduler
_scheduler.ListenerManager.AddJobListener(_jobListener, GroupMatcher(Of JobKey).AnyGroup())
_scheduler.Start()
I figured out which was the problem.
First of all, an advice: always configure a log before starting to debug with Quartz .net.
When the Job is ready to be executed, the JobListener is notified and then the method JobToBeExecuted is called. As you can see in my JobListener's implementation I throw an exception in the method JobToBeExecuted and that exception prevents the Job to be executed.
I didn't investigate why an error in the JobListener should prevent the Job to be executed. I guess there's a chain of calls broken by the exception.
Anyway, this is the answer to my question.

How to populate parameter "defaultValue" in Maven "AbstractMojoTestCase"?

I have a Maven plugin that I am attempting to test using a subclass of the AbstractMojoTestCase. The plugin Mojo defines an outputFolder parameter with a defaultValue. This parameter is not generally expected to be provided by the user in the POM.
#Parameter(defaultValue = "${project.build.directory}/someOutputFolder")
private File outputFolder;
And if I use the plugin in a real scenario then the outputFolder gets defaulted as expected.
But if I test the Mojo using the AbstractMojoTestCase then while parameters defined in the test POM are populated, parameters with a defaultValue that are not defined in the POM are not populated.
public class MyPluginTestCase extends AbstractMojoTestCase {
public void testAssembly() throws Exception {
final File pom = getTestFile( "src/test/resources/test-pom.xml");
assertNotNull(pom);
assertTrue(pom.exists());
final MyMojo myMojo = (BaselineAssemblyMojo) lookupMojo("assemble", pom);
assertNotNull(myMojo);
myMojo.execute(); // Dies due to NullPointerException on outputFolder.
}
}
Further: if I define the outputFolder parameter in the POM like so:
<outputFolder>${project.build.directory}/someOutputFolder</outputFolder>
then ${project.build.directory} is NOT resolved within the AbstractMojoTestCase.
So what do I need to do to get the defaultvalue populated when testing?
Or is this a fault in the AbstractMojoTestCase?
This is Maven-3.2.3, maven-plugin-plugin-3.2, JDK 8
You need to use lookupConfiguredMojo.
Here's what I ended up using:
public class MyPluginTest
{
#Rule
public MojoRule mojoRule = new MojoRule();
#Test
public void noSource() throws Exception
{
// Just give the location, where the pom.xml is located
MyPlugin plugin = (MyPlugin) mojoRule.lookupConfiguredMojo(getResourcesFile("basic-test"), "myGoal");
plugin.execute();
assertThat(plugin.getSomeInformation()).isEmpty();
}
public File getResourcesFile(String filename)
{
return new File("src/test/resources", filename);
}
}
Of course you need to replace myGoal with your plugin's goal. You also need to figure out how to assert that your plugin executed successfully.
For a more complete example, check out the tests I wrote for fmt-maven-plugin

Groovy method interception

In my Grails app I've installed the Quartz plugin. I want to intercept calls to every Quartz job class' execute method in order to do something before the execute method is invoked (similar to AOP before advice).
Currently, I'm trying to do this interception from the doWithDynamicMethods closure of another plugin as shown below:
def doWithDynamicMethods = { ctx ->
// get all the job classes
application.getArtefacts("Job").each { klass ->
MetaClass jobMetaClass = klass.clazz.metaClass
// intercept the methods of the job classes
jobMetaClass.invokeMethod = { String name, Object args ->
// do something before invoking the called method
if (name == "execute") {
println "this should happen before execute()"
}
// now call the method that was originally invoked
def validMethod = jobMetaClass.getMetaMethod(name, args)
if (validMethod != null) {
validMethod.invoke(delegate, args)
} else {
jobMetaClass.invokeMissingMethod(delegate, name, args)
}
}
}
}
So, given a job such as
class TestJob {
static triggers = {
simple repeatInterval: 5000l // execute job once in 5 seconds
}
def execute() {
"execute called"
}
}
It should print:
this should happen before execute()
execute called
But my attempt at method interception seems to have no effect and instead it just prints:
execute called
Perhaps the cause of the problem is this Groovy bug? Even though the Job classes don't explicitly implement the org.quartz.Job interface, I suspect that implicitly (due to some Groovy voodoo), they are instances of this interface.
If indeed this bug is the cause of my problem, is there another way that I can do "before method interception"?
Because all the job classes are Spring beans you can solve this problem using Spring AOP. Define an aspect such as the following (adjust the pointcut definition so that it matches only your job classes, I've assumed they are all in a package named org.example.job and have a class name that ends with Job).
#Aspect
class JobExecutionAspect {
#Pointcut("execution(public * org.example.job.*Job.execute(..))")
public void executeMethods() {}
#Around("executeMethods()")
def interceptJobExecuteMethod(ProceedingJoinPoint jp) {
// do your stuff that should happen before execute() here, if you need access
// to the job object call jp.getTarget()
// now call the job's execute() method
jp.proceed()
}
}
You'll need to register this aspect as a Spring bean (it doesn't matter what name you give the bean).
You can have your customized JobListener registered in the application to handle logics before execute() is triggered. You can use something like:-
public class MyJobListener implements JobListener {
public void jobToBeExecuted(JobExecutionContext context) {
println "Before calling Execute"
}
public void jobWasExecuted(JobExecutionContext context,
JobExecutionException jobException) {}
public void jobExecutionVetoed(JobExecutionContext context) {}
}
Register the customized Job Listener to Quartz Scheduler in Bootstrap:-
Scheduler scheduler = ctx.getBean("quartzScheduler") //ctx being application context
scheduler.getListenerManager().addJobListener(myJobListener, allJobs())
resources.groovy:-
beans = {
myJobListener(MyJobListener)
}
One benefit I see here using this approach is that we don't need the second plugin used for method interception any more.
Second, we can register the listener to listen all jobs, specific jobs, and jobs in a group. Refer Customize Quartz JobListener and API for JobListener, TriggerListener, ScheduleListener for more insight.
Obviously, AOP is another approach if we do want want to use Quartz API.
You are not getting the job classes like that. If you refer to the Quartz plugin, you can get them by calling jobClasses:
application.jobClasses.each {GrailsJobClass tc -> ... }
see https://github.com/nebolsin/grails-quartz/blob/master/QuartzGrailsPlugin.groovy
If you actually look, you can see that they are almost doing what you are trying to acheive without the need to use aop or anything else.
For method interception implement invokeMethod on the metaclass. In my case the class was not of third party so I can modify the implementation.
Follow this blog for more information.

How can I run Quartz.NET Jobs in a separate AppDomain?

Is it possible to run Quartz.NET jobs in a separate AppDomain? If so, how can this be achieved?
Disclaimer: I've not tried this, it's just an idea. And none of this code has been compiled, even.
Create a custom job factory that creates a wrapper for your real jobs. Have this wrapper implement the Execute method by creating a new app domain and running the original job in that app domain.
In more detail: Create a new type of job, say IsolatedJob : IJob. Have this job take as a constructor parameter the type of a job that it should encapsulate:
internal class IsolatedJob: IJob
{
private readonly Type _jobType;
public IsolatedJob(Type jobType)
{
_jobType = jobType ?? throw new ArgumentNullException(nameof(jobType));
}
public void Execute(IJobExecutionContext context)
{
// Create the job in the new app domain
System.AppDomain domain = System.AppDomain.CreateDomain("Isolation");
var job = (IJob)domain.CreateInstanceAndUnwrap("yourAssembly", _jobType.Name);
job.Execute(context);
}
}
You may need to create an implementation of IJobExecutionContext that inherits from MarshalByRefObject and proxies calls onto the original context object. Given the number of other objects that IJobExecutionContext provides access to, I'd be tempted to implement many members with a NotImplementedException as most won't be needed during job execution.
Next you need the custom job factory. This bit is easier:
internal class IsolatedJobFactory : IJobFactory
{
public IJob NewJob(TriggerFiredBundle bundle, IScheduler scheduler)
{
return NewJob(bundle.JobDetail.JobType);
}
private IJob NewJob(Type jobType)
{
return new IsolatedJob(jobType);
}
}
Finally, you will need to instruct Quartz to use this job factory rather than the out of the box one. Use the IScheduler.JobFactory property setter and provide a new instance of IsolatedJobFactory.

Resources