How to pass custom MyPipelineOptions to Google Dataflow DoFN? - google-cloud-dataflow

when I try to add MyPipelineOptions parameter to my Google Dataflow DoFN as documented, I get a compiler error:
java.lang.IllegalArgumentException:
com.xxx.MyProcessor,
#ProcessElement parseItem(PubsubMessage, MyPipelineOptions, OutputReceiver),
#ProcessElement parseItem(PubsubMessage, MyPipelineOptions, OutputReceiver),
parameter of type MyPipelineOptions at index 1:
MyPipelineOptions is not a valid context parameter.
Should be one of [BoundedWindow, RestrictionTracker<?, ?>]
If I change MyPipelineOptions to PipelineOptions, the error is gone, but if I try to cast back to MyPipelineOptions inside my function, I get a ClassCastException so I'm guessing it's not the right way... Any idea how I pass my custom options class to the element processors?
Here's the code structure:
import org.apache.beam.runners.dataflow.options.DataflowPipelineOptions;
public interface MyPipelineOptions extends DataflowPipelineOptions {
...
}
public class MyProcessor extends DoFn<PubsubMessage, String> {
#ProcessElement
public void parseItem(#Element PubsubMessage message, MyPipelineOptions po, OutputReceiver<String> out) throws Exception {
...
}
}
Note the docs only show an example of non-custom PipelineOptions:
PipelineOptions: The PipelineOptions for the current pipeline can always be accessed in a process method by adding it as a parameter:
.of(new DoFn<String, String>() {
public void processElement(
#Element String word, PipelineOptions options) {
}
})

Ok found the problem. The argument PipelineOptions is a proxy. In order to get it correctly I need to do this:
public class MyProcessor extends DoFn<PubsubMessage, String> {
#ProcessElement
public void parseItem(
#Element PubsubMessage message,
PipelineOptions po,
OutputReceiver<String> out) throws Exception {
MyPipelineOptions opts = po.as(MyPipelineOptions.class);
...
}
}
}

Related

OAuth2Authentication object deserialization (RedisTokenStore)

I'm trying to rewrite some legacy code which used org.springframework.security.oauth2.provider.token.store.InMemoryTokenStore to store the access tokens. I'm currently trying to use the RedisTokenStore instead of the previously used InMemoryTokenStore. The token gets generated and gets stored in Redis fine (Standalone redis configuration), however, deserialization of OAuth2Authentication fails with the following error:
Could not read JSON: Cannot construct instance of `org.springframework.security.oauth2.provider.OAuth2Authentication` (no Creators, like default constructor, exist): cannot deserialize from Object value (no delegate- or property-based Creator)
Since there's no default constructor for this class, the deserialization and mapping to the actual object while looking up from Redis fails.
RedisTokenStore redisTokenStore = new RedisTokenStore(jedisConnectionFactory);
redisTokenStore.setSerializationStrategy(new StandardStringSerializationStrategy() {
#Override
protected <T> T deserializeInternal(byte[] bytes, Class<T> aClass) {
return Utilities.parse(new String(bytes, StandardCharsets.UTF_8),aClass);
}
#Override
protected byte[] serializeInternal(Object o) {
return Objects.requireNonNull(Utilities.convert(o)).getBytes();
}
});
this.tokenStore = redisTokenStore;
public static <T> T parse(String json, Class<T> clazz) {
try {
return OBJECT_MAPPER.readValue(json, clazz);
} catch (IOException e) {
log.error("Jackson2Json failed: " + e.getMessage());
} return null;}
public static String convert(Object data) {
try {
return OBJECT_MAPPER.writeValueAsString(data);
} catch (JsonProcessingException e) {
log.error("Conversion failed: " + e.getMessage());
}
return null;
}
How is OAuth2Authentication object reconstructed when the token is looked up from Redis? Since it does not define a default constructor, any Jackson based serializer and object mapper won't be able to deserialize it.
Again, the serialization works great (since OAuth2Authentication implements Serializable interface) and the token gets stored fine in Redis. It just fails when the /oauth/check_token is called.
What am I missing and how is this problem dealt with while storing access token in Redis?
I solved the issue by writing custom deserializer. It looks like this:
import com.fasterxml.jackson.core.JacksonException;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.JsonDeserializer;
import com.fasterxml.jackson.databind.module.SimpleModule;
import org.springframework.security.oauth2.core.AuthorizationGrantType;
import java.io.IOException;
public class AuthorizationGrantTypeCustomDeserializer extends JsonDeserializer<AuthorizationGrantType> {
#Override
public AuthorizationGrantType deserialize(JsonParser p, DeserializationContext ctxt) throws IOException, JacksonException {
Root root = p.readValueAs(Root.class);
return root != null ? new AuthorizationGrantType(root.value) : new AuthorizationGrantType("");
}
private static class Root {
public String value;
}
public static SimpleModule generateModule() {
SimpleModule authGrantModule = new SimpleModule();
authGrantModule.addDeserializer(AuthorizationGrantType.class, new AuthorizationGrantTypeCustomDeserializer());
return authGrantModule;
}
}
Then I registered deserializer in objectMapper which is later used by jackson API
ObjectMapper mapper = new ObjectMapper()
.registerModule(AuthorizationGrantTypeCustomDeserializer.generateModule());

Dependency Injection in Apache Storm topology

Little background: I am working on a topology using Apache Storm, I thought why not use dependency injection in it, but I was not sure how it will behave on cluster environment when topology deployed to cluster. I started looking for answers on if DI is good option to use in Storm topologies, I came across some threads about Apache Spark where it was mentioned serialization is going to be problem and saw some responses for apache storm along the same lines. So finally I decided to write a sample topology with google guice to see what happens.
I wrote a sample topology with two bolts, and used google guice to injects dependencies. First bolt emits a tick tuple, then first bolt creates message, bolt prints the message on log and call some classes which does the same. Then this message is emitted to second bolt and same printing logic there as well.
First Bolt
public class FirstBolt extends BaseRichBolt {
private OutputCollector collector;
private static int count = 0;
private FirstInjectClass firstInjectClass;
#Override
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
collector = outputCollector;
Injector injector = Guice.createInjector(new Module());
firstInjectClass = injector.getInstance(FirstInjectClass.class);
}
#Override
public void execute(Tuple tuple) {
count++;
String message = "Message count "+count;
firstInjectClass.printMessage(message);
log.error(message);
collector.emit("TO_SECOND_BOLT", new Values(message));
collector.ack(tuple);
}
#Override
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
outputFieldsDeclarer.declareStream("TO_SECOND_BOLT", new Fields("MESSAGE"));
}
#Override
public Map<String, Object> getComponentConfiguration() {
Config conf = new Config();
conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 10);
return conf;
}
}
Second Bolt
public class SecondBolt extends BaseRichBolt {
private OutputCollector collector;
private SecondInjectClass secondInjectClass;
#Override
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
collector = outputCollector;
Injector injector = Guice.createInjector(new Module());
secondInjectClass = injector.getInstance(SecondInjectClass.class);
}
#Override
public void execute(Tuple tuple) {
String message = (String) tuple.getValue(0);
secondInjectClass.printMessage(message);
log.error("SecondBolt {}",message);
collector.ack(tuple);
}
#Override
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
}
}
Class in which dependencies are injected
public class FirstInjectClass {
FirstInterface firstInterface;
private final String prepend = "FirstInjectClass";
#Inject
public FirstInjectClass(FirstInterface firstInterface) {
this.firstInterface = firstInterface;
}
public void printMessage(String message){
log.error("{} {}", prepend, message);
firstInterface.printMethod(message);
}
}
Interface used for binding
public interface FirstInterface {
void printMethod(String message);
}
Implementation of interface
public class FirstInterfaceImpl implements FirstInterface{
private final String prepend = "FirstInterfaceImpl";
public void printMethod(String message){
log.error("{} {}", prepend, message);
}
}
Same way another class that receives dependency via DI
public class SecondInjectClass {
SecondInterface secondInterface;
private final String prepend = "SecondInjectClass";
#Inject
public SecondInjectClass(SecondInterface secondInterface) {
this.secondInterface = secondInterface;
}
public void printMessage(String message){
log.error("{} {}", prepend, message);
secondInterface.printMethod(message);
}
}
another interface for binding
public interface SecondInterface {
void printMethod(String message);
}
implementation of second interface
public class SecondInterfaceImpl implements SecondInterface{
private final String prepend = "SecondInterfaceImpl";
public void printMethod(String message){
log.error("{} {}", prepend, message);
}
}
Module Class
public class Module extends AbstractModule {
#Override
protected void configure() {
bind(FirstInterface.class).to(FirstInterfaceImpl.class);
bind(SecondInterface.class).to(SecondInterfaceImpl.class);
}
}
Nothing fancy here, just two bolts and couple of classes for DI. I deployed it on server and it works just fine. The catch/problem though is that I have to initialize Injector in each bolt which makes me question what is side effect of it going to be?
This implementation is simple, just 2 bolts.. what if I have more bolts? what impact it would create on topology if I have to initialize Injector in all bolts?
If I try to initialize Injector outside prepare method I get error for serialization.

Why Apache beam can't infer the default coder when using KV<String, String>?

I'm implementing the CombinePerKeyExample using a subclass of CombineFn instead of using an implementation of SerializableFunction.
package me.examples;
import org.apache.beam.sdk.coders.AvroCoder;
import org.apache.beam.sdk.coders.DefaultCoder;
import org.apache.beam.sdk.transforms.Combine.CombineFn;
import java.util.HashSet;
import java.util.Set;
public class ConcatWordsCombineFn extends CombineFn<String, ConcatWordsCombineFn.Accumulator, String> {
#DefaultCoder(AvroCoder.class)
public static class Accumulator{
HashSet<String> plays;
}
#Override
public Accumulator createAccumulator(){
Accumulator accumulator = new Accumulator();
accumulator.plays = new HashSet<>();
return accumulator;
}
#Override
public Accumulator addInput(Accumulator accumulator, String input){
accumulator.plays.add(input);
return accumulator;
}
#Override
public Accumulator mergeAccumulators(Iterable<Accumulator> accumulators){
Accumulator mergeAccumulator = new Accumulator();
mergeAccumulator.plays = new HashSet<>();
for(Accumulator accumulator: accumulators){
mergeAccumulator.plays.addAll(accumulator.plays);
}
return mergeAccumulator;
}
#Override
public String extractOutput(Accumulator accumulator){
return String.join(",", accumulator.plays);
}
}
The pipeline is composed of a ReadFromBigQuery, ExtractAllPlaysOfWords (code below) and WriteToBigQuery
package me.examples;
import com.google.api.services.bigquery.model.TableRow;
import org.apache.beam.sdk.coders.KvCoder;
import org.apache.beam.sdk.coders.StringUtf8Coder;
import org.apache.beam.sdk.transforms.Combine;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
public class PlaysForWord extends PTransform<PCollection<TableRow>, PCollection<TableRow>> {
#Override
public PCollection<TableRow> expand(PCollection<TableRow> input) {
PCollection<KV<String, String>> largeWords = input.apply("ExtractLargeWords", ParDo.of(new ExtractLargeWordsFn()));
//PCollection<KV<String, String>> wordNPlays = largeWords.apply("CombinePlays", Combine.perKey(new ConcatWordsCombineFunction()));
//using CombineFn instead
PCollection<KV<String, String>> wordNPlays = largeWords.apply("CombinePlays",Combine.perKey(new ConcatWordsCombineFn()));
wordNPlays.setCoder(KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of()));
PCollection<TableRow> rows = wordNPlays.apply("FormatToRow", ParDo.of(new FormatShakespeareOutputFn()));
return rows;
}
}
If I'm not adding this line in the code above
wordNPlays.setCoder(KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of()));
I'm having an exception
Exception in thread "main" java.lang.IllegalStateException: Unable to return a default Coder for ExtractAllPlaysOfWords/CombinePlays/Combine.GroupedValues/ParDo(Anonymous)/ParMultiDo(Anonymous).output [PCollection]. Correct one of the following root causes:
No Coder has been manually specified; you may do so using .setCoder().
Inferring a Coder from the CoderRegistry failed: Cannot provide coder for parameterized type org.apache.beam.sdk.values.KV<K, OutputT>: Unable to provide a Coder for K.
Building a Coder using a registered CoderProvider failed.
See suppressed exceptions for detailed failures.
Using the default output Coder from the producing PTransform failed: PTransform.getOutputCoder called.
at org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
at org.apache.beam.sdk.values.PCollection.getCoder(PCollection.java:278)
at org.apache.beam.sdk.values.PCollection.finishSpecifying(PCollection.java:115)
at org.apache.beam.sdk.runners.TransformHierarchy.finishSpecifyingInput(TransformHierarchy.java:191)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:536)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:370)
at me.examples.PlaysForWord.expand(PlaysForWord.java:21)
at me.examples.PlaysForWord.expand(PlaysForWord.java:10)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:370)
at me.examples.Main.main(Main.java:41)
From the stacktrace, I think the pipeline is not able to get a coder for the String type of the KV obejct. Why is that ? Isn't supposed to be a "known" type for Apache Beam. Why is it working without specifying the coder when using the subclass of SerializableFunction in the Combine.perKey?
In addition to that, when I tried to get the default coder for String from the coder registry I get StringUTF8Coder
Coder coder = null;
try {
coder = pipeline.getCoderRegistry().getCoder(String.class);
logger.info("coder is " + coder);
} catch (Exception e){
logger.info("exception "+ e.getMessage() +"\n coder is " + coder );
}
/*result
INFO: coder is StringUtf8Coder
*/
I used Apache Beam 2.12.0 and run it on Google Dataflow

ValueProvider Issue

I am trying to get the value of a property that is passed from a cloud function to a dataflow template. I am getting errors because the value being passed is a wrapper, and using the .get() method fails during the compile. with this error
An exception occurred while executing the Java class. null: InvocationTargetException: Not called from a runtime context.
public interface MyOptions extends DataflowPipelineOptions {
...
#Description("schema of csv file")
ValueProvider<String> getHeader();
void setHeader(ValueProvider<String> header);
...
}
public static void main(String[] args) throws IOException {
...
List<String> sideInputColumns = Arrays.asList(options.getHeader().get().split(","));
...
//ultimately use the getHeaders as side inputs
PCollection<String> input = p.apply(Create.of(sideInputColumns));
final PCollectionView<List<String>> finalColumnView = input.apply(View.asList());
}
How do I extract the value from the ValueProvider type?
The value of a ValueProvider is not available during pipeline construction. As such, you need to organize your pipeline so that it always has the same structure, and serializes the ValueProvider. At runtime, the individual transforms within your pipeline can inspect the value to determine how to operate.
Based on your example, you may need to do something like the following. It creates a single element, and then uses a DoFn that is evaluated at runtime to expand the headers:
public static class HeaderDoFn extends DoFn<String, String> {
private final ValueProvider<String> header;
public HeaderDoFn(ValueProvider<String> header) {
this.header = header;
}
#ProcessElement
public void processElement(ProcessContext c) {
// Ignore input element -- there should be exactly one
for (String column : this.header().get().split(",")) {
c.output(column);
}
}
}
public static void main(String[] args) throws IOException {
PCollection<String> input = p
.apply(Create.of("one")) // create a single element
.apply(ParDo.of(new DoFn<String, String>() {
#ProcessElement
public void processElement(ProcessContext c) {
}
});
// Note that the order of this list is not guaranteed.
final PCollectionView<List<String>> finalColumnView =
input.apply(View.asList());
}
Another option would be to use a NestedValueProvider to create a ValueProvider<List<String>> from the option, and pass that ValueProvider<List<String>> to the necessary DoFns rather than using a side input.

#override of Dart code

I noticed PetitParserDart has a lot of #override in the code, but I don't know how do they be checked?
I tried IDEA dart-plugin for #override, but it has no effect at all. How can we use #override with Dart?
From #override doc :
An annotation used to mark an instance member (method, field, getter or setter) as overriding an inherited class member. Tools can use this annotation to provide a warning if there is no overridden member.
So, it depends on the tool you use.
In the current Dart Editor(r24275), there's no warning for the following code but it should (it looks like a bug).
import 'package:meta/meta.dart';
class A {
m1() {}
}
class B extends A {
#override m1() {} // no warning because A has a m1()
#override m2() {} // tools should display a warning because A has no m2()
}
The #override annotation is an example of metadata. You can use Mirrors to check for these in code. Here is a simple example that checks if the m1() method in the child class has the #override annotation:
import 'package:meta/meta.dart';
import 'dart:mirrors';
class A {
m1() {}
}
class B extends A {
#override m1() {}
}
void main() {
ClassMirror classMirror = reflectClass(B);
MethodMirror methodMirror = classMirror.methods[const Symbol('m1')];
InstanceMirror instanceMirror = methodMirror.metadata.first;
print(instanceMirror.reflectee); // Instance of '_Override#0x2fa0dc31'
}
it's 2021 . the override it's optional
Use the #override annotation judiciously and only for methods where the superclass is not under the programmer's control, the superclass is in a different library or package, and it is not considered stable. In any case, the use of #override is optional. from dart api https://api.dart.dev/stable/2.10.5/dart-core/override-constant.html
example
Class A {
void say (){
print ('Say something 1') ;
}
}
Class B extends A {
#override
void adds() { // when i don't type the same name of super class function show an
// warning not an error say 'The method doesn't override an inherited
// method.' because it's not same name but when type the same name must be
// overriding
print ('Say something 2 ')
}
Update : the main use of #override is when try to reach abstract method inside abstract class in sub class that inherited to the abstract super class . must use #override to access the abstract method .

Resources