Context dependent ANTLR4 ParseTreeVisitor implementation - parsing

I am working on a project where we migrate massive number (more than 12000) views to Hadoop/Impala from Oracle. I have written a small Java utility to extract view DDL from Oracle and would like to use ANTLR4 to traverse the AST and generate an Impala-compatible view DDL statement.
The most of the work is relatively simple, only involves re-writing some Oracle specific syntax quirks to Impala style. However, I am facing an issue, where I am not sure I have the best answer yet: we have a number of special cases, where values from a date field are extracted in multiple nested function calls. For example, the following extracts the day from a Date field:
TO_NUMBER(TO_CHAR(d.R_DATE , 'DD' ))
I have an ANTLR4 grammar declared for Oracle SQL and hence get the visitor callback when it reaches TO_NUMBER and TO_CHAR as well, but I would like to have special handling for this special case.
Is not there any other way than implementing the handler method for the outer function and then resorting to manual traversal of the nested structure to see
I have something like in the generated Visitor class:
#Override
public String visitNumber_function(PlSqlParser.Number_functionContext ctx) {
// FIXME: seems to be dodgy code, can it be improved?
String functionName = ctx.name.getText();
if (functionName.equalsIgnoreCase("TO_NUMBER")) {
final int childCount = ctx.getChildCount();
if (childCount == 4) {
final int functionNameIndex = 0;
final int openRoundBracketIndex = 1;
final int encapsulatedValueIndex = 2;
final int closeRoundBracketIndex = 3;
ParseTree encapsulated = ctx.getChild(encapsulatedValueIndex);
if (encapsulated instanceof TerminalNode) {
throw new IllegalStateException("TerminalNode is found at: " + encapsulatedValueIndex);
}
String customDateConversionOrNullOnOtherType =
customDateConversionFromToNumberAndNestedToChar(encapsulated);
if (customDateConversionOrNullOnOtherType != null) {
// the child node contained our expected child element, so return the converted value
return customDateConversionOrNullOnOtherType;
}
// otherwise the child was something unexpected, signalled by null
// so simply fall-back to the default handler
}
}
// some other numeric function, default handling
return super.visitNumber_function(ctx);
}
private String customDateConversionFromToNumberAndNestedToChar(ParseTree parseTree) {
// ...
}

For anyone hitting the same issue, the way to go seems to be:
changing the grammar definition and introducing custom sub-types for
the encapsulated expression of the nested function.
Then, I it is possible to hook into the processing at precisely the desired location of the Parse tree.
Using a second custom ParseTreeVisitor that captures the values of function call and delegates back the processing of the rest of the sub-tree to the main, "outer" ParseTreeVisitor.
Once the second custom ParseTreeVisitor has finished visiting all the sub-ParseTrees I had the context information I required and all the sub-tree visited properly.

Related

Null Assertions in null-safe mode and How to Avoid If Possible

Learning Dart and using dart_code_metrics to ensure that I write code that meets expectations. One of the rules that is active is avoid-non-null-assertion.
Note, the code below was created to recreate the problem encountered in a larger code base where the value of unitString is taken from a JSON file. As such the program cannot control what is specified in the JSON file.
From pubspec.yaml
environment:
sdk: '>=2.15.0 <3.0.0'
// ignore_for_file: avoid_print
import 'package:qty/qty.dart';
void main() {
const String unitString = 'in';
// unit.Width returns null if unitString is not a unit of Length.
if (Length().unitWith(symbol: unitString) == null) {
print('units $unitString not supported.');
} else {
// The following line triggers avoid-non-null-assertion with the use of !.
final Unit<Length> units = Length().unitWith(symbol: unitString)!;
final qty = Quantity(amount: 0.0, unit: units);
print('Qty = $qty');
}
}
If I don't use ! then I get the following type error:
A value of type 'Unit<Length>?' can't be assigned to a variable of type 'Unit<Length>'.
Try changing the type of the variable, or casting the right-hand type to 'Unit<Length>'.
Casting the right-hand side to
Unit<Length>
fixes the above error but cause a new error when instantiating Quantity() since the constructor expects
Unit<Length>
and not
Unit<Length>?
I assume there is an solution but I'm new to Dart and cannot formulate the correct search query to find the answer.
How can I modify the sample code to make Dart and dart_code_metrics happy?
Your idea of checking for null before using a value is good, it's just not implemented correctly. Dart does automatically promote nullable types to non-null ones when you check for null with an if, but in this case you need to use a temporary variable.
void main() {
const String unitString = 'in';
//Use a temp variable, you could specify the type instead of using just using final
final temp = Length().unitWith(symbol: unitString);
if (temp == null) {
print('units $unitString not supported.');
} else {
final Unit<Length> units = temp;
final qty = Quantity(amount: 0.0, unit: units);
print('Qty = $qty');
}
}
The basic reason for that when you call your unitWith function and see that it's not null the first time, there's no guarantee that the when you call it again that it will still return a non-null value. I think there's another SO question that details this better, but I can't seem to find.

forEach vs for in: Different Behavior When Calling a Method

I noticed that forEach and for in to produce different behavior. I have a list of RegExp and want to run hasMatch on each one. When iterating through the list using forEach, hasMatch never returns true. However, if I use for in, hasMatch returns true.
Sample code:
class Foo {
final str = "Hello";
final regexes = [new RegExp(r"(\w+)")];
String a() {
regexes.forEach((RegExp reg) {
if (reg.hasMatch(str)) {
return 'match';
}
});
return 'no match';
}
String b() {
for (RegExp reg in regexes) {
if (reg.hasMatch(str)) {
return 'match';
}
}
return 'no match';
}
}
void main() {
Foo foo = new Foo();
print(foo.a()); // prints "no match"
print(foo.b()); // prints "match"
}
(DartPad with the above sample code)
The only difference between the methods a and b is that a uses forEach and b uses for in, yet they produce different results. Why is this?
Although there is a prefer_foreach lint, that recommendation is specifically for cases where you can use it with a tear-off (a reference to an existing function). Effective Dart recommends against using Iterable.forEach with anything else, and there is a corresponding avoid_function_literals_in_foreach_calls lint to enforce it.
Except for those simple cases where the callback is a tear-off, Iterable.forEach is not any simpler than using a basic and more general for loop. There are more pitfalls using Iterable.forEach, and this is one of them.
Iterable.forEach is a function that takes a callback as an argument. Iterable.forEach is not a control structure, and the callback is an ordinary function. You therefore cannot use break to stop iterating early or use continue to skip to the next iteration.
A return statement in the callback returns from the callback, and the return value is ignored. The caller of Iterable.forEach will never receive the returned value and will never have an opportunity to propagate it. For example, in:
bool f(List<int> list) {
for (var i in list) {
if (i == 42) {
return true;
}
}
return false;
}
the return true statement returns from the function f and stops iteration. In contrast, with forEach:
bool g(List<int> list) {
list.forEach((i) {
if (i == 42) {
return true;
}
});
return false;
}
the return true statement returns from only the callback. The function g will not return until it completes all iterations and reaches the return false statement at the end. This perhaps is clearer as:
bool callback(int i) {
if (i == 42) {
return true;
}
}
bool g(List<int> list) {
list.forEach(callback);
return false;
}
which makes it more obvious that:
There is no way for callback to cause g to return true.
callback does not return a value along all paths.
(That's the problem you encountered.)
Iterable.forEach must not be used with asynchronous callbacks. Because any value returned by the callback is ignored, asynchronous callbacks can never be waited upon.
I should also point out that if you enable Dart's new null-safety features, which enable stricter type-checking, your forEach code will generate an error because it returns a value in a callback that is expected to have a void return value.
A notable case where Iterable.forEach can be simpler than a regular for loop is if the object you're iterating over might be null:
List<int>? nullableList;
nullableList?.forEach((e) => ...);
whereas a regular for loop would require an additional if check or doing:
List<int>? nullableList;
for (var e in nullableList ?? []) {
...
}
(In JavaScript, for-in has unintuitive pitfalls, so Array.forEach often is recommended instead. Perhaps that's why a lot of people seem to be conditioned to use a .forEach method over a built-in language construct. However, Dart does not share those pitfalls with JavaScript.)
👋 jamesdin! Everything you have shared about the limitations of forEach is correct however there's one part where you are wrong. In the code snippet showing the example of how you the return value from forEach is ignored, you have return true; inside the callback function for forEach which is not allowed as the callback has a return type of void and returning any other value from the callback is not allowed.
Although you have mentioned that returning a value from within the callback will result in an error, I'm just pointing at the code snippet.
Here's the signature for forEach
Also, some more pitfalls of forEach are:
One can't use break or continue statements.
One can't get access to the index of the item as opposed to using the regular for loop

Apache Beam Stateful DoFn Periodically Output All K/V Pairs

I'm trying to aggregate (per key) a streaming data source in Apache Beam (via Scio) using a stateful DoFn (using #ProcessElement with #StateId ValueState elements). I thought this would be most appropriate for the problem I'm trying to solve. The requirements are:
for a given key, records are aggregated (essentially summed) across all time - I don't care about previously computed aggregates, just the most recent
keys may be evicted from the state (state.clear()) based on certain conditions that I control
Every 5 minutes, regardless if any new keys were seen, all keys that haven't been evicted from the state should be outputted
Given that this is a streaming pipeline and will be running indefinitely, using a combinePerKey over a global window with accumulating fired panes seems like it will continue to increase its memory footprint and the amount of data it needs to run over time, so I'd like to avoid it. Additionally, when testing this out, (maybe as expected) it simply appends the newly computed aggregates to the output along with the historical input, rather than using the latest value for each key.
My thought was that using a StatefulDoFn would simply allow me to output all of the global state up until now(), but it seems this isn't a trivial solution. I've seen hintings at using timers to artificially execute callbacks for this, as well as potentially using a slowly growing side input map (How to solve Duplicate values exception when I create PCollectionView<Map<String,String>>) and somehow flushing this, but this would essentially require iterating over all values in the map rather than joining on it.
I feel like I might be overlooking something simple to get this working. I'm relatively new to many concepts of windowing and timers in Beam, looking for any advice on how to solve this. Thanks!
You are right that Stateful DoFn should help you here. This is a basic sketch of what you can do. Note that this only outputs the sum without the key. It may not be exactly what you want, but it should help you move forward.
class CombiningEmittingFn extends DoFn<KV<Integer, Integer>, Integer> {
#TimerId("emitter")
private final TimerSpec emitterSpec = TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
#StateId("done")
private final StateSpec<ValueState<Boolean>> doneState = StateSpecs.value();
#StateId("agg")
private final StateSpec<CombiningState<Integer, int[], Integer>>
aggSpec = StateSpecs.combining(
Sum.ofIntegers().getAccumulatorCoder(null, VarIntCoder.of()), Sum.ofIntegers());
#ProcessElement
public void processElement(ProcessContext c,
#StateId("agg") CombiningState<Integer, int[], Integer> aggState,
#StateId("done") ValueState<Boolean> doneState,
#TimerId("emitter") Timer emitterTimer) throws Exception {
if (SOME CONDITION) {
countValueState.clear();
doneState.write(true);
} else {
countValueState.addAccum(c.element().getValue());
emitterTimer.align(Duration.standardMinutes(5)).setRelative();
}
}
}
#OnTimer("emitter")
public void onEmit(
OnTimerContext context,
#StateId("agg") CombiningState<Integer, int[], Integer> aggState,
#StateId("done") ValueState<Boolean> doneState,
#TimerId("emitter") Timer emitterTimer) {
Boolean isDone = doneState.read();
if (isDone != null && isDone) {
return;
} else {
context.output(aggState.getAccum());
// Set the timer to emit again
emitterTimer.align(Duration.standardMinutes(5)).setRelative();
}
}
}
}
Happy to iterate with you on something that'll work.
#Pablo was indeed correct that a StatefulDoFn and timers are useful in this scenario. Here is the with code I was able to get working.
Stateful Do Fn
// DomainState is a custom case class I'm using
type DoFnT = DoFn[KV[String, DomainState], KV[String, DomainState]]
class StatefulDoFn extends DoFnT {
#StateId("key")
private val keySpec = StateSpecs.value[String]()
#StateId("domainState")
private val domainStateSpec = StateSpecs.value[DomainState]()
#TimerId("loopingTimer")
private val loopingTimer: TimerSpec = TimerSpecs.timer(TimeDomain.EVENT_TIME)
#ProcessElement
def process(
context: DoFnT#ProcessContext,
#StateId("key") stateKey: ValueState[String],
#StateId("domainState") stateValue: ValueState[DomainState],
#TimerId("loopingTimer") loopingTimer: Timer): Unit = {
... logic to create key/value from potentially null values
if (keepState(value)) {
loopingTimer.align(Duration.standardMinutes(5)).setRelative()
stateKey.write(key)
stateValue.write(value)
if (flushState(value)) {
context.output(KV.of(key, value))
}
} else {
stateValue.clear()
}
}
#OnTimer("loopingTimer")
def onLoopingTimer(
context: DoFnT#OnTimerContext,
#StateId("key") stateKey: ValueState[String],
#StateId("domainState") stateValue: ValueState[DomainState],
#TimerId("loopingTimer") loopingTimer: Timer): Unit = {
... logic to create key/value checking for nulls
if (keepState(value)) {
loopingTimer.align(Duration.standardMinutes(5)).setRelative()
if (flushState(value)) {
context.output(KV.of(key, value))
}
}
}
}
With pipeline
sc
.pubsubSubscription(...)
.keyBy(...)
.withGlobalWindow()
.applyPerKeyDoFn(new StatefulDoFn())
.withFixedWindows(
duration = Duration.standardMinutes(5),
options = WindowOptions(
accumulationMode = DISCARDING_FIRED_PANES,
trigger = AfterWatermark.pastEndOfWindow(),
allowedLateness = Duration.ZERO,
// Only take the latest per key during a window
timestampCombiner = TimestampCombiner.END_OF_WINDOW
))
.reduceByKey(mostRecentEvent())
.saveAsCustomOutput(TextIO.write()...)

Attaching a subscriber/listener within EPL module to a statement with context partitions

I have the following EPL module which successfully deploys:
module context;
import events.*;
import configDemo.*;
import annotations.*;
import main.*;
import subscribers.*;
import listeners.*;
#Name('schemaCreator')
create schema InitEvent(firstStock String, secondStock String, bias double);
#Name('createSchemaEvent')
create schema TickEvent as TickEvent;
#Name('contextCreator')
create context TwoStocksContext
initiated by InitEvent as initEvent;
#Name('compareStocks')
#Description('Compare the difference between two different stocks and make a decision')
#Subscriber('subscribers.MySubscriber')
context TwoStocksContext
select * from TickEvent
match_recognize (
measures A.currentPrice as a_currentPrice, B.currentPrice as b_currentPrice,
A.stockCode as a_stockCode, B.stockCode as b_stockCode
pattern (A C* B)
define
A as A.stockCode = context.initEvent.firstStock,
B as A.currentPrice - B.currentPrice >= context.initEvent.bias and
B.stockCode = context.initEvent.secondStock
);
I have a problem with the listeners/subscribers. According to my checks and debugging, the classes don't have any problems, the annotations work, they are attached to the statement upon deployment, and yet neither of them receive any updates from the events.
This is my subscriber, I simply want to print that it has been received:
package subscribers;
import java.util.Map;
public class MySubscriber {
public void update(Map row) {
System.out.println("got it");
}
}
I previously had the same module without any context partitions and then the subscribers worked without a problem. After I added the context, it stopped.
So far I have tried:
Checking if the statement has any subscriber/listener attached (it does)
Checking their names
Remove the annotations and set them manually within Java code after deployment (same thing - they attach, I can retrieve their name but still don't receive updates)
Debugging the subscriber class. The program either doesn't go there at all to stop at a break point or I get an error (missing line number attribute error - ("can't place a break point there" which I tried to fix to no avail)
Any idea what could cause this or what is the best way to set a subscriber to a statement which has context partitions?
This is a continuation of a previous problem which was solved here - Creating instances of Esper's epl
EDIT: Events being sent in the format I use them and in the EPL online tool format:
I first get the pair to be followed from the user:
System.out.println("First stock:");
String first = scanner.nextLine();
System.out.println("Second stock:");
String second = scanner.nextLine();
System.out.println("Difference:");
double diff= scanner.nextDouble();
InitEvent init = new InitEvent(first, second, diff);
After that I have an engine thread the continuously sends events, but before it starts InitEvents is sent as such:
#Override
public void run() {
runtime.sendEvent(initEvent);
while (contSimulation) {
TickEvent tick1 = new TickEvent(Math.random() * 100, "YAH");
runtime.sendEvent(tick1);
TickEvent tick2 = new TickEvent(Math.random() * 100, "GOO");
runtime.sendEvent(tick2);
TickEvent tick3 = new TickEvent(Math.random() * 100, "IBM");
runtime.sendEvent(tick3);
TickEvent tick4 = new TickEvent(Math.random() * 100, "MIC");
runtime.sendEvent(tick4);
try {
TimeUnit.SECONDS.sleep(1);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
latch.countDown();
}
}
I haven't used the online tool before but I think I got it working. This is the module text:
module context;
create schema InitEvent(firstStock String, secondStock String, bias double);
create schema TickEvent(currentPrice double, stockCode String);
create context TwoStocksContext
initiated by InitEvent as initEvent;
context TwoStocksContext
select * from TickEvent
match_recognize (
measures A.currentPrice as a_currentPrice, B.currentPrice as b_currentPrice,
A.stockCode as a_stockCode, B.stockCode as b_stockCode
pattern (A C* B)
define
A as A.stockCode = context.initEvent.firstStock,
B as A.currentPrice - B.currentPrice >= context.initEvent.bias and
B.stockCode = context.initEvent.secondStock
);
And the sequence of events:
InitEvent={firstStock='YAH', secondStock = 'GOO', bias=5}
TickEvent={currentPrice=55.6, stockCode='YAH'}
TickEvent={currentPrice=50.4, stockCode='GOO'}
TickEvent={currentPrice=30.8, stockCode='MIC'}
TickEvent={currentPrice=24.9, stockCode='APP'}
TickEvent={currentPrice=51.6, stockCode='YAH'}
TickEvent={currentPrice=45.8, stockCode='GOO'}
TickEvent={currentPrice=32.8, stockCode='MIC'}
TickEvent={currentPrice=28.9, stockCode='APP'}
The result I get using them:
At: 2001-01-01 08:00:00.000
Statement: Stmt-4
Insert
Stmt-4-output={a_currentPrice=55.6, b_currentPrice=50.4, a_stockCode='YAH',
b_stockCode='GOO'}
At: 2001-01-01 08:00:00.000
Statement: Stmt-4
Insert
Stmt-4-output={a_currentPrice=51.6, b_currentPrice=45.8, a_stockCode='YAH',
b_stockCode='GOO'}
If I make the second set of events having a difference less than 5 between YAH/GOO, I only get output from the first pair which makes sense. This is, I think what it is supposed to do.
In case needed, those two methods read and process the annotations of the EPL module (I didn't write them myself, they are taken from coinTrader Context class that could be found here - https://github.com/timolson/cointrader/blob/master/src/main/java/org/cryptocoinpartners/module/Context.java ):
private static Object getSubscriber(String className) throws Exception {
Class<?> cl = Class.forName(className);
return cl.newInstance();
}
private static void processAnnotations(EPStatement statement) throws Exception {
Annotation[] annotations = statement.getAnnotations();
for (Annotation annotation : annotations) {
if (annotation instanceof Subscriber) {
Subscriber subscriber = (Subscriber) annotation;
Object obj = getSubscriber(subscriber.className());
System.out.println(subscriber.className());
statement.setSubscriber(obj);
} else if (annotation instanceof Listeners) {
Listeners listeners = (Listeners) annotation;
for (String className : listeners.classNames()) {
Class<?> cl = Class.forName(className);
Object obj = cl.newInstance();
if (obj instanceof StatementAwareUpdateListener) {
statement.addListener((StatementAwareUpdateListener) obj);
} else {
statement.addListener((UpdateListener) obj);
}
}
}
}
}
Well, after a month of struggle I finally solved it. In case anyone has similar problem in the future, here's where the problem was. The epl worked fine in the online tool but not in my code. Eventually, I figured out initial events aren't firing, hence context partitions aren't being created and as a result the subscribers and listeners do not receive any updates. My mistake was that I had POJO InitEvent fired, but the event that the context was using was created within the EPL module via create schema. I don't know what I was thinking, it makes sense now that it didn't work. As a result, the events I fire within Java aren't the events that the context uses. My solution was only within the EPL. Since I couldn't figure out if I can fire events in Java that are created within the module, I created a schema which is populated by my POJO and the stream is then used by the context as such:
#Name('schemaCreator')
create schema StartEvent(firstStock string, secondStock string, difference
double);
#Name('insertInitEvent')
insert into StartEvent
select * from InitEvent;
All else remains the same, as well as the Java code.

Test that either one thing holds or another in AssertJ

I am in the process of converting some tests from Hamcrest to AssertJ. In Hamcrest I use the following snippet:
assertThat(list, either(contains(Tags.SWEETS, Tags.HIGH))
.or(contains(Tags.SOUPS, Tags.RED)));
That is, the list may be either that or that. How can I express this in AssertJ? The anyOf function (of course, any is something else than either, but that would be a second question) takes a Condition; I have implemented that myself, but it feels as if this should be a common case.
Edited:
Since 3.12.0 AssertJ provides satisfiesAnyOf which succeeds is one of the given assertion succeeds,
assertThat(list).satisfiesAnyOf(
listParam -> assertThat(listParam).contains(Tags.SWEETS, Tags.HIGH),
listParam -> assertThat(listParam).contains(Tags.SOUPS, Tags.RED)
);
Original answer:
No, this is an area where Hamcrest is better than AssertJ.
To write the following assertion:
Set<String> goodTags = newLinkedHashSet("Fine", "Good");
Set<String> badTags = newLinkedHashSet("Bad!", "Awful");
Set<String> tags = newLinkedHashSet("Fine", "Good", "Ok", "?");
// contains is statically imported from ContainsCondition
// anyOf succeeds if one of the conditions is met (logical 'or')
assertThat(tags).has(anyOf(contains(goodTags), contains(badTags)));
you need to create this Condition:
import static org.assertj.core.util.Lists.newArrayList;
import java.util.Collection;
import org.assertj.core.api.Condition;
public class ContainsCondition extends Condition<Iterable<String>> {
private Collection<String> collection;
public ContainsCondition(Iterable<String> values) {
super("contains " + values);
this.collection = newArrayList(values);
}
static ContainsCondition contains(Collection<String> set) {
return new ContainsCondition(set);
}
#Override
public boolean matches(Iterable<String> actual) {
Collection<String> values = newArrayList(actual);
for (String string : collection) {
if (!values.contains(string)) return false;
}
return true;
};
}
It might not be what you if you expect that the presence of your tags in one collection implies they are not in the other one.
Inspired by this thread, you might want to use this little repo I put together, that adapts the Hamcrest Matcher API into AssertJ's Condition API. Also includes a handy-dandy conversion shell script.

Resources