Example of FileBasedSource usage in Google Cloud Dataflow

Example of FileBasedSource usage in Google Cloud Dataflow - google-cloud-dataflow

Can someone post a simple example of subclassing FileBasedSource? I'm new to Google Dataflow and very inexperienced with Java. My goal is to read files while including line numbers as a key, or to skip lines based on the line number.

The implementation of XMLSource is a good starting point for understanding how FileBasedSource works. You'll likely want something like this for your reader (where readNextLine() reads to the end of a line and updates the offset):
protected void startReading(ReadableByteChannel channel) throws IOException {
if (getCurrentSource().getMode() == FileBasedSource.Mode.SINGLE_FILE_OR_SUBRANGE) {
// If we are not at the beginning of a line, we should ignore the current line.
if (getCurrentSource().getStartOffset() > 0) {
SeekableByteChannel seekChannel = (SeekableByteChannel) channel;
// Start from one character back and read till we find a new line.
seekChannel.position(seekChannel.position() - 1);
nextOffset = seekChannel.position() + readNextLine(new ByteArrayOutputStream());
}
}
}
I've created a gist with the complete LineIO example, which may be simpler than XMLSource.

Related

ESP8266 EEPROM READ/WRITE - Write seems to happen before read of old value

I'm trying to write some code for the the ESP8266-12E that detects initial program load of a new version of the code. For this simplified version of my code (that still exhibits the behavior I'm seeing) there is no code in the loop() section.
I place my current version of the code in a const String pgmVersion. The code reads the EEPROM (actually flash for the ESP8266) and compares it to the current version of the code (pgmVersion). If they are different, then I know that I have a new version of the code. This is then followed by a write to EEPROM to save the current version pgmVersion so that the next time I boot this version will be the old version.
When I run the code with only the for loop for the eeprom read, I can see that the saved version is different than the current version (and can also see when they are the same). This seems to work properly.
However, when I run the full code that includes the write to eeprom, the read for loop always indicates that the saved version matches the current version and does not execute the eeprom write for loop. This happens consistently even when I run with a new value for the current version. This is simply baffling to me. I can remove power and then power up again and the new version data has been saved to eeprom so it seems that it is really being written.
Can anyone see what is wrong with my code or explain why the eeprom seems to be written without going through my eeprom write for loop? I've read lots of posts and online documentation and still can't figure this out.
Jim
#include <EEPROM.h>
const String pgmVersion = "00.04";
void setup() {
Serial.begin(115200);
EEPROM.begin(6);
delay(500);
char eepData;
char pgmData;
bool pgmMatch = true;
for (unsigned int i = 0; i < pgmVersion.length(); i++)
{
eepData = char(EEPROM.read(i));
pgmData = pgmVersion.charAt(i);
Serial.print("eepData = ");
Serial.println(eepData);
Serial.print("pgmVersion[i] = ");
Serial.println(pgmData);
if (eepData == pgmData)
{
Serial.println("eepData matches pgmData at index " + String(i));
} else
{
Serial.println("eepData does NOT match pgmData at index " + String(i));
pgmMatch = false;
}
}
if (!pgmMatch)
{
Serial.println("Writing EEPROM");
for (unsigned int i = 0; i < pgmVersion.length(); i++)
{
pgmData = pgmVersion.charAt(i);
EEPROM.write(i,pgmData);
delay(10);
}
if (EEPROM.commit())
{
Serial.println("EEPROM successfully committed");
} else
{
Serial.println("ERROR! EEPROM commit failed");
}
}
}
void loop() {
// put your main code here, to run repeatedly:
}

Ok, I've found out what's going on. The above code fails to work correctly as described in the original post when running under VS Code with PlatformIO. But works as it should when running under the Arduino IDE. (I did not originally post the #include <arduino.h> that is needed in that environment - my mistake!).
When running the code with the write loop included, it looks like the eeprom write gets executed before the write loop itself as when the saved pgm data and new pgm data are known to be different the comparison code says they are the same.
I tried just commenting out the line with the eeprom write for a case when the saved pgm data and new pgm data are known to be different. This resulted in the write loop being entered as it should (meaning it the code detected the saved and new pgm data were not the same).
So it looks like the VS Code version with PlatformIO reorders the code by hoisting the eeprom write somewhere or something with that effect. If that is actually the case, what is needed is a fix to some piece of platform code or some sort of barrier instruction to prevent this from happening. This is unfortunate as I do appreciate the extra function available in the VS Code / PlatformIO environment.

Logging large strings from Flutter

I'm trying to build a Flutter App and learning Dart in the process, but I'm getting kind of frustrated when debugging. I have fetched a resource from an API and now I want to print the JSON string to the console, but it keeps cutting off the string.
So I actually have two questions: is the terminal console really the only way to print debug messages and how can I print large strings to the console without them automatically getting cut off?

How about using the Flutter log from the dart: developer library. This does not seem to have the maximum length limit like print() or debugPrint(). This is the only solution that seems to work fine. Try it as below:
import 'dart:developer';
log(reallyReallyLongText);
The output will be the entire long string without breaks and prefixed with [log]

You can make your own print. Define this method
void printWrapped(String text) {
final pattern = RegExp('.{1,800}'); // 800 is the size of each chunk
pattern.allMatches(text).forEach((match) => print(match.group(0)));
}
Use it like
printWrapped("Your very long string ...");
Credit

Use debugPrint with the optional parameter to wrap according to the platform's output limit.
debugPrint(someSuperLongString, wrapWidth: 1024);

Currently dart doesn't support printing logs more than 1020 characters (found that out by trying).
So, I came up with this method to print long logs:
static void LogPrint(Object object) async {
int defaultPrintLength = 1020;
if (object == null || object.toString().length <= defaultPrintLength) {
print(object);
} else {
String log = object.toString();
int start = 0;
int endIndex = defaultPrintLength;
int logLength = log.length;
int tmpLogLength = log.length;
while (endIndex < logLength) {
print(log.substring(start, endIndex));
endIndex += defaultPrintLength;
start += defaultPrintLength;
tmpLogLength -= defaultPrintLength;
}
if (tmpLogLength > 0) {
print(log.substring(start, logLength));
}
}
}

There is an open issue for that: https://github.com/flutter/flutter/issues/22665
debugPrint and print are actually truncating the output.

You can achieve this using the Logger Plugin: https://pub.dev/packages/logger
To print any type of log Just do the do the following.
var logger = Logger();
logger.d("Logger is working!");// It also accept json objects
In fact, it will even format the output for you.

Please try debugPrint('your output'); instead of print('your output'); the documentation is here if you would like to read. debugPrint throttles the output to a level to avoid being dropped by android's kernel as per the documentation.

Here is a one-liner based on #CopsOnRoad's answer that you can quickly copy and paste (such as: when you want to slightly modify your code and log some data and see temporarily):
void printWrapped(String text) => RegExp('.{1,800}').allMatches(text).map((m) => m.group(0)).forEach(print);

Method 1
void prints(var s1) {
String s = s1.toString();
debugPrint(" =======> " + s, wrapWidth: 1024);
}
Method 2
void prints(var s1) {
String s = s1.toString();
final pattern = RegExp('.{1,800}');
pattern.allMatches(s).forEach((match) => print(match.group(0)));
}
Just call this method to print your longggg string

If you run the application in android studio it will truncate long string.
In xcode 10.2 which i am using long string is not truncating.
My suggestion is write print statement logs and run the application in Xcode instead of android studio.

Same issue caused lot of frustration when I have to test base64 of images.
I was using iTerm2 editor, so the answer is specific to the iTerm2
1. Navigate to Preferences -> Profiles
2. Select your Profile (in my case was **Default** only)
3. Select **Terminal** in the header of right pan
4. Check Unlimited scrollback
Now you can have copy the large strings from the terminal.

How can I read input after the wrong type has been entered in D readf?

I am wondering how to continue using stdin in D after the program has read an unsuitable value. (for example, letters when it was expecting an int)
I wrote this to test it:
import std.stdio;
void main()
{
int a;
for(;;){
try{
stdin.readf(" %s", a);
break;
}catch(Exception e){
writeln(e);
writeln("Please enter a number.");
}
}
writeln(a);
}
After entering incorrect values such as 'b', the program would print out the message indefinitly. I also examined the exception which indicated that it was trying to read the same characters again, so I made a version like this:
import std.stdio;
void main()
{
int a;
for(;;){
try{
stdin.readf(" %s", a);
break;
}catch(Exception e){
writeln(e);
writeln("Please enter a number.");
char c;
readf("%c", c);
}
}
writeln(a);
}
Which still threw an exception when trying to read a, but not c. I also tried using stdin.clearerr(), which had no effect. Does anyone know how to solve this? Thanks.

My recommendation: don't use readf. It is so bad. Everyone goes to it at first since it is in the stdlib (and has been since 1979 lol, well scanf has... and imo i think scanf is better than readf! but i digress), and almost everyone has trouble with it. It is really picky about formats and whitespace consumption when it goes right, and when it goes wrong, it gives crappy error messages and leaves the input stream in an indeterminate state. And, on top of that, is still really limited in what data types it can actually read in and is horribly user-unfriendly, not even allowing things like working backspacing on most systems!
Slightly less bad than readf is to use readln then strip and to!int it once you check the line and give errors. Something like this:
import std.stdio;
import std.string; // for strip, cuts off whitespace
import std.algorithm.searching; // for all
import std.ascii; // for isAscii
import std.conv; // for to, does string to other type conversions
int readInt() {
for(;;) {
string line = stdin.readln();
line = line.strip();
if(all!isDigit(line))
return to!int(line);
else
writeln("Please enter a number");
}
assert(0);
}
void main()
{
int a = readInt();
writeln(a);
}
I know that's a lot of import spam (and for a bunch of individual trivial functions too), and readln still sucks for the end user, but this little function is going to be so much nicer on your users and on yourself than trying to use readf. It will consistently consume one line at a time and give a nice message. Moreover, the same pattern can be extended to any other type of validation you need, and the call to readln can be replaced by a call to a more user-friendly function that allows editing and history and stuff later if you decide to go down that route.
If you must use readf anyway though, easiest way to make things sane again in your catch block is still to just call readln and discard its result. So then it just skips the whole line containing the error, allowing your user to start fresh. That'd also drop if they were doing "1 2" and wanted two ints to be read at once... but meh, I'd rather start them fresh anyway than try to pick up an errored line half way through.

Caching streams in Functional Reactive Programming

I have an application which is written entirely using the FRP paradigm and I think I am having performance issues due to the way that I am creating the streams. It is written in Haxe but the problem is not language specific.
For example, I have this function which returns a stream that resolves every time a config file is updated for that specific section like the following:
function getConfigSection(section:String) : Stream<Map<String, String>> {
return configFileUpdated()
.then(filterForSectionChanged(section))
.then(readFile)
.then(parseYaml);
}
In the reactive programming library I am using called promhx each step of the chain should remember its last resolved value but I think every time I call this function I am recreating the stream and reprocessing each step. This is a problem with the way I am using it rather than the library.
Since this function is called everywhere parsing the YAML every time it is needed is killing the performance and is taking up over 50% of the CPU time according to profiling.
As a fix I have done something like the following using a Map stored as an instance variable that caches the streams:
function getConfigSection(section:String) : Stream<Map<String, String>> {
var cachedStream = this._streamCache.get(section);
if (cachedStream != null) {
return cachedStream;
}
var stream = configFileUpdated()
.filter(sectionFilter(section))
.then(readFile)
.then(parseYaml);
this._streamCache.set(section, stream);
return stream;
}
This might be a good solution to the problem but it doesn't feel right to me. I am wondering if anyone can think of a cleaner solution that maybe uses a more functional approach (closures etc.) or even an extension I can add to the stream like a cache function.
Another way I could do it is to create the streams before hand and store them in fields that can be accessed by consumers. I don't like this approach because I don't want to make a field for every config section, I like being able to call a function with a specific section and get a stream back.
I'd love any ideas that could give me a fresh perspective!

Well, I think one answer is to just abstract away the caching like so:
class Test {
static function main() {
var sideeffects = 0;
var cached = memoize(function (x) return x + sideeffects++);
cached(1);
trace(sideeffects);//1
cached(1);
trace(sideeffects);//1
cached(3);
trace(sideeffects);//2
cached(3);
trace(sideeffects);//2
}
#:generic static function memoize<In, Out>(f:In->Out):In->Out {
var m = new Map<In, Out>();
return
function (input:In)
return switch m[input] {
case null: m[input] = f(input);
case output: output;
}
}
}
You may be able to find a more "functional" implementation for memoize down the road. But the important thing is that it is a separate thing now and you can use it at will.
You may choose to memoize(parseYaml) so that toggling two states in the file actually becomes very cheap after both have been parsed once. You can also tweak memoize to manage the cache size according to whatever strategy proves the most valuable.

ANTLR Parse tree modification

I'm using ANTLR4 to create a parse tree for my grammar, what I want to do is modify certain nodes in the tree. This will include removing certain nodes and inserting new ones. The purpose behind this is optimization for the language I am writing. I have yet to find a solution to this problem. What would be the best way to go about this?

While there is currently no real support or tools for tree rewriting, it is very possible to do. It's not even that painful.
The ParseTreeListener or your MyBaseListener can be used with a ParseTreeWalker to walk your parse tree.
From here, you can remove nodes with ParserRuleContext.removeLastChild(), however when doing this, you have to watch out for ParseTreeWalker.walk:
public void walk(ParseTreeListener listener, ParseTree t) {
if ( t instanceof ErrorNode) {
listener.visitErrorNode((ErrorNode)t);
return;
}
else if ( t instanceof TerminalNode) {
listener.visitTerminal((TerminalNode)t);
return;
}
RuleNode r = (RuleNode)t;
enterRule(listener, r);
int n = r.getChildCount();
for (int i = 0; i<n; i++) {
walk(listener, r.getChild(i));
}
exitRule(listener, r);
}
You must replace removed nodes with something if the walker has visited parents of those nodes, I usually pick empty ParseRuleContext objects (this is because of the cached value of n in the method above). This prevents the ParseTreeWalker from throwing a NPE.
When adding nodes, make sure to set the mutable parent on the ParseRuleContext to the new parent. Also, because of the cached n in the method above, a good strategy is to detect where the changes need to be before you hit where you want your changes to go in the walk, so the ParseTreeWalker will walk over them in the same pass (other wise you might need multiple passes...)
Your pseudo code should look like this:
public void enterRewriteTarget(#NotNull MyParser.RewriteTargetContext ctx){
if(shouldRewrite(ctx)){
ArrayList<ParseTree> nodesReplaced = replaceNodes(ctx);
addChildTo(ctx, createNewParentFor(nodesReplaced));
}
}
I've used this method to write a transpiler that compiled a synchronous internal language into asynchronous javascript. It was pretty painful.

Another approach would be to write a ParseTreeVisitor that converts the tree back to a string. (This can be trivial in some cases, because you are only calling TerminalNode.getText() and concatenate in aggregateResult(..).)
You then add the modifications to this visitor so that the resulting string representation contains the modifications you try to achieve.
Then parse the string and you get a parse tree with the desired modifications.
This is certainly hackish in some ways, since you parse the string twice. On the other hand the solution does not rely on antlr implementation details.

I needed something similar for simple transformations. I ended up using a ParseTreeWalker and a custom ...BaseListener where I overwrote the enter... methods. Inside this method the ParserRuleContext.children is available and can be manipulated.
class MyListener extends ...BaseListener {
#Override
public void enter...(...Context ctx) {
super.enter...(ctx);
ctx.children.add(...);
}
}
new ParseTreeWalker().walk(new MyListener(), parseTree);

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Example of FileBasedSource usage in Google Cloud Dataflow - google-cloud-dataflow

Can someone post a simple example of subclassing FileBasedSource? I'm new to Google Dataflow and very inexperienced with Java. My goal is to read files while including line numbers as a key, or to skip lines based on the line number.

Related

ESP8266 EEPROM READ/WRITE - Write seems to happen before read of old value

Logging large strings from Flutter

How can I read input after the wrong type has been entered in D readf?

Caching streams in Functional Reactive Programming

ANTLR Parse tree modification

Categories

Resources