Why directory comparison does not work the expected way in Dart?
import 'dart:io';
void main() {
Directory d = Directory('/kek');
Directory e = Directory('/kek');
print(d==e); // false
print(d.hashCode); // 123456
print(e.hashCode); // 654321
}
As I read the documentation for the Directory object, the hashCode and operator== methods are simply inherited from Object, and thus have no special implementation that would cause two different Directory objects to compare equal if they point to the same place.
This would be difficult to implement. Should the hashCode canonicalize relative paths and paths containing "." and ".."? Should it follow symlinks? What about files with multiple hard links?
Related
Someone know how to get Filename when using file pattern match in google-cloud-dataflow?
I'm newbee to use dataflow. How to get filename when use file patten match, in this way.
p.apply(TextIO.Read.from("gs://dataflow-samples/shakespeare/*.txt"))
I'd like to how I detect filename that kinglear.txt,Hamlet.txt, etc.
If you would like to simply expand the filepattern and get a list of filenames matching it, you can use GcsIoChannelFactory.match("gs://dataflow-samples/shakespeare/*.txt") (see GcsIoChannelFactory).
If you would like to access the "current filename" from inside one of the DoFn's downstream in your pipeline - that is currently not supported (though there are some workarounds - see below). It is a common feature request and we are still thinking how best to fit it into the framework in a natural, generic and high-performant way.
Some workarounds include:
Writing a pipeline like this (the tf-idf example uses this approach):
DoFn readFile = ...(takes a filename, reads the file and produces records)...
p.apply(Create.of(filenames))
.apply(ParDo.of(readFile))
.apply(the rest of your pipeline)
This has the downside that dynamic work rebalancing features won't work particularly well, because they currently apply at the level of Read PTransform's only, but not at the level of ParDo's with high fan-out (like the one here, which would read a file and produce all records); and parallelization will only work to the level of files but files will not be split into sub-ranges. At the scale of reading Shakespeare this is not an issue, but if you are reading a set of files of wildly different size, some extremely large, then it may become an issue.
Implementing your own FileBasedSource (javadoc, general documentation) which would return records of type something like Pair<String, T> where the String is the filename and the T is the record you're reading. In this case the framework would handle the filepattern matching for you, dynamic work rebalancing would work just fine, however it is up to you to write the reading logic in your FileBasedReader.
Both of these work-arounds are non-ideal, but depending on your requirements, one of them may do the trick for you.
Update based on latest SDK
Java (sdk 2.9.0):
Beams TextIO readers do not give access to the filename itself, for these use cases we need to make use of FileIO to match the files and gain access to the information stored in the file name. Unlike TextIO, the reading of the file needs to be taken care of by the user in transforms downstream of the FileIO read. The results of a FileIO read is a PCollection the ReadableFile class contains the file name as metadata which can be used along with the contents of the file.
FileIO does have a convenience method readFullyAsUTF8String() which will read the entire file into a String object, this will read the whole file into memory first. If memory is a concern you can work directly with the file with utility classes like FileSystems.
From: Document Link
PCollection<KV<String, String>> filesAndContents = p
.apply(FileIO.match().filepattern("hdfs://path/to/*.gz"))
// withCompression can be omitted - by default compression is detected from the filename.
.apply(FileIO.readMatches().withCompression(GZIP))
.apply(MapElements
// uses imports from TypeDescriptors
.into(KVs(strings(), strings()))
.via((ReadableFile f) -> KV.of(
f.getMetadata().resourceId().toString(), f.readFullyAsUTF8String())));
Python (sdk 2.9.0):
For 2.9.0 for python you will need to collect the list of URI from outside of the Dataflow pipeline and feed it in as a parameter to the pipeline. For example making use of FileSystems to read in the list of files via a Glob pattern and then passing that to a PCollection for processing.
Once fileio see PR https://github.com/apache/beam/pull/7791/ is available, the following code would also be an option for python.
import apache_beam as beam
from apache_beam.io import fileio
with beam.Pipeline() as p:
readable_files = (p
| fileio.MatchFiles(‘hdfs://path/to/*.txt’)
| fileio.ReadMatches()
| beam.Reshuffle())
files_and_contents = (readable_files
| beam.Map(lambda x: (x.metadata.path,
x.read_utf8()))
One approach is to build a List<PCollection> where each entry corresponds to an input file, then use Flatten. For example, if you want to parse each line of a collection of files into a Foo object, you might do something like this:
public static class FooParserFn extends DoFn<String, Foo> {
private String fileName;
public FooParserFn(String fileName) {
this.fileName = fileName;
}
#Override
public void processElement(ProcessContext processContext) throws Exception {
String line = processContext.element();
// here you have access to both the line of text and the name of the file
// from which it came.
}
}
public static void main(String[] args) {
...
List<String> inputFiles = ...;
List<PCollection<Foo>> foosByFile =
Lists.transform(inputFiles,
new Function<String, PCollection<Foo>>() {
#Override
public PCollection<Foo> apply(String fileName) {
return p.apply(TextIO.Read.from(fileName))
.apply(new ParDo().of(new FooParserFn(fileName)));
}
});
PCollection<Foo> foos = PCollectionList.<Foo>empty(p).and(foosByFile).apply(Flatten.<Foo>pCollections());
...
}
One downside of this approach is that, if you have 100 input files, you'll also have 100 nodes in the Cloud Dataflow monitoring console. This makes it hard to tell what's going on. I'd be interested in hearing from the Google Cloud Dataflow people whether this approach is efficient.
I also had the 100 input files = 100 nodes on the dataflow diagram when using code similar to #danvk. I switched to an approach like this which resulted in all the reads being combined into a single block that you can expand to drill down into each file/directory that was read. The job also ran faster using this approach rather than the Lists.transform approach in our use case.
GcsOptions gcsOptions = options.as(GcsOptions.class);
List<GcsPath> paths = gcsOptions.getGcsUtil().expand(GcsPath.fromUri(options.getInputFile()));
List<String>filesToProcess = paths.stream().map(item -> item.toString()).collect(Collectors.toList());
PCollectionList<SomeClass> pcl = PCollectionList.empty(p);
for(String fileName : filesToProcess) {
pcl = pcl.and(
p.apply("ReadAvroFile" + fileName, AvroIO.Read.named("ReadFromAvro")
.from(fileName)
.withSchema(SomeClass.class)
)
.apply(ParDo.of(new MyDoFn(fileName)))
);
}
// flatten the PCollectionList, combining all the PCollections together
PCollection<SomeClass> flattenedPCollection = pcl.apply(Flatten.pCollections());
This might be a very late post for the above question, but I wanted to add answer with Beam bundled classes.
This could also be seen as an extracted code from the solution provided by #Reza Rokni.
PCollection<String> listOfFilenames =
pipe.apply(FileIO.match().filepattern("gs://apache-beam-samples/shakespeare/*"))
.apply(FileIO.readMatches())
.apply(
MapElements.into(TypeDescriptors.strings())
.via(
(FileIO.ReadableFile file) -> {
String f = file.getMetadata().resourceId().getFilename();
System.out.println(f);
return f;
}));
pipe.run().waitUntilFinish();
Above PCollection<String> will have a list of files available at any provided directory.
I was struggling with the same use case while using wildcard to read files from GCS but also needed to modify the collection based on the file name.The key is to use ReadFromTextWithFilename instead of readfromtext In java you already have a way out and you can use:
String filename =context.element().getMetadata().resourceId().getCurrentDirectory().toString()
inside your processElement method.
But for Python below technique will work:
-> Use beam.io.ReadFromTextWithFilename for reading the wildcard path from GCS
-> As per the document, ReadFromTextWithFilename returns the file's name and the file's content.
Below is the code snippet:
class GetFileNameFromWildcard(beam.DoFn):
def process(self, element, *args, **kwargs):
file_path, content = element
schema = ["id","name","mob","email","dept","store"]
store_name = file_path.split("/")[-2]
content_list = content.split(",")
content_list.append(store_name)
out_dict = dict(zip(schema,content_list))
print(out_dict)
yield out_dict
def run():
pipeline_options = PipelineOptions()
with beam.Pipeline(options=pipeline_options) as p:
# saving main session so that it can load global namespace on the Cloud Dataflow Worker
init = p | 'Begin Pipeline With Initiator' >> beam.Create(
["pcollection initializer"]) | 'Read From GCS' >> beam.io.ReadFromTextWithFilename(
"gs://<bkt-name>/20220826/*/dlp*", skip_header_lines=1) | beam.ParDo(
GetFileNameFromWildcard()) | beam.io.WriteToText(
'df_out.csv')
I'm using ANTLR4 to create a parse tree for my grammar, what I want to do is modify certain nodes in the tree. This will include removing certain nodes and inserting new ones. The purpose behind this is optimization for the language I am writing. I have yet to find a solution to this problem. What would be the best way to go about this?
While there is currently no real support or tools for tree rewriting, it is very possible to do. It's not even that painful.
The ParseTreeListener or your MyBaseListener can be used with a ParseTreeWalker to walk your parse tree.
From here, you can remove nodes with ParserRuleContext.removeLastChild(), however when doing this, you have to watch out for ParseTreeWalker.walk:
public void walk(ParseTreeListener listener, ParseTree t) {
if ( t instanceof ErrorNode) {
listener.visitErrorNode((ErrorNode)t);
return;
}
else if ( t instanceof TerminalNode) {
listener.visitTerminal((TerminalNode)t);
return;
}
RuleNode r = (RuleNode)t;
enterRule(listener, r);
int n = r.getChildCount();
for (int i = 0; i<n; i++) {
walk(listener, r.getChild(i));
}
exitRule(listener, r);
}
You must replace removed nodes with something if the walker has visited parents of those nodes, I usually pick empty ParseRuleContext objects (this is because of the cached value of n in the method above). This prevents the ParseTreeWalker from throwing a NPE.
When adding nodes, make sure to set the mutable parent on the ParseRuleContext to the new parent. Also, because of the cached n in the method above, a good strategy is to detect where the changes need to be before you hit where you want your changes to go in the walk, so the ParseTreeWalker will walk over them in the same pass (other wise you might need multiple passes...)
Your pseudo code should look like this:
public void enterRewriteTarget(#NotNull MyParser.RewriteTargetContext ctx){
if(shouldRewrite(ctx)){
ArrayList<ParseTree> nodesReplaced = replaceNodes(ctx);
addChildTo(ctx, createNewParentFor(nodesReplaced));
}
}
I've used this method to write a transpiler that compiled a synchronous internal language into asynchronous javascript. It was pretty painful.
Another approach would be to write a ParseTreeVisitor that converts the tree back to a string. (This can be trivial in some cases, because you are only calling TerminalNode.getText() and concatenate in aggregateResult(..).)
You then add the modifications to this visitor so that the resulting string representation contains the modifications you try to achieve.
Then parse the string and you get a parse tree with the desired modifications.
This is certainly hackish in some ways, since you parse the string twice. On the other hand the solution does not rely on antlr implementation details.
I needed something similar for simple transformations. I ended up using a ParseTreeWalker and a custom ...BaseListener where I overwrote the enter... methods. Inside this method the ParserRuleContext.children is available and can be manipulated.
class MyListener extends ...BaseListener {
#Override
public void enter...(...Context ctx) {
super.enter...(ctx);
ctx.children.add(...);
}
}
new ParseTreeWalker().walk(new MyListener(), parseTree);
I am trying to make some reusable blocks for my application.
CommonBlocks.h
void (^testBlock)(int) = ^(int number) {
// do nothing for now;
};
VariousImplementationFile.m
#import "CommonBlocks.h"
(void)setup {
testBlock(5);
}
Unfortunately, when I try to push this code to iOS device I receive error: linker command failed with exit code 1 (use -v to see invocation). It seems that I missing some.
Any advice?
Thanks
You try add static keyword before the declaration:
static void (^testBlock)(int) = ^(int number) {
// do nothing for now;
};
Your code causes error because you have non-static variable testBlock declared in .h header file.
When you call #import "CommonBlocks.h" in VariousImplementationFile.m, testBlock is declared once. Then you import CommonBlocks.h in some where else, testBlock is declared once more, so you'll get symbol duplicate error.
Declare block in CommonBlocks.h this way
typedef void (^RCCompleteBlockWithResult) (BOOL result, NSError *error);
Then you may use in any method for example:
-(void)getConversationFromServer:(NSInteger)placeId completionBlock:(RCCompleteBlockWithResult)completionBlock
This is not specific to blocks. Basically, you want to know how to have a global variable that is accessible from multiple files.
Basically, the issue is that in in C, each "symbol" can only be "defined" once (it can be "declared" multiple times, but just be "defined" once). Thus, you cannot put the "definition" of a symbol in a header file, because it will be included in multiple source files, so effectively, the same symbol will be "defined" multiple times.
For a function, the prototype is declaration, and the implementation with the code is the definition. You cannot implement a function in a header file for this reason. For a regular variable, writing the name and type of the variable is defining it. To only "declare" it, you need to use extern.
It is also worth mentioning static. static makes a variable local to a particular source file. That way, its name won't interfere with variables with the same name elsewhere. You can use this to make global variables that are "private" to a particular file. However, that is not what you are asking for -- you are asking for the exact opposite -- a variable that is "public", i.e. shared among files.
The standard way to do it is this:
CommonBlocks.h
extern void (^testBlock)(int); // any file can include the declaration
CommonBlocks.m
// but it's only defined in one source file
void (^testBlock)(int) = ^(int number) {
// do nothing for now;
};
Is there any way to conditionally import libraries / code based on environment flags or target platforms in Dart? I'm trying to switch out between dart:io's ZLibDecoder / ZLibEncoder classes and zlib.js based on the target platform.
There is an article that describes how to create a unified interface, but I'm unable to visualize that technique not creating duplicate code and redundant tests to test that duplicate code. game_loop employs this technique, but uses separate classes (GameLoopHtml and GameLoopIsolate) that don't seem to share anything.
My code looks a bit like this:
class Parser {
Layer parse(String data) {
List<int> rawBytes = /* ... */;
/* stuff you don't care about */
return new Layer(_inflateBytes(rawBytes));
}
String _inflateBytes(List<int> bytes) {
// Uses ZLibEncoder on dartvm, zlib.js in browser
}
}
I'd like to avoid duplicating code by having two separate classes -- ParserHtml and ParserServer -- that implement everything identically except for _inflateBytes.
EDIT: concrete example here: https://github.com/radicaled/citadel/blob/master/lib/tilemap/parser.dart. It's a TMX (Tile Map XML) parser.
You could use mirrors (reflection) to solve this problem. The pub package path is using reflection to access dart:io on the standalone VM or dart:html in the browser.
The source is located here. The good thing is, that they use #MirrorsUsed, so only the required classes are included for the mirrors api. In my opinion the code is documented very good, it should be easy to adopt the solution for your code.
Start at the getters _io and _html (stating at line 72), they show that you can load a library without that they are available on your type of the VM. Loading just returns false if the library it isn't available.
/// If we're running in the server-side Dart VM, this will return a
/// [LibraryMirror] that gives access to the `dart:io` library.
///
/// If `dart:io` is not available, this returns null.
LibraryMirror get _io => currentMirrorSystem().libraries[Uri.parse('dart:io')];
// TODO(nweiz): when issue 6490 or 6943 are fixed, make this work under dart2js.
/// If we're running in Dartium, this will return a [LibraryMirror] that gives
/// access to the `dart:html` library.
///
/// If `dart:html` is not available, this returns null.
LibraryMirror get _html =>
currentMirrorSystem().libraries[Uri.parse('dart:html')];
Later you can use mirrors to invoke methods or getters. See the getter current (starting at line 86) for an example implementation.
/// Gets the path to the current working directory.
///
/// In the browser, this means the current URL. When using dart2js, this
/// currently returns `.` due to technical constraints. In the future, it will
/// return the current URL.
String get current {
if (_io != null) {
return _io.classes[#Directory].getField(#current).reflectee.path;
} else if (_html != null) {
return _html.getField(#window).reflectee.location.href;
} else {
return '.';
}
}
As you see in the comments, this only works in the Dart VM at the moment. After issue 6490 is solved, it should work in Dart2Js, too. This may means that this solution isn't applicable for you at the moment, but would be a solution later.
The issue 6943 could also be helpful, but describes another solution that is not implemented yet.
Conditional imports are possible based on the presence of dart:html or dart:io, see for example the import statements of resource_loader.dart in package:resource.
I'm not yet sure how to do an import conditional on being on the Flutter platform.
I'm new to dart, and trying to use dart to write a hello world and a unit test, but I get the error:
duplicate top-level declaration 'METHOD main' at ../app.dart::5:6
My project dir is test-dart, and it has 3 files.
test-dart/models.dart
class User {
hello(String name) {
print("Hello, ${name}");
}
}
test-dart/app.dart
#library("app");
#source("./models.dart");
void main() {
new User().hello("app");
}
test-dart/test/test.dart
#library("test");
#import("../app.dart");
void main() {
print("hello, test");
}
Now there is an error in "test.dart" on void main(), the error message is:
duplicate top-level declaration 'METHOD main' at ../app.dart::5:6
The two main() methods are in different libraries, why they are still duplicated? How to fix it?
If you import a library like this #import('../app.dart), then all names from app.dart become visible in the importing code (all public names, actually -- those that don't start with a _). So in your test.dart library, you now have two main functions visible. That is obviously a collision. There are two ways to solve it (that I know of).
First: import the library with a prefix, like this: #import('../app.dart', prefix: 'app'). Then, all public names from app.dart are still visible, but only with an app prefix, so the main function from app.dart is only accessible by app.main. No collision here, but you have to use a prefix everytime.
Second: using a show combinator, like this: #import('../app.dart', show: ['a', 'b']). Then, it is no longer true that all names from app.dart are visible, only those explicitly named (a and b here). I'm not sure if this is already implemented, though.
Maybe in the future, we will get something opposite to the show combinator, so that you could do #import('../app.dart', hide: ['main']). That would be the best solution for your problem, but it isn't in the current language (as specified by 0.09).
You are importing app.dart without a prefix which means that the symbols of the importing and imported library can collide if there are duplicates such as in your example.
To resolve these collisions the library import allows you to prefix imports with an identifier. Your example should work if you change test.dart as follows:
#library("test");
#import("../app.dart", prefix: "app");
void main() {
print("hello, test");
app.main();
new app.User().hello("main");
}
Notice how the classes and top-level functions in the app.dart library are now accessed using the "app" prefix and thus do not collide with the names in test.dart.