Setting multiple files as source input generated by a single task generator - waf

A single task generator generates a number of source and header files. The number of generated files is not known at that time. How can I set these generated files as source input?
I used the code shown in the documentation, but this only describes the case a.a → a.b + a.c, but my case is a.a → a lot of files in directory a. Therefore I am not able to use:
b_node = node.change_ext('.b')
c_node = node.change_ext('.c')
self.create_task('idl', node, [b_node, c_node])
self.source.append(b_node)
The example is shown in the documentation here: https://waf.io/book/#_mixing_extensions_and_c_c_features
How can these unknown number of files used as input for self.source.append(**what goes here?**)

Well you should look at §11.4.2: A compiler producing source files with names unknown in advance. The trick is to manage dependencies by overloading runnable_status() and run() methods

Related

Consuming contents of declare_directory

I have rule A implemented with a macro that uses declare_directory to produce a set of files:
output = ctx.actions.declare_directory("selected")
Names of those files are not known in advance. The implementation returns the directory created by declare_directory with the following:
return DefaultInfo(
files = depset([output]),
)
Rule A is included in "srcs" attribute of rule B. Rule B is also implemented with a macro. Unfortunately the list of files passed to B implementation through "srcs" attribute only contains the "selected" directory created by rule A instead of files residing in that directory.
I know that Args class supports expansion of directories so I could pass names of all files in "selected" directory to a single action. What I need, however, is a separate action for every individual file for parallelism and caching. What is the best way to achieve that?
This is one of the intended use cases of directory outputs (called TreeArtifacts in the implementation), and it's implemented using ActionTemplate:
https://github.com/bazelbuild/bazel/blob/c2100ad420618bb53754508da806b5624209d9be/src/main/java/com/google/devtools/build/lib/actions/ActionTemplate.java#L24-L57
However, this is not exposed to Starlark, and has only a couple usages currently, in the Android rules AndroidBinary.java and C++ rules CcCompilationHelper.java. The Android rules and C++ rules are going to be migrated over to Starlark, so this functionality might eventually be made available in Starlark, but I'm not sure of any concrete timelines. It would probably be good to file a feature request on Github.

File Organization in a tar.gz archive

While I observed that usually the files inside a folder are listed sequentially in a tar.gz archive in one exceptional case I found that it is listed in a random manner. E.g., let's say there are three folders a, b, and c and each contains 1,2,3 file. In the usual case, the archive entries would be listed in a/1, a/2, a/3, b/1, b/2, b/3, c/1, c/2, c/3 but in this case it is something like b/2, a/1, b/4, ... Why this could happen? I'm using the first organization assumption to read a .tar.gz archive file and do some processing on the data inside at a folder level. Without traversing the whole archive each time and generating parent/child formation any idea if I could get the folder listings sorted inline for such cases. Sample code below:
TarArchiveInputStream tis = new TarArchiveInputStream("a.tar");
while(tis.getNextTarEntry()!=null)
System.out.println(tis.getCurrentEntry().getName() );
I could not find any API which would give me such a sorted list inline. It would be very helpful if somebody helps me here. I'm stuck with this case.

How to deal with implicit dependency (e.g C++ include) for incremental build in custom Skylark rule

Problem
I wonder how to inform bazel about dependencies unknown at declaration time, but known at build time (a.k.a implicit dependencies, dynamic dependencies, ...). For instance when compiling C++ sources, the .cpp source file will depends on some header files and this information is not available when writing the BUILD file. It needs to be retrieve at build time. Whatever is the solution to get the information (dry-run, generating depfile, parsing stdout), it needs to be done at build time and the information need to be retrieved to bazel build graph.
Since skylark does not allow to do I/O, for instance to read a generated depfile or to parse stdout result containing a dependency list, I have no clue on how to deal with it.
Behind implicit dependencies, I am looking for correct incremental build.
Example
To experiment this problem I have created a simple tool, just_a_tool.exe which takes an input file, read a list of file from it, and concatenate the content of all these file to an output file.
command line example:
just_a_tool.exe --input input.txt --depfile dep.d output.txt
dep.d contains the list of all the read files.
Issue
If I change the content of test1.txt, test2.txt, or test3.txt, bazel does not rebuild output.txt file. Of course, because it does not know there were dependencies.
Example files
just_a_tool.bzl
def _impl(ctx):
exec_path = "C:/Code/JustATool/just_a_tool.exe"
for f in ctx.attr.source.files:
source_path = f.path
output_path = ctx.outputs.out.path
dep_file = ctx.actions.declare_file("dep.d")
args = ["--input", source_path, "--dep_file", dep_file.path, output_path]
ctx.actions.run(
outputs=[ctx.outputs.out, dep_file],
executable=exec_path,
inputs=ctx.attr.source.files,
arguments=args
)
jat_convert = rule(
implementation = _impl,
attrs = {
"source" : attr.label(mandatory=True, allow_files=True, single_file=True)
},
outputs = {"out": "%{name}.txt"}
)
BUILD
load("//tool:just_a_tool.bzl", "jat_convert")
jat_convert(
name="my_output",
source=":input.txt"
)
input.txt
test1.txt
test2.txt
test3.txt
Goal
I want to do correct and fast incremental build for the following situation:
Generate reflection data from C++ sources, this custom tool execution depends on header file included in my source files.
Use a internal tool to build asset file which can include other files
Run a custom preprocessor on my shaders allowing a #include feature
Thanks!
Bazel's extension language doesn't support creating actions with a dynamic set of inputs, where this set depends on the output of a previous action. In other words, custom rules cannot run an action, read the action's output, then create actions with those inputs or update (or prune the set of) inputs of already created actions.
Instead I suggest adding attribute(s) to your rule where the user can declare the set of files that the sources may include. I call this "the universe of headers". The actions you create depend on this user-defined universe, so the set of action inputs is completely defined. Of course this means these actions potentially depend on more files than the cpp files, which they process, include.
This approach is analogous to how the cc_* rules work: a file in cc_*.srcs can include other files in the srcs of the same rule and from hdrs of dependencies, but nothing else. Thus the union of srcs + hdrs of (direct & transitive) dependencies defines the universe of header files that a cpp file may include.

Multiple file generation while writing to XML through Apache Beam

I'm trying to write an XML file where the source is a text file stored in GCS. The code is running fine but instead of a single XML file, it is generating multiple XML files. (No. of XML files seem to follow total no. of records present in source text file). I have observed this scenario while using 'DataflowRunner'.
When I run the same code in local then two files get generated. First one contains all the records with proper elements and the second one contains only opening and closing root element.
Any idea about the occurrence of this unexpected behaviour? please find below the code snippet I'm using :
PCollection<String>input_records=p.apply(TextIO.read().from("gs://balajee_test/xml_source.txt"));
PCollection<XMLFormatter> input_object= input_records.apply(ParDo.of(new DoFn<String,XMLFormatter>(){
#ProcessElement
public void processElement(ProcessContext c)
{
String elements[]=c.element().toString().split(",");
c.output(new XMLFormatter(elements[0],elements[1],elements[2],elements[3],elements[4]));
System.out.println("Values to be written have been provided to constructor ");
}
})).setCoder(AvroCoder.of(XMLFormatter.class));
input_object.apply(XmlIO.<XMLFormatter>write()
.withRecordClass(XMLFormatter.class)
.withRootElement("library")
.to("gs://balajee_test/book_output"));
Please let me know the way to generate a single XML file(book_output.xml) at output.
XmlIO.write().to() is documented as follows:
/**
* Writes to files with the given path prefix.
*
* <p>Output files will have the name {#literal {filenamePrefix}-0000i-of-0000n.xml} where n is
* the number of output bundles.
*/
I.e. it is expected that it may produce multiple files: e.g. if the runner chooses to process your data parallelizing it into 3 tasks ("bundles"), you'll get 3 files. Some of the parts may turn out empty in some cases, but the total data written will always add up to the expected data.
Asking the IO to produce exactly one file is a reasonable request if your data is not particularly big. It is supported in TextIO and AvroIO via .withoutSharding(), but not yet supported in XmlIO. Please feel free to file a JIRA with the feature request.

F# 'modular' scripting

What is the recommended way to load+reload fsx files? Just experimenting... yes yes right language right job ect ect..
I love how the following can be done in FSI:
#load "script.fsx";
open Script
> let p = script.x 1
Error: This expression was expected to have type string but here has int...
(* edit script.fsx x to make it int -> int *)
>
> #load "script.fsx"
> let p = script.x 1
val it : int = 2
But how do we do this for an application that we are running via fsi blah.fsx? Maybe something that is sitting in a while loop. It seems #load and #use must not be inside let or module.. i.e. you cannot use #load like let reload script = #load script, wonder why?
My original method was to have .fs files and recompile + relaunch each time I wanted to add/fix something. This method feels primitive.
Second method was to attempt to use the #load directive inside of a module, which turns out to not work (kind of makes sense in terms of scoping)...
module test1 =
#load #"C:\users\pc\Desktop\test.fsx"
open Test
module test2 =
...
Another way would be to create a new process for every module by loading fsi module.fsx with process diagnostics, but this seems horrible, inefficient and ugh.
I have a feeling deep in my heart that this will not be trivial inside .NET, but I would like to pose the question anyway, FSI does it... I wonder if I can leverage the FSI API or something (or at the least to copy their code)?
TL;DR I read the following about erlang and want it for myself in F#.
Erlang: Is there a way to reload changed modules into an already running node with rebar?
"...any time a module in your program changes on disk, the reloader will replace the running copy automatically."
I don't know if this would work in FS but in ML you can load a master file that loads all your files in your project and then executes any code that you need to use to knit them together and runs your application. To see an example of a massive app run from inside of a REPL look at the Isabelle/HOL site at the Cambridge laboratory of Computational Science http://www.cl.cam.ac.uk/research/hvg/Isabelle/installation.html. After downloading the app look in the src code directory for any file called root.ml. There will be half a dozen of them that control various levels of implementation. This is recursive because a top level file can call a file in several sub-directories that loads that particular sub-feature. This allows targeting your application to various scenarios depending on which top level file is executed.
Typical .NET Framework applications cannot unload/reload assemblies unless they are in an App Domains that are separate from the primary one that starts up with the application. This is essentially how most plugin systems are designed for applications that run on the full .NET Framework. Things may be changing post .NET Standard 2.0 in .NET Core with the Collectible Assemblies feature.
References:
https://github.com/dotnet/coreclr/issues/552
https://github.com/dotnet/corefx/issues/19773

Resources