How to not rebuild artifacts on every invocation - bazel

I want to download and build ruby within a workspace. I've been trying to implement this by mimicking rules_go. I have that part working. The issue I'm having is it rebuilds the openssl and ruby artifacts each time ruby_download_sdk is invoked. In the code below the download artifacts are cached but the builds of openssl and ruby are always executed.
def ruby_download_sdk(name, version = None):
# TODO detect os and arch
os, arch = "osx", "x86_64"
_ruby_download_sdk(
name = name,
version = version,
)
_register_toolchains(name, os, arch)
def _ruby_download_sdk_impl(repository_ctx):
# TODO detect platform
platform = ("osx", "x86_64")
_sdk_build_file(repository_ctx, platform)
_remote_sdk(repository_ctx)
_ruby_download_sdk = repository_rule(
_ruby_download_sdk_impl,
attrs = {
"version": attr.string(),
},
)
def _remote_sdk(repository_ctx):
_download_openssl(repository_ctx, version = "1.1.1c")
_download_ruby(repository_ctx, version = "2.6.3")
openssl_path, ruby_path = "openssl/build", ""
_build(repository_ctx, "openssl", openssl_path, ruby_path)
_build(repository_ctx, "ruby", openssl_path, ruby_path)
def _build(repository_ctx, name, openssl_path, ruby_path):
script_name = "build-{}.sh".format(name)
template_name = "build-{}.template".format(name)
repository_ctx.template(
script_name,
Label("#rules_ruby//ruby/private:{}".format(template_name)),
substitutions = {
"{ssl_build}": openssl_path,
"{ruby_build}": ruby_path,
}
)
repository_ctx.report_progress("Building {}".format(name))
res = repository_ctx.execute(["./" + script_name], timeout=20*60)
if res.return_code != 0:
print("res %s" % res.return_code)
print(" -stdout: %s" % res.stdout)
print(" -stderr: %s" % res.stderr)
Any advice on how I can make bazel aware such that it doesn't rebuild these build artifacts every time?

Problem is, that bazel isn't really building your ruby and openssl. When it prepares your build tree and runs the repository rule, it just executes a shell script as instructed, which apparently happens to build, but that fact is essentially opaque to bazel (and it also happens before bazel itself would even build).
There might be other, but I see the following as your options from top of my head:
Pre-build your ruby environment and its results as external dependency. The obvious downside (which may or may not be quite a lot of pain) being you need to do so for all platforms you need to supports (incl. making sure correct detection and corresponding download). The upside being you really only build once (per platform) and also have control over tooling used across all hosts. This would likely be my primary choice.
Build ssl and ruby as any other C sources making them just another bazel target. This however means you'd need to bazelify their builds (describe and maintain bazel build of otherwise bazel unaware project).
You can continue further along the path you've started and just (sort of) leave bazel out of it. I.e. for these builds extend the magic and in the build scripts used for instance using deterministic location and perhaps manifest files of what is around (also to make corruption less likely) make it possible to determine that the build has indeed already taken place and you can just collect its previous results.

Related

Bazel fetch remote file not as a WORKSPACE rule?

In Bazel, how do I fetch a remote file as a build rule not as a WORKSPACE rule?
I want to use a build rule because WORKSPACE rules are not loaded for transitively.
e.g. this fails
load("#bazel_tools//tools/build_defs/repo:http.bzl", "http_file")
http_file(
name = "foo",
urls = [ "https://example.com" ],
sha256 = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
executable = True,
)
Error in repository_rule: 'repository rule http_file' can only be called during workspace loading
If you really want to do that, you have to implement your own rule, a naïve trivial example relying on curl to fetch could be:
def _impl(ctx):
args = ctx.actions.args()
args.add("-o", ctx.outputs.out)
args.add(ctx.attr.url)
ctx.actions.run(
outputs = [ctx.outputs.out],
executable = "curl",
arguments = [args],
)
get_stuff = rule(
_impl,
attrs = {
"url": attr.string(
mandatory = True,
),
},
outputs = {"out": "%{name}.out"},
)
But (and esp. in such a trivial) for, it comes with problems. Apart from, do you want to step out of sandbox during the build? And do you want to talk to someone across the network during the build (out of the sandbox)? Bypassing repository_cache, and possibly getting remote_cache involved (networked caching of networked fetching). Specifically in this example, if content of the file pointed to by url changes... build has no idea and only fetches it when it either hasn't done so or the url itself has changed. I.e. the implementation would need to be more robust (mimic that of http_file for instance).
But it actually sounds like you're trying to address a different problem (transitive external dependencies, for which there could be another solution). One trick used for that is to define a macro (in your first level dependency to load define the next hop) and after declaring that first step as an external dependency in your parent project, load the that macro and use it from parent project WORKSPACE. This too has a price though, namely the first level dependency has to always be present (fetched or already cached), even if build target asked for does not actually need it (as that load and macro call will always pull it in).

Bazel: BUILD fragments need to differ based on the target operating system

I have something like this in a BUILD file. I am un/commenting lines based on the operating system. Is there a graceful way to do this?
# Ubuntu
#shared_libraries = [
# "libboost_atomic.so"
#],
# OSX
shared_libraries = [
"libboost_atomic.dylib"
],
I haven't tried this however the bazel-build/rules_nodejs uses this kind of approach by wrapping the native binary and querying its's Node JS OS API.
OS Name Function(uses context given from user running node.exe)
UPDATE
Check here
Setting up a C++ Toolchain
This seems to be what you need.

Is it possible to keep my Nix packages in sync across machines not running NixOS?

I know with NixOS, you can simply copy over the configuration.nix file to sync your OS state including installed packages between machines.
Is it possible then, to do the same using Nix the package manager on a non-NixOS OS to sync only the installed packages?
Please note, that at least since 30.03.2017 (corresponding to 17.03 Nix/NixOS channel/release), as far as I understand the official, modern, supported and suggested solution is to use the so called overlays.
See the chapter titled "Overlays" in the nixpkgs manual for a nice guide on how to use the new approach.
As a short summary: you can put any number of files with .nix extension in $HOME/.config/nixpkgs/overlays/ directory. They will be processed in alphabetical order, and each one can modify the set of available Nix packages. Each of the files must be written as in the following pattern:
self: super:
{
boost = super.boost.override {
python = self.python3;
};
rr = super.callPackage ./pkgs/rr {
stdenv = self.stdenv_32bit;
};
}
The super set corresponds to the "old" set of packages (before the overlay was applied). If you want to refer to the old version of a package (as in boost above), or callPackage, you should reference it via super.
The self set corresponds to the eventual, "future" set of packages, representing the final result after all overlays are applied. (Note: don't be scared when sometimes using them might get rejected by Nix, as it would result in infinite recursion. Probably you should rather just use super in those cases instead.)
Note: with the above changes, the solution I mention below in the original answer seems "deprecated" now — I believe it should still work as of April 2017, but I have no idea for how long. It appears marked as "obsolete" in the nixpkgs repository.
Old answer, before 17.03:
Assuming you want to synchronize apps per-user (as non-NixOS Nix keeps apps visible on per-user basis, not system-wide, as far as I know), it is possible to do it declaratively. It's just not well advertised in the manual — though it seems quite popular among long-time Nixers!
You must create a text file at: $HOME/.nixpkgs/config.nix — e.g.:
$ mkdir -p ~/.nixpkgs
$ $EDITOR ~/.nixpkgs/config.nix
then enter the following contents:
{
packageOverrides = defaultPkgs: with defaultPkgs; {
home = with pkgs; buildEnv {
name = "home";
paths = [
nethack mc pstree #...your favourite pkgs here...
];
};
};
}
Then you should be able to install all listed packages with:
$ nix-env -i home
or:
$ nix-env -iA nixos.home # *much* faster than above
In paths you can put stuff in a similar way like in /etc/nixos/configuration.nix on NixOS. Also, home is actually a "fake package" here. You can add more custom package definitions beside it, and then include them your "paths".
(Side note: I'm hoping to write a blog post with what I learned on how exactly this works, and also showing how to extend it with more customizations. I'll try to remember to link it here if I succeed.)

Different checksum results for jar files compiled on subsequent build?

I am working verifying the jar files present on remote unix boxes with that of built on local machine(Windows & Cygwin) with same JVM.
As a POC I am trying to verify if same checksum is produced with jar files generated on my machine with consecutive builds, I tried below,
Generated the jar file first time using ant script
Calculated the checksum (e.g. "xyz abc")
Generated the jar file again with same ant script without changing anything
I got different checksum but same byte count (e.g. "xvw abc")
I am not sure how java internal processes produce the class files and then the jar files, Can someone please help me understand below points
Does the cksum utility of unix/cygwin consider timestamp of the file while coming up with the value?
Will the checksum be different for compiled class files/jar file produced if we keep every other things same [Compiler version + sourcecode + machine + environment]?
Answer to question 1: cksum doesn't consider the timestamp of the archive (e.g. jar-file) but it does consider the timestamps of the files inside the jarfile.
Answer to question 2: The checksums of the individual class-files will be the same with all other things the same (source-code, compiler etc.) The checksums of the jar-files will be different. Causes of differences can be the timestamp of the files inside the jarfile or if files are put into the archive in different orders (e.g. caused by parallel builds).
If you want to create a reproducible build with gradle you can do so with the config below:
tasks.withType(AbstractArchiveTask) {
preserveFileTimestamps = false
reproducibleFileOrder = true
}
Maven allows something similar, sorry I don't know how to do this with ant..
More info here:
https://dzone.com/articles/reproducible-builds-in-java
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74682318

Comparing generated executables for equivilance

I need to compare 2 executables and/or shared objects, compiled using the same compiler/flags and verify that they have not changed. We work in a regulated environment, so it would be really useful for testing purposes to isolate exactly what parts of the executable has changed.
Using MD5Sums/Hashes doesn't work due to the headers containing information about the file.
Does anyone know of a program or way to verify that 2 files are executionally the same even if they were built at a different time?
An interesting question. I have a similar problem on linux. Intrusion detection systems like OSSEC or tripwire may generate false positives if the hashsum of an executable changes all of a sudden. This may be nothing worse than the Linux "prelink" program patching the executable file for faster startups.
In order to compare two binaries (in the ELF format), one can use the "readelf" executable and then "diff" to compare outputs. I'm sure there are refined solutions, but without further ado, a poor man's comparator in Perl:
#!/usr/bin/perl -w
$exe = $ARGV[0];
if (!$exe) {
die "Please give name of executable\n"
}
if (! -f $exe) {
die "Executable $exe not found or not a file\n";
}
if (! (`file '$exe'` =~ /\bELF\b.*?\bexecutable\b/)) {
die "file command says '$exe' is not an ELF executable\n";
}
# Identify sections in ELF
#lines = pipeIt("readelf --wide --section-headers '$exe'");
#sections = ();
for my $line (#lines) {
if ($line =~ /^\s*\[\s*(\d+)\s*\]\s+(\S+)/) {
my $secnum = $1;
my $secnam = $2;
print "Found section $1 named $2\n";
push #sections, $secnam;
}
}
# Dump file header
#lines = pipeIt("readelf --file-header --wide '$exe'");
print #lines;
# Dump all interesting section headers
#lines = pipeIt("readelf --all --wide '$exe'");
print #lines;
# Dump individual sections as hexdump
for my $section (#sections) {
#lines = pipeIt("readelf --hex-dump='$section' --wide '$exe'");
print #lines;
}
sub pipeIt {
my($cmd) = #_;
my $fh;
open ($fh,"$cmd |") or die "Could not open pipe from command '$cmd': $!\n";
my #lines = <$fh>;
close $fh or die "Could not close pipe to command '$cmd': $!\n";
return #lines;
}
Now you can run for example, on machine 1:
./checkexe.pl /usr/bin/curl > curl_machine1
And on machine 2:
./checkexe.pl /usr/bin/curl > curl_machine2
After having copypasted, SFTP-ed or NSF-ed (you don't use FTP, do you?) the files into the same filetree, compare the files:
diff --side-by-side --width=200 curl_machine1 curl_machine2 | less
In my case, differences exist in section ".gnu.conflict", ".gnu.liblist", ".got.plt" and ".dynbss", which might be ok for a "prelink" intervention, but in the code section, ".text", which would be a Bad Sign.
To follow up, here is what I came up with finally:
Instead of comparing the final executables & shared objects, we compared the .o files output before linking. We assumed that the linking process was sufficiently reproducible that this would be fine.
It works in some of our cases, where we have two builds were we've made some small change that shouldn't effect the final code (Code pretty-printer) but doesn't help us if we do not have the build intermediary output.
You can compare the contents of RO and RW initialized sections by generating a binary file from the ELF file.
objcopy <elf_file> -O binary <binary_file>
Use the generated binary files to compare if they are identical, using diff, for example.
In my opinion, this is enough to grantee you are generating the same executable.
A few years back I had to do the same thing. We had to prove that we could rebuild the executable from source when given only a revision number, revision control repository, build tools, and build configuration. Note: If any of these change you may see a difference.
I remember there is some timestamps in the executable. The trick is to realize that the file is not just a bunch of bytes, that can not be interpreted. The file has sections, most will not change, but there will be a section for time of build (or some such thing).
I don't remember all the details, but the commands you will need are { objcopy, objdump, nm }, I think objdump would be the first to try.
Hope this helps.

Resources