Finding all Nix build time dependencies, including bootstrapping ones - nix

I'm trying to set up a Nix cache containing all the store paths needed to build a simple derivation. The goal is for this to work on an empty store, with no cache misses so I don't have to hit cache.nixos.org at all. I'm having trouble because Nix seems to download a bunch of extra bootstrapping stuff, apparently to help with fetching.
For example, consider this derivation:
# empty_store_test_simple.nix
with { inherit (import <nixpkgs> {}) fetchFromGitHub; };
(fetchFromGitHub {
owner = "codedownio";
repo = "templates";
rev = "ba68b83d25d2b74f5475521ac00de3bbb884c983";
sha256 = "sha256-LNTi1ZBEsThmGWK53U9Na1j5DKHljcS42/PRXj97p6s=";
})
If I build this on an empty store with --dry-run, I see the following:
λ nix build --impure --store ~/experimental-store --substituters https://cache.nixos.org/ --dry-run -f ./empty_store_test_simple.nix
this derivation will be built:
/nix/store/jr55kq3kk6va95rvjcdyn5jmh059007p-source.drv
these 48 paths will be fetched (24.52 MiB download, 121.41 MiB unpacked):
/nix/store/02bfycjg1607gpcnsg8l13lc45qa8qj3-libssh2-1.10.0
/nix/store/0fi0432kdh46x9kbngnmz2y7z0q68cdz-xz-5.2.5-bin
/nix/store/0rizskpri8d8qawx6qjqcnvlxcvzr1bm-keyutils-1.6.3-lib
/nix/store/1l4r0r4ab3v3a3ppir4jwiah3icalk9d-zlib-1.2.11
/nix/store/1xyz8jwyg9rya2f7gs549c7n2ah378v6-stdenv-linux
/nix/store/3h4a92kysiw3s3rvbsa6a2nys3lf8f8v-libkrb5-1.18
/nix/store/3ibnw61rlgj2lj5hycy2dn3ybpq7wapm-libev-4.33
/nix/store/5qbrz5fimkbywws73vaim8allyh8kjy5-nghttp2-1.43.0-bin
/nix/store/67x6kxbanrqafx3hg7pb3bc83i3d1v3f-gzip-1.11
/nix/store/6irxz4fbf1d1ac7wvdjf8cqb3sgmnvg8-zlib-1.2.11-dev
/nix/store/71pachqc22wlvf3xjhwjh2rqbl6l3ngg-diffutils-3.8
/nix/store/9mp06ni69a44dmrjhn28mn15brdry52w-gnused-4.8
/nix/store/9ppi191zsi7zvynkm8vy2bi22lci9iwg-bzip2-1.0.6.0.2
/nix/store/c9f15p1kwm0mw5p13wsnvd1ixrhbhb12-gcc-10.3.0-lib
/nix/store/d1n274a607fmqdgr7888nq19hdsj7av0-openssl-1.1.1l-bin
/nix/store/d8p27w4d21xs6svkaf3ij60lsw243rn2-openssl-1.1.1l-dev
/nix/store/fdbwa5jrijn0yzwl8l4xdxa0l5daf5j6-curl-7.79.1
/nix/store/fvprxgcxf4px865gdjd81fbwnxcjrg41-coreutils-9.0
/nix/store/gf6j3k1flnhayvpnwnhikkg0s5dxrn1i-openssl-1.1.1l
/nix/store/gmnh4jfjhx83aggwgwzcnrwmpmqr8fwf-gnutar-1.34
/nix/store/gmzhclix3kzhir5jmmwakwhpg6j5zwf1-acl-2.3.1
/nix/store/h0b8ajwz9lvw3a3vqrf41cxrhlx9dz7p-nghttp2-1.43.0-lib
/nix/store/h7srws2r1nalsih91lrm0hfhhar14jzm-libkrb5-1.18-dev
/nix/store/h97sr1q1rpv1ry83031q51jbkba7q0m4-bzip2-1.0.6.0.2-bin
/nix/store/ihscadskdrvwc9dvbirff51lr70cphjj-curl-7.79.1-bin
/nix/store/ikvp5db9hygc14da45lvxi1c9b4ylna9-pcre-8.44
/nix/store/ilszk5f0zcv8lifkixg47ja1f2lsgxkd-nghttp2-1.43.0-dev
/nix/store/k0qa3rjifblr2vrgx4g54a59zxlfhg90-xz-5.2.5
/nix/store/kd14wd2wfmb56zpv5y71yq2lqs11l06k-attr-2.5.1
/nix/store/ksqy6mszsld4z3w8ybxa2vkjf5cqxw3f-c-ares-1.17.2
/nix/store/l0wlqpbsvh1pgvhcdhw7qkka3d31si7k-bash-5.1-p8
/nix/store/lhambyc1v2c7qzzr5sq7p449xs1j6pg8-gnugrep-3.7
/nix/store/lypy3bif096j0qc1divwa87gdvv3r575-curl-7.79.1-dev
/nix/store/p12km8psjlmvbmi52wb9r6gfykqxcdnd-libssh2-1.10.0-dev
/nix/store/pkpynsyxm8c38z4m8ngv52c7v8vhkr2h-unzip-6.0
/nix/store/prq96vz3ywk955nnxlr7s892wf5qvbr0-mirrors-list
/nix/store/psqacrv7k5fxz6mdiawc28sxcdchb4c9-ed-1.17
/nix/store/qbdsd82q5fyr0v31cvfxda0n0h7jh03g-libunistring-0.9.10
/nix/store/qzr7r4w5gm5m20afn2wz4vlv7ah4sr89-gnumake-4.3
/nix/store/r5niwjr8r8qags2bzv9z583r9vajxag3-patchelf-0.13
/nix/store/rnx655nq2qs53yb5arv2gapa91r1wsbn-findutils-4.8.0
/nix/store/scz4zbxirykss3hh5iahgl39wk9wpaps-libidn2-2.3.2
/nix/store/sqn31ly001033hsz0dpxwcsay5qdbk2w-gawk-5.1.1
/nix/store/vslsa0l17xjcrdgm2knwj0z5hlvf73m7-perl-5.34.0
/nix/store/x6pz7c0ffcd6kxzc8m1rflvqmdbjiihh-nghttp2-1.43.0
/nix/store/yj11v0gdjqli4nzax4x48xjnh9y36b2q-curl-7.79.1-man
/nix/store/z56jcx3j1gfyk4sv7g8iaan0ssbdkhz1-glibc-2.33-56
/nix/store/zjm4xv4nr872mdhvv3j22bzb08rgf1hk-patch-2.7.6
However, I can't find all these paths by using the means I would expect:
λ nix repl empty_store_test_simple.nix
nix-repl> :b with import <nixpkgs> {}; closureInfo { rootPaths = [inputDerivation]; }
This derivation produced the following outputs:
out -> /nix/store/1iqm7sr2rr6i5njnfxlqqyzi567mb4cz-closure-info
λ cat /nix/store/1iqm7sr2rr6i5njnfxlqqyzi567mb4cz-closure-info/store-paths
/nix/store/icvdinlsgl6y2kxk9wzkj82a53jgpdlm-source
I would expect to see ~48 paths here, but I only see 1! How can I get all the build-time dependencies indicated by the dry run? I've seen this kind of issue in the past when IFD is present, could there be some going on in Nixpkgs?

If you build your derivation and store it in your own Nix cache that should work; Nix shouldn't need to get the build-time dependencies of that thing, it can just download the result (i.e. the source code you're fetching) from your cache.
If you want to get the build-time dependencies anyway, try:
nix-store -qR $(nix-instantiate test.nix)
I think this will get you one step closer but it's not a complete solution. You'd probably need to build all the derivations in this list or something.

Related

How to issue Message Before Build--or seq problems

I'm trying to add helpful messages for arbitrary builds. If the build fails the user can, for example, install the package with different arguments.
My interface idea is to provide a function, build-with-message, that would be called with something like this:
build-with-message
''Building ${pkg.name}. Alternative invocations are: ..''
pkg
My implementation is based on builtins.seq
build-with-message = msg : pkg :
seq
(self.runCommand "issue-message" {} ''mkdir $out; echo ${msg}'')
pkg;
When I build a package with build-with-message I never see the message. My hunch is that seq evaluates the runCommand far enough to see that a set is returned and moves on to building the package. I tried with deepSeq as well, but a deepSeq build fails on runCommand. I also tried calling out some attributes from the runCommand, e.g.
(self.runCommand "issue-message" {} ''mkdir $out; echo ${msg}'').drvPath
(self.runCommand "issue-message" {} ''mkdir $out; echo ${msg}'').out
My thought being that calling for one of these would prompt the rest of the build. Perhaps I'm not calling the right attribute, but in any case the ones I've tried don't work.
So:
Is there a way to force the runCommand to build in the above scenario?
Is there already some builtin that just lets me issue messages on top of arbitrary builds?
Here's me answering my own question again, consider this a warning.
Solution:
I've in-lined some numbered comments to help with the explanation.
build-with-message = msg : pkg :
let runMsg /*1*/ = self.runCommand "issue-message"
{ version = toString currentTime; /*2*/ } ''
cat <<EOF
${msg}
EOF
echo 0 > $out /*3*/
'';
in seq (import runMsg /*4*/) pkg; /*5*/
Explanation:
runMsg is the derivation that issues the message.
Adding a version based on the current time ensures that the build of runMsg will not be in /nix/store. Otherwise, each unique message will only be issued for the first build.
After the message is printed, a 0 is saved to file as the output of the derivation.
The import loads runMsg--a derivation, and therefore serialized as the path $out. Import expects a nix expression, which in this case is just the number 0 (a valid nix expression).
Now, since the runMsg output will not be available until after it has been built, the seq command will build it (issuing the message) and then build pkg.
Discussion:
I take note of Robert Hensing's comment to my question--this may not be something Nix was not intended for. I'm not arguing against that. Moving on.
Notice that issuing a message like so will add a file to your nix store for every message issued. I don't know if the message build will be garbage collected while pkg is still installed, so there's the possibility of polluting the nix store if such a pattern is overused.
I also think it's really interesting that the result of the runMsg build was to install a nix expression. I suppose this opens the door to doing useful things.

How to not rebuild artifacts on every invocation

I want to download and build ruby within a workspace. I've been trying to implement this by mimicking rules_go. I have that part working. The issue I'm having is it rebuilds the openssl and ruby artifacts each time ruby_download_sdk is invoked. In the code below the download artifacts are cached but the builds of openssl and ruby are always executed.
def ruby_download_sdk(name, version = None):
# TODO detect os and arch
os, arch = "osx", "x86_64"
_ruby_download_sdk(
name = name,
version = version,
)
_register_toolchains(name, os, arch)
def _ruby_download_sdk_impl(repository_ctx):
# TODO detect platform
platform = ("osx", "x86_64")
_sdk_build_file(repository_ctx, platform)
_remote_sdk(repository_ctx)
_ruby_download_sdk = repository_rule(
_ruby_download_sdk_impl,
attrs = {
"version": attr.string(),
},
)
def _remote_sdk(repository_ctx):
_download_openssl(repository_ctx, version = "1.1.1c")
_download_ruby(repository_ctx, version = "2.6.3")
openssl_path, ruby_path = "openssl/build", ""
_build(repository_ctx, "openssl", openssl_path, ruby_path)
_build(repository_ctx, "ruby", openssl_path, ruby_path)
def _build(repository_ctx, name, openssl_path, ruby_path):
script_name = "build-{}.sh".format(name)
template_name = "build-{}.template".format(name)
repository_ctx.template(
script_name,
Label("#rules_ruby//ruby/private:{}".format(template_name)),
substitutions = {
"{ssl_build}": openssl_path,
"{ruby_build}": ruby_path,
}
)
repository_ctx.report_progress("Building {}".format(name))
res = repository_ctx.execute(["./" + script_name], timeout=20*60)
if res.return_code != 0:
print("res %s" % res.return_code)
print(" -stdout: %s" % res.stdout)
print(" -stderr: %s" % res.stderr)
Any advice on how I can make bazel aware such that it doesn't rebuild these build artifacts every time?
Problem is, that bazel isn't really building your ruby and openssl. When it prepares your build tree and runs the repository rule, it just executes a shell script as instructed, which apparently happens to build, but that fact is essentially opaque to bazel (and it also happens before bazel itself would even build).
There might be other, but I see the following as your options from top of my head:
Pre-build your ruby environment and its results as external dependency. The obvious downside (which may or may not be quite a lot of pain) being you need to do so for all platforms you need to supports (incl. making sure correct detection and corresponding download). The upside being you really only build once (per platform) and also have control over tooling used across all hosts. This would likely be my primary choice.
Build ssl and ruby as any other C sources making them just another bazel target. This however means you'd need to bazelify their builds (describe and maintain bazel build of otherwise bazel unaware project).
You can continue further along the path you've started and just (sort of) leave bazel out of it. I.e. for these builds extend the magic and in the build scripts used for instance using deterministic location and perhaps manifest files of what is around (also to make corruption less likely) make it possible to determine that the build has indeed already taken place and you can just collect its previous results.

How do I get output files for a given Bazel target?

Ideally, I'd like a list of output files for a target without building. I imagine this should be possible using cquery which runs post-analysis, but can't figure out how.
Here's my output.cquery
def format(target):
outputs = target.files.to_list()
return outputs[0].path if len(outputs) > 0 else "(missing)"
You can run this as follows:
bazel cquery //a/b:bundle --output starlark \
--starlark:file=output.cquery 2>/dev/null
bazel-out/darwin-fastbuild/bin/a/b/something-bundle.zip
For more information on cquery.
What exactly do you mean by "output files" here? Do you mean that you'd like to know the files generated if you build the target on the command line?
At what point would you like to have this information? Do you really want to invoke a bazel query command to acquire this information, or would you like it during analysis? I don't think there's a way, using bazel query, to get the exact expected absolute path of output files (or even the workspace-relative path, for example, bazel-out/foo/bar/baz.txt)
It may be a bit more involved than you want, but Requesting Output Files
has some information about specifying output files in Starlark, with a brief bit about acquiring information about your dependencies' output files (See DefaultInfo
I made a slight improvement to Engene's answer, since a target's output might be multiple:
bazel cquery --output=starlark \
--starlark:expr="'\n'.join([f.path for f in target.files.to_list()])" \
//foo:bar

Is it possible to keep my Nix packages in sync across machines not running NixOS?

I know with NixOS, you can simply copy over the configuration.nix file to sync your OS state including installed packages between machines.
Is it possible then, to do the same using Nix the package manager on a non-NixOS OS to sync only the installed packages?
Please note, that at least since 30.03.2017 (corresponding to 17.03 Nix/NixOS channel/release), as far as I understand the official, modern, supported and suggested solution is to use the so called overlays.
See the chapter titled "Overlays" in the nixpkgs manual for a nice guide on how to use the new approach.
As a short summary: you can put any number of files with .nix extension in $HOME/.config/nixpkgs/overlays/ directory. They will be processed in alphabetical order, and each one can modify the set of available Nix packages. Each of the files must be written as in the following pattern:
self: super:
{
boost = super.boost.override {
python = self.python3;
};
rr = super.callPackage ./pkgs/rr {
stdenv = self.stdenv_32bit;
};
}
The super set corresponds to the "old" set of packages (before the overlay was applied). If you want to refer to the old version of a package (as in boost above), or callPackage, you should reference it via super.
The self set corresponds to the eventual, "future" set of packages, representing the final result after all overlays are applied. (Note: don't be scared when sometimes using them might get rejected by Nix, as it would result in infinite recursion. Probably you should rather just use super in those cases instead.)
Note: with the above changes, the solution I mention below in the original answer seems "deprecated" now — I believe it should still work as of April 2017, but I have no idea for how long. It appears marked as "obsolete" in the nixpkgs repository.
Old answer, before 17.03:
Assuming you want to synchronize apps per-user (as non-NixOS Nix keeps apps visible on per-user basis, not system-wide, as far as I know), it is possible to do it declaratively. It's just not well advertised in the manual — though it seems quite popular among long-time Nixers!
You must create a text file at: $HOME/.nixpkgs/config.nix — e.g.:
$ mkdir -p ~/.nixpkgs
$ $EDITOR ~/.nixpkgs/config.nix
then enter the following contents:
{
packageOverrides = defaultPkgs: with defaultPkgs; {
home = with pkgs; buildEnv {
name = "home";
paths = [
nethack mc pstree #...your favourite pkgs here...
];
};
};
}
Then you should be able to install all listed packages with:
$ nix-env -i home
or:
$ nix-env -iA nixos.home # *much* faster than above
In paths you can put stuff in a similar way like in /etc/nixos/configuration.nix on NixOS. Also, home is actually a "fake package" here. You can add more custom package definitions beside it, and then include them your "paths".
(Side note: I'm hoping to write a blog post with what I learned on how exactly this works, and also showing how to extend it with more customizations. I'll try to remember to link it here if I succeed.)

Comparing generated executables for equivilance

I need to compare 2 executables and/or shared objects, compiled using the same compiler/flags and verify that they have not changed. We work in a regulated environment, so it would be really useful for testing purposes to isolate exactly what parts of the executable has changed.
Using MD5Sums/Hashes doesn't work due to the headers containing information about the file.
Does anyone know of a program or way to verify that 2 files are executionally the same even if they were built at a different time?
An interesting question. I have a similar problem on linux. Intrusion detection systems like OSSEC or tripwire may generate false positives if the hashsum of an executable changes all of a sudden. This may be nothing worse than the Linux "prelink" program patching the executable file for faster startups.
In order to compare two binaries (in the ELF format), one can use the "readelf" executable and then "diff" to compare outputs. I'm sure there are refined solutions, but without further ado, a poor man's comparator in Perl:
#!/usr/bin/perl -w
$exe = $ARGV[0];
if (!$exe) {
die "Please give name of executable\n"
}
if (! -f $exe) {
die "Executable $exe not found or not a file\n";
}
if (! (`file '$exe'` =~ /\bELF\b.*?\bexecutable\b/)) {
die "file command says '$exe' is not an ELF executable\n";
}
# Identify sections in ELF
#lines = pipeIt("readelf --wide --section-headers '$exe'");
#sections = ();
for my $line (#lines) {
if ($line =~ /^\s*\[\s*(\d+)\s*\]\s+(\S+)/) {
my $secnum = $1;
my $secnam = $2;
print "Found section $1 named $2\n";
push #sections, $secnam;
}
}
# Dump file header
#lines = pipeIt("readelf --file-header --wide '$exe'");
print #lines;
# Dump all interesting section headers
#lines = pipeIt("readelf --all --wide '$exe'");
print #lines;
# Dump individual sections as hexdump
for my $section (#sections) {
#lines = pipeIt("readelf --hex-dump='$section' --wide '$exe'");
print #lines;
}
sub pipeIt {
my($cmd) = #_;
my $fh;
open ($fh,"$cmd |") or die "Could not open pipe from command '$cmd': $!\n";
my #lines = <$fh>;
close $fh or die "Could not close pipe to command '$cmd': $!\n";
return #lines;
}
Now you can run for example, on machine 1:
./checkexe.pl /usr/bin/curl > curl_machine1
And on machine 2:
./checkexe.pl /usr/bin/curl > curl_machine2
After having copypasted, SFTP-ed or NSF-ed (you don't use FTP, do you?) the files into the same filetree, compare the files:
diff --side-by-side --width=200 curl_machine1 curl_machine2 | less
In my case, differences exist in section ".gnu.conflict", ".gnu.liblist", ".got.plt" and ".dynbss", which might be ok for a "prelink" intervention, but in the code section, ".text", which would be a Bad Sign.
To follow up, here is what I came up with finally:
Instead of comparing the final executables & shared objects, we compared the .o files output before linking. We assumed that the linking process was sufficiently reproducible that this would be fine.
It works in some of our cases, where we have two builds were we've made some small change that shouldn't effect the final code (Code pretty-printer) but doesn't help us if we do not have the build intermediary output.
You can compare the contents of RO and RW initialized sections by generating a binary file from the ELF file.
objcopy <elf_file> -O binary <binary_file>
Use the generated binary files to compare if they are identical, using diff, for example.
In my opinion, this is enough to grantee you are generating the same executable.
A few years back I had to do the same thing. We had to prove that we could rebuild the executable from source when given only a revision number, revision control repository, build tools, and build configuration. Note: If any of these change you may see a difference.
I remember there is some timestamps in the executable. The trick is to realize that the file is not just a bunch of bytes, that can not be interpreted. The file has sections, most will not change, but there will be a section for time of build (or some such thing).
I don't remember all the details, but the commands you will need are { objcopy, objdump, nm }, I think objdump would be the first to try.
Hope this helps.

Resources