Can the same code and same compiler produce a different binary on different machines? - nix

The idea of Nixos binary caches led me to considering this question.
In nix, every compiled binary is associated with a hash key which is obtained from hashing all the dependencies and build script, i.e. a 'derivation' in nix-speak. That is my understanding, anyway.
But couldn't the same derivation lead to different binaries, when compiled on different machines?
If machine A's processor has a slightly different instruction set than machine B's processor, and the compiler took this different instruction set into account, wouldn't the binary produced by compiling the derivation on machine A be distinguishable from the binary produced by compiling the derivation on machine B? If so, then couldn't different binaries could have the same derivation and thus the same nix hash?
Does the same derivation built on machines with different instruction sets always produce the same binary?

This depends on the compiler implementation and options passed to it. For example, GCC by default does not seems to pay attention to the specifics of the current processor, unless you specify -march=native or -mtune=native.
So yes, if you use flags like these or a compiler with default behavior like these flags, you will get a different output on a machine with a different model of cpu.
A build can be non-reproducible for other reasons as well, such as inappropriate use of clock values or random values or even counters that are accessed in non-deterministically interleaved patterns by threads.
Nix does provide a sandbox that removes some sources of entropy; primarily the supposedly unrelated software that may be present on a machine. It does not remove all of these sources for practical reasons.
For these reasons, reproducibility will have to be a consideration, even when packaging with Nix; not something that is solved completely by it.
I'll quote the menu "Achieve deterministic builds
" from https://reproducible-builds.org/docs/ and annotate it with the effect of Nix to the best of my knowledge. Don't quote me on this.
SOURCE_DATE_EPOCH: solved; set by Nixpkgs
Deterministic build systems: partially solved; Nixpkgs may include patches
Volatile inputs can disappear: solvable with Nix if you upload sources to the (binary) cache. Hercules CI does this.
Stable order for inputs: mostly solved. Nix language preserves source order and sorts attributes.
Value initialization: low-level problem not solved by Nix
Version information: not solved; clock is accessible in sandbox
Timestamps: same as above
Timezones: solved by sandbox
Locales: solved by sandbox
Archive metadata: not solved
Stable order for outputs: use of randomness not solvable by sandbox
Randomness: same
Build path: partially; linux uses /build; macOS may differ depending on installation method
System images: broad issue taking elements from previous items
JVM: same

Related

With bazel how do I be/make sure objects taken from cache have been build for the right system/libraries?

I got some strange glibc-related linker errors for builds with distributed build cache configured on build nodes running different Linux distributions.
Now I somehow suspect build artifacts from those machines with different glibc versions getting mixed up, but I don't know how to investigate this.
How do I find out what Bazel takes into account when building the hash for a certain build artifact?
I know I can explicitly set environment variables which then will affect the hash. But how can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?
And how do I check/compare what's been taken into account?
This is a complex topic and a multi-facet question. I am going to answer in the following order:
How do I check/compare what's been taken into account?
How to investigate against which glibc a build linked?
How can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?
How do I check/compare what's been taken into account?
To answer this, you should look into the the execution look, specifically you can read up on https://bazel.build/remote/cache-remote#compare-logs. The *.json execution log should contain everything you need to know (granted, it might be a bit verbose) and is a little easier to process with shell-magic/your editor.
How to investigate against which glibc a build linked?
From the execution log, you can get all the required hashes to retrieve cached artifacts/binaries from your remote cache. Given these files, you should be able to use standard tools to get to the glibc version (ldd -r -v binary | grep GLIBC).
How can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?
This depends on the way you have setup for compilation toolchain. The best case would be a fully hermetic compilation toolchain, where all necessary files are declared using attributes like https://bazel.build/reference/be/c-cpp#cc_toolchain.compiler_files.
But this would also mean to lock-down the compiler sysroot. This should include all libraries you are linking against if you want full hermeticity. If you want to use some system libraries, you need to tell bazel where to find them and to factor in their hash: https://stackoverflow.com/a/43419786/20546409 or https://www.stevenengelhardt.com/2021/09/22/practical-bazel-depending-on-a-system-provided-c-cpp-library/
If you use the auto-detected compiler toolchain, some tricks are used to lock-down the sysroot paths, but expect some non-hermiticity. https://github.com/limdor/bazel-examples/tree/master/linux_toolchain is a nice write-up how to move from the auto-detected toolchain to something more hermetic.
The hack
Of course, you can hack around this. Note, this is inherently a bad idea:
create a script that inspects the system, determines everything important like the glibc version, maybe the linux distribution (flavor)
creates a string describing this variation and hash-summing it
use that as the instance key/name for your remote cache

linux hdf5 build exceptionally noisy -- reasonable?

I recently downloaded the source for HDF5 (hdf5-1.12.1) and am building it on Ubuntu 20.04. The number of compiler warnings is remarkable ---the messages reference -Wnull-dereference, -stringop-truncation, -Wcast-qual, -Wstrict-overflow ....
It's possible my system is configured by default to have more warnings than usual, but even so, I wouldn't expect something with this wide a distribution to fail so many warning checks. Any thoughts, beyond "don't run with strict checking"? If anyone is on the source side of HDF5, is this something you are aware of?

Handling complex and large dependencies

Problem
I've been developing a game in C++ in my spare time and I've opted to use Bazel as my build tool since I have never had a ton of luck (or fun) working with make or cmake. I also have dependencies in other languages (python for some of the high level scripting). I'm using glfw for basic window handling and high level graphics support and that works well enough but now comes the problem. I'm uncertain on how I should handle dependencies like glfw in a Bazel world.
For some of my dependencies (like gtest and fruit) I can just reference them in my WORKSPACE file and Bazel handles them automagically but glfw hasn't adopted Bazel. So all of this leads me to ask, what should I do about dependencies that don't use Bazel inside a Bazel project?
Current approach
For many of the simpler dependencies I have, I simply created a new_git_repository entry in my WORKSPACE file and created a BUILD file for the library. This works great until you get to really complicated libraries like glfw that have a number of dependencies on their own.
When building glfw for a Linux machine running X11 you now have a dependency on X11 which would mean adding X11 to my Bazel setup. X11 Comes with its own set of dependencies (the X11 libraries like X11Cursor) and so on.
glfw also tries to provide basic joystick support which is provided by default in Linux which is great! Except that this is provided by the kernel which means that the kernel is also a dependency of my project. Now I shouldn't need anything more than the kernel headers this still seems like a lot to bring in.
Alternative Options
The reason I took the approach I've taken so far is to make the dependencies required to spin up a machine that can successfully build my game very minimal. In theory they just need a C/C++ compiler, Java 8, and Bazel and they're off to the races. This is great since it also means I can create a Docker container that has Bazel installed and do CI/CD really easily.
I could sacrifice this ease and just say that you need to have libraries like glfw installed before attempting to compile the game but that brings the whole which version is installed and how is it all configured problem back up that Bazel is supposed to help solve.
Surely there is a simpler solution and I'm overthinking this?
If the glfw project has no BUILD files, then you have the following options:
Build glfw inside a genrule.
If glfw supports some other build system like make, you could create a genrule that runs the tool. This approach has obvious drawbacks, like the not-to-be-underestimated impracticality of having to declare all inputs of that genrule, but it'd be the simplest way of Bazel'izing glfw.
Pre-build glfw.o and check it into your source tree.
You can create a cc_library rule for it, and put the .o file in the srcs. Even though this solution is the least flexible of all because you not only restrict the target platform to whatever the .o was built for, but also make it harder to reproduce the whole build, the benefits are sometimes worth the costs.
I view this approach as a last resort. Even in Bazel's own source code there's one cc_library.srcs that includes a raw object file, because it was worth it, as the commit message of 92caf38 explains.
Require that glfw be installed.
You already considered this option. Some people may prefer this to the other approaches.

How to re-use existing CMake variables with new generator

I need to build OpenCV for both 32-bit and 64-bit in VS2015.
I'm aware that I need a separate build tree for each generator.
OpenCV's CMake configuration has approximately 300 user-configurable variables, which I have finally got set to my satisfaction. Now I want to use the exact same set of decisions to build the 64-bit version.
Is there a way to transfer the variable values that represent my decisions to the new build tree? (Other than opening two CMake-GUIs side by side and checking that all ~300 values correspond.)
BTW, if the generator is changed, CMakeCache.txt must be deleted, according to the CMake mailing list [ http://cmake.3232098.n2.nabble.com/Changing-the-the-current-generator-in-CMake-GUI-td7587876.html ]. Manually editing it is very risky and will likely lead to undefined behaviour.
Thanks
Turning my comment into an answer
You can use a partial CMakeCache.txt in the new directory (CMake will just pre-load the values that are there and reevaluate the rest).
So you can use a grep like approach and do
findstr "OpenCV_" CMakeCache.txt > \My\New\Path\CMakeCache.txt
Just tested it and seems to work as expected.
Reference
What are good grep tools for Windows?

how does nix know what binary a machine needs?

They write it in the nix documentation:
However, Nix can automatically skip building from source and instead use a binary cache, a web server that provides pre-built binaries. For instance, when asked to build /nix/store/b6gvzjyb2pg0…-firefox-33.1 from source, Nix would first check if the file https://cache.nixos.org/b6gvzjyb2pg0….narinfo exists, and if so, fetch the pre-built binary referenced from there; otherwise, it would fall back to building from source.
I wonder how does nix know what kind of binary my machine needs? Say if I ran nix on arm or intel or amd or you name it. How are the binaries from this cache selected for the right architecture?
nix is a purely functional package manager. As such it takes everything a package needs as input for a function, including the so called stdEnv, which contains information about the architecture.
Using these inputs nix generates a hash. This is what you see in front of the package name.
So the architecture needed is encoded in that hash, which is why when downloading from a binary cache only the hash needs to be checked.

Resources