What does "DSO" stand for as in, "kernel version 460.39.0 does not match DSO version 460.56.0"? - nvidia

Searching for "DSO" in the context of "nvidia" yields "Days Sales Outstanding". Looking at a comprehensive acronym list for "DSO" yields a likely candidate: "Data Source Object" but it is for some sort of Microsoft standard that would seem to be inapplicable to Linux platforms.

From Webopedia:
A dynamic shared object (DSO) is an object file that is intended to be used simultaneously (or shared by) multiple applications while they’re executing. A DSO can be used in place of archive libraries and will minimize overall memory usage because code is shared. Two executables that use the same DSO and that run simultaneously have only one copy of the shared components loaded into memory.
[Source: Adapted from SGI]

Related

With bazel how do I be/make sure objects taken from cache have been build for the right system/libraries?

I got some strange glibc-related linker errors for builds with distributed build cache configured on build nodes running different Linux distributions.
Now I somehow suspect build artifacts from those machines with different glibc versions getting mixed up, but I don't know how to investigate this.
How do I find out what Bazel takes into account when building the hash for a certain build artifact?
I know I can explicitly set environment variables which then will affect the hash. But how can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?
And how do I check/compare what's been taken into account?
This is a complex topic and a multi-facet question. I am going to answer in the following order:
How do I check/compare what's been taken into account?
How to investigate against which glibc a build linked?
How can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?
How do I check/compare what's been taken into account?
To answer this, you should look into the the execution look, specifically you can read up on https://bazel.build/remote/cache-remote#compare-logs. The *.json execution log should contain everything you need to know (granted, it might be a bit verbose) and is a little easier to process with shell-magic/your editor.
How to investigate against which glibc a build linked?
From the execution log, you can get all the required hashes to retrieve cached artifacts/binaries from your remote cache. Given these files, you should be able to use standard tools to get to the glibc version (ldd -r -v binary | grep GLIBC).
How can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?
This depends on the way you have setup for compilation toolchain. The best case would be a fully hermetic compilation toolchain, where all necessary files are declared using attributes like https://bazel.build/reference/be/c-cpp#cc_toolchain.compiler_files.
But this would also mean to lock-down the compiler sysroot. This should include all libraries you are linking against if you want full hermeticity. If you want to use some system libraries, you need to tell bazel where to find them and to factor in their hash: https://stackoverflow.com/a/43419786/20546409 or https://www.stevenengelhardt.com/2021/09/22/practical-bazel-depending-on-a-system-provided-c-cpp-library/
If you use the auto-detected compiler toolchain, some tricks are used to lock-down the sysroot paths, but expect some non-hermiticity. https://github.com/limdor/bazel-examples/tree/master/linux_toolchain is a nice write-up how to move from the auto-detected toolchain to something more hermetic.
The hack
Of course, you can hack around this. Note, this is inherently a bad idea:
create a script that inspects the system, determines everything important like the glibc version, maybe the linux distribution (flavor)
creates a string describing this variation and hash-summing it
use that as the instance key/name for your remote cache

What is a snapshot in Dart or compilation (not Flutter)?

I'm reading about dart compile, and it has a few option: Executable, AOT snapshot, JIT snapshot and Kernel snapshot and JavaScript.
What is the difference between an executable and a snapshot? Is it purely the fact executables contain the Dart runtime/ VM, whereas a snapshot doesn't. Why is it called a snapshot?
2 highly related question (which I found after posting this question) are:
What is the difference between Dart's snapshots and Java bytecode?
What is the snapshot concept in dart?
This question is different to Dart: Snapshots vs AOT, since is asking the difference between a Snapshot and AOT, but actually AOT files are snapshots. It also primarily asked about the differences between Snapshot options (AOT, Kernel, JIT).
An executable (created by dart compile exe) is a combination of an AOT snapshot, and the Dart runtime. The Dart runtime is needed to run any Dart code, as it performs critical tasks like managing memory (including garbage collection) and performing runtime type checks.
The three kinds of snapshots (AOT, kernel, and JIT) all contain just the compiled source code. They all need a runtime to be run (typically you just use dart run <snapshot>).
The snapshots should perhaps have been named differently. Would 'module' have been easier to understand?
When Dart code is compiled, it is compiled into intermediary byte code rather than native code. This means that in order for the code to be executed in any arbitrary environment, the compiler needs to include the Dart virtual machine inside the executable along with the user code as well as any portions of the core library the program requires. Usually, this is fine because the Dart runtime is actually quite compact, but the obvious downside to this is that the executable will be larger and start-up time will be longer as the runtime needs to be extracted and warmed up before the user code can be run.
If, however, you are compiling code for an environment where you can guarantee that a Dart runtime will be present (such as a server machine or an IoT device), you can omit the runtime from the compiled program by building a snapshot file rather than an executable. This results in a smaller compiled file with a faster start-up time, although it requires a command to execute and as such is less convenient. You can learn more about snapshots and how to build and execute them on the Dart GitHib wiki page on snapshots.
There are three different kinds of snapshots: Kernel snapshots, which contain only AST information rather than compiled byte code making them usable by Dart runtimes on any supported architecture (portable but slow); JIT snapshots, which contain just the parts of the program necessary for startup and leaves the rest to be interpreted at runtime (fastest startup but slower execution); and AOT snapshots, which fully compiles the entire program into byte code (slower startup but fastest execution).
As for why it's called a "snapshot", I couldn't say. If I had to guess, that's because it's a "snapshot" of the program in its compiled state but without the instructions necessary to run it as a standalone executable.
(The above is based on my quick research on the subject and may be missing some key details. If a member of the Dart team happens upon this question, they will probably be able to offer a more detailed and technical explanation.)

How do compilers create the executable file at the end of the compilation process?

I've been reading on the compilation process, I understand some of the earlier concepts like parsing but I stop short of understanding how the executable file is created at the end.
In the examples I've seen around the "compiler" takes input in the form of a lang defined by BNF and then upon parsing it outputs assembly.
Is the executable file literally just that assembly in binary form? I feel like this can't be the case given that there are applications for making executables from assembly?
If this isn't answerable (ie it's too complex for the stack overflow format) I'd totally be happy with links/books so I can educate myself.
The compiler (or more specifically, the linker) creates the executable.
The format of the file generally vary depending on the operating system.
There are currently two main formats ELF and COFF
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
http://en.wikipedia.org/wiki/COFF
If you understand the concept of a structure, this is the same, only within a file. Each file has a first structure called a header, and from there you can access the other structures as required.
In most cases, only the resulting binary code is saved in these files, although you often find debug information. Some formats could save the source along the code, but now a day it only saves the necessary references to the source.
With dynamic linking, you also find symbol tables that include the actual symbol name. Otherwise, only relocation tables would be required.
Under the Amiga we also had the possibility to define code in a "segment". Only one segment could be loaded at a time. Once you were done with the segment, you could unload it and load another. Yet, in the end the concepts were similar. Structures in a file.
Microsoft offers a PDF about the COFF format. I could not find it on their website just now, but it looks like others have it. ELF has many links in the Wikipedia page so you should be able to find a PDF to get started.
Not all but some (gcc, etc) compilers go from the high level language to assembly language then spawn the assembler. The assembler reads the assembly language and generates machine code and generates an object file which as you have guessed contains more than just the machine code bits. If you think of it for second you may realize that a variable or function that is defined in another source file which means its code lives in another object file, until link time one object doesnt know how to get at that external function, so 1) the machine code is not finished, patching up external addresses is not done until link time 2) there needs to be some information in the object file that defines what public items are in this object file and what external items are missing, names of functions for example which are obviously not embedded in the machine code. So the objects have machine code in various states of completion as well as other data needed by the linker. the linker then...links...the objects together into one program with everything resolved, it basically completes all the machine code and puts the fragments of machine code (in separate objects) into one place. Then it has to save all that on the disk in some format and typically that format is not just raw machine code. It has extra stuff in the file, starting with a header and the a way to define each binary blob and where it needs to live in memory before executing. When you run a program on the command line of your operating system or double clicking or whatever in a file manager gui, the operating system knows how to read that file format, extract the blobs of binary, place those blobs of binary in ram defined by this file format and then start executing at the place defined by this file format.
aout, elf, coff, intel hex, motorola s-record are all popular formats as well as raw binary which some toolchains can produce. the gnu tools will default to one (coff or elf or exe or aout) and then objcopy is used to convert from one to another or at least the default one to the others and there is help to show what your possible choices are. then simply google those or wikipedia them and find the definitions of the file formats. Intel hex or motorola srecord are good ones to start with at wikipedia then elf perhaps.
if you want to produce native executable file you have 2 options. you can assembly the binary form yourself or you can transalte your program to another language and use its compiler to producte the executable

How to obtain a file with the content of all include files explicitly included?

George "Mirage" Bakhtadze, the author of Cast II engine, has wrote about an include-based technique which can be used to create generic containers and algorithms. The source is avaiable from the repo at Github. For me, his include-based technique is very interesting and useful, because it can be used for older Delphi and it is compatible between Delphi and Free Pascal (and non-Windows OS ready).
It would be more useful for me if the _GenVector written in "gen_coll_vector.inc" has Sorted & Duplicates properties and related behaviors (behaving the same way as in TStringList).
However, it is less obvious for me to insert the code when there are many include directives (I wonder how George managed this in the first place). Therefore, I wonder whether it is possible to obtain a sample file with all include files explicitly included ? It might be more straightforward for me to start from there.
I mean that there is certain built-in pre-processor that works before the actual compiling and whether there is a way to keep these intermediate files ?
Delphi does not use a pre-processor. It is (and always has been, since Turbo Pascal days) a single-pass compiler. There is no intermediate step. When you {$I} to include files, they are inserted in place in memory during the compilation process. Therefore, there is no "intermediate file" that can be kept.

What is the difference between Dart's snapshots and Java bytecode?

I've been reading up on Dart snapshots, and they're frequently compared to Smalltalk images. But to me, they sound alot like Java bytecode.
For example:
"A Dart snapshot is just a binary serialization of the token stream, generated from parsing the code. A snapshot is not a "snapshot of a running program", it's generated before the tokens are turned into machine code. So, no program state is captured in a snapshot."
Plus they're cross-platform:
"The snapshot format itself is cross-platform meaning that it works between 32-bit, 64-bit machines and so forth. The format has been made so that it's quick to read into memory with a emphasis on minimizing extra work like pointer fixups."
Am I getting it wrong somewhere?
Sources:
What is the snapshot concept in dart?
http://www.infoq.com/articles/google-dart
Snapshots contain the VM data structures representing the loaded script in a serialized form similar to Smalltalk images. To get a better understanding of what is contained in the snapshot, we should take a look at what the Dart VM creates as it reads the script:
Library objects, referring to all top-level structures such as classes or top-level methods and variables.
Class objects, containing all objects describing all methods and fields.
Script and Tokenstream objects representing all loaded source code.
String objects for all used identifiers and string constants in the source code.
This object graph is serialized into a file when generating a snapshot using a format that is architecture agnostic. This allows the Dart VM to deserialize this snapshot file on 32-bit or 64-bit machines and recreate all of the necessary internal VM data structures much quicker than reading the original scripts from a set of files (see John's answer).
To clarify John's answer a bit. The Dart VM does not parse ALL of the source code when generating the snapshot. It only needs to parse the top level of the sources to be able to extract class, method and field definitions as these are represented in the serialized graph. In particular method bodies are not parsed and as it is customary for a scripting language errors will be only reported once control reaches the particular method.
The purpose of Java bytecodes is entirely different as Ladicek points out. You could create a snapshot of the VM data structures in a JVM once the bytecodes are loaded to get a similar effect.
In short: The snapshot contains an efficient representation of all the data structures allocated on the Dart VM heap which are needed to start executing the script.
A Dart snapshot is just a roll up of all source files that has been parsed ahead of time. A Dart snapshot is not similar to a Java bytecode file. A Java bytecode file consists of JVM machine code and is the product of a compile, link, and assembly (into JVM machine code) phase.
A Dart snapshot is a binary file of a Dart program and it's import/part source file dependencies that has been parsed into an abstract syntax tree and rolled into a single file. Executing a Dart snapshot allows for faster startup times because:
Only 1 file must be loaded from disk or off network. In contrast, a non-snapshot Dart program must be fetched, then any imported files must be fetched, and so on. Before each subsequent source file request can be made the previously fetched source file must be parsed to find out if it's referencing more source files. Imagine if your Dart program imported 10 libraries which consisted of 10 source files each. That means 110 I/O requests and parses that are done one after another.
The parsing has been done ahead of time. It's already known to be syntactically correct and ready to be compiled by the Dart VM.
I will just point out that as of Dart 2+, there are several distinctive concepts when it comes to Snapshots:
Kernel Snapshot
JIT Snapshot
AOT Snapshot
You can read more here.

Resources