Nix: How to override stdenv.cc with overlay globally? - nix

I would like to override stdenv.cc to a specific GCC version (not necessarily in nixpkgs) globally using an overlay (i.e. without changing nixpkgs). Is there a way to do that?
An overlay like this causes an infinite recursion (since package gcc49 has stdenv as input):
self: super:
{
stdenv = super.overrideCC super.stdenv super.gcc49;
}
What is the correct way to change the stdenv.cc globally?
Setting manually stdenv = ... in import nixpkgs is not feasible, since I would like to replace the cc not only when building/using nix expressions but also in e.g. nix-shell -p package.
Can someone help me with this?

(import <nixpkgs> { overlays = [(self: super: { gcc = self.gcc10; })]; }).stdenv.cc
This returns a derivation for gcc-10.1.0, so it could work.

Related

What's the mechanism behind `(import nixpkgs) { ... }` in Nix flakes?

I'm working to understand as much as I can about Nix flakes. I'm puzzled by the fact that a nixpkgs input is usually imported, and the imported value is called as a function. How does the result of import nixpkgs map to code in the nixpkgs flake?
It looks like this use of nixpkgs is common practice in flakes:
# flake.nix
{
inputs = {
flake-utils.url = "github:numtide/flake-utils";
nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
/* ... */
};
outputs = { self, flake-utils, nixpkgs /*, ... */ }:
flake-utils.lib.eachDefaultSystem (system:
let
pkgs = (import nixpkgs) {
inherit system;
};
in
{
/* ... */
}
);
}
My understanding is that the nixpkgs value in this flake's outputs function is the attribute set produced by the nixpkgs flake. I understand that flake output is a derivation, and a derivation can be imported. But how does the imported value become a function? I expected it to be an attribute set.
I see that the nixpkgs flake includes a lib output. Is there some mechanism where an attribute with a lib attribute path is callable? I have been looking for information on this, but I have not found anything.
If (import nixpkgs) {} is effectively calling that lib attribute, then how does importing differ from calling nixpkgs.lib directly? From what I've read importing a derivation has some effect on either forcing evaluating, or not forcing evaluation of something. I don't understand the details yet.
Flakes behave like paths because of their .outPath attribute, which contains the flake source path that is automatically added to all flakes by Nix.
This lets import load the file as a Nix value.
Specifically, because the path is a directory, it loads the default.nix in it, which then loads impure.nix, which contains a function.
Note that the way flakes are evaluated (without --impure), the evaluation is always pure, despite the "impure.nix" file name above.
The .outPath attribute got its name because it is also responsible for making derivations coercible to strings.

Nix: how to change stdenv in nixpkgs.mkShell

I'd like to override the stdenv for mkShell to use gcc10Stdenv. I've looked at https://nixos.wiki/wiki/Using_Clang_instead_of_GCC, which provides instructions for overriding stdenv, but it doesn't describe how to do it for mkShell when just making a shell without reference to any specific package (only for "Using Nix CLI on existing packages").
My question is whether it's possible to override stdenv for mkShell without an existing package? And if so, how?
Try:
pkgs.mkShell.override {stdenv = pkgs.gcc10Stdenv} {
inputsFrom = ...;
...
}
This is the standard way to alter the inputs to packages (which are just functions) in nixpkgs. It should work in this case.
Alternately, you could just copy the mkShell implementation into ./mkShell.nix and import it, as Chris suggested.
let mkShell = import ./mkShell.nix;
in mkShell {
lib = pkgs.lib;
stdenv = pkgs.gcc10Stdenv;
} {
inputsFrom = ...;
}
This is just a regular function, so we're calling it with two parameters.

Nix overlays and override pattern

I have trouble understanding Nix overlays and the override pattern. What I want to do is add something to "patches" of gdb without copy/pasting
the whole derivation.
From Nix Pills I kind of see that override just mimics OOP, in reality it is just another attribute of the set. But how does it work then? Override is a function from the original attribute set to a transformed one that again has a predefined override function?
And as Nix is a functional language you also don't have variables only bindings which you can shadow in a different scope. But that still doesn't explain how overlays achieve their "magic".
Through ~/.config/nixpkgs I have configured a test overlay approximately like this:
self: super:
{
test1 = super.gdb // { name = "test1"; buildInputs = [ super.curl ]; };
test2 = super.gdb // { name = "test2"; buildInputs = [ super.coreutils ]; };
test3 = super.gdb.override { pythonSupport = false; };
};
And I get:
nix-repl> "${test1}"
"/nix/store/ib55xzrp60fmbf5dcswxy6v8hjjl0s34-gdb-8.3"
nix-repl> "${test2}"
"/nix/store/ib55xzrp60fmbf5dcswxy6v8hjjl0s34-gdb-8.3"
nix-repl> "${test3}"
"/nix/store/vqlrphs3a2jfw69v8kwk60vhdsadv3k5-gdb-8.3"
But then
$ nix-env -iA nixpkgs.test1
replacing old 'test1'
installing 'test1'
Can you explain me those results please? Am I correct that override can just alter the "defined interface" - that is all parameters of the function and as "patches" isn't a parameter of gdb I won't be able to change it? What is the best alternative then?
I will write an answer in case anyone else stumbles on this.
Edit 21.8.2019:
what I actually wanted is described in https://nixos.org/nixpkgs/manual/#sec-overrides
overrideDerivation and overrideAttrs
overrideDerivation is basically "derivation (drv.drvAttrs // (f drv))" and overrideAttrs is defined as part of mkDerivation in https://github.com/NixOS/nixpkgs/blob/master/pkgs/stdenv/generic/make-derivation.nix
And my code then looks like:
gdb = super.gdb.overrideAttrs (oldAttrs: rec {
patches = oldAttrs.patches ++ [
(super.fetchpatch {
name = "...";
url = "...";
sha256 = "...";
})
];
});
The question title is misleading and comes from my fundamental misunderstanding of derivations. Overlays work exactly as advertised. And they are probably also not that magic. Just some recursion where endresult is result of previous step // output of last overlay function.
What is the purpose of nix-instantiate? What is a store-derivation?
Correct me please wherever I am wrong.
But basically when you evaluate Nix code the "derivation function" turns a descriptive attribute set (name, system, builder) into an "actual derivation". That "actual derivation" is again an attribute set, but the trick is that it is backed by a .drv file in the store. So in some sense derivation has side-effects. The drv encodes how the building is supposed to take place and what dependencies are required. The hash of this file also determines the directory name for the artefacts (despite nothing was built yet). So implicitly the name in the nix store also depends on all build inputs.
When I was creating a new derivation sort of like Frankenstein based on tying together existing derivations all I did was create multiple references to the same .drv file. As if I was copying a pointer with the result of getting two pointers pointing to the same value on the heap. I was able to change some metadata but in the end the build procedure was still the same. Infact as Nix is pure I bet there is no way to even write to the filesystem (to change the .drv file) - except again with something that wraps the derivation function.
Override on the other hand allows you to create a "new instance". Due to "inputs pattern" every package in Nix is a function from a dependencies attribute set to the actual code that in the end invokes the "derivation function". With override you are able to call that function again which makes "derivation function" get different parameters.

CoreNLP : provide pos tags

I have text that is already tokenized, sentence-split, and POS-tagged.
I would like to use CoreNLP to additionally annotate lemmas (lemma), named entities (ner), contituency and dependency parse (parse), and coreferences (dcoref).
Is there a combination of commandline options and option file specifications that makes this possible from the command line?
According to this question, I can ask the parser to view whitespace as delimiting tokens, and newlines as delimiting sentences by adding this to my properties file:
tokenize.whitespace = true
ssplit.eolonly = true
This works well, so all that remains is to specify to CoreNLP that I would like to provide POS tags too.
When using the Stanford Parser standing alone, it seems to be possible to have it use existing POS tags, but copying that syntax to the invocation of CoreNLP doesn't seem to work. For example, this does not work:
java -cp *:./* -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -props my-properties-file -outputFormat xml -outputDirectory my-output-dir -sentences newline -tokenized -tagSeparator / -tokenizerFactory edu.stanford.nlp.process.WhitespaceTokenizer -tokenizerMethod newCoreLabelTokenizerFactory -file my-annotated-text.txt
While this question covers programmatic invocation, I'm invoking CoreNLP form the commandline as part of a larger system, so I'm really asking whether this is possible to achieve this with commandline options.
I don't think this is possible with command line options.
If you want you can make a custom annotator and include it in your pipeline you could go that route.
Here is some sample code:
package edu.stanford.nlp.pipeline;
import edu.stanford.nlp.util.logging.Redwood;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.util.concurrent.MulticoreWrapper;
import edu.stanford.nlp.util.concurrent.ThreadsafeProcessor;
import java.util.*;
public class ProvidedPOSTaggerAnnotator {
public String tagSeparator;
public ProvidedPOSTaggerAnnotator(String annotatorName, Properties props) {
tagSeparator = props.getProperty(annotatorName + ".tagSeparator", "_");
}
public void annotate(Annotation annotation) {
for (CoreLabel token : annotation.get(CoreAnnotations.TokensAnnotation.class)) {
int tagSeparatorSplitLength = token.word().split(tagSeparator).length;
String posTag = token.word().split(tagSeparator)[tagSeparatorSplitLength-1];
String[] wordParts = Arrays.copyOfRange(token.word().split(tagSeparator), 0, tagSeparatorSplitLength-1);
String tokenString = String.join(tagSeparator, wordParts);
// set the word with the POS tag removed
token.set(CoreAnnotations.TextAnnotation.class, tokenString);
// set the POS
token.set(CoreAnnotations.PartOfSpeechAnnotation.class, posTag);
}
}
}
This should work if you provide your token with POS tokens separated by "_". You can change it with the forcedpos.tagSeparator property.
If you set customAnnotator.forcedpos = edu.stanford.nlp.pipeline.ProvidedPOSTaggerAnnotator
to the property file, include the above class in your CLASSPATH, and then include "forcedpos" in your list of annotators after "tokenize", you should be able to pass in your own pos tags.
I may clean this up some more and actually include it in future releases for people!
I have not had time to actually test this code out, if you try it out and find errors please let me know and I'll fix it!

clang set metadata to allocainst

First I'm real noob with clang/llvm.
BUT I'm trying to modify clang for some purpose.
I'd like to add metadata whenever an Alloca instruction is emitted in IR code for a variable which has some annotation.
I noticed this function in CGDecl.cpp:
CodeGenFunction::AutoVarEmission
CodeGenFunction::EmitAutoVarAlloca(const VarDecl &D)
which contains the nice line in the end:
if (D.hasAttr<AnnotateAttr>())
EmitVarAnnotations(&D, emission.Address);
this looks like the condition I need, so I modified it to
if (D.hasAttr<AnnotateAttr>()) {
AnnotateAttr* attr = D.getAttr<AnnotateAttr>();
if(attr->getAnnotation() == "_my_custom_annotation_") {
// set metadata...
}
EmitVarAnnotations(&D, emission.Address);
}
my Issue is I don't know how to add metadata at this point, because I can't find a way to access the instruction
In CGExp.cpp, however, I see where the AllocaInstr is built, but at this point I don't have access to the VarDecl, so I don't know if the annotation is there.
I tried anyway to add metadata (unconditionaly) in this function:
llvm::AllocaInst *CodeGenFunction::CreateIRTemp(QualType Ty,
const Twine &Name) {
llvm::AllocaInst *Alloc = CreateTempAlloca(ConvertType(Ty), Name);
// FIXME: Should we prefer the preferred type alignment here?
CharUnits Align = getContext().getTypeAlignInChars(Ty);
// how to put it conditionaly on the annotation?
llvm::MDNode* node = getRangeForLoadFromType(Ty);
Alloc->setMetadata("_my_custom_metadata", node);
Alloc->setAlignment(Align.getQuantity());
return Alloc;
}
by adding the setMetadata call.
However I don't see the metadata attached in the generated IR.
I compile with clang -g -S -target i686-pc-win32 -emit-llvm main.cpp -o output.ll
Maybe I'm totally wrong, but the thing is I don't master the code generation in clang :)
PS: here is the code I compile
int main() {
__attribute__ ((annotate("_my_custom_annotation_"))) float a[12];
}
Any help is appreciated!
Thanks
if (D.hasAttr<AnnotateAttr>()) {
AnnotateAttr* attr = D.getAttr<AnnotateAttr>();
if(attr->getAnnotation() == "_my_custom_annotation_") {
// set metadata...
}
EmitVarAnnotations(&D, emission.Address);
}
Looks like you are at the right place. In fact all EmitAutoVarAlloca has special handling for different kinds of variable declarations, but all end with the "address" (i.e., the instruction) in emission.Address.
So what you want to do is:
if (D.hasAttr<AnnotateAttr>()) {
AnnotateAttr* attr = D.getAttr<AnnotateAttr>();
if(attr->getAnnotation() == "_my_custom_annotation_") {
emission.Address->setMetadata(...); // <--- your MDNode goes here
}
EmitVarAnnotations(&D, emission.Address);
}
However, I would recommend a special attribute for adding metadata to instructions. If you read further through the code you will see that the AnnotateAttr has a special meaning and your emitted IR may not be as expected. You can add a custom attribute in the Attr.td file. I suggest a copy of the Annotate entry. Then you can follow the AnnotateAttr through the code and add code for your Attribute at the right places to get it recognized and handled by clang.

Resources