What is a Nix expression in regard to Nix package management? - nix

Even after reading the Nix manuals it is still confusing what Nix expressions really are. Sometimes they are referred to as derivations, but store derivations also mean something else.

In Nix, a Nix expression is just a general term for any type of value that you can write in the Nix language. A Nix expression can be a set, a list, a number, a string, a function, a name, a arithmetic operation, a function call, and much more.
Nix expressions can contain other Nix expressions: for example, the expression 1 + 2 contains two expressions inside it: 1 and 2.
People often like to write complicated Nix expressions that represent how to build a piece of software. Those expressions are really just sets with some special attributes. The Nix software can evaluate such an expression and turn it into a .drv file (a very simple, compact way of describing how to build some software), which can then be built.
You can do lots of things with the Nix language and Nix expressions that don't involve derivations or building software. The nix eval command lets you evaluate a Nix expression. Run nix eval --help to see its help screen, or run these commands to evaluate some simple expressions:
nix eval '(1 + 2)' # gives 3
nix eval '({ a = 1; b = 2; }.a)' # gives 1
(For some reason, this command seems to require parentheses to be put around most of the Nix expressions it evaluates, but that just seems like a bug or odd design choice, and surrounding parentheses are not an essential part of every Nix expression.)

A Nix expression is a set of instructions
describing how to build a software component
(package, project, application, etc.) using the Nix
purely functional language.
To quote Gabriel Gonzalez:
"You can think of a derivation as a
language-agnostic recipe for how to build something
(such as a Haskell package)."
Nix expressions are also commonly called
derivations (as in Nix derivation
expressions), but
*------------------------------------------------------*
| |
| STORE DERIVATION =/= NIX EXPRESSION |
| |
*------------------------------------------------------*
| |
| NIX EXPRESSION == function |
| |
| ( Describes how to build a component. That is, how ) |
| ( to compose its input parameters, which can be ) |
| ( other components as well. ) |
| |
| STORE DERIVATION == function application |
| |
| ( Call a Nix expression with concrete arguments. ) |
| ( Corollary: a single Nix expression can produce ) |
| ( different derivations depending on the inputs. ) |
| |
*------------------------------------------------------*
The purpose of Nix expressions is to produce a
store derivation
that can be built into a component (executable,
library, etc.).
For context:
Image taken from Eelco Dolstra's PhD thesis, section "2.4 Store derivations".
Extra
Normal form of a Nix expression
According to section "5.4 Translating Nix expressions to
store derivations" in Eelco Dolstra's PhD thesis:
The normal form [of a Nix expression] should
be
a call to derivation, or
a nested structure of lists and
attribute sets that contain calls
to derivation.
In any case, these derivation Nix expressions
are subsequently translated to store derivations.
What is a software component?
A package, application, development environment,
software library, etc.
More formally from "3.1 What is a component?" in Eelco Dolstra's PhD thesis:
A software component is
*-------------------------------------*
1. | a software artifact that is subject |
| to automatic composition |
*-------------------------------------*
It can require, and be required by,
other components.
*----------------------*
2. | a unit of deployment |
*----------------------*
(That entire section is worth reading.)

Related

Represent postfix and prefix increment and decrement in AST and grammar

I have those rules to build a simple calculator :
statement -> assignment | calculation
assignment -> variable '=' sum end
calculation -> sum end
sum -> product (('+' product)|('-' product))*
product -> factor (('*' factor)|('/' factor))*
factor -> term
term -> variable | number
My problem is how to model the rules for postfix and prefix increment and decrement. How can represent it in this grammar above so that, for example, if I have the assignment :
x=1
j=x++ +2
the result will be j=3 and x=2. How do I do post-increment after assignment?
The simplest grammar change would be to add the new operators to term:
term -> variable
| '++' variable | '--' variable
| variable '++' | variable '--'
| number
The new rules could have been added to factor instead, particularly since factor currently has no point at all and could be removed. However, if you ever add more complicated lvalues than a single variable (array subscripts for example) then that will have to be adjusted. Also, adding the operators to factor would make nonsense like ++2 syntactically possible, or (a+b)++ once you implement parentheses. So, although putting them in some non-terminal other than term is more common and probably more appropriate, it's not necessarily the best solution in this particular case.
The questions about the AST and the evaluation of the AST can't be answered without knowing a lot more about how you structure your ASTs. You're free to build ASTs in any way you feel appropriate, but it's probably worth noting that the AST must be able to distinguish between post- and pre-increment. Either you need to use a different operator symbol for the two cases, or you need some hack (such as the C++ hack of adding a fake operand to one of the two cases).

Grammar for parsing multiple calls?

As a side project and learning experiment, I'm writing my own programming language with no extra premade tools, such as LLVM.
I've already written my own recursive descent parser, but I'm having a problem trying to think of the logistics of parsing a statement like this:
x()()[0]()
I can't think of a good way to make a parse tree/AST out of this. I've tried reading the grammars of other programming languages (notably Python and C#), but I just can't figure out how they do it.
How would I write something to parse the above grammar?
From an AST perspective, it probably helps to imagine that you have some sort of "function call" node, with one child representing the expression representing the function to call and one child per expression denoting an argument. For example, the code fn()()[0]() might look like this:
+--------+
| call |
+--------+
function / \ args
/ (null)
/
+-----------+
| selection |
+-----------+
array / \ index
/ 0
/
+--------+
| call |
+--------+
function / \ args
/ (null)
/
+--------+
| call |
+--------+
function / \ args
fn 0
In terms of how to parse something like this, I'd recommend treating the function call as a postfix operator like array selection (arr[index]) or member selection (object.field). A CFG fragment for this might look like this:
Expr --> Expr(ArgList) |
/* other expression types */
From a recursive descent perspective, after you've parsed an expression, you'd do a lookahead to see if there's an open parenthesis token after it. If so, that means that whatever you just read should be treated as the function component of a function call expression, and what you're about to read is the argument list.

Preferring one alternative

An excerpt of my ANTLR v4 grammar looks like this:
expression:
| expression BINARY_OPERATOR expression
| unaryExpression
| nularExpression
;
unaryExpression:
ID expression
;
nularExpression:
ID
| NUMBER
| STRING
;
My goal is to match the language without knowing all the necessary keywords and therefore I'm simply matching keywords as IDs.
However there are binary operators that take an argument to both sides of the keyword (e.g. keyword ) and therefore they need "special treatment". As you can see I already included this "special treatment" in the expression rule.
The actual problem now consists of the fact that some of these binary operators can be used as unary operators (=normal keywords) as well meaning that the left argument does not have to be specified.
The above grammar can't habdle this case because everytime I tried to implement this I ended up with every binary operator being consumed as a unary operator.
Example:
Let's assume count is a binary operator.
Possible syntaxes are <arg1> count <arg2> and count <arg>
All my attempts to implement the above mentioned case ended up grouping myArgument count otherArgument like (myArgument (count (otherArgument) ) ) instead of (myArgument) count (otherArgument)
My brain tellsme that the solution to this problem is to tell the parser always to take two arguments for a binary operator and if this fails it should try to consume the binary operator as an unary one.
Does anybody know how to accomplish this?
How about something like this:
lower_precedence_expression
: ID higher_precedence_expression
| higher_precedence_expression
;
higher_precedence_expression
: higher_precedence_expression ID lower_precedence_expression
| ID
| NUMBER
| STRING
;
?

Faulty bison reduction using %glr-parser and %merge rules

Currently I'm trying to build a parser for VHDL which
has some of the problems C++-Parsers have to face.
The context-free grammar of VHDL produces a parse
forest rather than a single parse tree because of it's
ambiguity regarding function calls and array subscriptions
foo := fun(3) + arr(5);
This assignment can only be parsed unambiguous if the parser
would carry around a hirachically, type-aware symbol table
which it'd use to resolve the ambiguities somewhat on-the-fly.
I don't want to do this, because for statements like the
aforementioned, the parse forest would not grow exponentially, but
rather linear depending on the amount of function calls and
array subscriptions.
(Except, of course, one would torture the parser with statements like)
foo := fun(fun(fun(fun(fun(4)))));
Since bison forces the user to just create one single parse-tree,
I used %merge attributes to collect all subtrees recursively and
added those subtrees under so called AMBIG nodes in the singleton
AST.
The result looks like this.
In order to produce the above, I parsed the token stream "I=I(N);".
The substance of the grammar I used inside the parse.y file, is
collected below. It tries to resemble the ambiguous parts of VHDL:
start: prog
;
/* I cut out every semantic action to make this
text more readable */
prog: assignment ';'
| prog assignment ';'
;
assignment: 'I' '=' expression
;
expression: function_call %merge <stmtmerge2>
| array_indexing %merge <stmtmerge2>
| 'N'
;
function_call: 'I' '(' expression ')'
| 'I'
;
array_indexing: function_call '(' expression ')' %merge <stmtmerge>
| 'I' '(' expression ')' %merge <stmtmerge>
;
The whole sourcecode can be read at this github repository.
And now, let's get down to the actual Problem.
As you can see in the generated parse tree above,
the nodes FCALL1 and ARRIDX1 refer to the same
single node EXPR1 which in turn refers to N1 twice.
This, by all means, should not have happened and I don't
know why. Instead there should be the paths
FCALL1 -> EXPR2 -> N2
ARRIDX1 -> EXPR1 -> N1
Do you have any idea why bison reuses the aforementioned
nodes?
I also wrote a bugreport on the official gnu mailing
list for bison, without a reply to this point though.
Unfortunately, due to the restictions for new stackoverflow
users, I can't provide no link to this bug report...
That behaviour is expected.
expression can be unambiguously reduced, and that reduced value is used by both possible ambiguous reductions which include the value. Remember that GLR, like LR, is a left-to-right parser. When a reduction action is executed, all of the child reductions have already happened. The effect is not different from the use of a terminal in a right-hand side; the terminal will not be artificially copied in order to produce different instances in the ambiguous productions which use it.
For most people, this would be a feature rather than a bug, and I don't mean that as a joke. Without the graph-structured stack, GLR has exponential run-time. If you really want to do a deep copy of shared AST nodes when you merge parse trees, you will have to do it yourself, but I suggest that you find a way to make use of the fact that the parse forest is really an directed acyclic graph rather than a tree; you will probably be able to take advantage of the lack of duplication.

Is My Lambda Calculus Grammar Unambiguous?

I am trying to write a small compiler for a language that handles lambda calculus. Here is the ambiguous definition of the language that I've found:
E → ^ v . E | E E | ( E ) | v
The symbols ^, ., (, ) and v are tokens. ^ represents lambda and v represents a variable.
An expression of the form ^v.E is a function definition where v is the formal parameter of the function and E is its body. If f and g are lambda expressions, then the lambda expression fg represents the application of the function f to the argument g.
I'm trying to write an unambiguous grammar for this language, under the assumption that function application is left associative, e.g., fgh = (fg)h, and that function application binds tighter than ., e.g., (^x. ^y. xy) ^z.z = (^x. (^y. xy)) ^z.z
Here is what I have so far, but I'm not sure if it's correct:
E -> ^v.E | T
T -> vF | (E) E
F -> v | epsilon
Could someone help out?
Between reading your question and comments, you seem to be looking more for help with learning and implementing lambda calculus than just the specific question you asked here. If so then I am on the same path so I will share some useful info.
The best book I have, which is not to say the best book possible, is Types and Programming Languages (WorldCat) by Benjamin C. Pierce. I know the title doesn't sound anything like lambda calculus but take a look at λ-Calculus extensions: meaning of extension symbols which list many of the lambda calculi that come from the book. There is code for the book in OCaml and F#.
Try searching in CiteSeerX for research papers on lambda calculus to learn more.
The best λ-Calculus evaluator I have found so far is:
Lambda calculus reduction workbench with info here.
Also, I find that you get much better answers for lambda calculus questions related to programming at CS:StackExchange and math related questions at Math:StackExcahnge.
As for programming languages to implement lambda calculus you will probably need to learn a functional language if you haven't; Yes it's a different beast, but the enlightenment on the other side of the mountain is spectacular. Most of the source code I find uses a functional language such as ML or OCaml, and once you learn one, the rest get easier to learn.
To be more specific, here is the source code for the untyped lambda calculus project, here is the input file to an F# variation of YACC which from reading your previous questions seems to be in your world of knowledge, and here is sample input.
Since the grammar is for implementing a REPL, it starts with toplevel, think command prompt, and accepts multiple commands, which in this case are lambda calculus expressions. Since this grammar is used for many calculi it has parts that are place holders in the earlier examples, thus binding here is more of a place holder.
Finally we get to the part you are after
Note LCID is Lower Case Identifier
Term : AppTerm
| LAMBDA LCID DOT Term
| LAMBDA USCORE DOT Term
AppTerm : ATerm
| AppTerm ATerm
/* Atomic terms are ones that never require extra parentheses */
ATerm : LPAREN Term RPAREN
| LCID
You may find the proof for a particular grammar's ambiguity in sublinear time, but proving that grammar is unambiguous is an NP complete problem. You'd have to generate every possible sentence in the language, and check that there is only one derivation for each.

Resources