Why F# compiler gets into twist with seq{0L..-5L..-10L}? - f#

I'm having a bit of trouble declaring a descending sequence of int64.
What I want is this:
seq{0L..-5L..-10L};;
However, I get an error:
seq{0L..-5L..-10L};;
---^^^^^^^^^^^^^^^
stdin(5,4): error FS0739: Invalid object, sequence or record expression
Interestingly, it works with plain int:
> seq{0..-5..-10};;
val it : seq<int> = seq [0; -5; -10]
Even more interestingly, if I put spaces between .., it starts working with int64 too:
> seq{0L .. -5L .. -10L};;
val it : seq<int64> = seq [0L; -5L; -10L]
Can someone explain why the compiler gets into the twist with seq{0L..-5L..-10L}?

I agree that this is a bit odd behavior. It is generally recommended (although this is not strictly required by the specification) to write spaces around .. and it works correctly in that case. So I'd recommend using:
seq { 0 .. -5 .. -10 }
seq { 0L .. -5L .. -10L }
Why is this behaving differently for int and int64? You may notice that when you write 1..-2 and 1L..-2, Visual Studio colorizes the text differently (in the first case .. has the same color as numbers, in the other case, it has the same color as .. with spaces).
The problem is that when the compiler sees 1., it may mean a floating point value (1.0) or it may be a start of 1.., so this case is handled specially. For 1L., this is not a problem - 1L. has to be the beginning of 1L...
So, if you write 1..-5..-10, the compiler uses the special handling and generates a sequence. If you write 1L..-5..-10, then the compiler parses ..- as a unary operator that is applied to 5L. Writing the spaces resolves the ambiguity between unary operator and .. followed by a negative number.
For reference, here is a screenshot from my Visual Studio (which shows 10.. in green, but .. on the second line in yellow - not particularly noticeable difference, but they are different :-))

Related

Creating an array with operands results in nil members

I just started learning lua and run into a strange problem. The following code...
local xx = 100
vertices0 = {xx, xx}
vertices1 = {xx−5, xx-5}
results in...
an array containing (100,100) for vertices0 (as expected) but
in an array containing (nil, 95)for vertices1.
I really dont understand what is causing the nil to appear. I expected to get an array with (95,95).
I checked the documentation and tried to google the problem. But was not able to solve this problem.
Btw - I'm using love2d but "regular" lua seeems to cause the same behaviour.
xx−5 is not using - but −, lua treats − as part of a identifier so xx−5 is a separate identifier rather than the desired subtraction operation xx - 5
local xx = 100
local xx−5 = 100
vertices0 = {xx, xx}
vertices1 = {xx−5, xx-5}
print(vertices1[1])
This appears to work in 5.1, but not later version of lua. additionally an issue like this can be seen easier if you place a space around an operator and it's operands, which does tend to be a good style choice for readability.
vertices1 = {xx − 5, xx - 5}
Also if you have syntax highlighting than you can notice the improper char does not get highlighted properly.

Is it appropriate for a parser DCG to not be deterministic?

I am writing a parser for a query engine. My parser DCG query is not deterministic.
I will be using the parser in a relational manner, to both check and synthesize queries.
Is it appropriate for a parser DCG to not be deterministic?
In code:
If I want to be able to use query/2 both ways, does it require that
?- phrase(query, [q,u,e,r,y]).
true;
false.
or should I be able to obtain
?- phrase(query, [q,u,e,r,y]).
true.
nevertheless, given that the first snippet would require me to use it as such
?- bagof(X, phrase(query, [q,u,e,r,y]), [true]).
true.
when using it to check a formula?
The first question to ask yourself, is your grammar deterministic, or in the terminology of grammars, unambiguous. This is not asking if your DCG is deterministic, but if the grammar is unambiguous. That can be answered with basic parsing concepts, no use of DCG is needed to answer that question. In other words, is there only one way to parse a valid input. The standard book for this is "Compilers : principles, techniques, & tools" (WorldCat)
Now you are actually asking about three different uses for parsing.
A recognizer.
A parser.
A generator.
If your grammar is unambiguous then
For a recognizer the answer should only be true for valid input that can be parsed and false for invalid input.
For the parser it should be deterministic as there is only one way to parse the input. The difference between a parser and an recognizer is that a recognizer only returns true or false and a parser will return something more, typically an abstract syntax tree.
For the generator, it should be semi-deterministic so that it can generate multiple results.
Can all of this be done with one, DCG, yes. The three different ways are dependent upon how you use the input and output of the DCG.
Here is an example with a very simple grammar.
The grammar is just an infix binary expression with one operator and two possible operands. The operator is (+) and the operands are either (1) or (2).
expr(expr(Operand_1,Operator,Operand_2)) -->
operand(Operand_1),
operator(Operator),
operand(Operand_2).
operand(operand(1)) --> "1".
operand(operand(2)) --> "2".
operator(operator(+)) --> "+".
recognizer(Input) :-
string_codes(Input,Codes),
DCG = expr(_),
phrase(DCG,Codes,[]).
parser(Input,Ast) :-
string_codes(Input,Codes),
DCG = expr(Ast),
phrase(DCG,Codes,[]).
generator(Generated) :-
DCG = expr(_),
phrase(DCG,Codes,[]),
string_codes(Generated,Codes).
:- begin_tests(expr).
recognizer_test_case_success("1+1").
recognizer_test_case_success("1+2").
recognizer_test_case_success("2+1").
recognizer_test_case_success("2+2").
test(recognizer,[ forall(recognizer_test_case_success(Input)) ] ) :-
recognizer(Input).
recognizer_test_case_fail("2+3").
test(recognizer,[ forall(recognizer_test_case_fail(Input)), fail ] ) :-
recognizer(Input).
parser_test_case_success("1+1",expr(operand(1),operator(+),operand(1))).
parser_test_case_success("1+2",expr(operand(1),operator(+),operand(2))).
parser_test_case_success("2+1",expr(operand(2),operator(+),operand(1))).
parser_test_case_success("2+2",expr(operand(2),operator(+),operand(2))).
test(parser,[ forall(parser_test_case_success(Input,Expected_ast)) ] ) :-
parser(Input,Ast),
assertion( Ast == Expected_ast).
parser_test_case_fail("2+3").
test(parser,[ forall(parser_test_case_fail(Input)), fail ] ) :-
parser(Input,_).
test(generator,all(Generated == ["1+1","1+2","2+1","2+2"]) ) :-
generator(Generated).
:- end_tests(expr).
The grammar is unambiguous and has only 4 valid strings which are all unique.
The recognizer is deterministic and only returns true or false.
The parser is deterministic and returns a unique AST.
The generator is semi-deterministic and returns all 4 valid unique strings.
Example run of the test cases.
?- run_tests.
% PL-Unit: expr ........... done
% All 11 tests passed
true.
To expand a little on the comment by Daniel
As Daniel notes
1 + 2 + 3
can be parsed as
(1 + 2) + 3
or
1 + (2 + 3)
So 1+2+3 is an example as you said is specified by a recursive DCG and as I noted a common way out of the problem is to use parenthesizes to start a new context. What is meant by starting a new context is that it is like getting a new clean slate to start over again. If you are creating an AST, you just put the new context, items in between the parenthesizes, as a new subtree at the current node.
With regards to write_canonical/1, this is also helpful but be aware of left and right associativity of operators. See Associative property
e.g.
+ is left associative
?- write_canonical(1+2+3).
+(+(1,2),3)
true.
^ is right associative
?- write_canonical(2^3^4).
^(2,^(3,4))
true.
i.e.
2^3^4 = 2^(3^4) = 2^81 = 2417851639229258349412352
2^3^4 != (2^3)^4 = 8^4 = 4096
The point of this added info is to warn you that grammar design is full of hidden pitfalls and if you have not had a rigorous class in it and done some of it you could easily create a grammar that looks great and works great and then years latter is found to have a serious problem. While Python was not ambiguous AFAIK, it did have grammar issues, it had enough issues that when Python 3 was created, many of the issues were fixed. So Python 3 is not backward compatible with Python 2 (differences). Yes they have made changes and libraries to make it easier to use Python 2 code with Python 3, but the point is that the grammar could have used a bit more analysis when designed.
The only reason why code should be non-deterministic is that your question has multiple answers. In that case, you'd of course want your query to have multiple solutions. Even then, however, you'd like it to not leave a choice point after the last solution, if at all possible.
Here is what I mean:
"What is the smaller of two numbers?"
min_a(A, B, B) :- B < A.
min_a(A, B, A) :- A =< B.
So now you ask, "what is the smaller of 1 and 2" and the answer you expect is "1":
?- min_a(1, 2, Min).
Min = 1.
?- min_a(2, 1, Min).
Min = 1 ; % crap...
false.
?- min_a(2, 1, 2).
false.
?- min_a(2, 1, 1).
true ; % crap...
false.
So that's not bad code but I think it's still crap. This is why, for the smaller of two numbers, you'd use something like the min() function in SWI-Prolog.
Similarly, say you want to ask, "What are the even numbers between 1 and 10"; you write the query:
?- between(1, 10, X), X rem 2 =:= 0.
X = 2 ;
X = 4 ;
X = 6 ;
X = 8 ;
X = 10.
... and that's fine, but if you then ask for the numbers that are multiple of 3, you get:
?- between(1, 10, X), X rem 3 =:= 0.
X = 3 ;
X = 6 ;
X = 9 ;
false. % crap...
The "low-hanging fruit" are the cases where you as a programmer would see that there cannot be non-determinism, but for some reason your Prolog is not able to deduce that from the code you wrote. In most cases, you can do something about it.
On to your actual question. If you can, write your code so that there is non-determinism only if there are multiple answers to the question you'll be asking. When you use a DCG for both parsing and generating, this sometimes means you end up with two code paths. It feels clumsy but it is easier to write, to read, to understand, and probably to make efficient. As a word of caution, take a look at this question. I can't know that for sure, but the problems that OP is running into are almost certainly caused by unnecessary non-determinism. What probably happens with larger inputs is that a lot of choice points are left behind, there is a lot of memory that cannot be reclaimed, a lot of processing time going into book keeping, huge solution trees being traversed only to get (as expected) no solutions.... you get the point.
For examples of what I mean, you can take a look at the implementation of library(dcg/basics) in SWI-Prolog. Pay attention to several things:
The documentation is very explicit about what is deterministic, what isn't, and how non-determinism is supposed to be useful to the client code;
The use of cuts, where necessary, to get rid of choice points that are useless;
The implementation of number//1 (towards the bottom) that can "generate extract a number".
(Hint: use the primitives in this library when you write your own parser!)
I hope you find this unnecessarily long answer useful.

Performant, idiomatic way to concatenate two chars that are not in a list into a string

I've done most of my development in C# and am just learning F#. Here's what I want to do in C#:
string AddChars(char char1, char char2) => char1.ToString() + char2.ToString();
EDIT: added ToString() method to the C# example.
I want to write the same method in F# and I don't know how to do it other than this:
let addChars char1 char2 = Char.ToString(char1) + Char.ToString(char2)
Is there a way to add concatenate these chars into a string without converting both into strings first?
Sidenote:
I also have considered making a char array and converting that into a string, but that seems similarly wasteful.
let addChars (char1:char) (char2: char) = string([|char1; char2|])
As I said in my comment, your C# code is not going to do what you want ( i.e. concatenate the characters into a string). In C#, adding a char and a char will result in an int. The reason for this is because the char type doesn't define a + operator, so C# reverts to the nearest compatable type that does, which just happens to be int. (Source)
So to accomplish this behavior, you will need to do something similar to what you are already trying to do in F#:
char a = 'a';
char b = 'b';
// This is the wrong way to concatenate chars, because the
// chars will be treated as ints and the result will be 195.
Console.WriteLine(a + b);
// These are the correct ways to concatenate characters into
// a single string. The result of all of these will be "ab".
// The third way is the recommended way as it is concise and
// involves creating the fewest temporary objects.
Console.WriteLine(a.ToString() + b.ToString());
Console.WriteLine(Char.ToString(a) + Char.ToString(b));
Console.WriteLine(new String(new[] { a, b }));
(See https://dotnetfiddle.net/aEh1FI)
F# is the same way in that concatenating two or more chars doesn't result in a String. Unlike C#, it results instead in another char, but the process is the same - the char values are treated like int and added together, and the result is the char representation of the sum.
So really, the way to concatenate chars into a String in F# is what you already have, and is the direct translation of the C# equivalent:
let a = 'a'
let b = 'b'
// This is still the wrong way (prints 'Ã')
printfn "%O" (a + b)
// These are still the right ways (prints "ab")
printfn "%O" (a.ToString() + b.ToString())
printfn "%O" (Char.ToString(a) + Char.ToString(b))
printfn "%O" (String [| a;b |]) // This is still the best way
(See https://dotnetfiddle.net/ALwI3V)
The reason the "String from char array" approach is the best way is two-fold. First, it is the most concise, since you can see that that approach offers the shortest line of code in both languages (and the difference only increases as you add more and more chars together). And second, only one temporary object is created (the array) before the final String, whereas the other two methods involve making two separate temporary String objects to feed into the final result.
(Also, I'm not sure if it works this way as the String constructors are hidden in external sources, but I imagine that the array passed into the constructor would be used as the String's backing data, so it wouldn't end up getting wasted at all.) Strings are immutable, but using the passed array directly as the created String's backing data could result in a situation where a reference to the array could be held elsewhere in the program and jeopardize the String's immutability, so this speculation wouldn't fly in practice. (Credit: #CaringDev)
Another option you could do in F# that could be more idiomatic is to use the sprintf function to combine the two characters (Credit: #rmunn):
let a = 'a'
let b = 'b'
let s = sprintf "%c%c" a b
printfn "%O" s
// Prints "ab"
(See https://dotnetfiddle.net/Pp9Tee)
A note of warning about this method, however, is that it is almost certainly going to be much slower than any of the other three methods listed above. That's because instead of processing array or String data directly, sprintf is going to be performing more advanced formatting logic on the output. (I'm not in a position where I could benchmark this myself at the moment, but plugged into #TomasPetricek's benckmarking code below, I wouldn't be surprised if you got performance hits of 10x or more.)
This might not be a big deal as for a single conversion it will still be far faster than any end-user could possibly notice, but be careful if this is going to be used in any performance-critical code.
The answer by #Abion47 already lists all the possible sensible methods I can think of. If you are interested in performance, then you can run a quick experiment using the F# Interactive #time feature:
#time
open System
open System.Text
let a = 'a'
let b = 'b'
Comparing the three methods, the one with String [| a; b |] turns out to be about twice as fast as the methods involving ToString. In practice, that's probably not a big deal unless you are doing millions of such operations (as my experiment does), but it's an interesting fact to know:
// 432ms, 468ms, 472ms
for i in 0 .. 10000000 do
let s = a.ToString() + b.ToString()
ignore s
// 396ms 440ms, 458ms
for i in 0 .. 10000000 do
let s = Char.ToString(a) + Char.ToString(b)
ignore s
// 201ms, 171ms, 170ms
for i in 0 .. 10000000 do
let s = String [| a;b |]
ignore s

Why should successive arguments involving method application be parenthesized?

Suppose the following F# function:
let f (x:int) (y:int) = 42
I suspect that the reason I need to parenthesize the arguments in example z2 below is because of type inference; my example might not be great, but it's easy to imagine how things could get very hairy:
let z1 = f 2 3
let z2 = f 2 (f 3 5)
However, the following case is less clear to me:
let rng = System.Random()
let z3 = f 1 rng.Next(5)
z3 doesn't work, with a clear error message:
error FS0597: Successive arguments should be separated by spaces or
tupled, and arguments involving function or method applications should
be parenthesized.
Fixing it is trivial (parenthesize all the things), but what I am not clear about is why such an expression is a problem. I assume this has to do with type inference again, but naively, it seems to me that here, methods having a list of arguments surrounded by a parenthesis would actually make things less potentially ambiguous. Does this have to do with the fact that rng.Next(5) is equivalent to rng.Next 5?
Can someone hint, give an example or explain why this rule is needed, or what type of problems would arise if it were not there?
I think that the problem here is that the code could be treated as:
let z3 = f 1 rng.Next (5)
This would be equivalent to omitting the parentheses and so it would be calling f with 3 arguments (the second being a function value). This sounds a bit silly, but the compiler actually does not strictly insist on having a space between parameters. For example:
let second a b = b
add 5(1) // This works fine and calls 'add 5 1'
add id(1) // error FS0597
add rng.Next(5) // error FS0597
add (rng.Next(5)) // This works fine (partial application)
I think the problem is that if you look at the sequence of the 4 examples in the above snippet, it is not clear which behavior should you get in the second and the third case.
The call rng.Next(5) is still treated in a special way, because F# allows you to chain calls if they are formed by single-parameter application without space. For example rng.Next(5).ToString(). But, for example, writing second(1)(2) is allowed, but second(1)(2).ToString() will not work.

Writing F# code to parse "2 + 2" into code

Extremely just-started-yesterday new to F#.
What I want: To write code that parses the string "2 + 2" into (using as an example code from the tutorial project) Expr.Add(Expr.Num 2, Expr.Num 2) for evaluation. Some help to at least point me in the right direction or tell me it's too complex for my first F# project. (This is how I learn things: By bashing my head against stuff that's hard)
What I have: My best guess at code to extract the numbers. Probably horribly off base. Also, a lack of clue.
let script = "2 + 2";
let rec scriptParse xs =
match xs with
| [] -> (double)0
| y::ys -> (double)y
let split = (script.Split([|' '|]))
let f x = (split[x]) // "This code is not a function and cannot be applied."
let list = [ for x in 0..script.Length -> f x ]
let result = scriptParse
Thanks.
The immediate issue that you're running into is that split is an array of strings. To access an element of this array, the syntax is split.[x], not split[x] (which would apply split to the singleton list [x], assuming it were a function).
Here are a few other issues:
Your definition of list is probably wrong: x ranges up to the length of script, not the length of the array split. If you want to convert an array or other sequence to a list you can just use List.ofSeq or Seq.toList instead of an explicit list comprehension [...].
Your "casts" to double are a bit odd - that's not the right syntax for performing conversions in F#, although it will work in this case. double is a function, so the parentheses are unnecessary and what you are doing is really calling double 0 and double y. You should just use 0.0 for the first case, and in the second case, it's unclear what you are converting from.
In general, it would probably be better to do a bit more design up front to decide what your overall strategy will be, since it's not clear to me that you'll be able to piece together a working parser based on your current approach. There are several well known techniques for writing a parser - are you trying to use a particular approach?

Resources