In SynExpr.Lambda, what is the usage of args/body vs. parsedData? - f#

In FSharp.Compiler.Service, SynExpr.Lambda is defined like this:
/// First bool indicates if lambda originates from a method. Patterns here are always "simple"
/// Second bool indicates if this is a "later" part of an iterated sequence of lambdas
/// parsedData keeps original parsed patterns and expression,
/// prior to transforming to "simple" patterns and iterated lambdas
///
/// F# syntax: fun pat -> expr
| Lambda of
fromMethod: bool *
inLambdaSeq: bool *
args: SynSimplePats *
body: SynExpr *
parsedData: (SynPat list * SynExpr) option *
range: range *
trivia: SynExprLambdaTrivia
I am implementing an F# linter (i.e., a source code analyzer). I am confused about the usage of args/body vs. parsedData.
My understanding is that args/body is a simplified version of parsedData, and may therefore not correspond to what is in the source file. Therefore, since I am implementing a linter, I should use parsedData.
However, parsedData is option-wrapped, and I have no idea when it may be None or what I should do then.
Could anyone clear up the usage of args/body vs. parsedData in the context of implementing a linter?

Your understainding is entirely correct! The rationale for adding that field is explained in this PR: https://github.com/dotnet/fsharp/pull/10166
Keeps initially parsed lambda patterns and expressions in the syntax tree.
The parser replaces patterns and expression with iterated lambdas that
are later used by the type checker and code generation. Ideally this
conversion should be done at a later stage, so the syntax tree keeps
the actually parsed data instead. Doing so would make a lot of churn,
since at this point every other part of the compiler already assumes
to work with the iterated lambdas.
The problem with this early conversion approach is tools interested in
the syntax tree try to "reconstruct" the original lambdas, each one
using a different approach (e.g. FSharpLint, Fantomas). This may be
error-prone due to in some of the cases the compiler effectively
throws the tree data away (by stripping parens and errors in various
cases).
This PR adds another field to SynExpr.Lambda so tooling interested in
the syntax tree would be able to work with parsed data.
It does not explain what to do when the parsedData is None. Since the purpose of the parsedData is to support Fantomas etc. (not compiling code), I presume that it was made optional for easier backward compatibility.
If you are writing a linter, I would suggest using the parsedData if present and either skipping, crashing or using the other fields as a fall-back.
Indeed, skipping is what Fantomas appears to do:
| SynExpr.Lambda(_, _, _, _, Some(pats, body), _, { ArrowRange = Some mArrow }) ->
mkLambda creationAide pats mArrow body exprRange |> Expr.Lambda

Related

Extracting values from a deeply nested data structure in haskell

I've been trying to work out how to use the language-bash package to parse some simple bash scripts, and I've come across the following structure
Right (List [Statement (Last (Pipeline {timed = False, timedPosix = False, inverted = False, commands = [Command (SimpleCommand [Assign (Parameter "x" Nothing) Equals (RValue [Char '3'])] []) []]})) Sequential])
as a result of running
import Language.Bash.Parse
parse "" "x=3"
I could theoretically just pattern match the whole thing away, though I was wondering if there was a cleaner way of accessing the values of the Assign datatype ('x', (Char '3').
Is there anyway to cleanly access those values (or generally access values in a complex datastructure) without obsessive pattern matching ?
Not really.
Here's the problem. You probably want to either handle an extremely limited set of possible Bash statements, in which case just writing out the patterns for specific List values will be faster than anything else you could possibly do.
Or, you want to handle a wide variety of Bash statements, in which case you can't really avoid the functional infrastructure to handle general List values. The same way you'd write an interpreter or compiler for any complex abstract syntax tree, you'll end up more or less writing a function for every (major) type and a case for every constructor.
The main Haskell tools for dealing with big, complex data structures are:
The "functional infrastructure" described above. That is, plain old functions defined using pattern matching, that process recursive data structures in a manner that mirrors the structures themselves. Don't underestimate this approach! It may seem like a lot of work, but it's likely to lead you to a correct program that handles all well-formed inputs, in a way that ad hoc approaches won't. Start with:
{-# OPTIONS_GHC -Wall #-}
data M = ... some monad ...
data Result = ... representation of what you want to extract from the script ...
processList :: List -> M Result
...
processStatement :: Statement -> M Result
...
and go from there. The -Wall is important to get the -Wincomplete-patterns warning so you don't miss any constructors.
Lenses, which provide a more ergonomic hierarchical syntax for referring to parts of deeply nested data structures. Since bash-language doesn't provide lenses for these structures, you'd need to write them yourself. They might allow you to write something along the lines of:
lst ^. _Right.statements._head.andOr.pipeline.commands.
_head._SimpleCommand.assignments._head.parameter.base
to extract the "x" from "x=3". Obviously, that doesn't help much, but lenses complement the "functional infrastructure" approach. The code to actually process all those types is often more easily expressed with lenses than pattern matching.
Generics, which allow you to generically access certain patterns within recursive data structures, while ignoring the "rest" of the data structure that you don't care about. The bash-language library includes deriving clauses for both Data and Generic generics. If it didn't, you could use StandaloneDeriving clauses to derive them. As an example, you can use Data generics to extract all Parameters from a List, regardless of where those Parameters appear, with something like:
import Language.Bash.Parse
import Language.Bash.Word
import Data.Data
import Data.Generics.Schemes
import Data.Generics.Aliases
parameters :: (Data a) => a -> [Parameter]
parameters = everything (++) (mkQ [] (\p -> [p]))
main = do
let Right lst = parse "" "x=3; y=4; LANG=C echo $x $y"
print $ parameters lst
Here, this prints out a list of all parameters appearing in this shell "script", whether for purposes of assignment or substitution, so it includes: "x", "y", "LANG", and "x" and "y" again.
This is a powerful tool, but it's likely to be applicable to only a few specific use-cases.
Ultimately, you'll probably have to take the view that you are writing a Bash interpreter (even if your interpreter does something besides "executing" the Bash script). Someone's been nice enough to supply a Bash parser to get the input source code into an AST, but the other half of the interpreter -- the actual interpretation itself -- still needs to be written by you.

F# "reverse" exhausting match on discriminated unions?

Taking the example of writing a parser-unparser pair for a DU.
The unparser function of course uses match expression, where we can count on static exhaustiveness check if new cases are added later.
Now, for the parsing side, the obvious route is (with eg FParsec) use a choice parser with a list of the DU case parsers. This works, but does not have exhaustiveness checking.
My best solution for this, is to create a DU_Case -> parser function that uses match expression, and using runtime reflection get all DU cases, call the function with them, and pass the so generated list to the choice combinator. This works, and does have static exhaustiveness checking, BUT gets pretty clunky fast, especially if the cases have varying fields.
In other words, is there a good pattern other than using reflection for ensuring exhaustiveness when the discriminated union is the output and not the input?
Edit: example on the reflection based solution
By getting clunky, I mean that for every case that has fields, I have to also create the field values dynamically, instead of the simple Array.zeroCreate(0) which is of course always valid. But what in the else branch? There needs to be code to dynamically create the correct number of field values, with their correct (possibly complex) constructors and so on. Not impossible with reflection, but clunky.
FSharpType.GetUnionCases duType
|> Seq.map (fun duCase ->
if duCase.GetFields().Length = 0 then
let case = FSharpValue.MakeUnion(duCase, Array.zeroCreate(0))
else
(*...*)

wrap all module's functions with a decorator programmatically

There is need for tracing. The decorator should print function name, parameters values and return value. Instead of writing each time a decorator for each function, it would be terrific if it could be possible to do this programmatically.
The current function name can be discovered using reflection via MethodBase.GetCurrentMethod. Functions could be easily decorated with an inline function for logging:
let inline log args f =
let mi = System.Reflection.MethodBase.GetCurrentMethod()
let result = f ()
printf "%s %A -> %A" mi.Name args result
let add a b = log (a,b) (fun () -> a + b)
add 1 1
Which prints: add (1, 1) -> 2
EDIT: Another option would be to create a wrap function, i.e.:
let inline wrap f =
fun ps ->
let result = f args
printfn "%A -> %A" args result
result
let add (a,b) = a + b
wrap add (1,1)
However in this case there is not an easy way to programmatically retrieve the function name.
Yet another option might be to develop a Type Provider that takes an assembly path as a parameter and provides wrapped versions of all members.
I had the same desire previously and found that there are no current automated solutions for F#.
See: Converting OCaml to F#: Is there a simple way to simulate OCaml top-level #trace in F#
While OCaml's trace facility with time travel is the most useful debugging feature of comparison to what is desired, it is not an exact fit; but when I use OCaml it is the first inspection tool I use.
See: Using PostSharp with F# - Need documentation with working example
Also the suggestion of using AOP, i.e. PostSharp, was another good suggestion, as the response from Gael Fraiteur, the Principal Engineer of PostSharp points out:
PostSharp does not officially support F#.
Other than using reflection as suggested by Phillip Trelford, which I have not tried, the best solution I have found is to manually modify each function to be traced as I noted in Converting OCaml to F#: Is there a simple way to simulate OCaml top-level #trace in F# and save the results to a separate log file using NLog.
See: Using NLog with F# Interactive in Visual Studio - Need documentation
Another route to pursue would be check out the work of F# on Mono as there is a lot of work being done there to add extra tooling for use with F#.
Basically what I have found is that as my F# skills increase my need
to use a debugger or tracing decrease.
At present when I do run into a problem needing this level of inspection, adding the inspection code as noted in Converting OCaml to F#: Is there a simple way to simulate OCaml top-level #trace in F# helps to resolve my misunderstanding.
Also of note is that people who come from the C# world to the F# world tend to expect a debugger to be just as useful. Remember that imperative languages tend to be about manipulating data held in variables and that the debugger is used to inspect the values in these variables, while with functional programming, at least for me, I try to avoid mutable values and so the only values one needs to inspect are the values being passed to the function and no other values, thus reducing or obviating the need for a debugger or inspection beyond that of the function of question.

F#'s underscore: why not just create a variable name?

Reading about F# today and I'm not clear on one thing:
From: http://msdn.microsoft.com/en-us/library/dd233200.aspx
you need only one element of the tuple, the wildcard character (the underscore) can be used to avoid creating a new name for a variable that you do not need
let (a, _) = (1, 2)
I can't think of a time that I've been in this situation. Why would you avoid creating a variable name?
Because you don't need the value. I use this often. It documents the fact that a value is unused and saves naming variables unused, dummy, etc. Great feature if you ask me.
Interesting question. There are many trade-offs involved here.
Your comparisons have been with the Ruby programming language so perhaps the first trade-off you should consider is static typing. If you use the pattern x, _, _ then F# knows you are referring to the first element of a triple of exactly three elements and will enforce this constraint at compile time. Ruby cannot. F# also checks patterns for exhaustiveness and redundancy. Again, Ruby cannot.
Your comparisons have also used only flat patterns. Consider the patterns _, (x, _) or x, None | _, Some x or [] | [_] and so on. These are not so easily translated.
Finally, I'd mention that Standard ML is a programming language related to F# and it does provide operators called #1 etc. to extract the first element of a tuple with an arbitrary number of elements (see here) so this idea was implemented and discarded decades ago. I believe this is because SML's #n notation culminates in incomprehensible error messages within the constraints of the type system. For example, a function that uses #n is not making it clear what the arity of the tuple is but functions cannot be generic over tuple arity so this must result in an error message saying that you must give more type information but many users found that confusing. With the CAML/OCaml/F# approach there is no such confusion.
The let-binding you've given is an example of a language facility called pattern matching, which can be used to destructure many types, not just tuples. In pattern matches, underscores are the idiomatic way to express that you won't refer to a value.
Directly accessing the elements of a tuple can be more concise, but it's less general. Pattern matching allows you to look at the structure of some data and dispatch to an approprate handling case.
match x with
| (x, _, 20) -> x
| (_, y, _) -> y
This pattern match will return the first item in x only if the third element is 20. Otherwise it returns the second element. Once you get beyond trivial cases, the underscores are an important readability aid. Compare the above with:
match x with
| (x, y, 20) -> x
| (x, y, z) -> y
In the first code sample, it's much easier to tell which bindings you care about in the pattern.
Sometimes a method will return multiple values but the code you're writing is only interested in a select few (or one) of them. You can use multiple underscores to essentially ignore the values you don't need, rather than having a bunch of variables hanging around in local scope.

What is the order of evaluation in F#?

I was reading this and now wonder: what is the evaluation order in F#?
Obviously ; makes effects happen in a sequential fashion. But what about things like function calls or applications, order of evaluation for operators, and the like.
I've glanced at the F# spec, but there is no mention of that. Thanks for any insight!
I found some emails where we fixed the implementation to have a rigid application order. The code
open System
let f a =
Console.WriteLine "app1";
fun b ->
Console.WriteLine "app2";
()
(Console.WriteLine "f"; f) (Console.WriteLine "arg1") (Console.WriteLine "arg2")
will print "f", "arg1", "arg2", "app1", "app2". However this didn't make it into the spec. I'll file a spec bug.
(Some other portions of the spec are already more explicit, e.g.
6.9.6 Evaluating Method Applications
For elaborated applications of methods, the elaborated form of the expression will be either expr.M(args) or M(args).
The (optional) expr and args are evaluated in left-to-right order and the body of the member is evaluated in an environment with formal parameters that are mapped to corresponding argument values.
If expr evaluates to null then NullReferenceException is raised.
If the method is a virtual dispatch slot (that is, a method that is declared abstract) then the body of the member is chosen according to the dispatch maps of the value of expr.
That said, some experts believe that you will live a longer, happier life if you do not rely on evaluation order. :) )
(Possibly see also
http://blogs.msdn.com/ericlippert/archive/2009/11/19/always-write-a-spec-part-one.aspx
http://blogs.msdn.com/ericlippert/archive/2009/11/23/always-write-a-spec-part-two.aspx
for more on how easy it is to screw things up with evaluation order.)

Resources