How to guarantee referential transparency in F# applications?

How to guarantee referential transparency in F# applications? - f#

So I'm trying to learn FP and I'm trying to get my head around referential transparency and side effects.
I have learned that making all effects explicit in the type system is the only way to guarantee referential transparency:
The idea of “mostly functional programming” is unfeasible. It is impossible to make imperative
programming languages safer by only partially removing implicit side effects. Leaving one kind of effect is often enough to simulate the very effect you just tried to remove. On the other hand, allowing effects to be “forgotten” in a pure language also causes mayhem in its own way.
Unfortunately, there is no golden middle, and we are faced with a classic dichotomy: the curse of the excluded middle, which presents the choice of either (a) trying to tame effects using purity annotations, yet fully embracing the fact that your code is still fundamentally effectful; or (b) fully embracing purity by making all effects explicit in the type system and being pragmatic - Source
I have also learned that not-pure FP languages like Scala or F# cannot guarantee referential transparency:
The ability to enforce referential transparency this is pretty much incompatible with Scala's goal of having a class/object system that is interoperable with Java. - Source
And that in not-pure FP it is up to the programmer to ensure referential transparency:
In impure languages like ML, Scala or F#, it is up to the programmer to ensure referential transparency, and of course in dynamically typed languages like Clojure or Scheme, there is no static type system to enforce referential transparency. - Source
I'm interested in F# because I have a .Net background so my next questions is:
What can I do to guarantee referential transparency in an F# applications if it is not enforced by the F# compiler?

The short answer to this question is that there is no way to guarantee referential transparency in F#. One of the big advantages of F# is that it has fantastic interop with other .NET languages but the downside of this, compared to a more isolated language like Haskell, is that side-effects are there and you will have to deal with them.
How you actually deal with side effects in F# is a different question entirely.
There is actually nothing to stop you from bringing effects into the type system in F# in very much the same way as you might in Haskell although effectively you are 'opting in' to this approach rather than it being enforced upon you.
All you really need is some infrastructure like this:
/// A value of type IO<'a> represents an action which, when performed (e.g. by calling the IO.run function), does some I/O which results in a value of type 'a.
type IO<'a> =
private
|Return of 'a
|Delay of (unit -> 'a)
/// Pure IO Functions
module IO =
/// Runs the IO actions and evaluates the result
let run io =
match io with
|Return a -> a
|Delay (a) -> a()
/// Return a value as an IO action
let return' x = Return x
/// Creates an IO action from an effectful computation, this simply takes a side effecting function and brings it into IO
let fromEffectful f = Delay (f)
/// Monadic bind for IO action, this is used to combine and sequence IO actions
let bind x f =
match x with
|Return a -> f a
|Delay (g) -> Delay (fun _ -> run << f <| g())
return brings a value within IO.
fromEffectful takes a side-effecting function unit -> 'a and brings it within IO.
bind is the monadic bind function and lets you sequence effects.
run runs the IO to perform all of the enclosed effects. This is like unsafePerformIO in Haskell.
You could then define a computation expression builder using these primitive functions and give yourself lots of nice syntactic sugar.
Another worthwhile question to ask is, is this useful in F#?
A fundamental difference between F# and Haskell is that F# is an eager by default language while Haskell is lazy by default. The Haskell community (and I suspect the .NET community, to a lesser extent) has learnt that when you combine lazy evaluation and side-effects/IO, very bad things can happen.
When you work in the IO monad in Haskell, you are (generally) guaranteeing something about the sequential nature of IO and ensuring that one piece of IO is done before another. You are also guaranteeing something about how often and when effects can occur.
One example I like to pose in F# is this one:
let randomSeq = Seq.init 4 (fun _ -> rnd.Next())
let sortedSeq = Seq.sort randomSeq
printfn "Sorted: %A" sortedSeq
printfn "Random: %A" randomSeq
At first glance, this code might appear to generate a sequence, sort the same sequence and then print the sorted and unsorted versions.
It doesn't. It generates two sequences, one of which is sorted and one of which isn't. They can, and almost certainly do, have completely distinct values.
This is a direct consequence of combining side effects and lazy evaluation without referential transparency. You could gain back some control by using Seq.cache which prevents repeat evaluation but still doesn't give you control over when, and in what order, effects occur.
By contrast, when you're working with eagerly evaluated data structures, the consequences are generally less insidious so I think the requirement for explicit effects in F# is vastly reduced compared to Haskell.
That said, a large advantage of making all effects explicit within the type system is that it helps to enforce good design. The likes of Mark Seemann will tell you that the best strategy for designing robust a system, whether it's object oriented or functional, involves isolating side-effects at the edge of your system and relying on a referentially transparent, highly unit-testable, core.
If you are working with explicit effects and IO in the type system and all of your functions are ending up being written in IO, that's a strong and obvious design smell.
Going back to the original question of whether this is worthwhile in F# though, I still have to answer with a "I don't know". I have been working on a library for referentially transparent effects in F# to explore this possibility myself. There is more material there on this subject as well as a much fuller implementation of IO there, if you are interested.
Finally, I think it's worth remembering that the Curse of the Excluded Middle is probably targeted at programming language designers more than your typical developer.
If you are working in an impure language, you will need to find a way of coping with and taming your side effects, the precise strategy which you follow to do this is open to interpretation and what best suits the needs of yourself and/or your team but I think that F# gives you plenty of tools to do this.
Finally, my pragmatic and experienced view of F# tells me that actually, "mostly functional" programming is still a big improvement over its competition almost all of the time.

I think you need to read the source article in an appropriate context - it is an opinion piece coming from a specific perspective and it is intentionally provocative - but it is not a hard fact.
If you are using F#, you will get referential transparency by writing good code. That means writing most logic as a sequence of transformations and performing effects to read the data before running the transformations & running effects to write the results somewhere after. (Not all programs fit into this pattern, but those that can be written in a referentially transparent way generally do.)
In my experience, you can live perfectly happily in the "middle". That means, write referentially transparent code most of the time, but break the rules when you need to for some practical reason.
To respond to some of the specific points in the quotes:
It is impossible to make imperative programming languages safer by only partially removing implicit side effects.
I would agree it is impossible to make them "safe" (if by safe we mean they have no side-effects), but you can make them safer by removing some side effects.
Leaving one kind of effect is often enough to simulate the very effect you just tried to remove.
Yes, but simulating effect to provide theoretical proof is not what programmers do. If it is sufficiently discouraged to achieve the effect, you'll tend to write code in other (safer) ways.
I have also learned that not-pure FP languages like Scala or F# cannot guarantee referential transparency:
Yes, that's true - but "referential transparency" is not what functional programming is about. For me, it is about having better ways to model my domain and having tools (like the type system) that guide me along the "happy path". Referential transparency is one part of that, but it is not a silver bullet. Referential transparency is not going to magically solve all your problems.

Like Mark Seemann has confirmed in the comments "Nothing in F# can guarantee referential transparency. It's up to the programmer to think about this."
I have been doing some search online and I found that "discipline is your best friend" and some recommendations to try to keep the level of referential transparency in your F# applications as high as possible:
Don't use mutable, for or while loops, ref keywords, etc.
Stick with purely immutable data structures (discriminated union, list, tuple, map, etc).
If you need to do IO at some point, architect your program so that they are separated from your purely functional code. Don't forget functional programming is all about limiting and isolating side-effects.
Algebraic data types (ADT) AKA "discriminated unions" instead of objects.
Learning to love laziness.
Embracing the Monad.

Related

Why are mutables allowed in F#? When are they essential?

Coming from C#, trying to get my head around the language.
From what I understand one of the main benefits of F# is that you ditch the concept of state, which should (in many cases) make things much more robust.
If this is the case (and correct me if it's not), why allow us to break this principle with mutables? To me it feels like it they don't belong in the language. I understand you don't have to use them, but it gives you the tools to go off track and think in an OOP manner.
Can anyone provide an example of where a mutable value is essential?

Current compilers for declarative (stateless) code are not very smart. This results in lots of memory allocations and copy operations, which are rather expensive. Mutating some property of an object allows to re-use the object in its new state, which is much faster.
Imagine you make a game with 10000 units moving around at 60 ticks a second. You can do this in F#, including collisions with a mutable quad- or octree, on a single CPU core.
Now imagine the units and quadtree are immutable. The compiler would have no better idea than to allocate and construct 600000 units per second and create 60 new trees per second. And this excludes any changes in other management structures. In a real-world use case with complex units, this kind of solution will be too slow.
F# is a multi-paradigm language that enables the programmer to write functional, object-oriented, and, to an extent, imperative programs. Currently, each variant has its valid uses. Maybe, at some point in the future, better compilers will allow for better optimization of declarative programs, but right now, we have to fall back to imperative programming when performance becomes an issue.

Having the ability to use mutable state is often important for performance reasons, among other things.
Consider implementing the API List.take: count : int -> list : 'a list -> 'a list which returns a list consisting of only the first count elements from the input list.
If you are bound by immutability, Lists can only be built up back-to-front. Implementing take then boils down to
Build up result list back-to-front with first count guys from input: O(count)
Reverse that result and return O(count)
The F# runtime, for performance reasons, has the magic special ability to build Lists front-to-back when needed (i.e. to mutate the tail of the last guy to point to a new tail element). The basic algorithm used for List.take is:
Build up result list front-to-back with first count guys from input: O(count)
Return the result
Same asymptotic performance, but in practical terms it's twice as fast to use mutation in this case.
Pervasive mutable state can be a nightmare as code is difficult to reason about. But if you factor your code so that mutable state is tightly encapsulated (e.g. in implementation details of List.take), then you can enjoy its benefits where it makes sense. So making immutability the default, but still allowing mutability, is a very practical and useful feature of the language.

First of all, what makes F# powerful is, in my opinion, not just the immutability by default, but a whole mix of features like: immutability by default, type inference, lightweight syntax, sum (DUs) and product types (tuples), pattern matching and currying by default. Possibly more.
These make F# very functional by default and they make you program in a certain way. In particular they make you feel uncomfortable when you use mutable state, as it requires the mutable keyword. Uncomfortable in this sense means more careful. And that is exactly what you should be.
Mutable state is not forbidden or evil per se, but it should be controlled. The need to explicitly use mutable is like a warning sign making you aware of danger. And good ways how to control it, is using it internally within a function. That way you can have your own internal mutable state and still be perfectly thread-safe because you don't have shared mutable state. In fact, your function can still be referentially transparent even if it uses mutable state internally.
As for why F# allows mutable state; it would be very difficult to write usual real-world code without the possibility for it. For instance in Haskell, something like a random number cannot be done in the same way as it can be done in F#, but rather needs threading through the state explicitly.
When I write applications, I tend to have about 95% of the code base in a very functional style that would be pretty much 1:1 portable to say Haskell without any trouble. But then at the system boundaries or at some performance-critical inner loop mutable state is used. That way you get the best of both worlds.

F# tail recursion and why not write a while loop?

I'm learning F# (new to functional programming in general though used functional aspects of C# for years but let's face it, that's pretty different) and one of the things that I've read is that the F# compiler identifies tail recursion and compiles it into a while loop (see http://thevalerios.net/matt/2009/01/recursion-in-f-and-the-tail-recursion-police/).
What I don't understand is why you would write a recursive function instead of a while loop if that's what it's going to turn into anyway. Especially considering that you need to do some extra work to make your function recursive.
I have a feeling someone might say that the while loop is not particularly functional and you want to act all functional and whatnot so you use recursion but then why is it sufficient for the compiler to turn it into a while loop?
Can someone explain this to me?

You could use the same argument for any transformation that the compiler performs. For instance, when you're using C#, do you ever use lambda expressions or anonymous delegates? If the compiler is just going to turn those into classes and (non-anonymous) delegates, then why not just use those constructions yourself? Likewise, do you ever use iterator blocks? If the compiler is just going to turn those into state machines which explicitly implement IEnumerable<T>, then why not just write that code yourself? Or if the C# compiler is just going to emit IL anyway, why bother writing C# instead of IL in the first place? And so on.
One obvious answer to all of these questions is that we want to write code which allows us to express ourselves clearly. Likewise, there are many algorithms which are naturally recursive, and so writing recursive functions will often lead to a clear expression of those algorithms. In particular, it is arguably easier to reason about the termination of a recursive algorithm than a while loop in many cases (e.g. is there a clear base case, and does each recursive call make the problem "smaller"?).
However, since we're writing code and not mathematics papers, it's also nice to have software which meets certain real-world performance criteria (such as the ability to handle large inputs without overflowing the stack). Therefore, the fact that tail recursion is converted into the equivalent of while loops is critical for being able to use recursive formulations of algorithms.

A recursive function is often the most natural way to work with certain data structures (such as trees and F# lists). If the compiler wants to transform my natural, intuitive code into an awkward while loop for performance reasons that's fine, but why would I want to write that myself?
Also, Brian's answer to a related question is relevant here. Higher-order functions can often replace both loops and recursive functions in your code.

The fact that F# performs tail optimization is just an implementation detail that allows you to use tail recursion with the same efficiency (and no fear of a stack overflow) as a while loop. But it is just that - an implementation detail - on the surface your algorithm is still recursive and is structured that way, which for many algorithms is the most logical, functional way to represent it.
The same applies to some of the list handling internals as well in F# - internally mutation is used for a more efficient implementation of list manipulation, but this fact is hidden from the programmer.
What it comes down to is how the language allows you to describe and implement your algorithm, not what mechanics are used under the hood to make it happen.

A while loop is imperative by its nature. Most of the time, when using while loops, you will find yourself writing code like this:
let mutable x = ...
...
while someCond do
...
x <- ...
This pattern is common in imperative languages like C, C++ or C#, but not so common in functional languages.
As the other posters have said some data structures, more exactly recursive data structures, lend themselves to recursive processing. Since the most common data structure in functional languages is by far the singly linked list, solving problems by using lists and recursive functions is a common practice.
Another argument in favor of recursive solutions is the tight relation between recursion and induction. Using a recursive solution allows the programmer to think about the problem inductively, which arguably helps in solving it.
Again, as other posters said, the fact that the compiler optimizes tail-recursive functions (obviously, not all functions can benefit from tail-call optimization) is an implementation detail which lets your recursive algorithm run in constant space.

What are advantages and disadvantages of "point free" style in functional programming?

I know that in some languages (Haskell?) the striving is to achieve point-free style, or to never explicitly refer to function arguments by name. This is a very difficult concept for me to master, but it might help me to understand what the advantages (or maybe even disadvantages) of that style are. Can anyone explain?

The point-free style is considered by some author as the ultimate functional programming style. To put things simply, a function of type t1 -> t2 describes a transformation from one element of type t1 into another element of type t2. The idea is that "pointful" functions (written using variables) emphasize elements (when you write \x -> ... x ..., you're describing what's happening to the element x), while "point-free" functions (expressed without using variables) emphasize the transformation itself, as a composition of simpler transforms. Advocates of the point-free style argue that transformations should indeed be the central concept, and that the pointful notation, while easy to use, distracts us from this noble ideal.
Point-free functional programming has been available for a very long time. It was already known by logicians which have studied combinatory logic since the seminal work by Moses Schönfinkel in 1924, and has been the basis for the first study on what would become ML type inference by Robert Feys and Haskell Curry in the 1950s.
The idea to build functions from an expressive set of basic combinators is very appealing and has been applied in various domains, such as the array-manipulation languages derived from APL, or the parser combinator libraries such as Haskell's Parsec. A notable advocate of point-free programming is John Backus. In his 1978 speech "Can Programming Be Liberated From the Von Neumann Style ?", he wrote:
The lambda expression (with its substitution rules) is capable of
defining all possible computable functions of all possible types
and of any number of arguments. This freedom and power has its
disadvantages as well as its obvious advantages. It is analogous
to the power of unrestricted control statements in conventional
languages: with unrestricted freedom comes chaos. If one
constantly invents new combining forms to suit the occasion, as
one can in the lambda calculus, one will not become familiar with
the style or useful properties of the few combining forms that
are adequate for all purposes. Just as structured programming
eschews many control statements to obtain programs with simpler
structure, better properties, and uniform methods for
understanding their behavior, so functional programming eschews
the lambda expression, substitution, and multiple function
types. It thereby achieves programs built with familiar
functional forms with known useful properties. These programs are
so structured that their behavior can often be understood and
proven by mechanical use of algebraic techniques similar to those
used in solving high school algebra problems.
So here they are. The main advantage of point-free programming are that they force a structured combinator style which makes equational reasoning natural. Equational reasoning has been particularly advertised by the proponents of the "Squiggol" movement (see [1] [2]), and indeed use a fair share of point-free combinators and computation/rewriting/reasoning rules.
[1] "An introduction to the Bird-Merteens Formalism", Jeremy Gibbons, 1994
[2] "Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire", Erik Meijer, Maarten Fokkinga and Ross Paterson, 1991
Finally, one cause for the popularity of point-free programming among Haskellites is its relation to category theory. In category theory, morphisms (which could be seen as "transformations between objects") are the basic object of study and computation. While partial results allow reasoning in specific categories to be performed in a pointful style, the common way to build, examine and manipulate arrows is still the point-free style, and other syntaxes such as string diagrams also exhibit this "pointfreeness". There are rather tight links between the people advocating "algebra of programming" methods and users of categories in programming (for example the authors of the banana paper [2] are/were hardcore categorists).
You may be interested in the Pointfree page of the Haskell wiki.
The downside of pointfree style is rather obvious: it can be a real pain to read. The reason why we still love to use variables, despite the numerous horrors of shadowing, alpha-equivalence etc., is that it's a notation that's just so natural to read and think about. The general idea is that a complex function (in a referentially transparent language) is like a complex plumbing system: the inputs are the parameters, they get into some pipes, are applied to inner functions, duplicated (\x -> (x,x)) or forgotten (\x -> (), pipe leading nowhere), etc. And the variable notation is nicely implicit about all that machinery: you give a name to the input, and names on the outputs (or auxiliary computations), but you don't have to describe all the plumbing plan, where the small pipes will go not to be a hindrance for the bigger ones, etc. The amount of plumbing inside something as short as \(f,x,y) -> ((x,y), f x y) is amazing. You may follow each variable individually, or read each intermediate plumbing node, but you never have to see the whole machinery together. When you use a point-free style, all the plumbing is explicit, you have to write everything down, and look at it afterwards, and sometimes it's just plain ugly.
PS: this plumbing vision is closely related to the stack programming languages, which are probably the least pointful programming languages (barely) in use. I would recommend trying to do some programming in them just to get of feeling of it (as I would recommend logic programming). See Factor, Cat or the venerable Forth.

I believe the purpose is to be succinct and to express pipelined computations as a composition of functions rather than thinking of threading arguments through. Simple example (in F#) - given:
let sum = List.sum
let sqr = List.map (fun x -> x * x)
Used like:
> sum [3;4;5]
12
> sqr [3;4;5]
[9;16;25]
We could express a "sum of squares" function as:
let sumsqr x = sum (sqr x)
And use like:
> sumsqr [3;4;5]
50
Or we could define it by piping x through:
let sumsqr x = x |> sqr |> sum
Written this way, it's obvious that x is being passed in only to be "threaded" through a sequence of functions. Direct composition looks much nicer:
let sumsqr = sqr >> sum
This is more concise and it's a different way of thinking of what we're doing; composing functions rather than imagining the process of arguments flowing through. We're not describing how sumsqr works. We're describing what it is.
PS: An interesting way to get your head around composition is to try programming in a concatenative language such as Forth, Joy, Factor, etc. These can be thought of as being nothing but composition (Forth : sumsqr sqr sum ;) in which the space between words is the composition operator.
PPS: Perhaps others could comment on the performance differences. It seems to me that composition may reduce GC pressure by making it more obvious to the compiler that there is no need to produce intermediate values as in pipelining; helping make the so-called "deforestation" problem more tractable.

While I'm attracted to the point-free concept and used it for some things, and agree with all the positives said before, I found these things with it as negative (some are detailed above):
The shorter notation reduces redundancy; in a heavily structured composition (ramda.js style, or point-free in Haskell, or whatever concatenative language) the code reading is more complex than linearly scanning through a bunch of const bindings and using a symbol highlighter to see which binding goes into what other downstream calculation. Besides the tree vs linear structure, the loss of descriptive symbol names makes the function hard to intuitively grasp. Of course both the tree structure and the loss of named bindings also have a lot of positives as well, for example, functions will feel more general - not bound to some application domain via the chosen symbol names - and the tree structure is semantically present even if bindings are laid out, and can be comprehended sequentially (lisp let/let* style).
Point-free is simplest when just piping through or composing a series of functions, as this also results in a linear structure that we humans find easy to follow. However, threading some interim calculation through multiple recipients is tedious. There are all kinds of wrapping into tuples, lensing and other painstaking mechanisms go into just making some calculation accessible, that would otherwise be just the multiple use of some value binding. Of course the repeated part can be extracted out as a separate function and maybe it's a good idea anyway, but there are also arguments for some non-short functions and even if it's extracted, its arguments will have to be somehow threaded through both applications, and then there may be a need for memoizing the function to not actually repeat the calculation. One will use a lot of converge, lens, memoize, useWidth etc.
JavaScript specific: harder to casually debug. With a linear flow of let bindings, it's easy to add a breakpoint wherever. With the point-free style, even if a breakpoint is somehow added, the value flow is hard to read, eg. you can't just query or hover over some variable in the dev console. Also, as point-free is not native in JS, library functions of ramda.js or similar will obscure the stack quite a bit, especially with the obligate currying.
Code brittleness, especially on nontrivial size systems and in production. If a new piece of requirement comes in, then the above disadvantages get into play (eg. harder to read the code for the next maintainer who may be yourself a few weeks down the line, and also harder to trace the dataflow for inspection). But most importantly, even something seemingly small and innocent new requirement can necessitate a whole different structuring of the code. It may be argued that it's a good thing in that it'll be a crystal clear representation of the new thing, but rewriting large swaths of point-free code is very time consuming and then we haven't mentioned testing. So it feels that the looser, less structured, lexical assignment based coding can be more quickly repurposed. Especially if the coding is exploratory, and in the domain of human data with weird conventions (time etc.) that can rarely be captured 100% accurately and there may always be an upcoming request for handling something more accurately or more to the needs of the customer, whichever method leads to faster pivoting matters a lot.

To the pointfree variant, the concatenative programming language, i have to write:
I had a little experience with Joy. Joy is a very simple and beautiful concept with lists. When converting a problem into a Joy function, you have to split your brain into a part for the stack plumbing work and a part for the solution in the Joy syntax. The stack is always handled from the back. Since the composition is contained in Joy, there is no computing time for a composition combiner.

Does functional programming avoid state?

According to wikipedia: functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. (emphasis mine).
Is this really true? My personal understanding is that it makes the state more explicit, in the sense that programming is essentially applying functions (transforms) to a given state to get a transformed state. In particular, constructs like monads let you carry the state explicitly through functions. I also don't think that any programming paradigm can avoid state altogether.
So, is the Wikipedia definition right or wrong? And if it is wrong what is a better way to define functional programming?
Edit: I guess a central point in this question is what is state? Do you understand state to be variables or object attributes (mutable data) or is immutable data also state? To take an example (in F#):
let x = 3
let double n = 2 * n
let y = double x
printfn "%A" y
would you say this snippet contains a state or not?
Edit 2: Thanks to everyone for participating. I now understand the issue to be more of a linguistic discrepancy, with the use of the word state differing from one community to the other, as Brian mentions in his comment. In particular, many in the functional programming community (mainly Haskellers) interpret state to carry some state of dynamism like a signal varying with time. Other uses of state in things like finite state machine, Representational State Transfer, and stateless network protocols may mean different things.

I think you're just using the term "state" in an unusual way. If you consider adding 1 and 1 to get 2 as being stateful, then you could say functional programming embraces state. But when most people say "state," they mean storing and changing values, such that calling a function might leave things different than before the function was called, and calling the function a second time with the same input might not have the same result.
Essentially, stateless computations are things like this:
The result of 1 + 1
The string consisting of 's' prepended to "pool"
The area of a given rectangle
Stateful computations are things like this:
Increment a counter
Remove an element from the array
Set the width of a rectangle to be twice what it is now

Rather than avoids state, think of it like this:
It avoids changing state. That word "mutable".
Think in terms of a C# or Java object. Normally you would call a method on the object and you might expect it to modify its internal state as a result of that method call.
With functional programming, you still have data, but it's just passed through each function, creating output that corresponds to the input and the operation.
At least in theory. In reality, not everything you do actually works functionally so you frequently end up hiding state to make things work.
Edit:
Of course, hiding state also frequently leads to some spectacular bugs, which is why you should only use functional programming languages for purely functional situations. I've found that the best languages are the ones that are object oriented and functional, like Python or C#, giving you the best of both worlds and the freedom to move between each as necessary.

The Wikipedia definition is right. Functional programming avoids state. Functional programming means that you apply a function to a given input and get a result. However, it is guaranteed that your input will not be modified in any way. It doesn't mean that you cannot have a state at all. Monads are perfect example of that.

The Wikipedia definition is correct. It may seem puzzling at first, but if you start working with say Haskell, you'll note that you don't have any variables laying around containing values.
The state can still be sort of represented using state monads.

At some point, all apps must have state (some data that changes). Otherwise, you would start the app up, it wouldn't be able to do anything, and the app would finish.
You can have immutable data structures as part of your state, but you will swap those for other instances of other immutable data structures.
The point of immutability is not that data does not change over time. The point is that you can create pure functions that don't have side effects.

Could the compiler automatically parallelize code that support parallelization?

Instead of adding the PLinq extension statement AsParallel() manually could not the compiler figure this out for us auto-magically? Are there any examples where you specifically do not want parallelization if the code supports it?

I do research in the area of automatic parallelization. This is a very tricky topic that is the subject of many Ph.D. dissertations. Static code analysis and automatic parallelization has been done with great success for languages such as Fortran, but there are limits to the compiler's ability to analyze inter-regional dependencies. Most people would not be willing to sacrifice the guarantee of code correctness in exchange for potential parallel performance gains, and so the compiler must be quite conservative in where it inserts parallel markers.
Bottom line: yes, a compiler can parallelize code. But a human can often parallelize it better, and having the compiler figure out where to put the markers can be very, very, very tricky. There are dozens of research papers available on the subject, such as the background work for the Mitosis parallelizing compiler or the D-MPI work.

Auto parallelization is trickier than it might initially appear. It's that "if the code supports it" part that gets you. Consider something like
counter = 0
f(i)
{
counter = counter + 1
return i + counter
}
result = map(f, {1,2,3,4})
If the compiler just decides to parallelize map here, you could get different results on each run of the program. Granted, it is obvious that f doesn't actually support being used in this manner because it has a global variable. However if f is in a different assembly the compiler can't know that it is unsafe to parallelize it. It could introspect the assembly f is in at runtime and decide then, but then it becomes a question of "is the introspection fast enough and the dynamic parallelization fast enough to not negate the benefits from doing this?" And sometimes it may not be possible to introspect, f could be a P/Invoked function, that might actually be perfectly suited to parallelization, but since the runtime/compiler has no way of knowing it has to assume it can't be. This is just the tip of the iceberg.
In short, it is possible, but difficult, so the trade-off between implementing the magic and the benefits of the magic were likely skewed too far in the wrong direction.

The good news is that the compiler researchers say that we are really close to having automatically parallelizing compilers. The bad news is that they have been saying that for fifty years.
The problem with impure languages such as C# is usually that there is not enough parallelism. In an impure language, it is very hard for a human and pretty much impossible for a program to figure out if and how two pieces of code interact with each other or with the environment.
In a pure, referentially transparent, language, you have the opposite problem: everything is parallelizable, but it usually doesn't make sense, because scheduling a thread takes way longer than just simply executing the damn thing.
For example, in a pure, referentially transparent, functional language, if I have something like this:
if a <= b + c
foo
else
bar
end
I could fire up five threads and compute a, b, c, foo and bar in parallel, then I compute the + and then the <= and lastly, I compute the if, which simply means throwing away the result of either foo or bar, which both have already been computed. (Note how this depends on being functional: in an impure language, you cannot simply compute both branches of an if and then throw one away. What if both print something? How would you "unprint" that?)
But if a, b, c, foo and bar are really cheap, then the overhead of those five threads will be far greater than the computations themselves.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart