Why are compilers non-reentrant at the beginning? - parsing

I'm reading some yacc and lex related stuff and some other compiler implementations, it seems like they are all using global state and hence really unsafe to use multithreaded situations so it is hard to embed them in other programs. I know GNU Bison and Flex could be used for re-entrancy but why are they not on by default?

Because when the interface for Lex and Yacc was defined, many many many years ago, the use of globals was much more common. Reentrancy changes the interface, and the reentrant interfaces have never been formally standardised (which is probably just as well, given the state of play). At the time, multithreading was not very common, largely because a typical computer just barely had the resources to do one compilation (and sometimes not even that; it was also pretty common for compilation passes to be sequentially loaded executables).
So the default continues to be the non-reentrant, standardised interface. And it probably will remain that way, whether or not we like it.

Related

If Lisp is the perfect language, why are there so many? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why is the Lisp community so fragmented?
Despite the snarky tone, I'm actually looking for a serious answer.
I know the textbook response: Lisp is a model for computation, not a "language" per se. Still, why exactly are there so many different dialects of Lisp?
Presumably it isn't because of surface syntax issues or crucial missing features, the way it is with so many other languages. But if not that, then what?
Do they interpret that model of computation in slightly different ways? Are they pursuing different simplicity versus efficiency tradeoffs? Is it because of limitations in different compiler/interpreter codebases? Or something else entirely that non-Lispers like myself can't even imagine?
I suppose the followup question would be: if the differences matter, which is the best modern Lisp for real-world usage?
Thanks,
Dr. Ernie
There are a number of reasons for the many dialects of Lisp, some historical, some technical, and some mostly psychological.
Historical: By classical standards, Lisp was fairly slow and used lots of memory. Quite a few people have devised various techniques (or corruptions, if you don't like them) to try to make it more practical. This was especially true when Lisp machines were being built -- the hardware was devised specifically to run Lisp, and at the same time, the Lisp they ran was devised (revised?) specifically to run on that hardware and to take full advantage of its capabilities.
Technical: Some decisions that have been made at times in Lisp were questionable (to put it nicely). For example, all modern Lisps uses lexical scoping, but quite a few early ones used dynamic scoping. Some Scheme users don't think much of the non-hygienic macros in most other Lisp dialects.
Psychological: Lisp is so simple that many people have felt qualified to write their own implementations. Many Lisp programmers are also fond of experimentation and pursuing perfection, so many of those implementations included the implementors idea of improvements of various (usually incompatible) kinds. Nobody was coordinating efforts so many of those extensions/changes were incompatible with each other in various ways, so each became a (more or less) distinct dialect. Some of this was probably avoidable, but some of it wasn't -- just for example, two people might see a particular feature as flawed. One would work at improving it to something he found more acceptable, while another removed it completely, and either considered that an improvement in itself, or possibly devised something completely different to replace it.
Poor communication also often played a role. Somebody at (say) MIT might go somewhere on sabbatical, and take along a tape of some Lisp implementation, which would start to be used wherever they went. That would often (quite unintentionally) fork the implementation, as the two schools did work independently in parallel.
because it's a mudball, it gets picked up and shaped but still remains what it was
Common Lisp is a standard, but there are many other implementations. It shares that trait with many other languages like C, C++, Fortran, Ada, etc. If you look around, you will find that there are many implementations of all these languages, with slightly different flags, options, and support for the corner cases.
Many other languages that are common today were not standards (at least to begin with) and included one conical implementation/compiler/interpreter. I am thinking of languages like Perl, Python, Java, .NET, Ruby, etc. There may be some off shoots of these languages, and ports to new platforms...but overall the syntax and usage of the language is always referenced to the one true implementation.
I use the GNU CLISP for my work...I chose it because it is free, available for the platforms I am interested in, reasonably well documented, and appears to be robust, mature, and complete (at least in terms of the ANSI Lisp Standard). You may have very different requirements for your Lisp environment, and that may lead you to a different choice for your implementation.

Why is F#'s type inference so fickle?

The F# compiler appears to perform type inference in a (fairly) strict top-to-bottom, left-to-right fashion. This means you must do things like put all definitions before their use, order of file compilation is significant, and you tend to need to rearrange stuff (via |> or what have you) to avoid having explicit type annotations.
How hard is it to make this more flexible, and is that planned for a future version of F#? Obviously it can be done, since Haskell (for example) has no such limitations with equally powerful inference. Is there anything inherently different about the design or ideology of F# that is causing this?
Regarding "Haskell's equally powerful inference", I don't think Haskell has to deal with
OO-style dynamic subtyping (type classes can do some similar stuff, but type classes are easier to type/infer)
method overloading (type classes can do some similar stuff, but type classes are easier to type/infer)
That is, I think F# has to deal with some hard stuff that Haskell does not. (Almost certainly, Haskell has to deal with some hard stuff that F# does not.)
As is mentioned by other answers, most of the major .NET languages have the Visual Studio tooling as a major language design influence (see e.g. how LINQ has "from ... select" rather than the SQL-y "select... from", motivated by getting intellisense from a program prefix). Intellisense, error squiggles, and error-message comprehensibility are all tooling factors that inform the F# design.
It may well be possible to do better and infer more (without sacrificing on other experiences), but I don't think it's among our high priorities for future versions of the language. (The Haskellers may see F# type inference as somewhat weak, but they are probably outnumbered by the C#ers who see F# type inference as very strong. :) )
It might also be hard to extend the type inference in a non-breaking fashion; it is ok to change illegal programs into legal ones in a future version, but you have to be very careful to ensure previously-legal programs do not change semantics under new inference rules, and name resolution (an awful nightmare in every language) is likely to interact with type-inference-changes in surprising ways.
I think that the algorithm used by F# has the benefit that it is easy to (at least roughly) explain how it works, so once you understand it, you can have some expectations about the result.
The algorithm will always have some limitations. Currently, it is quite easy to understand them. For more complicated algorithms, this could be difficult. For example, I think you could run into situations where you think that the algorithm should be able to deduce something - but if it was general enough to cover the case, it would be non-decidable (e.g. could keep looping forever).
Another thought on this is that checking the code from the top to the bottom corresponds to how we read code (at least sometimes). So, maybe the fact that we tend to write the code in a way that enables type-inference also makes the code more readable for people...
F# uses one pass compilation such that
you can only reference types or
functions which have been defined
either earlier in the file you're
currently in or appear in a file which
is specified earlier in the
compilation order.
I recently asked Don Syme about making
multiple source passes to improve the
type inference process. His reply was
"Yes, it’s possible to do multi-pass
type inference. There are also
single-pass variations that generate a
finite set of constraints.
However these approaches tend to give
bad error messages and poor
intellisense results in a visual
editor."
http://www.markhneedham.com/blog/2009/05/02/f-stuff-i-get-confused-about/#comment-16153
The short answer is that F# is based on the tradition of SML and OCaml, whereas Haskell comes from a slightly different world of Miranda, Gofer, and the like. The differences in historical tradition are subtle, but pervasive. This distinction is paralleled in other modern languages too, such as the ML-like Coq which has the same ordering restrictions vs the Haskell-like Agda which doesn't.
This difference is related to lazy vs strict evaluation. The Haskell side of the universe believes in laziness, and once you already believe in laziness the idea of adding laziness to things like type inference is a no-brainer. Whereas in the ML side of the universe whenever laziness or mutual recursion is necessary it must be explicitly noted by the use of keywords like with, and, rec, etc. I prefer the Haskell approach because it results in less boilerplate code, but there are a lot of folks who think it's better to make these things explicit.

Grammar/own-written parser?

I'm doing some small projects which involve having different syntaxes for something, however sometimes these syntaxes are so easy that using a parser generator might be overkill.
Now, when should I use a hand-made parser, and when should I use a parser generator?
Thanks,
William van Doorn
There is no hard-and-fast answer, other than "use whatever is easiest for the particular situation".
My experience is that parsers tend to get more complicated over their lifetimes, so using a parser generator up front usually pays off. Even if the language doesn't get more complicated, using a generator forces you to create a formal specification of the syntax, which is itself valuable.
The downsides are that other programmers may not know how to use the generator, so it makes it difficult for others to help out, and it makes your project dependent on that generator.
It's worth coding the parser by hand if, and only if, you're super-keen to have it be extremely fast even on a machine of very modest speed. For example, in this article on the history of Turbo Pascal from before it got its name, you can see how and why the prototype impressed the small (then Danish) firm "Borland" to hire the prototype's author (Anders Hejlsberg), fully develop the compiler, and launch it as its main product, and I quote...:
with no great expectations I hit the
compile key - AND THEN I WAS
COMPLETELY FLOORED! My test program,
that took minutes to compile and link
using Digital Research’s Pascal MT+,
was compiled and running before I
could blink an eye! That was a great
WOW moment!
Turbo Pascal's amazing compile speed -- coming first and foremost from a carefully hand-coded and highly tuned recursive descent parser coded in assembly language -- allowed it to use a very different strategy from most compilers: no separate compilation pass generating object files and libraries, and then a linker to put them together, rather, Turbo Pascal 1.0 was a single-pass compiler that directly turned source code into a single executable binary.
I remember just the same amazing experience on the tiny personal computers of that era (when a Z80, 64K or RAM, and two floppies was a lot;-) -- Turbo Pascal, with its amazing parser and the IDE and everything else, fit comfortably in memory together with a substantial program in both source and compiled form -- no floppies were needed, which meant many orders of magnitude of difference in program turnaround time.
If Hejlsberg had stuck to what was already the traditional wisdom at the time -- always use parser generators -- Turbo Pascal would probably never have emerged as a commercial product, and definitely not achieved the dominance in the Pascal world it enjoyed for years.
Of course, on a typical PC of today, such extreme parsing speed would not be needed for most compilers. Possible exceptions include compilers that must run seamlessly as part of an "interpreter-like" environment (the simple compilers for languages such as Perl and Python are typically hand-coded, to substantial extents, for that reason -- that was an implementation choice that made them viable in the '90s, although today it's not clear it's still needed), or compilers that run on very limited hardware resources, such as smartphones or low-end netbooks.
In the vast majority of cases in which you'll be writing a compiler, none of these performance considerations probably apply, and you'll be happier with a parser generator.
Your question title suggests that using a grammar is optional. It really isn't - even if I was going to implement a tiny language, I'd sketch out a grammar on a single sheet of paper.
As for when to use parser generators, this is really personal preference. Many people believe in hand-writing recursive descent parsers, rather than using the table-driven approach, for example. The important thing is to be comfortable in understanding the capabilities of the generator.
And don't be thinking that using parser generators is somehow the more professional, or even the easier approach. Bjarne Stroustrup when writing the first C++ compiler intended to use recursive descent, but got talked out of it by some keen colleagues at Bell Labs, much to his eventual chagrin. See section 3.3.2 of The Design and Evolution of C++ for more details.

Does F# provide you automatic parallelism?

By this I meant: when you design your app side effects free, etc, will F# code be automatically distributed across all cores?
No, I'm afraid not. Given that F# isn't a pure functional language (in the strictest sense), it would be rather difficult to do so I believe. The primary way to make good use of parallelism in F# is to use Async Workflows (mainly via the Async module I believe). The TPL (Task Parallel Library), which is being introduced with .NET 4.0, is going to fulfil a similar role in F# (though notably it can be used in all .NET languages equally well), though I can't say I'm sure exactly how it's going to integrate with the existing async framework. Perhaps Microsoft will simply advise the use of the TPL for everything, or maybe they will leave both as an option and one will eventually become the de facto standard...
Anyway, here are a few articles on asynchronous programming/workflows in F# to get you started.
http://blogs.msdn.com/dsyme/archive/2007/10/11/introducing-f-asynchronous-workflows.aspx
http://strangelights.com/blog/archive/2007/09/29/1597.aspx
http://www.infoq.com/articles/pickering-fsharp-async
F# does not make it automatic, it just makes it easy.
Yet another chance to link to Luca's PDC talk. Eight minutes starting at 52:20 are an awesome demo of F# async workflows. It rocks!
No, I'm pretty sure that it won't automatically parallelise for you. It would have to know that your code was side-effect free, which could be hard to prove, for one thing.
Of course, F# can make it easier to parallelise your code, particularly if you don't have any side effects... but that's a different matter.
Like the others mentioned, F# will not automatically scale across cores and will still require a framework such as the port of ParallelFX that Josh mentioned.
F# is commonly associated with potential for parallel processing because it defaults to objects being immutable, removing the need for locking for many scenarios.
On purity annotations: Code Contracts have a Pure attribute. I remember hearing the some parts of the BCL already use this. Potentially, this attribute could be used by parallellization frameworks as well, but I'm not aware of such work at this point. Also, I' not even sure how well code contacts are usable from within F#, so a lot of unknowns here.
Still, it will be interesting to see how all this stuff comes together.
No it will not. You must still explicitly marshal calls to other threads via one of the many mechanisms supported by F#.
My understanding is that it won't but Parallel Extensions is being modified to make it consumable by F#. Which won't make it automatically multi-thread it, should make it very easy to achieve.
Well, you have your answer, but I just wanted to add that I think this is the most significant limitation of F# stemming from the fact that it is a hybrid imperative/functional language.
I would like to see some extension to F# that declares a function to be pure. That is, it has no side-effects that are not denoted by the function's type. The idea would be that a function is pure only if it references other "known-pure" functions. Of course, this would only be useful if it were then possible to require that a delegate passed as a function parameter references a pure function.

Is it possible that F# will be optimized more than other .Net languages in the future?

Is it possible that Microsoft will be able to make F# programs, either at VM execution time, or more likely at compile time, detect that a program was built with a functional language and automatically parallelize it better?
Right now I believe there is no such effort to try and execute a program that was built as single threaded program as a multi threaded program automatically.
That is to say, the developer would code a single threaded program. And the compiler would spit out a compiled program that is multi-threaded complete with mutexes and synchronization where needed.
Would these optimizations be visible in task manager in the process thread count, or would it be lower level than that?
I think this is unlikely in the near future. And if it does happen, I think it would be more likely at the IL level (assembly rewriting) rather than language level (e.g. something specific to F#/compiler). It's an interesting question, and I expect that some fine minds have been looking at this and will continue to look at this for a while, but in the near-term, I think the focus will be on making it easier for humans to direct the threading/parallelization of programs, rather than just having it all happen as if by magic.
(Language features like F# async workflows, and libraries like the task-parallel library and others, are good examples of near-term progress here; they can do most of the heavy lifting for you, especially when your program is more declarative than imperative, but they still require the programmer to opt-in, do analysis for correctness/meaningfulness, and probably make slight alterations to the structure of the code to make it all work.)
Anyway, that's all speculation; who can say what the future will bring? I look forward to finding out (and hopefully making some of it happen). :)
Being that F# is derived from Ocaml and Ocaml compilers can optimize your programs far better than other compilers, it probably could be done.
I don't believe it is possible to autovectorize code in a generally-useful way and the functional programming facet of F# is essentially irrelevant in this context.
The hardest problem is not detecting when you can perform subcomputations in parallel, it is determining when that will not degrade performance, i.e. when the subtasks will take sufficiently long to compute that it is worth taking the performance hit of a parallel spawn.
We have researched this in detail in the context of scientific computing and we have adopted a hybrid approach in our F# for Numerics library. Our parallel algorithms, built upon Microsoft's Task Parallel Library, require an additional parameter that is a function giving the estimated computational complexity of a subtask. This allows our implementation to avoid excessive subdivision and ensure optimal performance. Moreover, this solution is ideal for the F# programming language because the function parameter describing the complexity is typically an anonymous first-class function.
Cheers,
Jon Harrop.
I think the question misses the point of the .NET architecture-- F#, C# and VB (etc.) all get compiled to IL, which then gets compiled to machine code via the JIT compiler. The fact that a program was written in a functional language isn't relevant-- if there are optimizations (like tail recursion, etc.) available to the JIT compiler from the IL, the compiler should take advantage of it.
Naturally, this doesn't mean that writing functional code is irrelevant-- obviously, there are ways to write IL which will parallelize better-- but many of these techniques could be used in any .NET language.
So, there's no need to flag the IL as coming from F# in order to examine it for potential parallelism, nor would such a thing be desirable.
There's active research for autoparallelization and auto vectorization for a variety of languages. And one could hope (since I really like F#) that they would concive a way to determine if a "pure" side-effect free subset was used and then parallelize that.
Also since Simon Peyton-Jones the father of Haskell is working at Microsoft I have a hard time not beliving there's some fantastic stuff comming.
It's possible but unlikely. Microsoft spends most of it's time supporting and implementing features requested by their biggest clients. That usually means C#, VB.Net, and C++ (not necessarily in that order). F# doesn't seem like it's high on the list of priorities.
Microsoft is currently developing 2 avenues for parallelisation of code: PLINQ (Pararllel Linq, which owes much to functional languages) and the Task Parallel Library (TPL) which was originally part of Robotics Studio. A beta of PLINQ is available here.
I would put my money on PLINQ becoming the norm for auto-parallelisation of .NET code.

Resources