Related
I'm reading some yacc and lex related stuff and some other compiler implementations, it seems like they are all using global state and hence really unsafe to use multithreaded situations so it is hard to embed them in other programs. I know GNU Bison and Flex could be used for re-entrancy but why are they not on by default?
Because when the interface for Lex and Yacc was defined, many many many years ago, the use of globals was much more common. Reentrancy changes the interface, and the reentrant interfaces have never been formally standardised (which is probably just as well, given the state of play). At the time, multithreading was not very common, largely because a typical computer just barely had the resources to do one compilation (and sometimes not even that; it was also pretty common for compilation passes to be sequentially loaded executables).
So the default continues to be the non-reentrant, standardised interface. And it probably will remain that way, whether or not we like it.
Aside from using a different scripting language, it seems that the main appeal of node.js is it's support for event-driven programming which makes it easier to write scalable servers (or other typically I/O bound applications) due to its simplified non-blocking I/O calls. However, this feature comes at the expense of having to learn a new programming model which essentially requires you to pass callback after callback function making some straightforward tasks (e.g. dependent sequences of actions) a bit more complicated.
Contrast that programming model to the traditional one of Ruby on Rails which blocks on all I/O operations and is (effectively) single-threaded (due to MRI's green thread implementation).
Just dreaming out loud here, it seems that it should be possible to implement a Ruby (or Rails) runtime which reconciles these models by trapping I/O calls, transparently replacing them with their non-blocking version, storing the current continuation and calling it when the I/O operation is complete. This way you would get the familiar, procedural programming style and the benefits of the event-driven/asynchronous/callback model.
Is such a runtime (or runtime translator) technically possible? Are there web frameworks that do something like this already?
Yes.
There are two possibilities for doing asynchronous but imperative programming
Use a real asynchronous language:
Erlang would be an example where you can write imperative do this, do that code and it translates to async. I don't think it goes all the way though.
Use a compiler
You can use a compiler that converts blocking style code into non-blocking code. I personally highly recommend against this because it's a black box and a nightmare to debug.
One example would be storm
However, this feature comes at the expense of having to learn a new programming model which essentially requires you to pass callback after callback function making some straightforward tasks (e.g. dependent sequences of actions) a bit more complicated.
I however recommend that you bite the bullet and make the paradigm switch. This will be a far better investment in the long-run. Mind you it's not neccessary to use node.js, there are strong alternatives like erlang and haskell out there.
Thanks to #igorw, the async-rails project is what I was imagining.
But as #Raynos and #apneadiving point out, there are potentially better solutions such as Ruby EventMachine and stormjs.
I've been checking out the Nitrogen Project which is supposed to be the most mature web development framework for Erlang.
Erlang, as a language, is extremely impressive. However, with regards to Nitrogen, what I am not too keen about is using Erlang's rather uncommon syntax (unless you're native in PROLOG) to build UIs.
What is your experience with it as opposed to other mainstream web frameworks such as Django or Rails?
I've done very little with Nitrogen so far, but I've been monitoring the mailing list for months, so I think I have something useful to say about it.
To your concern about the syntax of Erlang and the Nitrogen framework, I'd respond that that sounds like a pure case of unfamiliarity, rather than unsuitability. Objectively, HTML is not a beautiful language, and it has plenty of quirks. You're used to this now, so it doesn't seem so bad. Give Nitrogen/Erlang a chance and you may find that you get used to it soon enough, too.
To your question about comparison to other languages and frameworks, I'd say the biggest difference is that with Nitrogen, the entire web site is being served directly by the Erlang runtime. Ruby on Rails has such a mode, but it's intended only for testing. Many other frameworks don't even offer the option of running everything within a single long-running process.
Running the entire web application and its underlying infrastructure within a single long-running process has significant implications on how the site runs:
With Apache, each child gets killed off every N connections, where N=500 or so, and you can't say whether a given child will always handle all of a given client's requests. Because HTTP is stateless but web apps almost always require some client state, an Apache child must rebuild its view of client state as part of handling a new connection. By default, this means going back to disk for persistent data stored about that client. There are alternatives like memcached, but these aren't built into the core of a LAMP type stack. With Erlang, nothing is killed off periodically, and Erlang offers standard facilities like Mnesia which provide disk-backed in-memory DBs.
Incidentally, if you're familiar with nginx, it's built on the same principles as Erlang, and it's fast for the same reason. The main difference between nginx and an Erlang instance running a web server is that nginx isn't a programming environment, so it still has to delegate a lot of processing to outside code. That means it shares the same IPC and persistent state problems as Apache.
Because the runtime stays up continuously and is a fully-functional programming environment, you can probably build more parts of your system in Erlang than with a lashed-together LAMP type stack. This magnifies the above benefits. The various parts of your system can coordinate via message passing and Mnesia instead of heavyweight IPC and MySQL, and all the pieces stay up and running continually, leading to less time-consuming state reconstruction.
A dozen or so Apache children all accessing the persistent client state data store is a lock-based hairball. The frameworks all handle locking and such for you transparently, but what they can't hide is the time it takes to do all this correctly.
Erlang is an impure functional language, which implies but does not require data purity; it is also built with multiprocessing in mind, going clear down to the core of the runtime design. These two facts mean you're less likely to spend time waiting on locks in an Erlang based server than one naively built on one of the other frameworks. It is certainly possible to optimize away lock delays in the other systems, but is that really what you want to be doing? Do you want to be on the thousandth team that has to learn how to optimize its web stack after the service becomes popular, or would you rather leave it all up to the tooling so you can spend your time doing something no one else has done yet?
I, too, was once concerned about clunky Erlang syntax. I've built a couple of tools to alleviate its annoyances for everyday web programming, and perhaps you will find one or both of them helpful:
ErlyDTL is an Erlang implementation of the Django Template Language; it's not available in Nitrogen, but it is available in other frameworks, such as Zotonic, Erlang Web, BeepBeep, and Chicago Boss
Chicago Boss is a full-stack Erlang framework that does a lot of code generation so that you can access data fields with function calls instead of Erlang's rather verbose record syntax (e.g. Person:name() instead of Person#person.name)
Note that Nitrogen does not include a database layer, so it's not really comparable to Rails or Django. For a comprehensive comparison of the database-driven frameworks, check out my answer to this StackOverflow question:
https://stackoverflow.com/questions/1822518/current-state-of-erlang-web-development-frameworks-template-languages/2898271#2898271
I would check out Webmachine if I were you. It is quite simple, fast, and leaves the interface up to you.
Erlang Web should also be considered mature. It is an MVC framework, whereas Nitrogen is more event based. It's a matter of preference.
I haven't used the other tools mentioned here except Webmachine, which I think it's a wonderful tool, but it is not a web framework like the others. It is as HTTP processor, and is ideal for building a restful interfaces.
I would also suggest you give the Erlang syntax a chance. Erlang is one of my favourite languages to use.
I keep seeing "bootstrapping" mentioned in discussions of application development. It seems both widespread and important, but I've yet to come across even a poor explanation of what bootstrapping actually is; rather, it seems as though everyone is just supposed to know what it means. I don't, though. Near as I can figure, it has something to do with initialization tasks required of an application upon launch, but I could be completely wrong about that. Can anyone help me to understand this idea?
"Bootstrapping" comes from the term "pulling yourself up by your own bootstraps." That much you can get from Wikipedia.
In computing, a bootstrap loader is the first piece of code that runs when a machine starts, and is responsible for loading the rest of the operating system. In modern computers it's stored in ROM, but I recall the bootstrap process on the PDP-11, where you would poke bits via the front-panel switches to load a particular disk segment into memory, and then run it. Needless to say, the bootstrap loader is normally pretty small.
"Bootstrapping" is also used as a term for building a system using itself -- or more correctly, a predecessor version. For example, ANTLR version 3 is written using a parser developed in ANTLR version 2.
An example of bootstrapping is in some web frameworks. You call index.php (the bootstrapper), and then it loads the frameworks helpers, models, configuration, and then loads the controller and passes off control to it.
As you can see, it's a simple file that starts a large process.
The term "bootstrapping" usually applies to a situation where a system depends on itself to start, sort of a chicken and egg problem.
For instance:
How do you compile a C compiler written in C?
How do you start an OS initialization process if you don't have the OS running yet?
How do you start a distributed (peer-to-peer) system where the clients depend on their currently known peers to find out about new peers in the system?
In that case, bootstrapping refers to a way of breaking the circular dependency, usually with the help of an external entity, e.g.
You can use another C compiler to compile (bootstrap) your own compiler, and then you can use it to recompile itself
You use a separate piece of code that sets up the initial process without depending on any functions provided by the OS
You use a hard-coded list of initial peers or a hard-coded tracker URL that supplies the peer list
etc.
See on the Wikipedia article on bootstrapping.
There is a section and links explaining what it means in Computing. It has four different uses in the field.
Here are some quotes, but for a more in depth explanation, and alternative meanings, consult the links above.
"...is a technique by which a simple computer program activates a more complicated system of programs."
"A different use of the term bootstrapping is to use a compiler to compile itself, by first writing a small part of a compiler of a new programming language in an existing language to compile more programs of the new compiler written in the new language."
In the context of application development, "bootstrapping" usually comes up when talking about modular and/or auto-updatable software.
Rather than the user downloading the entire app, including features he does not need, and re-downloading and manually updating it whenever there is an update, the user only downloads and starts a small "bootstrap" executable, which in turn downloads and installs those parts of the application that the user needs. Additionally, the bootstrap component is able to look for updates and install them each time it is started.
Alex, it's pretty much what your computer does when it boots up. ('Booting' a computer actually comes from the word bootstrapping)
Initially, the small program in your BIOS runs. That contains enough machine code to load and run a larger, more complex program.
That second program is probably something like NTLDR (in Windows) or LILO (in Linux), which then executes and is able to load, then run, the rest of the operating system.
For completeness, it is also a rather important (and relatively new) method in statistics that uses resampling / simulation to infer population properties from a sample. It has its own lengthy Wikipedia article on bootstrapping (statistics).
Boot strapping the dictionary meaning is to start up with minimum resources. In the Context of an OS the OS should be able to swiftly load once the Power On Self Test (POST) determines that its safe to wake up the CPU. The boot strap code will be run from the BIOS. BIOS is a small sized ROM. Generally it is a jump instruction to the set of instructions which will load the Operating system to the RAM. The destination of the Jump is the Boot sector in the Hard Disk. Once the bios program checks it is a valid Boot sector which contains the starting address of the stored OS, ie whether it is a valid MBR (Master Boot Record) or not. If its a valid MBR the OS will be copied to the memory (RAM)from there on the OS takes care of Memory and Process management.
As the question is answered. For web develoment.
I came so far and found a good explanation about bootsrapping in Laravel doc. Here is the link
In general, we mean registering things, including registering service
container bindings, event listeners, middleware, and even routes.
hope it will help someone who learning web application development.
Bootstrapping has yet another meaning in the context of reinforcement learning that may be useful to know for developers, in addition to its use in software development (most answers here, e.g. by kdgregory) and its use in statistics as discussed by Dirk Eddelbuettel.
From Sutton and Barto:
Widrow, Gupta, and Maitra (1973) modified the Least-Mean-Square (LMS)
algorithm of Widrow and Hoff (1960) to produce a reinforcement
learning rule that could learn from success and failure signals
instead of from training examples. They called this form of learning
“selective bootstrap adaptation” and described it as “learning with a
critic” instead of “learning with a teacher.” They analyzed this rule
and showed how it could learn to play blackjack. This was an isolated
foray into reinforcement learning by Widrow, whose contributions to
supervised learning were much more influential.
The book describes various reinforcement algorithms where the target value is based on a previous approximation as bootstrap methods:
Finally, we note
one last special property of DP [Dynamic Programming] methods. All of them update estimates
of the values of states based on estimates of the values of successor
states. That is, they update estimates on the basis of other
estimates. We call this general idea bootstrapping. Many reinforcement
learning methods perform bootstrapping, even those that do not
require, as DP requires, a complete and accurate model of the
environment.
Note that this differs from bootstrap aggregating and intelligence explosion that is mentioned on the wikipedia page on bootstrapping.
I belong to the generation who flipped switches to enter a boot program. In the early 1980s, I worked on a microcomputer called Micro-78, developed by Electronics Corporation of India Ltd (ECIL). It was a sort of clone of Altair 8800. I distinctly remember what happens when a small boot program was entered using the toggle switches and executed by pressing a button. The program reads a second boot program contained in the 1st track of the floppy disk and overwrites it on itself in such a way that the second boot program starts executing to load a disk operating system. I think the term "bootstrap" refers to this process of the first boot program reading and overwriting the second boot program on itself, in a way "pulling itself up" with the additional functionality of the second boot program. That may be the origin of the original meaning of "the bootstrap program".
IMHO there is not a better explanation than the fact about How was the first compiler written?
These days the Operating System loading is the most common process referred as Bootstrapping
In terms of it in regards to using the popular Twitter Bootstrap I feel like this type of bootstrapping is the action of integrating a modular component into a Web application without the Web application having to even acknowledge the modular component exists until it needs it or references it.
The developer can seamlessly integrate a default copy of the CSS Twitter Bootstrap theme by simply loading (referencing) it into the Web application. Vuola! Then you may need to override some of these changes, but you can do so in such a way that the resource/component is untouched and completely reusable.
This same concept is how Web Devs implement jQuery APIs and so on, but it's not really expressed by Devs as bootstrapping per se. What it does is it improves flexibility and reusability while allowing the isolation of different components/resources of an app to reside freely either on the same server/s or possibly on a CDN.
NOTE: In computing bootstrapping deals with the MBR and in UNIX it requires a special bootloader or manager which is a small program in ROM that loads the OS into RAM. If you think about it the same concept takes places in the action of the bootstrap loader checking the MBR and loading the OS based on this table which occurs without the OS having any idea that this takes place.
Bootstrap file is responsible for loading contents of main file. It is a wrapper around main file. This way we can catch errors if loading of file was unsuccessful for some reason.
As a humble beginner in the world of programming, and flicking through all the answers here after seeing this word used a lot in apparently slightly different ways in different places, I found reading the Wikipedia page on Bootstrapping (duh! I didn't think of it either at first) is very informative to understand differences in use of this word. Could it be......on extremely rare occasions......Wikipedia might even have better explanations of certain terms than....(redacted)? Will they bring in rep points on Wikipedia though?
To me, it seems all the meanings something to do with: start with something as simple as possible Thing1, make something slightly more complex with that Thing2, and now you can use Thing2 to do some kind of tasks more efficiently and quickly than you could originally with Thing1. Then repeat from Thing2 to Thing 3 ad infinitum...
I see it as closely connected to both biological evolution and 'Layers of Abstraction' (newbies like me see, ahem, Wikipedia, cough) - the evolution from 1940's computers with switches, machine code, Assembly, C, Python, AIs you can give all kinds of complex instructions to like "make the %4^% dinner to my default &^$% requirements and clean the floor you %$£"#:~" in drunken slang English or Amazon tribal dialect without them 'raising an exception' (for newbies again...you guessed it) - missed out lot of links there due to simple ignorance.
Then in certain specific software meanings:
Meaning1: Thing1 is used to load latest version of Thing2 (because of course Thing2 will be bigger than Thing1, just as Thing3 will be be bigger than Thing2).
Meaning2: Thing1 is a lower level language (closer to 1001011100....011001 than print("Hello, ", user.name)) used to write a little bit of the higher language of Thing2, then this little bit of Thing2 is used to expand Thing2 itself from baby vocabulary level towards adult vocabulary level (Thing2 starts to be processed, or to use correct technical term 'compiled', by the baby version of itself (it's a clever baby!), whereas the baby version of Thing2 itself could of course only be compiled by Thing1, cause it can't exist before it exists, right duh!), then child version of Thing2 compiles Surly Teenager version of Thing2, at which point programming community decides whether Surly Teenager's 'issues' (software term and metaphor term!) are worth spending enough time resolving to be accepted long term, or to abandon them to (not sure where to take the analogy here).
If yes, then Thing2 has 'Bootstrapped' itself (possibly a few times) from babyhood to adulthood: "the child is the father of the man" (Wordsworth, suggest don't try looking up the quote or the author on Stack Overflow).
I've read somewhere that functional programming is suitable to take advantage of multi-core trend in computing. I didn't really get the idea. Is it related to the lambda calculus and von neumann architecture?
Functional programming minimizes or eliminates side effects and thus is better suited to distributed programming. i.e. multicore processing.
In other words, lots of pieces of the puzzle can be solved independently on separate cores simultaneously without having to worry about one operation affecting another nearly as much as you would in other programming styles.
One of the hardest things about dealing with parallel processing is locking data structures to prevent corruption. If two threads were to mutate a data structure at once without having it locked perfectly, anything from invalid data to a deadlock could result.
In contrast, functional programming languages tend to emphasize immutable data. Any state is kept separate from the logic, and once a data structure is created it cannot be modified. The need for locking is greatly reduced.
Another benefit is that some processes that parallelize very easily, like iteration, are abstracted to functions. In C++, You might have a for loop that runs some data processing over each item in a list. But the compiler has no way of knowing if those operations may be safely run in parallel -- maybe the result of one depends on the one before it. When a function like map() or reduce() is used, the compiler can know that there is no dependency between calls. Multiple items can thus be processed at the same time.
I've read somewhere that functional programming is suitable to take advantage of multi-core trend in computing... I didn't really get the idea. Is it related to the lambda calculus and von neumann architecture?
The argument behind the belief you quoted is that purely functional programming controls side effects which makes it much easier and safer to introduce parallelism and, therefore, that purely functional programming languages should be advantageous in the context of multicore computers.
Unfortunately, this belief was long since disproven for several reasons:
The absolute performance of purely functional data structures is poor. So purely functional programming is a big initial step in the wrong direction in the context of performance (which is the sole purpose of parallel programming).
Purely functional data structures scale badly because they stress shared resources including the allocator/GC and main memory bandwidth. So parallelized purely functional programs often obtain poor speedups as the number of cores increases.
Purely functional programming renders performance unpredictable. So real purely functional programs often see performance degradation when parallelized because granularity is effectively random.
For example, the bastardized two-line quicksort often cited by the Haskell community typically runs thousands of times slower than a real in-place quicksort written in a more conventional language like F#. Moreover, although you can easily parallelize the elegant Haskell program, you are unlikely to see any performance improvement whatsoever because all of the unnecessary copying makes a single core saturate the entire main memory bandwidth of a multicore machine, rendering parallelism worthless. In fact, nobody has ever managed to write any kind of generic parallel sort in Haskell that is competitively performant. The state-of-the-art sorts provided by Haskell's standard library are typically hundreds of times slower than conventional alternatives.
However, the more common definition of functional programming as a style that emphasizes the use of first-class functions does actually turn out to be very useful in the context of multicore programming because this paradigm is ideal for factoring parallel programs. For example, see the new higher-order Parallel.For function from the System.Threading.Tasks namespace in .NET 4.
When there are no side effects the order of evaluation does not matter. It is then possible to evaluate expressions in parallel.
The basic argument is that it is difficult to automatically parallelize languages like C/C++/etc because functions can set global variables. Consider two function calls:
a = foo(b, c);
d = bar(e, f);
Though foo and bar have no arguments in common and one does not depend on the return code of the other, they nonetheless might have dependencies because foo might set a global variable (or other side effect) which bar depends upon.
Functional languages guarantee that foo and bar are independant: there are no globals, and no side effects. Therefore foo and bar could be safely run on different cores, automatically, without programmer intervention.
All the answers above go to the key idea that "no shared mutable storage" is a key enabler to execute pieces of a program in parallel. It does not really solve the equally hard problem of finding things to execute in parallel. But the typical clearer expressions of functionality in functional languages do make it theoretically easier to extract parallelism from a sequential expression.
In practice, I think the "no shared mutable storage" property of languages based on garbage collection and copy-on-change semantics make them easier to add threads to. The best example is probably Erlang, that combines near-functional semantics with explicit threads.
This is a little bit of a vague question. One perk of multi-core CPUs is that you can run a functional program and let it plug away serially without worrying about affecting any computing going on that has to do with other functions the machine is carrying out.
The difference between a multi-U server and a multi-core CPU in a server or PC is the speed savings you get by having it on the same BUS, allowing better and faster communication to the cores.
edit: I should probably qualify this post by saying that in most of the scripting I do, with or without multiple cores, I rarely see a problem in getting my data through hackish parallelizing, such as running multiple small scripts at once in my script so I'm not slowed down by things like waiting for URLs to load and what not.
double edit: Furthermore, a lot of functional programming languages have had forked parallel variants for decades. These better utilize parallel computation with some speed improvement, but they never really caught on.
Omitting any technical/scientific terms the reason is because functional program doesn't share data. Data is copied and transfered among functions, thus there is no shared data in the application.
And shared data is what causes half the headaches with multithreading.
The book Programming Erlang: Software for a Concurrent World by Joe Armstrong (the creator of Erlang) talks quite a bit about using Erlang for multicore(/multiprocessor) systems. As the wikipedia article states:
Creating and managing processes is trivial in Erlang, whereas threads are considered a complicated and error-prone topic in most languages. Though all concurrency is explicit in Erlang, processes communicate using message passing instead of shared variables, which removes the need for locks.