Understanding Memory Layouts of an Interpreter (JVM/JS)

Understanding Memory Layouts of an Interpreter (JVM/JS) - memory

I am trying to understand the memory layout of a process at OS level and we are used to this diagram.
Forget the multithreading part of the diagram but now for general purposes, we assume the "code" block shown in the diagram above to be the binary instructions of our program. This assumes that the code has already been compiled to now be available in its binary form. But what about interpreted languages e.g. a bytecode to be executed by the JVM interpreter. While I am choosing the JVM interpreter here, my question is for any interpreted language and how does it fit in the diagram shown above. My understanding is that the Interpreter itself is a program and therefore has to sit in the code block shown in the diagram above and the .class program in case of Java or a .js file in case of Javascript interpreters is the "argument" so to speak that this interpreter works upon to translate them to OS/machine understandable code which is then executed. Request your thoughts on this.

It is a matter of perspective whether you would consider bytecode „code“. The terminology is a bit fuzzy.
The „code“ in the diagram is native executable code, i.e. your interpreter. As far as the CPU and operating system are concerned, that is the only code that ever runs. To the OS, the bytecode being interpreted is simply data that the actual native code operates on.
That in this case, the data happens to be a form of instructions as well is a detail the CPU is unaware of and doesn‘t care about.

Related

Understanding how do the coverage modules work

I came across with a new dynamic language. I would like to create a coverage tool for that language. I started reading the source code of Perl 5 and Python coverage modules but it got complicated. It's a dynamic scripting language so I guess that source code of static languages (like Java & C++) won't help me here. Also, as I understand, each language was built in a different way and the same ideas won't work. But, the big concepts could be similar.
My question is as follows: how do I "attack" this task? What is the proper workflow I need to follow? What I need to investigate? Are there any books or blogs I can read about those kind of stuff?

There are two kinds of coverage collection mechanisms:
1) Real-time sampling of the program counter, typically by a clock running at 1-10ms. Difficulties: a) mapping an actual PC value back to a source line, b) sampling means you might not see execution of a rarely used bit of code, so your coverage reporting is inaccurate. Because of these issues, this approach isn't used very often.
2) Instrumenting the program so that it collects coverage as it runs. This is hard to do with object code... a) you have to decode the instructions to see where to put probes, and this can be very hard to do right, b) you have patch the source code to include the probes (this can be really awkward; a "probe" might consist of a 5 byte subroutine call but the probe has replace a single-byte instruction). c) you still have to figure out how to map a probe location back to a source code line. A more effective way is to instrument the source code, which requires pretty sophisticated machinery to read source, make probe patches, and regenerate the instrumented code for execution/compilation.
My technical paper Branch Coverage for Arbitrary Languages Made Easy provides explicit detail for how to do this in a general way. My company has built commercial test coverage tools for a wide variety of languages (C, Python, PHP, COBOL, Java, C++, C#, ProC,....) using this approach. This covers most static and dynamic languages. Some dynamic mechanisms are extremely difficult to instrument, e.g., eval() but that is true of every approach.

In addition to Ira's answer, there is a third coverage collection mechanism: the language implementation provides a callback that can inform you about program events. For example, Python has sys.settrace: you provide it a function, and Python calls your function for every function called or returned, and every line executed.

View code generated by IBM's Enterprise COBOL compiler

I have recently started doing some work with COBOL, where I have only ever done work in z/OS Assembler on a Mainframe before.
I know that COBOL will be translated into Mainframe machine-code, but I am wondering if it is possible to see the generated code?
I want to use this to better understand the under workings of COBOL.
For example, if I was to compile a COBOL program, I would like to see the assembly that results from the compile. Is something like this possible?

Relenting, only because of this: "I want to use this to better understand the under workings of Cobol".
The simple answer is that there is, for Enterprise COBOL on z/OS, a compiler option, LIST. LIST will provide what is known as the "pseudo assembler" output in your compile listing (and some other useful stuff for understanding the executable program). Another compiler option, OFFSET, shows the displacement from the start of the program of the code generated for each COBOL verb. LIST (which inherently has the offset already) and OFFSET are mutually exclusive. So you need to specify LIST and NOOFFSET.
Compiler options can be specified on the PARM of the EXEC PGM= for the compiler. Since the PARM is limited to 100 characters, compiler options can also be specified in a data set, with a DDName of SYSOPTF (which, in turn, you use a compiler option to specify its use).
A third way to specify compiler options is to include them in the program source, using the PROCESS or (more common, since it is shorter) CBL statement.
It is likely that you have a "panel" to compile your programs. This may have a field allowing options to be specified.
However, be aware of a couple of things: it is possible, when installing the compiler, to "nail in" compiler options (which means they can't be changed by the application programmer); it is possible, when installing the compiler, to prevent the use of PROCESS/CBL statements.
The reason for the above is standardisation. There are compiler options which affect code generation, and using different code generation options within the same system can cause unwanted affects. Even across systems, different code generation options may not be desirable if programmers are prone to expect the "normal" options.
It is unlikely that listing-only options will be "nailed", but if you are prevented from specifying options, then you may need to make a special request. This is not common, but you may be unlucky. Not my fault if it doesn't work for you.
This compiler options, and how you can specify them, are documented in the Enterprise COBOL Programming Guide for your specific release. There you will also find the documentation of the pseudo-assembler (be aware that it appears in the document as "pseudo-assembler", "pseudoassembler" and "pseudo assembler", for no good reason).
When you see the pseudo-assembler, you will see that it is not in the same format as an Assembler statement (I've never discovered why, but as far as I know it has been that way for more than 40 years). The line with the pseudo-assembler will also contain the machine-code in the format you are already familiar with from the output of the Assembler.
Don't expect to see a compiled COBOL program looking like an Assembler program that you would write. Enterprise COBOL adheres to a language Standard (1985) with IBM Extensions. The answer to "why does it do it likely that" will be "because", except for optimisations (see later).
What you see will depend heavily on the version of your compiler, because in the summer of 2013, IBM introduced V5, with entirely new code-generation and optimisation. Up to V4.2, the code generator dated back to "ESA", which meant that over 600 machine instructions introduced since ESA were not available to Enterprise COBOL programs, and extended registers. The same COBOL program compiled with V4.2 and with V6.1 (latest version at time of writing) will be markedly different, and not only because of the different instructions, but also because the structure of an executable COBOL program was also redesigned.
Then there's opimisation. With V4.2, there was one level of possible optimisation, and the optimised code was generally "recognisable". With V5+, there are three levels of optimisation (you get level zero without asking for it) and the optimisations are much more extreme, including, well, extreme stuff. If you have V5+, and want to know a bit more about what is going on, use OPT(0) to get a grip on what is happening, and then note the effects of OPT(1) and OPT(2) (and realise, with the increased compile times, how much work is put into the optimisation).
There's not really a substantial amount of official documentation of the internals. Search-engineing will reveal some stuff. IBM's Compiler Cafe:COBOL Cafe Forum - IBM is a good place if you want more knowledge of V5+ internals, as a couple of the developers attend there. For up to V4.2, here may be as good a place as any to ask further specific questions.

Is Dart statically compiled, or is code interpetted at runtime as it's parsed and loaded into the VM?

I'm trying to understand why adding traits to Dart would cause the shape of objects in memory to change, and am therefore curious how it loads in code right now.

Dart is a dynamically typed language that generates its own machine language equivalents straight from source code with no intermediate byte-code step. There is no generic bytecode (like the JVM or llvm) and instead it is directly compiled into machine code.
I would add that despite compiling straight to machine code, the language itself is not designed in a way that would allow a C/C++ style compiler to effectively generate fast efficient code. This is by design as Dart seems to be an attempt to fill the gap between JavaScript and Java rather than the gap between Java and C/C++. Dart addresses many issues that make JavaScript hard to optimize most importantly typing of numeric variables.
There are some efforts to port the Dart environment to various platforms beyond Windows/Mac/Linux but I have yet to see an actual straight to machine language compiler for Dart. That doesn't mean they don't exist, I just haven't seen anything other than ports of the Linux Dart environment onto Beagleboard and other small Linux distros.
From the Dart FAQ
Q. Why didn’t Google build a bytecode VM targetable by multiple
languages including Dart? Each approach has advantages and
disadvantages, but we feel that in the context of Dart it made sense
to build a language-specific VM for the following reasons:
Google already works on a multi-language bytecode: LLVM bitcode in
PNaCl.
Even if a bytecode VM is specialized for Dart, a language VM will be
simpler and faster because it can work under stronger assumptions—for
instance, a structured control flow. These assumptions make the
implementation cleaner and optimizations easier.
A general-purpose bytecode VM would be even larger and slower, as it
generalizes assumptions and adds functionality that for Dart is dead
code: for example, multithreading with a shared heap.
No bytecode VM is truly general-purpose; they all make assumptions
that privilege some class of languages. A language VM leaves more room
to improve the VM and make deep changes to optimization of the
language. Some Dart engineers wrote an article talking about the VM
question in more detail.
A pretty good presentation on Compiling Dart to Efficient
Machine Code

Can't we just write a program in machine level language?

Compiler convert human understandable language into machine level language. Can't we just write a program in machine level language so that it will be easy and quick for a program to execute.

Yes, you can program in assembler under Linux.
Check this and this questions on Stack Overflow, for example. Also, the Linux Assembly HOWTO looks good.

Compiler convert human understandable language into machine level language. Can't we just write a program in machine level language so that it will be easy and quick for a program to execute.
No one writes programs in machine language. Normal binary exectuables are not just machine code anyway, so it would be pointless to try and do this. Binaries contain machine code but include specific, OS dependent formatting. For example, linux uses ELF. This format is understood by the linker and loader (on *nix, the loader is part of the kernel). The only place unadulterated machine code exists is in the system memory.
You can write programs in assembly language, which is very similar to machine language, but then this must be compiled and linked. In other words, it is the same thing as writing a program in any other compiled language.
Finally, creating a binary manually by formatting some machine code would not provide any advantages and it would be an endless headache to work with. You might do it as a learning excercise, but not for any real purpose.

Yes, but then you have to write more and then it takes longer, so using a higher level language and compiler is more efficient on your time. More often than not, a compiler is better than what a mere human can do (they are pretty good programmers these compiler writers).

What is bootstrapping?

I keep seeing "bootstrapping" mentioned in discussions of application development. It seems both widespread and important, but I've yet to come across even a poor explanation of what bootstrapping actually is; rather, it seems as though everyone is just supposed to know what it means. I don't, though. Near as I can figure, it has something to do with initialization tasks required of an application upon launch, but I could be completely wrong about that. Can anyone help me to understand this idea?

"Bootstrapping" comes from the term "pulling yourself up by your own bootstraps." That much you can get from Wikipedia.
In computing, a bootstrap loader is the first piece of code that runs when a machine starts, and is responsible for loading the rest of the operating system. In modern computers it's stored in ROM, but I recall the bootstrap process on the PDP-11, where you would poke bits via the front-panel switches to load a particular disk segment into memory, and then run it. Needless to say, the bootstrap loader is normally pretty small.
"Bootstrapping" is also used as a term for building a system using itself -- or more correctly, a predecessor version. For example, ANTLR version 3 is written using a parser developed in ANTLR version 2.

An example of bootstrapping is in some web frameworks. You call index.php (the bootstrapper), and then it loads the frameworks helpers, models, configuration, and then loads the controller and passes off control to it.
As you can see, it's a simple file that starts a large process.

The term "bootstrapping" usually applies to a situation where a system depends on itself to start, sort of a chicken and egg problem.
For instance:
How do you compile a C compiler written in C?
How do you start an OS initialization process if you don't have the OS running yet?
How do you start a distributed (peer-to-peer) system where the clients depend on their currently known peers to find out about new peers in the system?
In that case, bootstrapping refers to a way of breaking the circular dependency, usually with the help of an external entity, e.g.
You can use another C compiler to compile (bootstrap) your own compiler, and then you can use it to recompile itself
You use a separate piece of code that sets up the initial process without depending on any functions provided by the OS
You use a hard-coded list of initial peers or a hard-coded tracker URL that supplies the peer list
etc.

See on the Wikipedia article on bootstrapping.
There is a section and links explaining what it means in Computing. It has four different uses in the field.
Here are some quotes, but for a more in depth explanation, and alternative meanings, consult the links above.
"...is a technique by which a simple computer program activates a more complicated system of programs."
"A different use of the term bootstrapping is to use a compiler to compile itself, by first writing a small part of a compiler of a new programming language in an existing language to compile more programs of the new compiler written in the new language."

In the context of application development, "bootstrapping" usually comes up when talking about modular and/or auto-updatable software.
Rather than the user downloading the entire app, including features he does not need, and re-downloading and manually updating it whenever there is an update, the user only downloads and starts a small "bootstrap" executable, which in turn downloads and installs those parts of the application that the user needs. Additionally, the bootstrap component is able to look for updates and install them each time it is started.

Alex, it's pretty much what your computer does when it boots up. ('Booting' a computer actually comes from the word bootstrapping)
Initially, the small program in your BIOS runs. That contains enough machine code to load and run a larger, more complex program.
That second program is probably something like NTLDR (in Windows) or LILO (in Linux), which then executes and is able to load, then run, the rest of the operating system.

For completeness, it is also a rather important (and relatively new) method in statistics that uses resampling / simulation to infer population properties from a sample. It has its own lengthy Wikipedia article on bootstrapping (statistics).

Boot strapping the dictionary meaning is to start up with minimum resources. In the Context of an OS the OS should be able to swiftly load once the Power On Self Test (POST) determines that its safe to wake up the CPU. The boot strap code will be run from the BIOS. BIOS is a small sized ROM. Generally it is a jump instruction to the set of instructions which will load the Operating system to the RAM. The destination of the Jump is the Boot sector in the Hard Disk. Once the bios program checks it is a valid Boot sector which contains the starting address of the stored OS, ie whether it is a valid MBR (Master Boot Record) or not. If its a valid MBR the OS will be copied to the memory (RAM)from there on the OS takes care of Memory and Process management.

As the question is answered. For web develoment.
I came so far and found a good explanation about bootsrapping in Laravel doc. Here is the link
In general, we mean registering things, including registering service
container bindings, event listeners, middleware, and even routes.
hope it will help someone who learning web application development.

Bootstrapping has yet another meaning in the context of reinforcement learning that may be useful to know for developers, in addition to its use in software development (most answers here, e.g. by kdgregory) and its use in statistics as discussed by Dirk Eddelbuettel.
From Sutton and Barto:
Widrow, Gupta, and Maitra (1973) modified the Least-Mean-Square (LMS)
algorithm of Widrow and Hoff (1960) to produce a reinforcement
learning rule that could learn from success and failure signals
instead of from training examples. They called this form of learning
“selective bootstrap adaptation” and described it as “learning with a
critic” instead of “learning with a teacher.” They analyzed this rule
and showed how it could learn to play blackjack. This was an isolated
foray into reinforcement learning by Widrow, whose contributions to
supervised learning were much more influential.
The book describes various reinforcement algorithms where the target value is based on a previous approximation as bootstrap methods:
Finally, we note
one last special property of DP [Dynamic Programming] methods. All of them update estimates
of the values of states based on estimates of the values of successor
states. That is, they update estimates on the basis of other
estimates. We call this general idea bootstrapping. Many reinforcement
learning methods perform bootstrapping, even those that do not
require, as DP requires, a complete and accurate model of the
environment.
Note that this differs from bootstrap aggregating and intelligence explosion that is mentioned on the wikipedia page on bootstrapping.

I belong to the generation who flipped switches to enter a boot program. In the early 1980s, I worked on a microcomputer called Micro-78, developed by Electronics Corporation of India Ltd (ECIL). It was a sort of clone of Altair 8800. I distinctly remember what happens when a small boot program was entered using the toggle switches and executed by pressing a button. The program reads a second boot program contained in the 1st track of the floppy disk and overwrites it on itself in such a way that the second boot program starts executing to load a disk operating system. I think the term "bootstrap" refers to this process of the first boot program reading and overwriting the second boot program on itself, in a way "pulling itself up" with the additional functionality of the second boot program. That may be the origin of the original meaning of "the bootstrap program".

IMHO there is not a better explanation than the fact about How was the first compiler written?
These days the Operating System loading is the most common process referred as Bootstrapping

In terms of it in regards to using the popular Twitter Bootstrap I feel like this type of bootstrapping is the action of integrating a modular component into a Web application without the Web application having to even acknowledge the modular component exists until it needs it or references it.
The developer can seamlessly integrate a default copy of the CSS Twitter Bootstrap theme by simply loading (referencing) it into the Web application. Vuola! Then you may need to override some of these changes, but you can do so in such a way that the resource/component is untouched and completely reusable.
This same concept is how Web Devs implement jQuery APIs and so on, but it's not really expressed by Devs as bootstrapping per se. What it does is it improves flexibility and reusability while allowing the isolation of different components/resources of an app to reside freely either on the same server/s or possibly on a CDN.
NOTE: In computing bootstrapping deals with the MBR and in UNIX it requires a special bootloader or manager which is a small program in ROM that loads the OS into RAM. If you think about it the same concept takes places in the action of the bootstrap loader checking the MBR and loading the OS based on this table which occurs without the OS having any idea that this takes place.

Bootstrap file is responsible for loading contents of main file. It is a wrapper around main file. This way we can catch errors if loading of file was unsuccessful for some reason.

As a humble beginner in the world of programming, and flicking through all the answers here after seeing this word used a lot in apparently slightly different ways in different places, I found reading the Wikipedia page on Bootstrapping (duh! I didn't think of it either at first) is very informative to understand differences in use of this word. Could it be......on extremely rare occasions......Wikipedia might even have better explanations of certain terms than....(redacted)? Will they bring in rep points on Wikipedia though?
To me, it seems all the meanings something to do with: start with something as simple as possible Thing1, make something slightly more complex with that Thing2, and now you can use Thing2 to do some kind of tasks more efficiently and quickly than you could originally with Thing1. Then repeat from Thing2 to Thing 3 ad infinitum...
I see it as closely connected to both biological evolution and 'Layers of Abstraction' (newbies like me see, ahem, Wikipedia, cough) - the evolution from 1940's computers with switches, machine code, Assembly, C, Python, AIs you can give all kinds of complex instructions to like "make the %4^% dinner to my default &^$% requirements and clean the floor you %$£"#:~" in drunken slang English or Amazon tribal dialect without them 'raising an exception' (for newbies again...you guessed it) - missed out lot of links there due to simple ignorance.
Then in certain specific software meanings:
Meaning1: Thing1 is used to load latest version of Thing2 (because of course Thing2 will be bigger than Thing1, just as Thing3 will be be bigger than Thing2).
Meaning2: Thing1 is a lower level language (closer to 1001011100....011001 than print("Hello, ", user.name)) used to write a little bit of the higher language of Thing2, then this little bit of Thing2 is used to expand Thing2 itself from baby vocabulary level towards adult vocabulary level (Thing2 starts to be processed, or to use correct technical term 'compiled', by the baby version of itself (it's a clever baby!), whereas the baby version of Thing2 itself could of course only be compiled by Thing1, cause it can't exist before it exists, right duh!), then child version of Thing2 compiles Surly Teenager version of Thing2, at which point programming community decides whether Surly Teenager's 'issues' (software term and metaphor term!) are worth spending enough time resolving to be accepted long term, or to abandon them to (not sure where to take the analogy here).
If yes, then Thing2 has 'Bootstrapped' itself (possibly a few times) from babyhood to adulthood: "the child is the father of the man" (Wordsworth, suggest don't try looking up the quote or the author on Stack Overflow).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart