I've been reading up on Dart snapshots, and they're frequently compared to Smalltalk images. But to me, they sound alot like Java bytecode.
For example:
"A Dart snapshot is just a binary serialization of the token stream, generated from parsing the code. A snapshot is not a "snapshot of a running program", it's generated before the tokens are turned into machine code. So, no program state is captured in a snapshot."
Plus they're cross-platform:
"The snapshot format itself is cross-platform meaning that it works between 32-bit, 64-bit machines and so forth. The format has been made so that it's quick to read into memory with a emphasis on minimizing extra work like pointer fixups."
Am I getting it wrong somewhere?
Sources:
What is the snapshot concept in dart?
http://www.infoq.com/articles/google-dart
Snapshots contain the VM data structures representing the loaded script in a serialized form similar to Smalltalk images. To get a better understanding of what is contained in the snapshot, we should take a look at what the Dart VM creates as it reads the script:
Library objects, referring to all top-level structures such as classes or top-level methods and variables.
Class objects, containing all objects describing all methods and fields.
Script and Tokenstream objects representing all loaded source code.
String objects for all used identifiers and string constants in the source code.
This object graph is serialized into a file when generating a snapshot using a format that is architecture agnostic. This allows the Dart VM to deserialize this snapshot file on 32-bit or 64-bit machines and recreate all of the necessary internal VM data structures much quicker than reading the original scripts from a set of files (see John's answer).
To clarify John's answer a bit. The Dart VM does not parse ALL of the source code when generating the snapshot. It only needs to parse the top level of the sources to be able to extract class, method and field definitions as these are represented in the serialized graph. In particular method bodies are not parsed and as it is customary for a scripting language errors will be only reported once control reaches the particular method.
The purpose of Java bytecodes is entirely different as Ladicek points out. You could create a snapshot of the VM data structures in a JVM once the bytecodes are loaded to get a similar effect.
In short: The snapshot contains an efficient representation of all the data structures allocated on the Dart VM heap which are needed to start executing the script.
A Dart snapshot is just a roll up of all source files that has been parsed ahead of time. A Dart snapshot is not similar to a Java bytecode file. A Java bytecode file consists of JVM machine code and is the product of a compile, link, and assembly (into JVM machine code) phase.
A Dart snapshot is a binary file of a Dart program and it's import/part source file dependencies that has been parsed into an abstract syntax tree and rolled into a single file. Executing a Dart snapshot allows for faster startup times because:
Only 1 file must be loaded from disk or off network. In contrast, a non-snapshot Dart program must be fetched, then any imported files must be fetched, and so on. Before each subsequent source file request can be made the previously fetched source file must be parsed to find out if it's referencing more source files. Imagine if your Dart program imported 10 libraries which consisted of 10 source files each. That means 110 I/O requests and parses that are done one after another.
The parsing has been done ahead of time. It's already known to be syntactically correct and ready to be compiled by the Dart VM.
I will just point out that as of Dart 2+, there are several distinctive concepts when it comes to Snapshots:
Kernel Snapshot
JIT Snapshot
AOT Snapshot
You can read more here.
Related
I'm reading about dart compile, and it has a few option: Executable, AOT snapshot, JIT snapshot and Kernel snapshot and JavaScript.
What is the difference between an executable and a snapshot? Is it purely the fact executables contain the Dart runtime/ VM, whereas a snapshot doesn't. Why is it called a snapshot?
2 highly related question (which I found after posting this question) are:
What is the difference between Dart's snapshots and Java bytecode?
What is the snapshot concept in dart?
This question is different to Dart: Snapshots vs AOT, since is asking the difference between a Snapshot and AOT, but actually AOT files are snapshots. It also primarily asked about the differences between Snapshot options (AOT, Kernel, JIT).
An executable (created by dart compile exe) is a combination of an AOT snapshot, and the Dart runtime. The Dart runtime is needed to run any Dart code, as it performs critical tasks like managing memory (including garbage collection) and performing runtime type checks.
The three kinds of snapshots (AOT, kernel, and JIT) all contain just the compiled source code. They all need a runtime to be run (typically you just use dart run <snapshot>).
The snapshots should perhaps have been named differently. Would 'module' have been easier to understand?
When Dart code is compiled, it is compiled into intermediary byte code rather than native code. This means that in order for the code to be executed in any arbitrary environment, the compiler needs to include the Dart virtual machine inside the executable along with the user code as well as any portions of the core library the program requires. Usually, this is fine because the Dart runtime is actually quite compact, but the obvious downside to this is that the executable will be larger and start-up time will be longer as the runtime needs to be extracted and warmed up before the user code can be run.
If, however, you are compiling code for an environment where you can guarantee that a Dart runtime will be present (such as a server machine or an IoT device), you can omit the runtime from the compiled program by building a snapshot file rather than an executable. This results in a smaller compiled file with a faster start-up time, although it requires a command to execute and as such is less convenient. You can learn more about snapshots and how to build and execute them on the Dart GitHib wiki page on snapshots.
There are three different kinds of snapshots: Kernel snapshots, which contain only AST information rather than compiled byte code making them usable by Dart runtimes on any supported architecture (portable but slow); JIT snapshots, which contain just the parts of the program necessary for startup and leaves the rest to be interpreted at runtime (fastest startup but slower execution); and AOT snapshots, which fully compiles the entire program into byte code (slower startup but fastest execution).
As for why it's called a "snapshot", I couldn't say. If I had to guess, that's because it's a "snapshot" of the program in its compiled state but without the instructions necessary to run it as a standalone executable.
(The above is based on my quick research on the subject and may be missing some key details. If a member of the Dart team happens upon this question, they will probably be able to offer a more detailed and technical explanation.)
Searching for "DSO" in the context of "nvidia" yields "Days Sales Outstanding". Looking at a comprehensive acronym list for "DSO" yields a likely candidate: "Data Source Object" but it is for some sort of Microsoft standard that would seem to be inapplicable to Linux platforms.
From Webopedia:
A dynamic shared object (DSO) is an object file that is intended to be used simultaneously (or shared by) multiple applications while they’re executing. A DSO can be used in place of archive libraries and will minimize overall memory usage because code is shared. Two executables that use the same DSO and that run simultaneously have only one copy of the shared components loaded into memory.
[Source: Adapted from SGI]
I've been reading on the compilation process, I understand some of the earlier concepts like parsing but I stop short of understanding how the executable file is created at the end.
In the examples I've seen around the "compiler" takes input in the form of a lang defined by BNF and then upon parsing it outputs assembly.
Is the executable file literally just that assembly in binary form? I feel like this can't be the case given that there are applications for making executables from assembly?
If this isn't answerable (ie it's too complex for the stack overflow format) I'd totally be happy with links/books so I can educate myself.
The compiler (or more specifically, the linker) creates the executable.
The format of the file generally vary depending on the operating system.
There are currently two main formats ELF and COFF
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
http://en.wikipedia.org/wiki/COFF
If you understand the concept of a structure, this is the same, only within a file. Each file has a first structure called a header, and from there you can access the other structures as required.
In most cases, only the resulting binary code is saved in these files, although you often find debug information. Some formats could save the source along the code, but now a day it only saves the necessary references to the source.
With dynamic linking, you also find symbol tables that include the actual symbol name. Otherwise, only relocation tables would be required.
Under the Amiga we also had the possibility to define code in a "segment". Only one segment could be loaded at a time. Once you were done with the segment, you could unload it and load another. Yet, in the end the concepts were similar. Structures in a file.
Microsoft offers a PDF about the COFF format. I could not find it on their website just now, but it looks like others have it. ELF has many links in the Wikipedia page so you should be able to find a PDF to get started.
Not all but some (gcc, etc) compilers go from the high level language to assembly language then spawn the assembler. The assembler reads the assembly language and generates machine code and generates an object file which as you have guessed contains more than just the machine code bits. If you think of it for second you may realize that a variable or function that is defined in another source file which means its code lives in another object file, until link time one object doesnt know how to get at that external function, so 1) the machine code is not finished, patching up external addresses is not done until link time 2) there needs to be some information in the object file that defines what public items are in this object file and what external items are missing, names of functions for example which are obviously not embedded in the machine code. So the objects have machine code in various states of completion as well as other data needed by the linker. the linker then...links...the objects together into one program with everything resolved, it basically completes all the machine code and puts the fragments of machine code (in separate objects) into one place. Then it has to save all that on the disk in some format and typically that format is not just raw machine code. It has extra stuff in the file, starting with a header and the a way to define each binary blob and where it needs to live in memory before executing. When you run a program on the command line of your operating system or double clicking or whatever in a file manager gui, the operating system knows how to read that file format, extract the blobs of binary, place those blobs of binary in ram defined by this file format and then start executing at the place defined by this file format.
aout, elf, coff, intel hex, motorola s-record are all popular formats as well as raw binary which some toolchains can produce. the gnu tools will default to one (coff or elf or exe or aout) and then objcopy is used to convert from one to another or at least the default one to the others and there is help to show what your possible choices are. then simply google those or wikipedia them and find the definitions of the file formats. Intel hex or motorola srecord are good ones to start with at wikipedia then elf perhaps.
if you want to produce native executable file you have 2 options. you can assembly the binary form yourself or you can transalte your program to another language and use its compiler to producte the executable
Source code is present at run time in production system (though it may be compiled into ByteCode, native object code, or some other format for performance reasons). Application code is not delivered as object code for the underlying processor (some stable system libraries may be pre-compiled in this way however).
I read this lines from
http://c2.com/cgi/wiki?ScriptingLanguage
and i did not understood the line "Application code is not delivered as object code for the underlying processor" can anybody help me to understand this line because unless or until the
object code has not delivered to the system the code will not be executed Then how come it is
possible to have a application code which is not delivered to the processor help full to us please help me with a small example thank you......
A scripting language is (generally) interpreted. This means that there is an application (the interpreter) that reads the source file (which is in text format) and executes the instructions as it reads them (*). Thus no object code (for the interpreted program) is required
(*) this might not result in actual "performing" the code, might just store the defintion of a structure/class, etc.
This is in contrast with compiled programs that are first translated from source code to native-binary/byte-code/etc. by the compiler. In this case the source (text format) is not needed to execute the program, only the object code (the result of the translation).
Note: the line is a bit blured with byte-code-like object format. Although it is not the source code, it will stil have to be interpreted by the underlying virtual machine to be able to execute it on a CPU. Unless you treat the virtual machine as the machine that "executes" the object code (in the form of byte-code)
So here is the problem: Recently someone bought a new PC for server to replace an older dating from before 1985 (i wonder how it is possible to work daily from then) .
He wants to put there the old COBOL software and he isnt willing in any means to rewrite it to something better..
So is there any compiler for 1985 cobol? For nowadays red hat linux? Googling it found opencobol and other few but all converted the code to c... Seems too compilacted too me..
UPDATE AS REQUESTED
AIX was the old system
What's the problem with converting the COBOL to C and then compiling? As long as it works. Early C++ environments were implemented in the same way: they converted the C++ to C, and then invoked the C compiler.
Converting the COBOL to C allows them to use high-level abstractions that implement the COBOL equivalents in C. They can leverage the standard C libraries, and also convert the COBOL data access code into calls to widely available databases like MySQL. Finally, converting to C and then compiling leverages the vast amount of development effort that went into code generation. Were they to try compiling directly to object code, they'd have to generate the intermediate code expected by the GNU compiler subsystem, or they'd have to go directly to object code. Either one of those would be much more complicated than converting to C, meaning that the likelihood of bugs in the COBOL compiler would be much higher.
From where I sit, I'd say OpenCOBOL is worth looking into. Note that they say they implement "a substantial part of the COBOL 85 and COBOL 2002 standards." You probably want to make sure that they implement the parts that you need.
I would also suggest that you look into TinyCOBOL.
You don't mention when the application, or AIX was last updated. If these were updated in the last few years, you may be able to port the application, without re-compiling. You should check to see what COBOL compiler was used originally, e.g IBM, RM/COBOL, AcuCOBOL, etc. It might be possible to buy a run-time only version (will execute, but not compile), which would be cheaper than buying a compiler.
A company called Micro Focus make a cobol compiler for Windows but I can assure you it is not cheap at all!
Standard method for doing this is called migrating and involves a number of steps including converting source file to a textfile format or a filetype compatible with the target computer, using an approved method of converting to a file and writing to magtape with compatible recording method of Phase encoding or to disk or other data medium possibly in the ASN.xx mode, transferring to the new computer to then read in the file (through ASN.yy) and store it in a native or import file format, then either use a utility to convert it to the sourcefile format or by running the program development environment to access the native text file or import file and saving the content as a native sourcefile. Perform manual checks and amendments to the source or script code and then compile the program and repeat alterations until a working version is achieved. Create test data files on the new computer and create a new jobfile or macro to run the job in the development environment. When fully tested the program can be run live using data files and live macros or jobfiles migrated over from the old system or newly created in more or less the same way as bringing over the source code. An important point is that the live data must be read into a specialized data takeon or loading program to achieve a populated database before any new transactions occur in the case of a structured datafile being necessary. When moving from AIX or other versions of Unix to an entirely different operating system the characters for end of line and linefeed and end of record may need specific conversion if they are not handled by a file format convertor or exporter utility.