I have recently started doing some work with COBOL, where I have only ever done work in z/OS Assembler on a Mainframe before.
I know that COBOL will be translated into Mainframe machine-code, but I am wondering if it is possible to see the generated code?
I want to use this to better understand the under workings of COBOL.
For example, if I was to compile a COBOL program, I would like to see the assembly that results from the compile. Is something like this possible?
Relenting, only because of this: "I want to use this to better understand the under workings of Cobol".
The simple answer is that there is, for Enterprise COBOL on z/OS, a compiler option, LIST. LIST will provide what is known as the "pseudo assembler" output in your compile listing (and some other useful stuff for understanding the executable program). Another compiler option, OFFSET, shows the displacement from the start of the program of the code generated for each COBOL verb. LIST (which inherently has the offset already) and OFFSET are mutually exclusive. So you need to specify LIST and NOOFFSET.
Compiler options can be specified on the PARM of the EXEC PGM= for the compiler. Since the PARM is limited to 100 characters, compiler options can also be specified in a data set, with a DDName of SYSOPTF (which, in turn, you use a compiler option to specify its use).
A third way to specify compiler options is to include them in the program source, using the PROCESS or (more common, since it is shorter) CBL statement.
It is likely that you have a "panel" to compile your programs. This may have a field allowing options to be specified.
However, be aware of a couple of things: it is possible, when installing the compiler, to "nail in" compiler options (which means they can't be changed by the application programmer); it is possible, when installing the compiler, to prevent the use of PROCESS/CBL statements.
The reason for the above is standardisation. There are compiler options which affect code generation, and using different code generation options within the same system can cause unwanted affects. Even across systems, different code generation options may not be desirable if programmers are prone to expect the "normal" options.
It is unlikely that listing-only options will be "nailed", but if you are prevented from specifying options, then you may need to make a special request. This is not common, but you may be unlucky. Not my fault if it doesn't work for you.
This compiler options, and how you can specify them, are documented in the Enterprise COBOL Programming Guide for your specific release. There you will also find the documentation of the pseudo-assembler (be aware that it appears in the document as "pseudo-assembler", "pseudoassembler" and "pseudo assembler", for no good reason).
When you see the pseudo-assembler, you will see that it is not in the same format as an Assembler statement (I've never discovered why, but as far as I know it has been that way for more than 40 years). The line with the pseudo-assembler will also contain the machine-code in the format you are already familiar with from the output of the Assembler.
Don't expect to see a compiled COBOL program looking like an Assembler program that you would write. Enterprise COBOL adheres to a language Standard (1985) with IBM Extensions. The answer to "why does it do it likely that" will be "because", except for optimisations (see later).
What you see will depend heavily on the version of your compiler, because in the summer of 2013, IBM introduced V5, with entirely new code-generation and optimisation. Up to V4.2, the code generator dated back to "ESA", which meant that over 600 machine instructions introduced since ESA were not available to Enterprise COBOL programs, and extended registers. The same COBOL program compiled with V4.2 and with V6.1 (latest version at time of writing) will be markedly different, and not only because of the different instructions, but also because the structure of an executable COBOL program was also redesigned.
Then there's opimisation. With V4.2, there was one level of possible optimisation, and the optimised code was generally "recognisable". With V5+, there are three levels of optimisation (you get level zero without asking for it) and the optimisations are much more extreme, including, well, extreme stuff. If you have V5+, and want to know a bit more about what is going on, use OPT(0) to get a grip on what is happening, and then note the effects of OPT(1) and OPT(2) (and realise, with the increased compile times, how much work is put into the optimisation).
There's not really a substantial amount of official documentation of the internals. Search-engineing will reveal some stuff. IBM's Compiler Cafe:COBOL Cafe Forum - IBM is a good place if you want more knowledge of V5+ internals, as a couple of the developers attend there. For up to V4.2, here may be as good a place as any to ask further specific questions.
Related
I am curious if there are any extensive overview, preferrably specifications / technical reports about the GNU style and other commonly used styles for parsing Command Line Arguments.
As far as I know, there are many catches and it's not completely trivial to write a parsing library that would be as compliant as, for example, C++ boost::program_options, Python's argparse, GNU getopt and more.
On the other hand, there might be libraries that are too liberal in accepting certain options or too restrictive. So, if one wants to aim for a good compatibility / conformance with a de-facto standard (if such exists), is there a better way than simply reading a number of mature libraries' source code and/or test cases?
Posix provides guidelines for the syntax of utilities, as Chapter 12 of XBD (the Base Definitions). It's certainly worth a read. As is noted, backwards-compatibility has meant that many standardized utilities do not conform to these guidelines, but nonetheless the standard recommends
... that all future utilities and applications use these guidelines to enhance user portability. The fact that some historical utilities could not be changed (to avoid breaking existing applications) should not deter this future goal.
You can also read the rationale for the syntax guidelines.
Posix provides a basic syntax but it's insufficient for utilities with a large number of arguments, and single-letter options are somewhat lacking in self-documentation. Some utilities -- test, find and tcpdump spring to mind -- essentially implement domain specific languages. Others -- ls and ps, for example -- have a bewildering pantheon of invocation options. To say nothing of compilers...
Over the years, a number of possible extension methods have been considered, and probably all of the are still in use in at least one common (possibly even standard) utility. Posix recommends the use of -W as an extension mechanism, but there are few uses of that. X Windows and TCL/Tk popularized the use of spelled-out multicharacter options, but those utilities expect long option names to still start with a single dash, which renders it impossible to condense non-argument options [Note 1]. Other utilities -- dd, make and awk, to name a few -- special-case arguments which have the form {íd}={val} with no hyphens at all. The GNU approach of using a double-hyphen seems to have largely won, partly for this reason, but GNU-style option reordering is not universally appreciated.
A brief discussion of GNU style is found in the GNU style guide (see also the list of long options), and a slightly less brief discussion is in Eric Raymond's The Art of Unix Programming [Note 2].
Google code takes command-line options to a new level; the internal library has now been open-sourced as gflags so I suppose it is now not breaking confidentiality to observe how much of Google's server management tooling is done through command-line options. Google flags are scattered indiscriminately throughout the code, so that library functions can define their own options without the calling program ever being aware of them, making it possible to tailor the behaviour of key libraries independently of the application. (It's also possible to modify the value of a gflag on the fly at runtime, another interesting tool for service management.) From a syntactic viewpoint, gflags allows both single- and double-hyphen long option presentation, indiscriminately, and it doesn't allow coalesced single-character-option calls. [Note 3]
It's worth highlighting the observation in The Unix Programming Environment (Kernighan & Pike) that because the shell "must satisfy both the interactive and programming aspects of command execution, it is a strange language, shaped as much by history as by design." The requirements of these two aspects -- the desire of a concise interactive language and a precise programming language -- are not always compatible.
Syntax flexibility, while handy for the interactive user, can be disastrous for the script author. As an example, last night I typed -env=... instead of --env=... which resulted in my passing nv=... to the -e option rather than passing ... to the --env option, which I didn't notice until someone asked me why I was passing that odd string as an EOF indicator. On the other hand, my pet bugbear -- the fact that some prefer --long-option and others prefer --long_option and sometimes you find both styles in the same program (I'm looking at you, gcc) -- is equally annoying as an interactive user and as a scripter.
Sadly, I don't know of any resource which would serve as an answer to this question, and I'm not sure that the above serves the need either. But perhaps we can improve it over time.
Notes:
Obviously a bad idea, since it would make impossible the pastime of constructing useful netstat invocations whose argument is a readable word.
The book and its author are commonly known as TAOUP and ESR, respectively.
It took me a while to get used to this, and very little time to revert to my old habits. So you can see where my biases lie.
Two obfuscation-related questions:
1) Is there any tool that can disassemble F# back to its source form, or something close to it, from the MSIL target form? This is not an attempt at security through obscurity but I want to protect some source code from "theft".
2) I looked briefly at some F# compiler output and in general it appears pretty gibberish compared to what you get if you disassemble C# compiled code, presumably because C# is closer to the MSIL intermediate representation. The only partly mangled code I've seen from the C# compiler is iterators (and presumably async as of C# 5.0).
So far my impression is that the F# compiled code is reasonably "obfuscated" but is that true? (I realize this is a somewhat subjective question.)
I haven't heard of anything like this; however, I think it's quite likely such a tool will appear in the relatively-near future.
Assemblies produced by the F# compiler (i.e., MSIL and related metadata) aren't obfuscated in any way. However, some of the code it produces is far different than the code produced by the C# or VB.NET compilers, so it's not going to be as easy to reverse-engineer (simply because the tools to do so aren't available). Of course, as #Craig Stuntz said, this doesn't afford much protection against an experienced, motivated attacker.
If you're really paranoid, you might consider using an obfuscation tool on your compiled assemblies before shipping them. I've been using {SmartAssembly} with F# since late 2010, so I know that one works with F#; if you go with another tool, make sure you test it against some reasonably complicated F# assemblies before buying it -- at the time I was looking for an obfuscator, many of them didn't work correctly (or at all) with F# assemblies.
I wrote up some notes a while back about obfuscating F# assemblies, if you want to read more: Any experience using .NET obfuscators on F# assemblies?
F# is part of the .NET language therefore it can be decompiled. You can have a look at RedGate's Reflector if you want to spend money or 0xd4d's dnSpy (and yes, its the same developer as the very-well known deobfuscator De4Dot). Decompiled code is really close to hard-coded code, the logic is still the same and you can copy/paste the source code.
If you want to protect a F# application you may consider using an obfuscator, & currently they are almost all handled by De4Dot so it's hard to choose wisely, though .NETGuard is really strong, it can handle F# applications, it can produce a native output & it has some strong constant protection and De4Dot cannot handle it.
I used to use Pythia to obfuscate my D6 program. But it seems Pythia does not work anymore with my D2007.
Here's the link of Pythia (no update since early 2007) : http://www.the-interweb.com/serendipity/index.php?/archives/86-Pythia-1.1.html
From link above, here's what I want to achieve
Over the course of time, a lot of new language features were added.
Since there is no formal grammar available, it is very hard for tool vendors (including Embarcadero themselves) to keep their Delphi language parsers up at the same level as the Delphi Compiler.
It is one of the reasons it takes tool vendors a bit of time (and for Delphi generics support: a lot of time!) to update their tools, of they are update at all.
You even see artifacts of this in Delphi itself:
the structure pane often gets things wrong
the Delphi modelling and refactoring sometimes fails
the Delphi code formatter goes haywire
Pythia is the only obfuscator for the native Delphi language I know of.
You could ask them on their site if they plan for a newer version.
Personally, I almost never use obfuscators for these reasons:
reverse engineering non-obfuscated projects is difficult enough (it would take competitors long enough to reverse engineer, so the chance to lessen the backlog they already have in the first place is virtually zero)
their added value is limited when you have multi-project solutions (basically they only hide internal or private stuff)
they make bug hunting production code far too cumbersome
--jeroen
You may try UPX - Ultimate Packer for Executable). It will compress the resources and all the text entries are non-readable without de-compress first.
I don't know any good free solutions, but if you really need some protection you can always buy something like:
http://www.aspack.com/asprotect.html
or
http://www.oreans.com/themida.php
I have a task to convert COBOL code to .NET. Are there any converters available? I am trying to understand COBOL code in high level. I have a trouble understanding the COBOL code. Is there any flowchart generators? I appreciate any help.
Thank you..
Migrating software systems from one language or operating environment to another is always a challenge. Here are
a few things to consider:
Legacy code tends to be poorly structured as a result of a
long history of quick fixes and problem work-arounds. This really ups the signal-to-noise ratio
when trying to warp your head around what is really going on.
Converting code leads to further "de-structuring"
to compensate for mis-matches between the source and
target implementation platforms. When you start from a poorly structured base (legacy system),
the end result may be totally un-intelligible.
Documentation of the legacy architecture and/or business processes is generally so far out of
date that it is worse than useless, it may actually be misleading.
Complexity of COBOL code is almost always under estimated.
A number of "features" will be promulgated into the converted system that were originally
built to compensate for things that "couldn't be done" at one time (due to smaller memories,
slower computers etc.). Many of these may now be non-issues and you really don't want them.
There are no obvious or straight forward ways to refactor legacy process driven
systems into an equivalent object oriented system (at least not in a meaningful way).
There have been successful projects that migrated COBOL directly into Java. See naca.
However, the end result is only something its mother (or another COBOL programmer) could love, see this discussion
In general I would be suspicious of any product or tool making claims to convert your COBOL legacy
system into anything but another version of COBOL (e.g. COBOL.net). To this end you still
end up with what is essentially a COBOL system. If this approach is acceptable then you
might want to review this white paper from Micro Focus.
IMHO, your best bet for replacing COBOL is to re-engineer your system. If you ever find
a silver bullet to get from where you are to where you want to be - write a book, become
a consultant and make many millions of dollars.
Sorry to have provided such a negative answer, but if you are working with anything
but a trivial legacy system, the problem is going to be anything but trivial to solve.
Note: Don't bother with flowcharting the existing system. Try to get a handle on process input/output and program to program data transformation and flow. You need to understand the business function here, not a specific implementation of it.
Micro Focus and Fujitsu both have COBOL products that work with .NET. Micro Focus allow you to download a product trial, while the Fujitsu NetCOBOL site has a number of articles and case studies.
Micro Focus
http://www.microfocus.com/products/micro-focus-developer/micro-focus-cobol/windows-and-net/micro-focus-visual-cobol.aspx
Fujitsu
http://www.netcobol.com/products/Fujitsu-NetCOBOL-for-.NET/overview
[Note: I work for Micro Focus]
Hi
Actually, making COBOL applications available on the .NET framework is pretty straightforward (contrary to claim made in one of the earlier responses). Fujitsu and Micro Focus both have COBOL compilers that can create ILASM code for execution in the CLR.
Micro Focus Visual COBOL (http://www.microfocus.com/visualcobol) makes it particularly easy to deploy traditional, procedural COBOL as managed code with full support for COBOL data types, file systems etc. It also includes an updated OO COBOL syntax that takes away a lot of the verbosity & complexity of the syntax to be very easy to write COBOL code based on C# examples. It's unique approach also makes it easy to use all the Visual Studio tools such as IntelliSense.
The original question mentioned "convert" and I would strongly recommend against any approach that requires the source code to be converted to some other language before being used in a .NET environment. The amount of effort and risk involved is highly unlikely to be worth any benefits accrued. On the contrary, keeping the code in COBOL maintains the existing, working code and allows for the option to deploy onto other platforms in the future. For example, how about having a single set of source code and having the option to deploy into .NET as a native language and into a Java environment without changing a line of source code?
I recommend you get a trial copy of Visual COBOL from the link above and see how you can use your existing code in .NET without making any changes.
This is not an easy task. COBOL has fundamental ideas about data types that do not map well with the object-oriented .NET framework (e.g. in COBOL, all data types are represented in terms of fixed-size buffers) and in particular the way groups and arrays work do not map well to .NET classes.
I believe there are COBOL compilers that can actually compile .NET bytecode, but they would have their own runtime libraries to manage all of that. It might be worth looking at one of these compilers and simply leaving the legacy code in COBOL.
Other than that, line-by-line translation is probably not possible. Look at the code at a higher level and translate blocks of code at a time (e.g. at the procedure level or even higher).
There are a lot mechanisms how to convert COBOL to modern scalable environments, such as .NET or Java.
The first is a migration to a new environment with saving the existing COBOL code with some minor modifications (NET Microfocus COBOL);
The second is a migration to a new platform with simulation of COBOL statements and constructions. When there are some additional NET/Java libraries to simulate some specific COBOL logic:
ACCEPT goes to NETLibrary.Accept and so on.
The third approach is the most valuable one, when you migrate to "pure" NET/Java code with all the benefits of the new environment. It can be easily maintained and developed in the future.
However, the unique expertise and toolkits are required for this approach, and there are only a few players on the global market that can help you in this case.
If we are talking about automatic migration, the number of players decreases greatly and, unfortunately for you, you have to pay for the specific technologies and tools (like ours).
However, it is a better idea to invest your money in your future growth in the modern environment, than to spend your money on the "simulation" of old technologies.
Translations is not an easy task. Besides Micro Focus and Fujitsu there is also Raincode that offers a free version of Cobol that nicely integrates with Visual Studio.
I have recently spent several years translating legacy FORTRAN into Java. Prior to that, I found myself translating FORTRAN into C (for which I wrote a simple translation tool). After all this work, I find myself wondering how many others are doing similar language-to-language translations and whether an automated way of doing so would be beneficial.
I know about F2C, For_C, F2J and others, as well as some of the translation sites, but none seem to be all that successful. Having seen output from For_C, I can see why it just hasn't taken off. While it is technically correct, it is very difficult to maintain.
So, I guess what I am wondering is if there were are tool that produced more maintainable, more grok-able code than the code I have seen, would developers use it? Or are developers as jaded as many posts seem to indicate and unwilling to use generated code as it could never be as good as their manually translated code?
In short, no. Obviously time restraints necessitate it sometimes, but...
Rarely is code written in one language going to translate well to another - every language has certain ways of doing things that are more suited to the constructs available / common libraries / etc.
Consider for example a program written in C as compared to something written in Python - certainly you can write for loops and iterate through things in Python just as easily as you can in C, but it is much simpler to use list comprehensions and take advantage of the features the language provides.
I'd be surprised to see an example of a reasonably sized program written in any language that could be translated into 'correct', well-maintainable code in any other.
This was already covered to some extent in Conversion of Fortran 77 code to C++, but I'll take a stab at it here.
I think there's a lot of time wasted translating legacy code to new languages. It takes a phenomenal amount of time and energy to do, and you introduce new bugs when you do it.
Joel mentioned why rewriting from scratch is a horrible idea in Things you Should Never do Part I, and though I realize that translating something to a new language isn't quite the same as rewriting from scratch, I claim it's close enough:
Automated translation tools aren't wonderful because you don't get anything maintainable out of them. You pretty much have to know the old code to understand the new code, and then what have you gained?
To port something manually, you have to know how the code works to do it well. Rewriting code is seldom done by the original developers, so you seldom get people who understand everything that's going on to do the rewrite. I worked at a company where an outsource team was hired to translate an entire website backend from ColdFusion to JSP. That project kept getting delayed and delayed because the port team didn't know the code at all. Our guys never quite liked their design, and they never quite got it right, so there was constant iteration as everyone worked out all the issues that were solved in the original code. Then, the porting itself took forever.
You also need to be familiar with really technical inconsistencies between languages. People who are very familiar with two languages are rare.
For Fortran specifically, I now work at a place where there are millions of lines of legacy Fortran code, and no one here is about to rewrite it. There's just too much risk. Old bugs would have to be re-fixed, and there are hundreds of man-years that went into working out the math. Nobody wants to introduce those kinds of bugs, and it's probably downright unsafe to do it.
Instead of porting, we have hybrid codes. After all, you can link Fortran and C/C++, and if you make a C interface around your Fortran code, you can call it from Java. Modern codes here have C/C++ components that make calls into old Fortran routines, and if you do it this way you get the added benefit that Fortran compilers are screaming fast, so the old code continues to run as fast as it ever did.
I think the best way to handle this is to do any porting you need to do incrementally. Make a lightweight interface around your old fortran code and call the pieces you need, but only port things as you need them in the new part. There are also component frameworks for integrating multi-language applications that can make this easier, but you can check out Conversion of Fortran 77 code to C++ for more on that.
Since programming is hard, no such tool can really exist.
If it was trivial to change one language into another, the idea of "compiler" would be moot. You'd just map the language you liked into the language of the hardware, press the button and be done.
However, it's never that simple. Each VM, each language, each API library adds nuances that are just impossible to automate.
" I can see why it just hasn't taken off. While it is technically correct, it is very difficult to maintain."
Correct for F2C as well as Fortran to machine language. The object code generated from most compilers can't easily be read by people. Either it's cruddy or it's highly optimized. Either way, it doesn't look a thing like an expert human would write in the assembler language for that hardware.
If only compiling could be reduced to some XSLT-like transformations that preserved the clarity of the old language in the new language. If there was only some universal Lingua Franca of computing that would be the Rosetta Stone of programming.
Until someone invents that Lingua Franca of computing, every language translation job will be hard and will lead to code that's "difficult to maintain" in the new language.
I've used f2c, and I agree with whoever wanted to name it cc2fc instead. It isn't a way of transforming Fortran into anything vaguely usable as C. It's a way of taking a C compiler and making a Fortran compiler out of it.
It did work just fine at taking that Fortran code and turning it (through C) to a Macintosh library I could call from Macintosh Common Lisp. Those were the days.