I know that many COBOL compilers do allow nested copybooks (with a different depth), but I did not found any "official" rule.
Questions:
Is there any COBOL standard explicit allowing or forbidding nested copybooks?
Is there any COBOL compiler not allowing nested copybooks?
The standard permits nested copybooks, but does not allow any COPY statements to be mentioned in or be manipulated by the REPLACING clause. From the standard, COPY statement, general rules 11 to 14 (COBOL 2014 7.2.2.3, identical in COBOL 2002 7.1.2.3):
If the REPLACING phrase is specified, the library text shall not contain a COPY statement.
...
If the REPLACING phrase is not specified, the library text may contain a COPY statement that does not include a REPLACING phrase. The implementation shall support nesting of at least 5 levels, including the first COPY statement in the sequence. The library text being copied shall not cause the processing of a COPY statement that directly or indirectly copies itself.
The replacing action of a COPY statement shall not introduce a COPY statement, a SOURCE FORMAT directive, a comment or a blank line.
You can using COPY REPLACING in a main program to insert a copybook.
You can use COPY REPLACING in a copybook, to copy other copy books into the copybook.
So, yes, nested copy books are allowed. So the COBOL compiler allow this.
They don't occur a lot in practice, but they do occur.
The standard (draft 2010) has 7.2.2.3 13)
The implementation shall support nesting of at least 5 levels, including the first COPY statement in the sequence. The library text being copied shall not cause the processing of a COPY statement that directly or indirectly copies itself.
But 11) 12) and 14) mention restrictions on these nested books not including any REPLACING phrase, or where the replacement includes COPY. That all makes sense, as a single COPY statement is not meant to be recursive, but can encapsulate other COPY statements during the processing.
Related
What is the benefit of using paragraphs and sections for executing pieces of code, instead of using a subprogram instead? As far as I can see paragraphs and sections are dangerous because they have an non intuitive control flow, its easy to fall through and execute stuff you never meant to execute, and there is no variable (item) scoping, therefore it encourages a style of programming where everything is visible to everything else. Its a slippery soup.
I read a lot, but I could not find anything related to the comparative benefit of paragraphs/sections vs a subprogram. I also asked online some people in some COBOL forum, but their answers were along the lines of "is this a joke" or "go learn programming"(!!!).
I do not wish to engage in a discussion of stylistic preferences, everyone writes the way that their brain works, I only want to know, is there any benefit to using paragraphs/sections for flow control? As in, are there any COBOL operations that can be done only by using paragraphs/sections? Or is it just a remnant of an early way of thinking about code?
Because no other language I know of has mimicked that, so it either has some mechanical concrete essential reason to exist in COBOL, or it is a stylistic preference of the COBOL people. Can someone illuminate me on what is happening?
These are multiple questions... the two most important ones:
Are there any COBOL operations that can be done only by using paragraphs/sections?
Yes. A likely not complete list:
USE statements in DECLARATIVES can only apply to a paragraph or a section. These are used for handling file errors and exceptions. Not all compilers support this COBOL standard feature in full.
Segmentation (primary: a program that is only partially loaded in memory) is only possible with sections; but that is to be considered a "legacy feature" (at least I don't know of people actually using it this way explicitly); see the comment of Gilbert Le Blanc for more details on this
fall-through, many other languages have this feature with a kind of a switch statement (COBOL's EVALUATE, which is not the same as a common switch but can be used similar has an explicit break and no fall-through)
GO TO DEPENDING ON (could be recoded to achieve something similar with EVALUATE and then either PERFORM, if the paragraphs are expected to fall-through, which is not uncommon, then that creates a lot of extra code)
GO TO in general and especially nice - the old obsolete ALTER statement
PERFORM statement, format 1 "out-of-line"
file state is only shared between programs when you define it as EXTERNAL, and you often want to have a file state being limited to a single program
up to COBOL85: EXIT statement (plain without anything else, actually doing nothing else then a CONTINUE would)
What is the benefit of using paragraphs and sections for executing pieces of code, instead of using a subprogram instead?
shared data (I guess you know of programs with static data or otherwise (module)global data that is shared between functions/methods and also different source code files)
much less overhead than a CALL is
consistency:
you know what's in your code, you don't know what another program does (or at least: you cannot guarantee that it will do the same some years later exactly the same)
easier to extend/change: adding another variable (and also removing part of it, change its size) to a CALL USING means that you also have to adjust the called program - and all programs that call this, even when you place the complete definition in a copybook, which is very reasonable, this means you have to recompile all programs that use this
a section/paragraph is always available (it is already loaded when the program runs), a CALLed program may not be available or lead to an exception, for example because it cannot be loaded as its parameters have changed
less stuff to code
Note: While not all compilers support this you can work around nearly all of the runtime overhead and consistency issue when you use one source files with multiple program definitions in (possibly nested) and using a static call-convention. This likely gives you the "modern" view you aim for with scope-limitation of variables, within the programs either persistent (like local-static) when defined in WORKING-STORAGE or always passed when in LINKAGE or "local-temporary" when in LOCAL-STORAGE.
Should all code of an application be in one program?
[I've added this one to not lead to bad assumptions] Of course not!
Using sub-programs and also user-defined functions (possibly even nested providing the option for "scoped" and "shared" data) is a good thing where you have a "feature boundary" (for example: access to data, user-interface, ...) or with "modern" COBOL where you have a "language boundary" (for example: direct CALLs of C/Java/whatever), but it isn't "jut for limiting a counter to a section" - in this case: either define a variable which state is not guaranteed to be available after any PERFORM or define one for the section/paragraph; in both cases it would be reasonable to use a prefix telling you this.
Using that "separate by boundary" approach also takes care of the "bad habit of everything being seen by everyone" issue (which is in any case only true for "all sections/paragraphs in the same program).
Personal side note: I would only use paragraphs where it is a "shop/team rule" (it is better to stay consistent then to do things different "just because they are better" [still providing an option to possibly change the common rule]) or for GO TO, which I normally not use.
SECTIONs and EXIT SECTION + EXIT PERFORM [CYCLE] (and very rarely GOBACK/EXIT PROGRAM) make paragraphs nearly unnecessary.
very short answer. subroutines!!
Subroutines execute in the context of the calling routine. Two virtues: no parameter passing, easy to create. In some languages, subroutines are private to (and are part of) the calling (invoking) routine (see various dialects of BASIC).
direct answer: Section and Paragraph support a different way of thinking about programming. Higher performance than call subprogram. Supports overlays. The "fall thru" aspect can be quite useful, a feature rather than a vice. They may be necessary depending on what you are doing with a specific COBOL compiler.
See also PL/1, BAL/360, architecture 360/370/...
As a veteran Cobol dinosaur, I would say asking about the benefit is not the right question. I used paragraph (or section) differently than a subprogram. The right question in my opinion is when to use them logically. If I can make an analogy, if you have a Dog java class, you will write Dog-appropriate methods within it. If there's a cat involved, you may need a helper class. In this case the helper class is the subprogram. Though, you can instead code the helper class methods inside the Dog class, but that will be bad coding.
In any other language I would recommend putting self contained functions into subroutines.
However in COBOL not so much. If the code is very likely to be used in other programs then a subroutine is a good idea. Otherwise not!
The reason being the total lack of any checks on the number type or existence of passed parameters at compile time. Small errors in call statements lead to program crashes at run time. Limiting the use of sub-routines and carefully checking the calling code for errors makes for a more reliable program.
Using paragraphs any type mismatch will be flagged at compile time, or, an automatic conversion will occur.
I am trying to figure out what the purpose of the PERFORM command is below. Code was written 20 years ago. ACPY-READ-FIRST, ACPY-READ-NEXT, and ACPY-EXIT don't exist anywhere in the program.
MOVE ACPY-ID TO WS-ACPY-ID.
PERFORM ACPY-READ-FIRST THRU ACPY-EXIT.
150-PYMTS.
PERFORM ACPY-READ-NEXT THRU ACPY-EXIT.
IF NOT SUCCESSFUL OR
ACCT-ID NOT = ACPY-ACCT-ID
GO TO 160-DONE.
Answer: You wouldn't as this would create a syntax error with every compiler.
The paragraphs (or even sections, but I'd look for the former) have to be somewhere in the source unit, I'd say: 95% likeliness to find it in a copybook named in the COPY statement (= COBOL's "include"), 4% that it is inserted by a code generator that was used to process this and 1% that you've just overlooked it (COBOL is case-insensitive, just in case).
Hint: If you have all necessary sources you can use GnuCOBOL to process it and create a listing which shows you the copybook it the paragraphs are included in.
I am working on a project to automate COBOL to generate a class diagram. I am developing using a .NET console application. I need help tracking down the procedure name where the perform statement in used in the below example.
**Z-POST-COPYRIGHT.
move 0 to RETURN-CODE
perform Z-WRITE-FILE**
How do I track the procedure name 'Z-Post-COPYRIGHT' where the procedure 'Z-write-file' is called? The only idea I could think of in terms of COBOL is through indentation as the procedure names are always indented. Ideally in the database, the code should track the procedure name after the word 'perform' and procedure under which it is called (in this case it is Z-POST-COPYRIGHT).
I assume you want to do this "on your own" without external tools (a faster approach can be found at the end).
You first have to "know" your source:
which compiler was it compiled with (get a manual for this compiler)
which options were used
Then you have to preparse the source:
include copybooks (doing the given REPLACING rules if any)
if the source is in free-form reference format: concatenate contents of last line and current line if you find a - in column 7
check for REPLACE and change the result accordingly
remove all comments (maybe only * and \ in column 7 in fixed-form reference format or similar (extensions like "variable" format / "terminal" format", ... exist, maybe only inline comments - when in free-form reference-format, otherwise maybe inline comments *> or compiler specific extensions like |) - depending on the further re-engineering you want to do it could be a good idea to extract them and store them at least with a line number reference
The you finally can track the procedure name with the following rule:
go backwards to the last separator period (there are more rules but the rule "at least one line break, another period, a space a comma or a semicolon" [I've never seen the last two in real code but it is possible" should be enough)
check if there is only one word between this separator period and the next
if this word is no reserved COBOL word (this depends on your compiler) it is very likely a procedure name
Start from here and check the output, then fine grade the rule with actual false positives or missing entries.
If you want to do more than only extract the procedure-names for PERFORM and GO TO (you should at least check the sources for PERFROM ... THRU) then this can get to a lot of work...
Faster approach with external tools:
run a COBOL compiler on the complete sources and tell it to do the preparsing only - this way you have the big second point solved already
if you have the option: tell the compiler or an external tool to create a symbol table / cross reference - this will tell you in which line a procedure is and its name (you can simply find the correct procedure by comparing the line)
Just a note: You may want to check GnuCOBOL (formerly OpenCOBOL) for the preparsing and/or generation of symbol tables/cross-reference and/or printcbl for a completely external tool doing preparsing and/or cobxref for a complete cross reference generation.
I'm re-building a Lua to ES3 transpiler (a tool for converting Lua to cross-browser JavaScript). Before I start to spend my ideas on this transpiler, I want to ask if it's possible to convert Lua labels to ECMAScript 3. For example:
goto label;
:: label ::
print "skipped";
My first idea was to separate each body of statements in parts, e.g, when there's a label, its next statements must be stored as a entire next part:
some body
label (& statements)
other label (& statements)
and so on. Every statement that has a body (or the program chunk) gets a list of parts like this. Each part of a label should have its name stored in somewhere (e.g, in its own part object, inside a property).
Each part would be a function or would store a function on itself to be executed sequentially in relation to the others.
A goto statement would lookup its specific label to run its statement and invoke a ES return statement to stop the current statements execution.
The limitations of separating the body statements in this way is to access the variables and functions defined in different parts... So, is there a idea or answer for this? Is it impossible to have stable labels if converting them to ECMAScript?
I can't quite follow your idea, but it seems someone already solved the problem: JavaScript allows labelled continues, which, combined with dummy while loops, permit emulating goto within a function. (And unless I forgot something, that should be all you need for Lua.)
Compare pages 72-74 of the ECMAScript spec ed. #3 of 2000-03-24 to see that it should work in ES3, or just look at e.g. this answer to a question about goto in JS. As usual on the 'net, the URLs referenced there are dead but you can get summerofgoto.com [archived] at the awesome Internet Archive. (Outgoing GitHub link is also dead, but the scripts are also archived: parseScripts.js, goto.min.js or goto.js.)
I hope that's enough to get things running, good luck!
I'm developing an application that parses Cobol programs. In these programs some respect the traditional coding style (programm text from column 8 to 72), and some are newer and don't follow this style.
In my application I need to determine the coding style in order to know if I should parse content after column 72.
I've been able to determine if the program start at column 1 or 8, but prog that start at column 1 can also follow the rule of comments after column 72.
So I'm trying to find rules that will allow me to determine if texts after column 72 are comments or valid code.
I've find some but it's hard to tell if it will work everytime :
dot after column 72, determine the end of sentence but I fear that dot can be in comments too
find the close character of a statement after column 72 : " ' ) }
look for char at columns 71 - 72 - 73, if there is not space then find the whole word, and check if it's a key word or a var. Problem, it can be a var from a COPY or a replacement etc...
I'd like to know what do you think of these rules and if you have any ideas to help me determine the coding style of a Cobol program.
I don't need an API or something just solid rules that I will be able to rely on.
I think you need to know the COBOL compiler for each program. Its documentation should tell you what conventions/configurations/switches it uses to decide if the source code ends at column 72 or not.
So.... which compiler(s)?
And if you think the column 72 issue is a pain, wait till you get around to actually parsing the COBOL itself. If you are not well prepared to handle the lexical issues of the language, you are probably very badly prepared to handle the syntactic ones.
There is no absolutely reliable way to determine if a COBOL program
is in fixed or free format based only on the source code. Heck it is sometimes difficult to identify
the programming language based only on source code. Check out
this classic polyglot - it is valid under 8 different language compilers. That
said, you could try a few heuristics that might yield
the correct answer more often than not.
Compiler directives imbedded in source code
Watch for certain compiler directives that determine code format.
Unfortunately, every compiler vendor uses their own flavour of directive.
For example, Microfocus COBOL uses the
SOURCEFORMAT directive. This directive will appear near the top of the program so a short pre-scan
could be used to find it. On the other hand, OpenCobol uses >>SOURCE FORMAT IS FREE and
>>SOURCE FORMAT IS FIXED to toggle between free and fixed format, different parts of the same program
could be formatted differently!
The bottom line here is that you will have to support the conventions of multiple COBOL compilers.
Compiler switches
Source code format can be also be specified using a compiler switch. In this case, there are no concrete
clues to go on. However, you can be reasonably sure that the entire source program will be either
fixed or free. All you can do here is guess. Unless the programmer is out to "mess with
your head" (and some will), a program in free format will have the keywords IDENTIFICATION DIVISION or ID DIVISION, starting before column 8.
Every COBOL program will begin with these keywords so you can use them as the anchor point for determining code format in the
absence of imbedded compiler directives.
Warning - this is far from fool proof, but might be a good start.
There won't be an algorithm to do this with 100% certainty, because if comments can be anything, they can also be compilable COBOL code. So you could theoretically write a program that means one thing if the comments are ignored, and something else entirely if the comments are treated as part of the COBOL.
But that's extremely unlikely. What's most likely to happen is that if you try to compile the code under the wrong convention, it will simply fail. So the only accurate way to do this is to try compiling/parsing the program one way, and if you come to a line that can't make sense, switch to the other style. You could also support passing an argument to the compiler when the style is already known.
You can try using heuristics like what you've described, but that will never be totally accurate. The most they can give you is a probability that the code is one or the other style, which will increase as they examine more and more lines of code. They could be useful for helping you guess the style before you start compiling, or for figuring out when the problem is really just a typo in the code.
EDIT:
Regarding ideas for heuristics, it's hard to say. If there were a standard comment sigil like // or # in other languages, this would be a lot easier (actually, there is, but it sounds like your code doesn't follow this convention). The only thing I can think of would be to check whether every line (or maybe 99% of lines, and not counting empty lines or lines commented with *) has a period somewhere before position 72.
One thing you DON'T want to do is apply any heuristics to the part after position 72. That is, you don't want to be checking the comments to see if they're valid COBOL. You want to check what you know is COBOL first, and see if that works by itself. There are several reasons for this:
Comments written in English are likely to have periods and quotes in them, so your first and second bullet points are out.
Natural languages are WAY harder to parse than something like COBOL.
The comments could easily have COBOL in them (maybe someone commented out the previous version of the line).
An important rule for comments is that they should never affect what the program does. If changing the comments can change how the program is compiled, you violate that.
All that in mind, my opinion is that you shouldn't use heuristics at all. You should always try to compile the program under both conventions unless one is explicitly specified. There's a chance that code will compile successfully under both conventions, and then you'll have two different programs and no way to tell which one is correct.
If that happens, you need to compare the two results (perhaps with a hash or something) to see if they're the same program. If they're the same, great, but if not, you'll need to force the user to explicitly choose a convention.
Most COBOL compilers will allow you to generate and analyze the post text manipulation phase.
The text preprocessor output can be seen (using OpenCOBOL for the example)
cobc -E program.cob
The text manipulation processor deals with any COPY ... REPLACING compiler directives, as well as converting SOURCE FORMAT IS FIXED (with line continuations, string literal concatenations, comment line removal, among other things) to the actual free format that the compiler lexical analyzer needs. A lot of the OpenCOBOL toolkits (Cross referencer and Animator, to name two) use source code AFTER the preprocessor pass. I don't think you'll lose any street cred if your parser program relies on post processed source code files.