Why flang Fortran print adds a new line at a certain width? [duplicate] - printing

I want to write the following output to a txt file using f77:
14 76900.56273 0.000077 -100000 1000000000 -0.769006
I use:
write(6,*) KINC, BM, R2, AF, BK, BM/AF
without any format (which works well in terms of decimal digits). However in my txt file the output is written as:
14 76900.56273 0.000077 -100000
1000000000 -0.769006
Because I think there is a fixed column width limit by default. I don't know if it is possible to change this so that I can just copy and paste it to excel.
I've looked at FORTRAN 77 Language Reference but I haven't found a way to do it. Any ideas? Thanks

use format
or check your compiler's option
if your compiler is one of dec/compaq/intel, read this link.
http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/fortran-win/hh_goto.htm#GUID-C6A40AAC-81D8-4DD8-A792-62792B3AC213.htm#GUID-C6A40AAC-81D8-4DD8-A792-62792B3AC213
list directed output (fmt=*) :: 80 column limit default.
"There is a property of list-directed sequential WRITE statements called the right margin. If you do not specify RECL as an OPEN statement specifier or in environmental variable FORT_FMT_RECL, the right margin value defaults to 80. When RECL is specified, the right margin is set to the value of RECL. If the length of a list-directed sequential WRITE exceeds the value of the right margin value, the remaining characters will wrap to the next line. Therefore, writing 100 characters will produce two lines of output, and writing 180 characters will produce three lines of output."
In intel's manual, blue color indicates extensions to the Fortran Standards. These extensions (non-standard features) may or may not be implemented by other compilers that conform to the language standard.
oracle(sun) F77
http://docs.oracle.com/cd/E19957-01/805-4939/6j4m0vnbu/index.html#z400074369ac
"Output lines longer than 80 characters are avoided where possible"

With the asterisk as the format, you are using listed-directed IO. This is intended as a convenience. It gives the programmer minimal control, with few restrictions on the compiler and incomplete portability. The compiler is free to determine aspects such as line length. If you want control over line length, switch to using an actual format.
P.S. Why use FORTRAN 77? Fortran 90/95/2003 is easier to use and more powerful. gfortran is an open-source compiler.

Related

Add to zero...What is it for?

Why such code is used in some applications instead of a MOVE?
add 16 to ZERO giving SOME-RESULT
I spotted this in professionally written code at several spots.
Sorce is on this page
Why such code is used in some applications instead of a MOVE?
add 16 to ZERO giving SOME-RESULT
Without seeing more of the code, it appears that it could be a translation of IBM Assembler to COBOL. In particular, the ZAP (Zero and Add Packed) instruction may be literally translated to the above instruction, particularly if SOME-RESULT is COMP-3. Thus, someone checking the translation could see that the ZAP instruction was faithfully translated.
Or, it could be an assembler programmer's idea of a joke.
Having seen the code, I also note the use of
subtract some-data-item from some-data-item
which is used instead of
move zero to some-data-item
This is consistent with operations used with packed decimal fields in IBM Assembly, where there are no other instructions to accomplish "flexible" moves. By flexible, I mean that the packed decimal instructions contain a length field so that specific size MVC instructions need not be used.
This particular style, being unusual, may be related to catching copyright violations.
From my experience, I'm pretty sure I know the reason why the programmer would have done this. It has something to do with the binary representation of the number.
I bet SOME-RESULT is a packed-decimal (or COMP-3) format number. Let's assume the field is defined like this
05 SOME-RESULT PIC S9(5) COMP-3.
This results in a 3-byte field with a hex representation like this
x'00016C'
The decimal number is encoded as a binary encoded decimal (BCD, one decimal digit per half-byte), and the last half-byte holds the sign.
Let's take a look at how the sign is defined:
if it is one of x'C', x'A', x'F', x'E' (café), then the number is positive
if it is one of x'B', x'D', then the number is negative
any of x'0'..'x'9' are not valid signs, so we can distinguish signed packed-decimals from unsigned.
However, a zoned number (PIC 9(5) DISPLAY) - as in the source code - looks like this:
x'F0F0F0F1F6'
As you can see, each decimal digit is an EBCDIC character with the 'zone' part (the first half-byte) always being x'F'.
Now we get closer to your question!
What happens when we use
MOVE 16 TO SOME-RESULT
If you just MOVE a number to such a field, this results in being compiled into a PACK instruction on the machine code level.
PACK SOME-RESULT,=C'16'
A pack instruction takes a zoned number and packs it by picking only the second half-byte of each byte and storing it in the half-bytes of the packed number - with one exception! When it comes to the last byte, it simply flips the two half-bytes and stores them in the last half-byte of the decimal.
This means that the zone of the last byte of the zoned decimal becomes the sign in the packed decimal:
x'00016F'
So now we have an x'F' as the sign – which is a valid positive sign.
However, what happens if use this Cobol instruction instead
ADD 16 TO ZERO GIVING SOME-RESULT
This compiles into multiple machine level instructions
PACK SOME_RESULT,=C'0'
PACK TEMP,=C'16'
AP SOME_RESULT,TEMP
(or similar - the key point is that is needs an AP somewhere)
This makes a slight difference in the result, because the AP (add packed) instruction always sets the resulting sign to either x'C' for a positive or x'D' for a negative result.
So the difference lies in the sign
x'00016C'
Finally, the question is why would one make this difference? After all, both x'F' and x'C' are valid positive signs. So why care?
There is one situation when this slight difference can cause big problems: When the packed decimal is part of an index key, then we would not get a match, even though the numbers are semantically identical!
Because this situation occurred quite often in older databases like VSAM and DL/I (later: IMS/DB), it became good practice to "normalize" packed decimals if they were part of an index key.
However, some programmers adopted the practice without knowing why, so you may come across code that uses this "normalization" even though the data are not used for index keys.
You might also wonder why a compiler does not optimize out the ADD 16 TO ZERO. I'm pretty sure it once did, but that broke a lot of applications, so this specific optimization was removed again or at least made a non-default option with warnings.
Additional useful info
Note that at least the Enterprise Cobol for z/OS compiler allows you to see exactly the machine code that is produced from your source code if use the LIST compile option (see this example output). I recommend to always compile with options LIST, MAP, OFFSET, XREF because these options enable you find the exact problem in your Cobol source even when you only have a program dump from an abend.
Anyway, good programming practice is not to care about the compiler or the machine code, but about the other programmers who will have to maintain, and thus read and understand the code. Good practice would be to always prefer simple and readable instructions, and to document the reasons (right in the code) when deviating from this rule.
Some programmers like to do things "just because they can". I have a feeling that is what you are seeing here. It makes about as much sense as doing
a := 0 + b
would in go.

Are there Ansi escape sequences for superscript and subscript?

I'm playing around with ANSI escape sequences, e.g.
echo -e "\e[91mHello\e[m"
on a Linux console to display colored text.
Now I try to use superscript and subscript output like a=b².
I read here and here about: Partial Line Down (subscript) and Partial Line Up (superscript) but I'm not sure about the exact syntax and even which terminal client might supports this.
Any suggestions about this?
Possibly some commercial product supports it, but it's not supported by any terminal emulator you'll encounter (unless someone modifies one just to prove a point).
The standard describes possible escape sequences, but there is no requirement that any given sequence is supported by any terminal. There are commonly supported (and assumed) sequences such as clearing the screen, but even for that, not all terminals have supported the feature.
The reason is that terminal emulators are generally used with applications (such as text editors) which assume a regular set of rows/columns, and that the text is shown compactly (no extra space such as would be needed to allow for partial line movement. Back in the day when people used typewriters, it was common to have 1.5 or 2.0 line-spacing, and get no more than 33 lines on a page. That changed, long ago.
The need for subscripts/superscripts didn't go away — Unicode provides a usable set of characters with that representation (see Superscripts and Subscripts
Range: 2070–209F)
Further reading:
Your New Royal Portable (1953).
Line Spacing - Butterick's Practical Typography
console_codes - Linux console escape and control sequences

Informix 4GL report to screen - Reverse

I have a generated report in Informix 4GL that prints to the screen.
I need to have one column displayed in reverse format.
I tried the following:
print line_image attribute(reverse)
But that doesn't work. Is this possible at all?
Adding on to the previous answer, you can try the following
print "\033[7mHello \033[0mWorld"
\033[7m means to print in reverse. And, \033[0m means to go back to standard.
If you mean "is there any way at all to do it", the answer's "yes". If you mean "is there a nice easy built-in way to do it", the answer's "no".
What you'll need to do is:
Determine the character sequence that switches to 'reverse' video — store the characters in a string variable brv (begin reverse video; choose your own name if you don't like mine).
Determine the character sequence that switches to 'normal' video — store the characters in a string variable erv (end reverse video).
Arrange for your printing to use:
PRINT COLUMN 1, first_lot_of_data,
COLUMN 37, brv, reverse_data,
COLUMN 52, erv,
COLUMN 56, next_lot_of_data
There'll probably be 3 or 4 characters needed to switch. Those characters will be counted by the column-counting code in the report.
Different terminal types will have different sequences. These days, the chances are your not dealing with the huge variety of actual green-screen terminals that were prevalent in the mid-80s, so you may be able to hardwire your findings for the brv and erv strings. OTOH, you may have to do some fancy footwork to find the correct sequences for different terminals at runtime. Shout if you need more information on this.
A simple way which might allow you to discover the relevant sequences is to run a program such as (this hasn't been anywhere near an I4GL compiler — there are probably syntax errors in it):
MAIN
DISPLAY "HI" AT 1,1
DISPLAY "REVERSE" AT 1,4 ATTRIBUTE(REVERSE)
DISPLAY "LO" AT 1, 12
SLEEP 2
END MAIN
Compile that into terminfo.4ge and run:
./terminfo.4ge # So you know what the screen looks like
./terminfo.4ge > out.file
There's a chance that won't use the display attributes. You'd see that if you run cat out.file and don't see the reverse flash up, then we have to work harder.
You could also look at the terminal entry in the termcap file or from the terminfo entry. Use infocmp $TERM (with the correct terminal type set in the environment variable) and look for the smso (enter standout mode) and rmso (exit standout mode) capabilities. Decipher those (I have rmso=\E[27m and smso=\E[7m for an xterm-256color terminal; the \E is ASCII ESC or \033) and use them in the brv and erv strings. Note that rmso is 5 characters long.

How can these strings be different?

I am facing a weird problem.
I have extracted data from an Excel file. It should contain an IBAN account number.
Then I tried to analyze the set of account numbers (which the source guarantees to be good) with a Java library.
To keep the scope of the question narrow, I can't explain the following. The below strings are different
030​69
03069
The first is a copy & paste from the Excel file, the second is handwritten. Google returns different results for abi [above number] and in fact in the second case I can find that it is the bank code for Intesa Sanpaolo bank (exact page displaying the ABI code, localized, here).
So, to keep the scope narrow: how is that possible? Is it something to do with the encoding?
Try it yourself: do CTRL+F and try type "030", it will select both lines. Now type 6, it will match only the 2nd line.
Same happened in Notepad++
There's an U+200B ZERO WIDTH SPACE in between 030 and 69 in the first text.
Paste the text in https://www.branah.com/unicode-converter for example, or edit in a hexadecimal capable editor.
The solution for cleaning such strings could be for example to whitelist characters, so replace everything that isn't A-Z0-9 will be scrubbed.

Determine Cobol coding style

I'm developing an application that parses Cobol programs. In these programs some respect the traditional coding style (programm text from column 8 to 72), and some are newer and don't follow this style.
In my application I need to determine the coding style in order to know if I should parse content after column 72.
I've been able to determine if the program start at column 1 or 8, but prog that start at column 1 can also follow the rule of comments after column 72.
So I'm trying to find rules that will allow me to determine if texts after column 72 are comments or valid code.
I've find some but it's hard to tell if it will work everytime :
dot after column 72, determine the end of sentence but I fear that dot can be in comments too
find the close character of a statement after column 72 : " ' ) }
look for char at columns 71 - 72 - 73, if there is not space then find the whole word, and check if it's a key word or a var. Problem, it can be a var from a COPY or a replacement etc...
I'd like to know what do you think of these rules and if you have any ideas to help me determine the coding style of a Cobol program.
I don't need an API or something just solid rules that I will be able to rely on.
I think you need to know the COBOL compiler for each program. Its documentation should tell you what conventions/configurations/switches it uses to decide if the source code ends at column 72 or not.
So.... which compiler(s)?
And if you think the column 72 issue is a pain, wait till you get around to actually parsing the COBOL itself. If you are not well prepared to handle the lexical issues of the language, you are probably very badly prepared to handle the syntactic ones.
There is no absolutely reliable way to determine if a COBOL program
is in fixed or free format based only on the source code. Heck it is sometimes difficult to identify
the programming language based only on source code. Check out
this classic polyglot - it is valid under 8 different language compilers. That
said, you could try a few heuristics that might yield
the correct answer more often than not.
Compiler directives imbedded in source code
Watch for certain compiler directives that determine code format.
Unfortunately, every compiler vendor uses their own flavour of directive.
For example, Microfocus COBOL uses the
SOURCEFORMAT directive. This directive will appear near the top of the program so a short pre-scan
could be used to find it. On the other hand, OpenCobol uses >>SOURCE FORMAT IS FREE and
>>SOURCE FORMAT IS FIXED to toggle between free and fixed format, different parts of the same program
could be formatted differently!
The bottom line here is that you will have to support the conventions of multiple COBOL compilers.
Compiler switches
Source code format can be also be specified using a compiler switch. In this case, there are no concrete
clues to go on. However, you can be reasonably sure that the entire source program will be either
fixed or free. All you can do here is guess. Unless the programmer is out to "mess with
your head" (and some will), a program in free format will have the keywords IDENTIFICATION DIVISION or ID DIVISION, starting before column 8.
Every COBOL program will begin with these keywords so you can use them as the anchor point for determining code format in the
absence of imbedded compiler directives.
Warning - this is far from fool proof, but might be a good start.
There won't be an algorithm to do this with 100% certainty, because if comments can be anything, they can also be compilable COBOL code. So you could theoretically write a program that means one thing if the comments are ignored, and something else entirely if the comments are treated as part of the COBOL.
But that's extremely unlikely. What's most likely to happen is that if you try to compile the code under the wrong convention, it will simply fail. So the only accurate way to do this is to try compiling/parsing the program one way, and if you come to a line that can't make sense, switch to the other style. You could also support passing an argument to the compiler when the style is already known.
You can try using heuristics like what you've described, but that will never be totally accurate. The most they can give you is a probability that the code is one or the other style, which will increase as they examine more and more lines of code. They could be useful for helping you guess the style before you start compiling, or for figuring out when the problem is really just a typo in the code.
EDIT:
Regarding ideas for heuristics, it's hard to say. If there were a standard comment sigil like // or # in other languages, this would be a lot easier (actually, there is, but it sounds like your code doesn't follow this convention). The only thing I can think of would be to check whether every line (or maybe 99% of lines, and not counting empty lines or lines commented with *) has a period somewhere before position 72.
One thing you DON'T want to do is apply any heuristics to the part after position 72. That is, you don't want to be checking the comments to see if they're valid COBOL. You want to check what you know is COBOL first, and see if that works by itself. There are several reasons for this:
Comments written in English are likely to have periods and quotes in them, so your first and second bullet points are out.
Natural languages are WAY harder to parse than something like COBOL.
The comments could easily have COBOL in them (maybe someone commented out the previous version of the line).
An important rule for comments is that they should never affect what the program does. If changing the comments can change how the program is compiled, you violate that.
All that in mind, my opinion is that you shouldn't use heuristics at all. You should always try to compile the program under both conventions unless one is explicitly specified. There's a chance that code will compile successfully under both conventions, and then you'll have two different programs and no way to tell which one is correct.
If that happens, you need to compare the two results (perhaps with a hash or something) to see if they're the same program. If they're the same, great, but if not, you'll need to force the user to explicitly choose a convention.
Most COBOL compilers will allow you to generate and analyze the post text manipulation phase.
The text preprocessor output can be seen (using OpenCOBOL for the example)
cobc -E program.cob
The text manipulation processor deals with any COPY ... REPLACING compiler directives, as well as converting SOURCE FORMAT IS FIXED (with line continuations, string literal concatenations, comment line removal, among other things) to the actual free format that the compiler lexical analyzer needs. A lot of the OpenCOBOL toolkits (Cross referencer and Animator, to name two) use source code AFTER the preprocessor pass. I don't think you'll lose any street cred if your parser program relies on post processed source code files.

Resources