Suppose I have a PS input file that contains 9-byte long data. The input data would be numbers that are necessarily left aligned. Hence, in case the number in the input file is smaller than 9 bytes, the number would be trailed by spaces.
When I read such a number into a 9(9) variable and DISPLAY the value the value displayed would be the number followed by the spaces (checked with SET HEX ON in the spool).
But instead when I MOVE the value from the 9(9) variable into an S9(9) COMP variable and then DISPLAY its value, the value displayed would be some random numeric value.
My question here is, how does COBOL interpret/convert the value for an S9(9) COMP variable in the above scenario?
It's a computer, the results you get are far from random.
What did you think would happen?
You are MOVEing something which is not numeric, but which you have defined as numeric, to a binary field. You will have a value which relates to your data, just not what you expect.
You have compiler option NUMPROC(NOPFD). If you had NUMPROC(PFD) you would have got an S0C7 abend,
You should find that all your trailing spaces get treated as zero, with NUMPROC(NOPFD).
A zoned-decimal, prior to a calculation, or, as in your case, a conversion to binary, is "packed". Which means all the zones are tossed away, the sign and final digit are reversed.
So, in the packed number you only get the numerics and the sign.
As long as all the numerics are 0-9 and the sign is A-F, no S0C7. Garbled results, but no S0C7.
If we consider your final two trailing spaces, X'4040'. You have NUMPROC(NOPFD) so the compiler will "fix" the sign for you (to an F in this case). Toss the zones (so the first 4 goes) swap the last byte (become X'04'), fix the sign (becomes X'0F') and converts that value to binary (which is successful). You have turned spaces into zeros.
If you used NUMPROC(PFD) the sign fixing would not occur, and the convert to binary (CVB) would give you a S0C7 abend.
Related
Basically, I've been given an input file that gets passed in too a mix of fields that are alphanumeric and numeric. My goal is to test each field for valid data. The first field is an alphanumeric with a Pic X(3) description that should represent a number. However, because i'm testing for data validity there will be instances where the value could have a letter in there such as 0R1 or a negative value -001.
When testing for data, using the "Is Numeric" test works perfectly for finding values that aren't numeric. However, it fails the test when there is a negative sign being passed in. I assume this is happening because it recognizes that a dash or negative symbol (-) is not a numeric character. My overall goal is to test if the number is both numeric and positive, but given the circumstances above, i'm having trouble achieving a proper test.
Any recommendations on how to get past this?
Thanks
I suggest to use FUNCTION TEST-NUMVAL(some-data) if you need a good check of anything that is defined as alphanumeric/alphabetic; if this passes you can then use FUNCTION NUMVAL(some-data) to get the actual value.
For more details you could check on those in the current draft of the COBOL standard.
I am reading the following statement and I am not sure why we must have the packed-decimal in odd digits? Is the following statement true, that you can only have odd number of digits in hardware? Can you give me a example to show why it says that?
RULES(NOEVENPACK) This compiler option will tell you if you
accidentally define a Pack Decimal data item within even number of
digits. You can only have odd number of digits in hardware. If you
have one byte you have one digit , 2 byte you have 3 digit, 3 byte
-->5 digit.
In a packed-decimal field, the right-most half-byte (nybble) is the sign position. Each other half-byte in the field is a digit, 0-9.
This means that the storage occupied by a packed-decimal field represents an odd number of digits. You have no choice over that.
If you define PACKED-DECIMAL PIC 9(4) you get
?NNNNS
Where N is a digit (see, there are four of them) and S is the sign (since the field is defined as unsigned, it will have a sign of F, which is always treated as positive).
What about that ?. It can't not be there. Since it can't not be there, the compiler has to generate code so that it can only contain a zero, which won't affect the value of the field.
If you define PACKED-DECIMAL PIC 9(5) you get
NNNNNS
Five digits, sign, and nothing else for the compiler to worry about. No code generated beyond what is otherwise required for the field.
So your code runs faster.
You may wonder "how much does that matter?". If you consider how many packed-decimal fields you may see in a program, if each of those, every time it was referenced, had code to make the first digit zero, you've got quite a lot of code, for every pass through the program.
On the Mainframe you, generally, pay for resource-usage. If you avoid that in 5,000 programs which are processing 10,000,000 transactions a day, 365 days a year, then it adds up.
I am writing a program that converts national and international account numbers into IBAN numbers. To start, I need to form a string: Bank ID + Branch ID + Account Number + ISO Country Code without the trailing spaces that may be present in these fields. But not every account number has the same length, some account numbers have branch identifiers while others don't, so I will always end up with trailing spaces from these fields.
My working storage looks something like this:
01 Input-IBAN.
05 BANK-ID PIC N(10) VALUE "LOYD".
05 BRANCH-ID PIC N(10) VALUE " ".
05 ACCOUNT-NR PIC N(28) VALUE "012345678912 ".
05 COUNTRY-CODE PIC N(02) VALUE "GB".
01 Output-IBAN PIC N(34).
I've put some values in there for the example; in reality it would depend on the input. The branch code is optional, hence me leaving it empty in the example.
I basically want to go from this input strung together:
"LOYD 012345678912 GB"
to this:
"LOYD012345678912GB"
Does anyone know a way to do this that does not result in performance issues? I have thought of using the FUNCTION REVERSE and then using an INSPECT for tallying leading spaces. But I've heard that's a slow way to do it. Does anyone have any ideas? And maybe an example on how to use said idea?
EDIT:
I've been informed that the elementary fields may contain embedded spaces.
I see now that you have embedded blanks in the data. Neither answer you have so far works, then. Gilbert's "squeezes out" the embedded blanks, mine would lose any data after the first blank in each field.
However, just to point out, I don't really believe you can have embedded blanks if you are in any way generating an "IBAN". For instance, https://en.wikipedia.org/wiki/International_Bank_Account_Number#Structure,
specifically:
The IBAN should not contain spaces when transmitted electronically.
When printed it is expressed in groups of four characters separated by
a single space, the last group being of variable length
If your source-data has embedded blanks, at the field level, then you need to refer that back up the line for a decision on what to do. Presuming that you receive the correct answer (no embedded blanks at the field level) then both existing answers are back on the table. You amend Gilbert's by (logically) changing LENGTH OF to FUNCTION LENGTH and dealing with any possibility of overflowing the output.
With the STRING you again have to deal with the possibility of overflowing the output.
Original answer based on the assumption of no embedded blanks.
I'll assume you don't have embedded blanks in the elementary items which make up your structure, as they are sourced by standard values which do not contain embedded blanks.
MOVE SPACE TO OUTPUT-IBAN
STRING BANK-ID
BRANCH-ID
ACCOUNT-NR
COUNTRY-CODE
DELIMITED BY SPACE
INTO OUTPUT-IBAN
STRING only copies the values until it runs out of data to copy, so it is necessary to clear the OUTPUT-IBAN before the STRING.
Copying of the data from each source field will end when the first SPACE is encountered in each source field. If a field is entirely space, no data will be copied from it.
STRING will almost certainly cause a run-time routine to be executed and there will be some overhead for that. Gilbert LeBlanc's example may be slightly faster, but with STRING the compiler deals automatically with all the lengths of all the fields. Because you have National fields, ensure you use the figurative-constant SPACE (or SPACES, they are identical) not a literal value which you think contains a space " ". It does, but it doesn't contain a National space.
If the result of the STRING is greater than 34 characters, the excess characters will be quietly truncated. If you want to deal with that, STRING has an ON OVERFLOW phrase, where you specify what you want done in that case. If using ON OVERFLOW, or indeed NOT ON OVERFLOW you should use the END-STRING scope-terminator. A full-stop/period will terminate the STRING statement as well, but when used like that it can never, with ON/NOT ON, be used within a conditional statement of any type.
Don't use full-stops/periods to terminate scopes.
COBOL doesn't have "strings". You cannot get rid of trailing spaces in fixed-length fields, unless the data fills the field. Your output IBAN will always contain trailing spaces when the data is short.
If you were to actually have embedded blanks at the field level:
Firstly, if you want to "squeeze out" embedded blanks so that they don't appear in the output, I can't think of a simpler way (using COBOL) than Gilbert's.
Otherwise, if you want to preserve embedded blanks, you have no reasonable choice other than to count the trailing blanks so that you can calculate the length of the actual data in each field.
COBOL implementations do have Language Extensions. It is unclear which COBOL compiler you are using. If it happens to be AcuCOBOL (now from Micro Focus) then INSPECT supports TRAILING, and you can count trailing blanks that way. GnuCOBOL also supports TRAILING on INSPECT and in addition has a useful intrinsic FUNCTION, TRIM, which you could use to do exactly what you want (trimming trailing blanks) in a STRING statement.
move space to your-output-field
string function
trim
( your-first-national-source
trailing )
function
trim
( your-second-national-source
trailing )
function
trim
( your-third-national-source
trailing )
...
delimited by size
into your-output-field
Note that other than the PIC N in your definitions, the code is the same as if using alphanumeric fields.
However, for Standard COBOL 85 code...
You mentioned using FUNCTION REVERSE followed by INSPECT. INSPECT can count leading spaces, but not, by Standard, trailing spaces. So you can reverse the bytes in a field, and then count the leading spaces.
You have National data (PIC N). A difference with that is that it is not bytes you need to count, but characters, which are made up of two bytes. Since the compiler knows you are using PIC N fields, there is only one thing to trip you - the Special Register, LENGTH OF, counts bytes, you need FUNCTION LENGTH to count characters.
National data is UTF-16. Which happens to mean the two bytes for each character happen to be "ASCII", when one of the bytes happens to represent a displayable character. That doesn't matter either, running on z/OS, an EBCDIC machine, as the compiler will do necessary conversions automatically for literals or alpha-numeric data-items.
MOVE ZERO TO a-count-for-each-field
INSPECT FUNCTION
REVERSE
( each-source-field )
TALLYING a-count-for-each-field
FOR LEADING SPACE
After doing one of those for each field, you could use reference-modification.
How to use reference-modification for this?
Firstly, you have to be careful. Secondly you don't.
Secondly first:
MOVE SPACE TO output-field
STRING field-1 ( 1 : length-1 )
field-2 ( 1 : length-2 )
DELIMITED BY SIZE
INTO output-field
Again deal with overflow if possible/necessary.
It is also possible with plain MOVEs and reference-modification, as in this answer, https://stackoverflow.com/a/31941665/1927206, whose question is close to a duplicate of your question.
Why do you have to be careful? Again, from the answer linked previously, theoretically a reference-modification can't have a zero length.
In practice, it will probably work. COBOL programmers generally seem to be so keen on reference-modification that they don't bother to read about it fully, so don't worry about a zero-length not being Standard, and don't notice that it is non-Standard, because it "works". For now. Until the compiler changes.
If you are using Enterprise COBOL V5.2 or above (possibly V5.1 as well, I just haven't checked) then you can be sure, by compiler option, if you want, that a zero-length reference-modification works as expected.
Some other ways to achieve your task, if embedded blanks can exist and can be significant in the output, are covered in that answer. With National, just always watch to use FUNCTION LENGTH (which counts characters), not LENGTH OF (which counts bytes). Usually LENGTH OF and FUNCTION LENGTH give the same answer. For multi-byte characters, they do not.
I have no way to verify this COBOL. Let me know if this works.
77 SUB1 PIC S9(4) COMP.
77 SUB2 PIC S9(4) COMP.
MOVE 1 TO SUB2
PERFORM VARYING SUB1 FROM 1 BY 1
UNTIL SUB1 > LENGTH OF INPUT-IBAN
IF INPUT-IBAN(SUB1:1) IS NOT EQUAL TO SPACE
MOVE INPUT-IBAN(SUB1:1) TO OUTPUT-IBAN(SUB2:1)
ADD +1 TO SUB2
END-IF
END-PERFORM.
I have a compute statement that uses fields like so:
WS-COMPUTE PIC 9(14).
WS-NUM-1 PIC 9(09).
WS-NUM-2 PIC 9(09).
WS-NUM-3 PIC S9(11) COMP-3.
WS-DENOM PIC 9(09).
And then there is logic to make a computation
COMPUTE WS-COMPUTE =
((WS-NUM-1 - WS-NUM-2 + WS-NUM-3)
/ WS-DENOM) * 100
The * 100 is in there because a number < 1 is expected from the division, but 0 is what was always stored in WS-COMPUTE.
We got a workaround by declaring another field that did have implied decimals, and then moving that to value to WS-COMPUTE, but I was lost on why the original would always populate WS-COMPUTE with 0?
The number of decimal places for the results of intermediate calculations are directly related to the number of decimal places in your the final result field (you can consult the manual in the case where you have multiple result fields) when there are no decimal places in the individual operands. COBOL is not going to use a predetermined number of decimal places for intermediate results. If neither actual operands in question nor final result contain decimal places, the intermediate result will not contain decimal places.
The relationship is: number of decimal places in intermediate results = number of decimal places in final result field. The only thing which can modify this is the specification of ROUNDED. If ROUNDED is specified, one extra decimal place is kept for the intermediate result fields, and that will be used to perform the rounding of the final result.
You have no decimal places on your final result, and no ROUNDED. So the intermediate results will have no decimal places. If you get a value of less than zero, then it is gone before anything can happen to it. It is stored as zero, because there is no decimal part available to store it in.
You need to understand COMPUTE before you use it. Nowhere near enough people do. There is absolutely no need to specify excessive lengths of fields or decimal places where none are needed. These a common ways to "deal with" a problem, but are unnecessary, as the actual problem is a poorly-formed COMPUTE.
If your COMPUTE contains multiplication, do that first. If it contains division, do that last. This may require re-arranging a formula, but this will give you the correct result. Subject to truncation, which comes in two parts, as Bruce Martin has indicated. There is the one you are getting, decimal truncation through not specifying enough (any) decimal places when you expect a decimal-only value for an intermediate result, and high-order truncation if your source fields are not big enough. Always remember that the result field controls the size (decimal and integer) of the intermediate results. If you do those things, your COMPUTEs will always work.
And consider whether you want the final result rounded. If so, use ROUNDED. If you want intermediate results to be rounded, you need to do that yourself with separate COMPUTEs or DIVIDEs or MULTIPLYs.
If you don't take these things into account, your COMPUTEs will work by accident, or sometimes, or not at all, or when you specify excessive size or decimal places. Always remember that the result field controls the size (decimal) of the intermediate results where operands contain no decimal places.
If you don't need any decimal places in the final result, use Bruce Martin's first COMPUTE:
COMPUTE WS-COMPUTE = (((WS-NUM-1 - WS-NUM-2 + WS-NUM-3) * 100) / WS-DENOM
If you do need decimal places, use Bruce Martin's first COMPUTE (yes, the same one) with the decimals defined on the final result (WS-COMPUTE).
If you need the result to be rounded (0-4 down, 5-9 up) use ROUNDED. If you need some other rounding, specify the final result with an extra decimal place beyond what you need, and do your own rounding to your specification.
If you look at the column to the right of your question, under Related, you'll find existing questions here which would/should have answered this one for you.
You do not need to add spurious digits or spurious decimal places to everything in sight. Ensure your final result is big enough, has enough decimal places, and pay attention to the order of things. Read your manual which should document intermediate results. If your manual does not cover this, the IBM Enterprise COBOL manuals are an excellent general reference, as well as specific ones. The Programming Guide devotes an entire Appendix to intermediate results.
It sounds like you are using the TRUNC(STD) option, the compiler takes the picture clause to decide what precision to use for intermediate results. You can either add implied decimals to all your intermediate fields or try something like TRUNC(BIN) or TRUNC(OPT), though in this case, I don't think they will help.
Truncates final intermediate results. OS/VS COBOL has the TRUNC and NOTRUNC options (NOTRUNC is the default). VS COBOL II , IBM COBOL, and Enterprise COBOL have the TRUNC(STD|OPT|BIN) option.
TRUNC(STD)
Truncates numeric fields according to PICTURE specification of the binary receiving field
TRUNC(OPT)
Truncates numeric fields in the most optimal way
TRUNC(BIN)
Truncates binary fields based on the storage they occupy
TRUNC(STD) is the default.
For a complete description, see the Enterprise COBOL Programming Guide.
The default for Cobol is normally to truncate !!. This includes intermediate results.
So the decimal places will be truncated in your calculation
You could try:
COMPUTE WS-COMPUTE = (((WS-NUM-1 - WS-NUM-2 + WS-NUM-3) * 100) / WS-DENOM
This could result in loosing top order digits.
Alternatively you could
Use 2 computes
Add decimals to the input declaration
Use floating point fields (comp-1, comp-2). As they are rarely used in Cobol, I do not advise it.l
03 WS-Temp Pic 9(11)V9999 comp-3.
Compute WS-Temp = WS-NUM-1 - WS-NUM-2 + WS-NUM-3.
Compute WS-Temp = (WS-Temp / WS-DENOM) * 100.
Compute WS-COMPUTE = WS-Temp.
Change the field definition:
WS-COMPUTE PIC 9(14).
WS-NUM-1 PIC 9(09)V999.
WS-NUM-2 PIC 9(09)V999.
WS-NUM-3 PIC S9(11)V999 COMP-3.
WS-DENOM PIC 9(09).
In EDIFACT there are numeric data elements, specified e.g. as format n..5 -- we want to store those fields in a database table (with alphanumeric fields, so we can check them). How long must the db-fields be, so we can for sure store every possible valid value? I know it's at least two additional chars (for decimal point (or comma or whatever) and possibly a leading minus sign).
We are building our tables after the UN/EDIFACT standard we use in our message, not the specific guide involved, so we want to be able to store everything matching that standard. But documentation on the numeric data elements isn't really straightforward (or at least I could not find that part).
Thanks for any help
I finally found the information on the UNECE web site in the documentation on UN/EDIFACT rules Part 4. UN/EDIFACT rules Chapter 2.2 Syntax Rules . They don't say it directly, but when you put all the parts together, you get it. See TOC-entry 10: REPRESENTATION OF NUMERIC DATA ELEMENT VALUES.
Here's what it basically says:
10.1: Decimal Mark
Decimal mark must be transmitted (if needed) as specified in UNA (comma or point, put always one character). It shall not be counted as a character of the value when computing the maximum field length of a data element.
10.2: Triad Seperator
Triad separators shall not be used in interchange.
10.3: Sign
[...] If a value is to be indicated to be negative, it shall in transmission be immediately preceded by a minus sign e.g. -112. The minus sign shall not be counted as a character of the value when computing the maximum field length of a data element. However, allowance has to be made for the character in transmission and reception.
To put it together:
Other than the digits themselves there are only two (optional) chars allowed in a numeric field: the decimal seperator and a minus sign (no blanks are permitted in between any of the characters). These two extra chars are not counted against the maximum length of the value in the field.
So the maximum number of characters in a numeric field is the maximal length of the numeric field plus 2. If you want your database to be able to store every syntactically correct value transmitted in a field specified as n..17, your column would have to be 19 chars long (something like varchar(19)). Every EDIFACT-message that has a value longer than 19 chars in a field specified as n..17 does not need to be stored in the DB for semantic checking, because it is already syntactically wrong and can be rejected.
I used EDI Notepad from Liaison to solve a similar challenge. https://liaison.com/products/integrate/edi/edi-notepad
I recommend anyone looking at EDI to at least get their free (express) version of EDI Notepad.
The "high end" version (EDI Notepad Productivity Suite) of their product comes with a "Dictionary Viewer" tool that you can export the min / max lengths of the elements, as well as type. You can export the document to HTML from the Viewer tool. It would also handle ANSI X12 too.