Issues with ANDs and ORs (COBOL) - cobol

I can't seem to get this one part right. I was given a input file with a bunch of names, some of which I need to skip, with extra information on each one. I was trying use ANDs and ORs to skip over the names I did not need and I came up with this.
IF DL-CLASS-STANDING = 'First Yr' OR 'Second Yr' AND
GRAD-STAT-IN = ' ' OR 'X'
It got rid of all but one person, but when I tried to add another set of ANDs and ORs the program started acting like the stipulations where not even there.
Did I make it too complex for the compiler? Is there an easier way to skip over things?

Try adding some parentheses to group things logically:
IF (DL-CLASS-STANDING = 'First Yr' OR 'Second Yr') AND
(GRAD-STAT-IN = ' ' OR 'X')

You may want to look into fully expanding that abbreviated expression since the expansion may not be what you think when there's a lot of clauses - it's often far better to be explicit.
However, what I would do is use the 88 level variables to make this more readable - these were special levels to allow conditions to be specified in the data division directly rather than using explicit conditions in the code.
In other words, put something like this in your data division:
03 DL-CLASS-STANDING PIC X(20).
88 FIRST-YEAR VALUE 'First Yr'.
88 SECOND-YEAR VALUE 'Second Yr'.
03 GRAD-STAT-IN PIC X.
88 GS-UNKNOWN VALUE ' '.
88 GS-NO VALUE 'X'.
Then you can use the 88 level variables in your expressions:
IF (FIRST-YEAR OR SECOND-YEAR) AND (GS-UNKNOWN OR GS-NO) ...
This is, in my opinion, more readable and the whole point of COBOL was to look like readable English, after all.

The first thing to note is that the code shown is the code which was working, and the amended code which did not give the desired result was never shown. As an addendum, why, if only one person were left, would more selection be necessary? To sum up that, the actual question is unclear beyond saying "I don't know how to use OR in COBOL. I don't know how to use AND in COBOL".
Beyond that, there were two actual questions:
Did I make it too complex for the compiler?
Is there an easier way to skip over things [is there a clearer way to write conditions]?
To the first, the answer is No. It is very far from difficult for the compiler. The compiler knows exactly how to handle any combinations of OR, AND (and NOT, which we will come to later). The problem is, can the human writer/reader code a condition successfully such that the compiler will know what they want, rather than just giving the result from the compiler following its rules (which don't account for multiple possible human interpretations of a line of code)?
The second question therefore becomes:
How do I write a complex condition which the compiler will understand in an identical way to my intention as author and in an identical way for any reader of the code with some experience of COBOL?
Firstly, a quick rearrangement of the (working) code in the question:
IF DL-CLASS-STANDING = 'First Yr' OR 'Second Yr'
AND GRAD-STAT-IN = ' ' OR 'X'
And of the suggested code in one of the answers:
IF (DL-CLASS-STANDING = 'First Yr' OR 'Second Yr')
AND (GRAD-STAT-IN = ' ' OR 'X')
The second version is clearer, but (or and) it is identical to the first. It did not make that code work, it allowed that code to continue to work.
The answer was addressing the resolution of the problem of a condition having its complexity increased: brackets/parenthesis (simply simplifying the complexity is another possibility, but without the non-working example it is difficult to make suggestions on).
The original code works, but when it needs to be more complex, the wheels start to fall off.
The suggested code works, but it does not (fully) resolve the problem of extending the complexity of the condition, because, in minor, it repeats the problem, within the parenthesis, of extending the complexity of the condition.
How is this so?
A simple condition:
IF A EQUAL TO "B"
A slightly more complex condition:
IF A EQUAL TO "B" OR "C"
A slight, but not complete, simplification of that:
IF (A EQUAL TO "B" OR "C")
If the condition has to become more complex, with an AND, it can be simple for the humans (the compiler does not care, it cannot be fooled):
IF (A EQUAL TO "B" OR "C")
AND (E EQUAL TO "F")
But what of this?
IF (A EQUAL TO "B" OR "C" AND E EQUAL TO "F")
Placing the AND inside the brackets has allowed the original problem for humans to be replicated. What does that mean, and how does it work?
One answer is this:
IF (A EQUAL TO ("B" OR "C") AND E EQUAL TO "F")
Perhaps clearer, but not to everyone, and again the original problem still exists, in the minor.
So:
IF A EQUAL TO "B"
OR A EQUAL TO "C"
Simplified, for the first part, but still that problem in the minor (just add AND ...), so:
IF (A EQUAL TO "B")
OR (A EQUAL TO "C")
Leading to:
IF ((A EQUAL TO "B")
OR (A EQUAL TO "C"))
And:
IF ((A EQUAL TO "B")
OR (A EQUAL TO C))
Now, if someone wants to augment with AND, it is easy and clear. If done at the same level as one of the condition parts, it solely attaches to that. If done at the outermost level, it attaches to both (all).
IF (((A EQUAL TO "B")
AND (E EQUAL TO "F"))
OR (A EQUAL TO "C"))
or
IF (((A EQUAL TO "B")
OR (A EQUAL TO "C"))
AND (E EQUAL TO "F"))
What if someone wants to insert the AND inside the brackets? Well, because inside the brackets it is simple, and people don't tend to do that. If what is inside the brackets is already complicated, it does tend to be added. It seems that something which is simple through being on its own tends not to be made complicated, whereas something which is already complicated (more than one thing, not on its own) tends to be made more complex without too much further thought.
COBOL is an old language. Many old programs written in COBOL are still running. Many COBOL programs have to be amended, or just read to understand something, and that many times over their lifetimes of many years.
When changing code, by adding something to a condition, it is best if the original parts of the condition do not need to be "disturbed". If complexity is left within brackets, it is more likely that code needs to be disturbed, which increases the amount of time in understanding (it is more complex) and changing (more care is needed, more testing necessary, because the code is disturbed).
Many old programs will be examples of bad practice. There is not much to do about that, except to be careful with them.
There isn't any excuse for writing new code which requires more maintenance and care in the future than is absolutely necessary.
Now, the above examples may be considered long-winded. It's COBOL, right? Lots of typing? But COBOL gives immense flexibility in data definitions. COBOL has, as part of that, the Level 88, the Condition Name.
Here are data definitions for part of the above:
01 A PIC X.
88 PARCEL-IS-OUTSIZED VALUE "B" "C".
01 F PIC X.
88 POSTAGE-IS-SUFFICIENT VALUE "F".
The condition becomes:
IF PARCEL-IS-OUTSIZED
AND POSTAGE-IS-SUFFICIENT
Instead of just literal values, all the relevant literal values now have a name, so that the coder can indicate what they actually mean, as well as the actual values which carry that meaning. If more categories should be added to PARCEL-IS-OUTSIZED, the VALUE clause on the 88-level is extended.
If another condition is to be combined, it is much more simple to do so.
Is this all true? Well, yes. Look at it this way.
COBOL operates on the results of a condition where coded.
If condition
Simple conditions can be compounded through the use of brackets, to make a condition:
If condition = If (condition) = If ((condition1) operator (condition2))...
And so on, to the limits of the compiler.
The human just has to deal with the condition they want for the purpose at hand. For general logic-flow, look at the If condition. For verification, look at the lowest detail. For a subset, look at the part of the condition relevant to the sub-set.
Use simple conditions. Make conditions simple through brackets/parentheses. Make complex conditions, where needed, by combining simple conditions. Use condition-names for comparisons to literal values.
OR and AND have been treated so far. NOT is often seen as something to treat warily:
IF NOT A EQUAL TO B
IF A NOT EQUAL TO B
IF (NOT (A EQUAL TO B)), remembering that this is just IF condition
So NOT is not scary, if it is made simple.
Throughout, I've been editing out spaces. Because the brackets are there, I like to make them in-your-face. I like to structure and indent conditions, to emphasize the meaning I have given them.
So:
IF ( ( ( condition1 )
OR ( condition2 ) )
AND
( ( condition3 )
OR ( condition4 ) ) )
(and more sculptured than that as well). By structuring, I hope that a) I mess up less and b) when/if I do mess up, someone has a better chance of noticing it.
If conditions are not simplified, then understanding the code is more difficult. Changing the code is more difficult. For people learning COBOL, keeping things simple is a long-term benefit to all.

As a rule, I avoid the use of AND if at all possible. Nested IF's work just as well, are easier to read, and with judicious use of 88-levels, do not have to go very deep. This seems so much easier to read, at least in my experience:
05 DL-CLASS-STANDING PIC X(20) VALUE SPACE.
88 DL-CLASS-STANDING-VALID VALUE 'First Yr' 'Second Yr'.
05 GRAD-STAT-IN PIC X VALUE SPACE.
88 GRAD-STAT-IN-VALID VALUE SPACE 'N'.
Then the code is as simple as this:
IF DL-CLASS-STANDING-VALID
IF GRAD-STAT-IN-VALID
ACTION ... .

Related

Basic error using solver in Z3-Python: it is returning [] as a model

Maybe I have slept bad today, but I am really struggling with this simple query to Z3-Python:
from z3 import *
a = Bool('a')
b = Bool('b')
sss = Solver()
sss.add(Exists([a,b], True))
print(sss.check())
print(sss.model())
The check prints out sat, but the model is []. However, it should be printing some (anyone) concrete assignment, such as a=True, b=True.
The same is happening if I change the formula to, say: sss.add(Exists([a,b], Not(And(a,b)))). Also tested sss.add(True). Thus, I am missing something really basic, so sorry for the basic nature of this question.
Other codes are working normally (even with optimizers instead of solvers), so it is not a problem of my environment (Collab)
Any help?
Note that there's a big difference between:
a = Bool('a')
b = Bool('b')
sss.add(Exists([a, b], True))
and
a = Bool('a')
b = Bool('b')
sss.add(True) # redundant, but to illustrate
In the first case, you're checking if the statement Exists a. Exists b. True is satisfiable; which trivially is; but there is no model to display: The variables a and b are inside quantification; and they don't play any role in model construction.
In the second case, a and b are part of the model, and hence will be displayed.
What's confusing is why do you need to declare the a and b in the first case. That is, why can't we just say:
sss.add(Exists([a, b], True))
without any a or b in the environment? After all, they are irrelevant as far as the problem is concerned. This is merely a peculiarity of the Python API; there's really no good reason other than this is how it is implemented.
You can see the generated SMTLib by adding a statement of the form:
print(sss.sexpr())
and if you do that for the above segments, you'll see the first one doesn't even declare the variables at all.
So, long story short, the formula exists a, b. True has no "model" variables and thus there's nothing to display. The only reason you declare them in z3py is because of an implementation trick that they use (so they can figure out the type of the variable), nothing more than that.
Here're some specific comments about your questions:
An SMT solver will only construct model-values for top-level declared variables. If you create "local" variables via exists/forall, they are not going to be displayed in models constructed. Note that you could have two totally separate assertions that talk about the "same" existentially quantified variable: It wouldn't even have a way of referring to that item. Rule-of-thumb: If you want to see the value in a model, it has to be declared at the top-level.
Yes, this is the reason for the trick. So they can figure out what type those variables are. (And other bookkeeping.) There's just no syntax afforded by z3py to let you say something like Exists([Int(a), Int(b)], True) or some such. Note that this doesn't mean something like this cannot be implemented. They just didn't. (And is probably not worth it.)
No you understood correctly. The reason you get [] is because there are absolutely no constraints on those a and b, and z3's model constructor will not assign any values because they are irrelevant. You can recover their values via model_completion parameter. But the best way to experiment with this is to add some extra constraints at the top level. (It might be easier to play around if you make a and b Ints. At the top level, assert that a = 5, b = 12. At the "existential" level assert something else. You'll see that your model will only satisfy the top-level constraints. Interestingly, if you assert something that's unsatisfiable in your existential query, the whole thing will become unsat; which is another sign of how they are treated.
I said trivially true because your formula is Exists a. Exists b. True. There're no constraints, so any assignment to a and b satisfy it. It's trivial in this sense. (And all SMTLib logics work over non-empty domains, so you can always assign values freely.)
Quantified variables can always be alpha-renamed without changing the semantics. So, whenever there's collision between quantified names and top-level names, imagine renaming them to be unique. I think you'll be able to answer your own question if you think of it this way.
Extended discussions over comments is really not productive. Feel free to ask new questions if anything isn't clear.

Difference between multiset(a[..a.Length]) and multiset(a[..]) in Dafny

I'm trying to figure something out in dafny.
Given 2 arrays a and b, my assertions, invariants, post conditions, etc in the form of:
multiset(a[..]) == multiset(b[..]);
fails but
multiset(a[..a.Length]) == multiset(b[..b.Length])
succeeds.
I'm very confused by this because I assumed a[..] and a[..a.Length] would be the exact same thing. However, I found something interesting. If I add at the end of my method:
assert a[..a.Length] == a[..];
assert b[..b.Length] == b[..];
then I can get the invariants, post conditions, assertions involving my first example to work.
This suggests to me that a[..] and a[..a.Length] are actually different.
Could someone please explain why this is the case and what is happening here?
You are correct that a[..] and [..a.Length] (and, for that matter, also a[0..] and a[0..a.Length]) are the same thing. However, it may be that verifier treats these slightly differently. This makes a difference because the lack of (caution: technical word coming up) extensionality in Dafny.
Extensionality means that, if you know two things have the same elements, then they are the same thing. In your example, extensionality would mean that, if you know a[..] and a[..a.Length] to have the same elements, then a[..] and a[..a.Length] are the same thing.
The lack of extensionality in Dafny means that the verifier sometimes knows that two things have the same elements, but it still doesn't draw the conclusion that the two things are the same. This tends to be noticeable when the two things are passed to a function. In your example, that function is multiset(...), which converts a sequence into a multiset.
While Dafny does not support extensionality automatically, it does offer a simple technique to "remind" the verifier about extensionality. The technique is to assert the equality between the two things. In particular, when you write assert A == B; for two sequences A and B, then the verifier will first prove that A and B have the same elements, and then it will draw the conclusion that A is in fact equal to B. In other words, when the verifier is explicitly asked to verify the equality of two sequence-valued expressions like A and B, then it instead just proves that they are element-wise equal, after which it concludes A == B.
So, your remedy above is exactly the right thing to do. When you assert the equality between a[..a.Length] and a[..], the verifier proves that they have the same elements. After that, it "remembers" that this also means that a[..a.Length] and a[..] are equal. Once it has realized the two sequences are the same, it immediately also knows that functions of them, like multiset(a[..a.Length]) and multiset(a[..]), are the same.
More generally, extensionality is relevant for not just sequences, but also for sets, multisets, and maps. So, if you're working with any of those collection types, you may need to write assertions about sequence equality, set equality, etc., to "remind" the verifier about extensionality.
More generally, there are many things that are true and that the verifier doesn't immediately verify. To home in on what may be missing, the common technique is to start breaking down the proof obligations yourself, like you did in the assert statements. By breaking down more complicated proof obligations into simpler ones (which is usually done with assert statements or calc statements), you essentially provide hints to the verifier about how to prove the more complicated things.

Parsing and generalising a Stata program for renaming variables

I'm learning how to write a program in Stata for the first time and I'm having difficulty generalising my program so I can parse an arbitrary list of variables when renaming variables in datasets.
I'm working with two datasets. First one is panel dataset containing the life satisfaction of interviewees in a survey over a duration of 26 years (same dataset as in my previous question). The variables are originally named in this format: ap6801 bp9301 cp9601 and all the way to zp15701. ap6801 contains the respondents' life satisfaction for the year 1985, bp9301 contains it for 1986, and so on.
I wrote the following program to rename the variables so instead of ap6801 it would be lsat1985.
program myprogram
local mcode 1984
foreach stub in a b c d e f g h i j k l m n o p q r s t u v w x y z {
local mcode = `mcode' + 1
rename `stub'* lsat`mcode'
}
Now, I want to modify and generalise this program so that I can use it on my second dataset and with arbitrary numbers. The second dataset consists of variables abetto bbetto cbetto all the way until zbetto. These variables indicate whether a specific person has been interviewed in a specific year and, if not, why not. abetto corresponds to year 1985, bbetto corresponds to year 1986, and so on.
My goal is to write a generalised version of the program so that when I enter an arbitrary list of variables and other information (for example lsat, and a list of numbers (eg: 1985-2010)): myprogram ap6801-zp15701 , the variables will be renamed lsat1985 lsat1986... lsat2010
I'm guessing the program will have the following basic structure:
program myprogram
syntax varlist
foreach x of varlist {
}
Within the loop, there might be local letter = substr("`x'",1,1) to identify the first letter of the variables (a, b, c, d...). Next step would be to link the letters of the alphabet to numbers that will be specified by the user, and a rename command that renames the variable in the format: lsat/betto year. I'm having a difficult time putting all that together in code.
I'm new to Stata and programming, so any help is appreciated!
Your program only works in your extraordinary circumstances. (Trivially, there is no end statement.) It will fail if any of the wildcards a* b* and so on does not occur as corresponding variable names in the data. Other way round, each rename is from a wildcard, say a*, to one variable name, say lsat1985, and this will work if and only if there is precisely one variable in each case.
More generally, this is an example of premature programming. Prejudice alert: It's not good style to write programs for such highly specific tasks, wiring in a specific variable name prefix, and for such highly unusual circumstances. At most, this is territory for code in a do-file. But if you allow variable name abbreviation, this should work. Good style here implies explaining the circumstances.
* each wildcard a* b* ... is matched by a single variable
set varabbrev on
local mcode = 1985
foreach pre in `c(alpha)' {
rename `pre' lsat`mcode'
local ++mcode
}
Note that there is no need to type out all the lower case letters a to z. Stata holds all those in one place as c(alpha). You are not expected to know that. For completeness, and this may be irrelevant to your problem, note that Stata variable names can start with other characters, most notably underscore _. I am left wondering about data for years from 2011 on.
It's hard not to write this without running the risk of seeming obnoxiously patronising. And the commentary may seem entirely unfair, as your goal is precisely that of generalising the program! However, in Stata it is often a better idea to write do-files first and move to a program when, and only when, you have used a do-file in so many different circumstances that the need for a more general program becomes evident. So, I won't answer your question on a more general program. It's not obviously a good idea and even if it were it is hard for an outsider to know what that program should look like. You have given one example where the suffix looks unpredictable (6801 9301 9601 and so on) and one where it is predictable (betto) and we can have no idea what else is true (unless freakishly someone here recognises your dataset). A program written for a dataset that is just 26 variables asomething to zsomething is possible, but would you ever use it more than a few times?

Why is this causing an error in Cobol?

Why would this if statement below need a NEXT SENTENCE because there is a statement in both the IF and the ELSE part of the statement.
Question: Why is this an error in the if statement.
CHECK-PARM.
IF NAME = 'SW89JS' THEN 1183
E-NAME = 'FALSE'
Expected a verb or "NEXT SENTENCE", but found "E-NAME". The statement was discarded.
ELSE
E-NAME = 'TRUE'
"E-NAME" was invalid. Skipped to the next verb, period or procedure-name definition.
P-NAME = 'SW89JS'
END-IF.
Since it is somewhere buried in this answer, I'm going to repeat it up hear, even expand it a little.
You have a value you are testing. From the name it likely comes from the PARM on the EXEC card in the JCL.
You test the value, set a flag (TRUE/FALSE literals) on the result of the value, and use it later.
With an 88 you can make that parm value into the flag itself.
01 NAME PIC X(6).
88 IT-IS-SW89JS VALUE "SW89JS".
Now you can never get your flags out of step, as you only have one flag. One fewer flag to understand and potentially get wrong.
Because we don't have assignments in COBOL.
MOVE 'TRUE' TO E-NAME
or
MOVE data-name-with-value-true TO E-NAME
or
01 FILLER PIC X VALUE SPACE.
88 IT-IS-SW89JS VALUE "Y".
SET IT-IS-SW89JS TO TRUE
or
01 NAME PIC X(6).
88 IT-IS-SW89JS VALUE "SW89JS".
And with the last forget about anything else.
COBOL is not like many other languages. No strings, as you may know them. No arrays, as you may know them. No assignments. No booleans. No user-written functions. Few Intrinsic Functions. No public function libraries. It does have some other stuff :-)
A couple of points from the comments.
COBOL is a language of fixedness. Fixed-length field, fixed-length tables. The length of the data in a 30-byte field is 30 bytes. The length of the content of the field, in terms of what the data represents, is something the programmer has to work out, if needed. Mostly we don't need it, so we don't have to work it out.
The fixedness also imposes limits. So we think of ways to do things differently, so we don't have a limit, waiting to bust, dangling over our heads. We don't just pick a function which looks like maybe it makes life easy for us (less code to write) regardless of how it carries out the task. Usually we don't have a function anyway, and we write specific code to be re-used, through a CALL, for a specific system or set of systems.
A COBOL program may take longer to write (I say may because 90+% of the time it is a question of starting out by copying a program which is close to what you want, and then making it specific) but that program may have a lifetime of 30 years. It may be changed many times during its life. It may never be changed, but need to be understood many times during that period.
Conceptually, COBOL is a very different language from those with assignments/strings/arrays. If you are supposed to pick up COBOL with no training, there will be many pitfalls.
Yes, Bruce Martin, I suppose COBOL does have an assignment: the COMPUTE. The left-side can only be numeric or numeric-edited, although there can be multiple fields, and the right-side can only have numerics (or intrinsic functions returning numerics). It only supports basic mathematical operators (+, -, , /, *). It does allow rounding of the final answer if desired, and also allows for interception of "overflow" (ON SIZE ERROR).
It can be used as a simple assignment:
COMPUTE A = B
This will generate the same code as:
MOVE B TO A
Some people do this, though I've never really worked out why. There is a rumour that it means you can use ON SIZE ERROR (and don't forget END-COMPUTE if you do use it) to trap an overflow.
However, I always make my fields big enough, or deliberately truncate when that is the result I want, so I don't really get that.
In short, welcome to COBOL. Don't expect it to be like any other language you've used.
As Bill stated the problem is:
E-NAME = 'FALSE'
In Cobol (unlike most other languages), each statement starts with a control word
e.g.
Compute abc = 123
Move 'FALSE' to E-NAME
Perform abc
Call 'xyz'
With Cobol the control word on the far left of a line tells you what the statement is doing.
Also as Bill stated, in Cobol a boolean is normally define using 88 levels:
01 FILLER PIC X VALUE SPACE.
88 IS-ENAME VALUE "Y".
88 ENAME-OFF VALUE "N".
and your code becomes
IF NAME = 'SW89JS' THEN
Set ENAME-OFF to true
ELSE
Set IS-ENAME to true
Move 'SW89JS' to P-NAME
END-IF.

can we make printf align values on decimal point?

I've encountered an array of floating point values that I can't make print out in a reasonable way with the old tried and true printf() function. The problem, I guess, is that the range of numbers is huge... from tiny numbers like -3.66542e-296 to +9.5543e+301 and lots of values in between.
Normally values are more related to each other and something like %23.16f will work. But with these huge numbers the f specifier doesn't work, because some numbers print out dozens to hundreds of digits (overflowing the size specification). This leaves the e format (or the g format which lets printf() switch back and forth between e and f formats).
When forced to adopt e or g specifier due to large range of values, is there any way to:
make the decimal points of all values align over each other.
make the e (of the exponent) of all values align over each other.
make the number of digits following the e be fixed (always the same).
For almost any purposes, #1 is the best option - alignment with the . is most often helpful. But it seems impossible to make nice neat, readable columns in any way whatsoever in this situation... unless I'm missing something.

Resources