Find duplicates in array in cobol - cobol

I have a requirement where I need to check duplicate records in an array and merge them in one single record.
For finding duplicates in array, I have written below pseudo code:
PERFORM VARYING I1 FROM 1 BY 1 UNTIL I1 > MAX-TABLE-COUNT
PERFORM VARYING I2 FROM 1 BY 1 UNTIL I2 > I1-1
IF FIELD1(I1) = FIELD1(I2) AND FIELD2(I1)= FIELD2(I2)
PERFORM MERGE-RECORDS
END-IF
END-PERFORM
END-PERFORM
Here the issue is if the field1 and field2 are present 4 times, then I have to merge All four records into 1 record. Could you please suggest how it can done.

Here are two suggestions. Change them as you see fit.
The code is written for COBOL 85. Depending on your compiler and system there will be other, possibly better, means to do the same.
The following code will not maintain the original order unless the table is already ordered by ascending field1 and field2. The table is compacted by restoring only the merged records.
main-line.
*> procesing before merging records
perform sort-records
*> procesing after merging records
stop run *> or exit program or goback
.
sort-records.
if max-table-count > 1
sort sort-file ascending field1 field2
duplicates in order
input procedure release-records
output procedure return records
end-if
.
release-records.
perform varying i from 1 by 1
until i > max-table-count
release sort-record from table-entry (i)
end-perform
.
return-records.
move 0 to max-table-count
perform return-next-record
move sort-record to ws-record
perform return-next-record
*> first two records are in place for following loop
perform until end-of-input
if ws-field1 = sort-field1
and ws-field2 = sort-field2
perform merge-records
*> duplicates are dropped after merging
else
perform move-record-to-table
*> merged or unique record is saved
move sort-record to ws-record
end-if
perform return-next-record
end-perform
perform move-record-to-table
*> last record is saved
.
move-record-to-table.
add 1 to max-table-count
move ws-record to table-entry (max-table-count)
.
return-next-record.
return sort-file
at end
set end-of-input to true
end-return
.
merge-records.
*> whatever is needed
The following maintains the original order and compacts the table by removing the merged duplicates. Note, in the code comments, that marked records are to be deleted, while unmarked records remain.
main-line.
*> procesing before merging records
perform find-and-remove-duplicates
*> procesing after merging records
stop run *> or exit program or goback
.
find-and-remove-duplicates.
*> find and merge duplicates
perform varying i from 1 by 1
until i > (max-table-count - 1)
if field1 (i) (1:1) not = high-values
*> only compare unmarked records
add 1 to i giving j
perform varying j from j by 1
unitl j > max-table-count
if field1 (i) = field1 (j)
and field2 (i) = field2 (j)
perform merge-records
move high-values to field1 (j) (1:1)
*> mark for deletion
end-if
end-perform
end-if
end-perform
*> remove duplicates
perform varying i from 1 by 1
until i > max-table-count
or field1 (i) (1:1) = high-values
*> find first marked record
continue
end-perform
*> if there are no marked records, control
*> will pass to the end of the paragraph
*> and the table will remain unchanged
if i not > max-table-count
perform varying j from i by 1
until j > max-table-count
or field1 (j) (1:1) not = high-values
continue
end-perform
*> i points to a marked record
*> j points to an unmarked record
*> or is greater than max-table-count
*> which would occur if all marked records
*> are at the end of the table
*> loop to compact the table
perform until j > max-table-count
move table-entry (j) to table-entry (i)
add 1 to i
add i to j
perform varying j from j by 1
until j > max-table-count
or field1 (j) (1:1) not = high-values
*> find next unmarked record
continue
end-perform
end-perform
subtract 1 from i giving max-table-count
*> adjust count for removed records
end-if
.
merge-records.
*> whatever is needed
If the records are in sorted order, the code may be made more efficient.

Related

How do you compare multiple strings to find duplicates?

Is their a quicker way to compare multiple strings at once and figure out duplicates?
I have 5 course codes that the user will enter. In the case the user enters duplicate course codes the error message should spit out that "duplicate codes detected".
IF((WS-STUCODEL1 > 1) AND (STUCODE1 = STUCODE2
OR STUCODE3 OR STUCODE4 OR STUCODE5))
MOVE 'DUPLICATE CODES DETECTED' TO WS-MSG
MOVE 'B' TO WS-CURSOR
GO TO 400-RETURN
ELSE IF
Assuming there are many course codes available, then
SET NO-DUPLICATE-FOUND TO TRUE
IF STUCODE5 NOT = SPACES
IF STUCODE5 = STUCODE1 OR STUCODE2 OR STUCODE3
OR STUCODE4
SET DUPLICATE-FOUND TO TRUE
END-IF
END-IF
IF STUCODE4 NOT = SPACES
IF STUCODE4 = STUCODE1 OR STUCODE2 OR STUCODE3
SET DUPLICATE-FOUND TO TRUE
END-IF
END-IF
IF STUCODE3 NOT = SPACES
IF STUCODE3 = STUCODE1 OR STUCODE2
SET DUPLICATE-FOUND TO TRUE
END-IF
END-IF
IF STUCODE2 NOT = SPACES
IF STUCODE2 = STUCODE1
SET DUPLICATE-FOUND TO TRUE
END-IF
END-IF
IF DUPLICATE-FOUND
MOVE 'DUPLICATE CODES DETECTED' TO WS-MSG
MOVE 'B' TO WS-CURSOR
END-IF
GO TO 400-RETURN
might be reasonably quick.
Note that if any course code is blank, the remaining comparisons to that course code are skipped. This means that SPACES in any STUCODEn will not cause duplicates.
Test code to run online
If the amount of entries to check is big or may change it is reasonable to use a table to save typing...
One option is to use two PERFORM (test code online):
SET NO-DUPLICATE-FOUND TO TRUE
MOVE 1 TO IND
PERFORM VARYING IND FROM 1 BY 1
UNTIL IND = NUMBER-OF-STUCODES
MOVE STUCODE (IND) TO COMPCODE
IF COMPCODE NOT = SPACES
ADD 1 TO IND GIVING IND-2
PERFORM UNTIL IND-2 > NUMBER-OF-STUCODES
IF COMPCODE = STUCODE (IND-2)
SET DUPLICATE-FOUND TO TRUE
EXIT PERFORM
END-IF
ADD 1 TO IND-2
END-PERFORM
IF DUPLICATE-FOUND
MOVE 'DUPLICATE CODES DETECTED' TO WS-MSG
MOVE 'B' TO WS-CURSOR
EXIT PERFORM
END-IF
END-IF
END-PERFORM
Another option to indexed tables, with use one PERFORM and one SEARCH (test code online):
SET NO-DUPLICATE-FOUND TO TRUE
PERFORM VARYING EXTRA-IND FROM 1 BY 1 *> not all compilers support VARYING on USAGE INDEX
UNTIL EXTRA-IND = NUMBER-OF-STUCODES
MOVE STUCODE (EXTRA-IND) TO COMPCODE
IF COMPCODE NOT = SPACES
SET IND TO EXTRA-IND
SET IND UP BY 1
SEARCH STUCODE
WHEN STUCODE(IND) = COMPCODE
SET DUPLICATE-FOUND TO TRUE
END-SEARCH
IF DUPLICATE-FOUND
MOVE 'DUPLICATE CODES DETECTED' TO WS-MSG
MOVE 'B' TO WS-CURSOR
EXIT PERFORM
END-IF
END-IF
END-PERFORM

Finding the total from a single dimensional array - COBOL

I need to iterate through a 1D array and add all of the elements together to find the total. I must use a Perfrom ... Varying statement, this is what I have come up with so far.
perform 100-read-input-file
varying emp-rec-calls(ws-emp-total)
from 1 by ws-emp-total
until (ws-eof-flag = 'Y'
OR ws-array-counter > ws-array-max)
add emp-rec-calls(ws-emp-total) to ws-total-temp
The code for 100-read-input-file is simply
read input-file at end move 'y' to found-eof.
The problem I am currently getting is "Subscript out of range:" on this line "perform 100-read-input-file". All help is appretiated, thanks!
Let's analyze the code you provided:
perform 100-read-input-file
varying emp-rec-calls(ws-emp-total)
from 1 by ws-emp-total
until (ws-eof-flag = 'Y'
OR ws-array-counter > ws-array-max)
add emp-rec-calls(ws-emp-total) to ws-total-temp
This loop doesn't really make any sense. You are saying perform this loop varying occurance X of the array EMP-REC-CALLS from 1 by X until a flag that never gets set within the loop is equal to yes OR a counter you are not incrementing is greater than the array size.
I think you are trying to achieve something like this:
PERFORM VARYING WS-ARRAY-COUNTER
FROM 1 BY 1
UNTIL WS-ARRAY-COUNTER > WS-ARRAY-MAX
ADD EMP-REC-CALLS(WS-COUNTER) TO WS-TOTAL-TEMP
END-PERFORM
This will vary the counter WS-ARRAY-COUNTER by 1 every iteration of the loop (starting at 1) until that counter is greater than the max defined.

COBOL Beer on the Wall Program

I'm making the "99 Bottles" program, but with user input on how many to take down. I'm very new to COBOL and I'm definitely overlooking something simple or just completely thinking about this the wrong way.
The following is what I currently have:
IDENTIFICATION DIVISION.
PROGRAM-ID. HW.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 COUNTER PIC S99.
01 BOTTLES PIC Z9.
01 BOTTLES-REMAINING PIC Z9.
01 NUM PIC s9(02) VALUE 0.
PROCEDURE DIVISION.
PERFORM VARYING COUNTER FROM 99 BY NUM UNTIL COUNTER = 0
DISPLAY "How many bottles would you like to take down?"
ACCEPT NUM
MOVE COUNTER to Bottles
subtract NUM FROM COUNTER GIVING BOTTLES-REMAINING
DISPLAY SPACES
EVALUATE COUNTER
WHEN 1
DISPLAY " 1 bottle of beer on the wall, "
" 1 bottle of beer."
DISPLAY "Take one down and pass it around, "
"no more bottles of beer on the wall."
WHEN 2 Thru 99
DISPLAY BOTTLES " bottles of beer on the wall, "
BOTTLES " bottles of beer."
DISPLAY "Take one down and pass it around, "
BOTTLES-REMAINING
" bottles of beer on the wall."
END-EVALUATE
END-PERFORM
GOBACK.
I need to make the NUM clause negative in the following statement (or the data division) so it will subtract from the counter:
PERFORM VARYING COUNTER FROM 99 BY NUM UNTIL COUNTER = 0
I see a few issues here.
First, and this is from admittedly faded memory, but I seem to recall that the VARYING clause required a constant value for the delta. I don't think you can use an actual changing NUM to do this.
So your loop would be better off not using the VARYING clause and instead be something like (code here may not be syntactically correct COBOL, it's meant more to show intent and/or method):
set counter to 99
perform until counter = 0
blah blah blah then change counter
end perform
Second, your little ditty doesn't make sense any more if you're allowed to remove more than one bottle at a time. The statements for the third stanza of the rhyme should be modified similarly to the bottles-left stanza:
evaluate num
when 1
display "Take one down and pass it around, "
when 2 thru 99
display "Take ", num, " down and pass them around, "
end evaluate
And, finally, you probably want to avoid the situation where you remove more bottles than you have available (or less than one, for that matter). That can be done by silently enforcing those limits (clamping) immediately after getting the user input:
accept num
if num is less than one
set num to one
end if
if num is greater than counter
set num to counter
end if
You could also complain and require the user to enter a valid quantity but the easiest solution is probably just to clamp it.

Loop using GO TO

I need to write a "for loop" in COBOL without using the 'PERFORM ... THRU ...' structure. My idea is to add a paragraph that I can jump back to once certain conditions are met. Here is what I came up with:
PROGRAM-BEGIN.
PAR-A.
IF I <= 10 THEN
SET J TO 1
PAR-B.
IF J <= 10 THEN
DISPLAY ARRAY(I,J)
SET J UP BY 1
GO TO PAR-B
END-IF.
SET I UP BY 1
GO TO PAR-A
END-IF.
PROGRAM-DONE.
Clearly this doesn't work because writing in this way will incur a syntax error. Can anyone help me on this? I can only use IF and GO TO.
This is what you'd need with your existing structure. You've not shown the initialisation of I, but you'll need one. You've attempted to avoid an explicit termination condition/GO TO,
PROGRAM-BEGIN.
SET I TO 1
PAR-A.
IF I <= 10 THEN
SET J TO 1
ELSE
GO TO PROGRAM-DONE
END-IF
.
PAR-B.
IF J <= 10 THEN
DISPLAY ARRAY(I,J)
SET J UP BY 1
GO TO PAR-B
END-IF
SET I UP BY 1
GO TO PAR-A
.
PROGRAM-DONE.
Note the use of the full-stops/periods. You should adopt that for your code, you'll have fewer troubles.
That comma is also trying to disguise itself as a full-stop/period/mark on the screen, and why even include it if you are jamming everything up against each other:
DISPLAY ARRAY ( I J )
There, isn't that nicer?
From the start, work on your names. Use descriptive names. I and J are just plain dumb, and in some wonderful situations you will even confuse them with the number 1.
SET first-level-index
second-level-index TO 1
.
output-results.
IF second-level-index
NOT GREATER THAN 10
DISPLAY
">"
the-data
( first-level-index
second-level-index )
"<"
SET second-level-index UP BY 1
GO TO output-results
END-IF
IF first-level-index
NOT GREATER THAN 10
SET second-level-index TO 1
SET first-level-index UP BY 1
GO TO output-results
END-IF
.
Or
set-up-for-loop.
SET first-level-index TO 1
.
outer-loop.
SET second-level-index TO 1
.
inner-loop.
IF second-level-index
NOT GREATER THAN 10
DISPLAY
">"
the-data
( first-level-index
second-level-index )
"<"
SET second-level-index UP BY 1
GO TO inner-loop
END-IF
IF first-level-index
NOT GREATER THAN 10
SET first-level-index UP BY 1
GO TO outer-loop
END-IF
.
You'd give those paragrpah-names descibing the actual task.
Be aware that comparing indexes (your I and J) to literals requires some twists and turns for the compiler.
To fix the syntax error caused by putting a paragraph name inside
the if, you can use the GOTO to move the PARB paragraph out of the first IF statement:
PROGRAM-BEGIN.
PAR-A.
IF I <= 10 THEN
SET J TO 1
GOTO PAR-B
END-IF
PAR-B.
IF J <= 10 THEN
DISPLAY ARRAY(I,J)
SET J UP BY 1
GO TO PAR-B
END-IF.
SET I UP BY 1
GO TO PAR-A
PROGRAM-DONE.

Nested Perform Loops in COBOL?

Why can I not do this nested perform loop in COBOL?
If I put END-PERFORM. in any line sooner than where I have the last one just before EXIT PROGRAM - it works. But I need the program to display the INPUT C value every time. in the outer perform loop. Its driving me nuts.
PROCEDURE DIVISION USING INPUTC CIPHER.
COMPUTE CIPHERMAX = CIPHER.
MULTIPLY -1 BY CIPHER
---> PERFORM VARYING CIPHER FROM 0 BY 1
UNTIL CIPHERMAX = CIPHER
DISPLAY 'This is loop number: ' CIPHER
INSPECT INPUTC CONVERTING
"avcdefghijklmnopqrstuvwxyz" to "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
COMPUTE CONVERTNUM = FUNCTION MOD (CIPHER, 26)
INSPECT FUNCTION REVERSE(INPUTC) TALLYING LENGTHNUM FOR LEADING SPACES
COMPUTE LENGTHNUM = LENGTH OF CIPHER - LENGTHNUM
---> PERFORM UNTIL SENTRY = LENGTHNUM
IF ((FUNCTION ORD(INPUTC(SENTRY:1)) + CONVERTNUM) > (FUNCTION ORD('Z')))
MOVE FUNCTION CHAR((FUNCTION ORD(INPUTC(SENTRY:1)) + CONVERTNUM) - 26) TO RECHAR
ELSE
MOVE FUNCTION CHAR(FUNCTION ORD(INPUTC(SENTRY:1)) + CONVERTNUM) TO RECHAR
END-IF
IF (((FUNCTION ORD(INPUTC(SENTRY:1))) >= (FUNCTION ORD('A'))) AND
((FUNCTION ORD(INPUTC(SENTRY:1))) <= (FUNCTION ORD('Z'))))
IF ((FUNCTION ORD(INPUTC(SENTRY:1)) + CONVERTNUM) > (FUNCTION ORD('Z')))
INSPECT INPUTC(SENTRY:1) REPLACING ALL INPUTC(SENTRY:1) BY RECHAR
ELSE
INSPECT INPUTC(SENTRY:1) REPLACING ALL INPUTC(SENTRY:1) BY RECHAR
END-IF
ELSE
INSPECT INPUTC(SENTRY:1) REPLACING ALL INPUTC(SENTRY:1) BY INPUTC(SENTRY:1)
END-IF
COMPUTE SENTRY = SENTRY + 1
---> END-PERFORM
DISPLAY INPUTC.
COMPUTE LOOPI = LOOPI + 1
--->END-PERFORM.
EXIT PROGRAM.
END PROGRAM SOLVE.
That nasty scope terminating period after DISPLAY INPUTC. is terminating the scope of the nested PERFORM statements. Get rid of the period and all should work fine.
The only periods you should ever use in the Procedure Division when coding a program to COBOL-85 standard are the ones required to terminate section and paragraph headers and another one to terminate the current paragraph, section or program.
You have a period in the display INPUTC statement. Remove the period and you should be ok. The Period always ends a statement.

Resources