SOC1 Abend in Cobol program - cobol

I have a SOC1 abend when executing mt cobol program. Any ideas?
I get these messages in the JESMSGLG
10.18.45 JOB07120 IGD17296I DYNAMIC VOLUME COUNT (DVC=5) WAS USED TO 433
433 EXTEND DATA SET VALSD.ALT.CACD602.RF0020RC.LONENSEL.#C
10.18.48 JOB07120 IGD17296I DYNAMIC VOLUME COUNT (DVC=5) WAS USED TO 467
467 EXTEND DATA SET VALSD.ALT.CACD602.RF0020RC.LONENSEL.#C
10.18.51 JOB07120 IGD17296I DYNAMIC VOLUME COUNT (DVC=5) WAS USED TO 544
544 EXTEND DATA SET VALSD.ALT.CACD602.RF0020RC.LONENSEL.#C
10.18.54 JOB07120 IGD17296I DYNAMIC VOLUME COUNT (DVC=5) WAS USED TO 597
597 EXTEND DATA SET VALSD.ALT.CACD602.RF0020RC.LONENSEL.#C
10.18.59 JOB07120 IEC028I 837-08,IFG0554A,OCACD602,COLST51P,LONENSEL,6355,TSOD05, 688
688 VALSD.ALT.CACD602.RF0020RC.LONENSEL.#C
EDIT: When I use less input (= less output) I don't get the abend.

I can't see from the image, but like the guys before stated, disk space appears to be your problem. Try to allocate a small size on the primary allocation, and more on secondary. My recollection of this problem is that the primary allocation needs one chunk of space of specified size, but secondary allocations are split. This becomes more important when disc space is limited. Try running an idcams listcat to check for space. Then if necessary include a vol=ser parameter into your JCL. It might also be a good idea to include some file status checking into Cobol program. This makes issues like this far easier to resolve.

Related

I can't recalculate CRC32 checksum for a CD-ROM sector

I managed extracted a single sector from a cd-rom bin file using a hex editor.
According to cd-rom specifications the sector should have the following format:
* 16 bytes for synchronization and header
* 8 bytes for the sub-header
* 2048 bytes for user data
* 4 bytes for the EDC which happens to be a CRC32 applied to user data and sub-header.
I found detailed information on this matter on page 55 of the following document: https://ia800408.us.archive.org/4/items/cdi_may94_r2/cdi_may94_r2.pdf
I want to manually modify user data in the sector, but in order to do that I must recalculate the EDC as well. (I assume ECC is ignored if EDC results are positive)
However, before modifying anything I want to check that I agree with my sector on the given CRC32. But, I can't. And that's the problem.
Here is my 2352 byte sector:
https://drive.google.com/open?id=1CYAInG8TYMyRMOgR1zeooUrLK5CR7cmM
The EDC section has the following CRC32 result: 92 54 48 44 (hex)
But when I recalculate the CRC32 myself using HxD hex-editor I get: 55 32 CA 62
Why am I getting a different result?
I tried changing input/output reflection in the custom CRC32 section, but nothing works. I've been stuck on this for an hour and couldn't find anything on google.
I'm going to answer my own question since I finally found the answer:
The CRC32 used a different polynomial.
The typical forward/reverse values are: 0x04C11DB7 / 0xEDB88320
The forward/reverse values my cd-rom used: 0x8001801B / 0xD8018001
So if you wanted to recalculate the CRC32 manually in HxD hex-editor you would
need to use a manual CRC with the following settings:
*Polynomial: 0x8001801B
*Initial value: 0
*Output XOR: 0
*Input & Output reflections checked.
Though CDmage can do all this automatically.
In the end all of this didn't help me because the emulator didn't give a flying goldfish over the EDC/ECC bytes. I should have known that.
After slightly changing a few text related bytes in my sector I noticed that the disk image couldn't load anymore, and thus, immediately assumed that EDC/ECC bytes were responsible. The real cause probably lies in the software somewhere.

Optimal layout of 2D array with least memory access time

Let us say I have a 2D array that I can read from a file
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
I am looking to store them as 1D array arr[16].
I am aware of row wise and column wise storage.
This messes up the structure of the array. Say I would like to convolve this with a 2X2 filter. Then at conv(1,1), I would be accessing memory at position 1,2,5,6.
Instead, can I optimize the storage of the data in a pattern such that the elements 1,2,5,6 are stored next to each other rather than being located far away ?
This reduces memory latency issue.
It depends on your processor, but supposing you have a typical Intel cache line size of 64 bytes, then picking square subregions that are each 64 bytes in size feels like a smart move.
If your individual elements are a byte each then 8x8 subtiles makes sense. So, e.g.
#define index(x, y) (x&7) | ((y&7) << 3) | \
((x&~7) << 6) | ((y&~7) ... shifted and/or multiplied as per size of x)
So in each full tile:
in 49 of every 64 cases all data is going to be within the same cache line;
in a further 14 it's going to lie across two cache lines; and
in one case in 64 is it going to need four.
So that's an average of 1.265625 cache lines touched per output pixel, versus 2.03125 in the naive case.
I found out what I was looking for. I was looking for what is called Morten ordering of an array that has shown to reduce memory access time. One another method would be to use the hilbert curve method which is shown to be more effective than the morten ordering method.
I am attaching a link to an article explaining this
https://insidehpc.com/2015/10/morton-ordering/

Why is stack frame a multiple of 16 bytes long?

CSAPP explains that SSE instructions operate on 16-byte blocks of data and it needs memory addresses to be multiple of 16.
But what's the relationship with stack frame? Does it means SSE instructions operate on stack frame? If so, what's the commonly used instructions?
Yes, the alignmentment of stack frame is set so any instruction could work on any data type which you could potentially store in the stack frame.
So on x86/x86_64 for example there are SSE instructions which suppose that the memory address is aligned to 16 bytes. Compiler then supposes that the stack frame is aligned 16 bytes so it could arrange the local variables that they are aligned too if needed. SSE instructions (like any other) can operate on any memory, including global, heap or stack.
The same it is actually for heap - when you allocate structure longer than 16 (or equal), the malloc/new must return 16-bytes aligned address so that kind of instruction could work with it.

How does the size of a realm-file develop?

How does the size of a realm-file develop ?
To start with: I have a realm-file with several properties and one of them being an array of 860 entries and each array-entry consists of a couple of properties again.
One array-property states the name of the entry.
I observed the following:
If the name-property sais "Criteria_A1" (until "Criteria_A860") - then the realm-file is 1.6 MB big
If the name-property sais "A1" (until "A860") - then the realm-file is only 786 kB big
Why is the extra letters in the array-name-property making the realm-file this much bigger ??
A second observation:
if I add more objects (each again having an array with 860 entries), then the file size gets 1.6MB big again (no matter how many objects I add; guess until a critical value again where the size tripples...or am I wrong??).
It almost seems to me that the realm-file at 786 kB is doubled in size as soon as something is added (either a property that has more letters or an object that is added). Why does the realm-file double at a critical value and not linearly increase in size with more content added ??
Thanks for a clarification on this.
It's pretty well observed. :-) The Realm file starts out at about 4k and will double in size once it runs out of free space. It keeps doubling until 128M and then adds constantly 128M thereafter.
The reason to double the file and not just grow linearly is only due to performance. It's a common algorithm for dynamic data structures to just keep doubling.
You can use the methods available as seen below to write a compacted copy removing all free space in the file. This can be useful if you don't add new data anymore, want to ship a static database or want to send the file over the network.
Realm.writeCopyToURL(_:encryptionKey:) in Swift
-[RLMRealm writeCopyToURL:encryptionKey:error:] in Objective-C
Realm.writeCopyTo() in Java
Those thresholds and algorithm mentioned are the current ones, and may change in future versions though.
Hope this clarifies?

CUDA 128 bytes read in a single instruction

I am new to CUDA and currently optimize an existing application for molecular dynamics. What it does is that it takes array of double4 with coordinates and computes forces based on the neighborlist. I wrote a kernel with the following lines:
double4 mPos=d_arr_xyz[gid];
while(-1!=(id=d_neib_list[gid*MAX_NEIGHBORS+i])){
Calc(gid,mPos,AA,d_arr_xyz,id);i++;
}
then Calc takes d_arr_xyz[id] and calculates force. That gives 1 read of double4 + 65 reads of (int +double4) inside every call of Calc (65 is average number of neighbors (not equal to -1) in d_neib_list for each particle).
Is it possible to reduce those reads? Neighborlists for different particles, i.e. d_arr_xyz[gid] and d_arr_xyz[id] do not correalte, so I cannot use shared memory for the block of threads to cache d_arr_xyz.
What I see is that if somehow to load the whole list int*MAX_NEIGHBORS into shared memory in one or few large transactions, that will remove 65 separate reads of int.
So the question is: is it possible to do it so that those 65 reads of int will be translated into several large transactions. I read in the documentation that reads can be even 128 bytes long. What exactly should I write so that assembler will make 1 large call?
Update:
Thank you for your replies. From the answer from user talonmies below, I changed the code replacing dimensions x and y for the neighbors matrix. Now consecutive threads load consecutive int[gid], I guess that may result in a 128 byte read. The program works 8% faster.
All memory transactions are issued (where possible) on a per warp basis. So the 128 byte transaction you are asking about is when all 32 threads in a warp issue a memory load instruction which can be serviced in a single "coalesced" transaction. A single thread can't issue large memory transactions, only a warp of 32 threads can, and only when the memory coalescing requirements of whichever architecture you run the code on can be satisfied.
I couldn't really follow your description of what you code is actually doing, but from first principles alone, the answer would appear to be no.

Resources