Cobol program A calls program B via an entry point in B and crashes - linkage

COBOL program B has 3 entry points. Linkage section contains 1 general area, and then 3 areas (call them link-sect-a, link-sect-b and link-sect-c)
Cobol program A calls program B using entry 3. In z/OS, it's perfectly valid (and normal) to write
CALL PROGB-ENTRY3 using common area, link-sect-c
The trouble seems to be with GnuCobol, that after compiling both, anything as simple as the following in program B after entry point 3
DISPLAY 'First 50 bytes in link-sect-c 'link-sect-c(1:50)
causes a crash on the reference to link-sect-c
If instead, I change the call in program A (as well as the entry 3 in program
B to include all 4 arguments) to
CALL PROGB-ENTRY3 using common area, link-sect-a, link-sect-b, link-sect-c
(even though I have no need for either link-sect-a or link-sect-b)
the code works
I can include the 2 example programs if required, since they're really quite trivial

I added the option -fsticky-linkage to the compilation of program B, and that solved the problem. (It was easy to confirm it. Remove the option and compile again; problem reintroduced)

Related

forth implementation with JIT write protection?

I believe Apple has disabled being able to write and execute memory at the same time on the ARM64 architecture, see:
See mmap() RWX page on MacOS (ARM64 architecture)?
This makes it difficult to port implementations like jonesforth, which keeps generated code and the code to generate it (like the built-in assembler in jonesforth.f) in the same segment.
I thought I could do something like map the user space from start to HERE as 'r-x', and from here to the end as 'rw-'. Then I'd have to constantly remap memory as I compile new words, and I couldn't go and fix up previous words (I believe SCODE would make use of it).
Do you have any advice on how to handle such limitations ?
I guess I should look into other forth implementations that are running on M1 Macs.
A Forth implementation can have a problem with write-protected segments of code only when it generates machine code that should be executable at once. There is no such a problem if it uses threaded code. So it's supposed bellow that the Forth system have to generate machine code.
Data space and code space
Obviously, you have to separate code space from data space. Data space (at least mutable regions of, including regions for variables and data fields), as well as internal mutable memory regions and probably headers, should be mapped to 'rw-' segments. Code space should be mapped to 'r-x' segments.
The word here ( -- addr ) returns the address of the first cell available for reservation, which is writable for a program, and it should be always in an 'rw-' segment. You can have an internal word code::here ( -- addr ) that returns address in code space, if you need.
A decision for execution tokens is a compromise between speed and simplicity of implementation (an 'r-x' segment vs 'rw-'). The simplest case is that an execution token is represented by an address in an 'rw-' segment, and then execute does an additional dereferencing to get the corresponding address of code.
Code generation
In the given conditions we should generate machine code into an 'rw-' segment, but before this code is executed, this segment should be made 'r-x'.
Probably, the simplest solution is to allocate a memory block for every new definition, resize (minimize) the block on completion and make it 'r-x'. Possible disadvantages — losses due to page size (e.g. 4 KiB), and maybe memory fragmentation.
Changing protection of the main code segment starting from code::here also implies losses due to page size granularity.
Another variant is to break creating of a definition into two stages:
generate intermediate representation (IR) in a separate 'rw-' segment during compilation of a definition;
when the definition is completed, generate machine code in the main code segment from IR, and discard IR code.
Actually, it could be machine code on the first stage too, and then it's just relocated into another place on the second stage.
Before write to the main code segment you change it (or its part) to 'rw-', and after that revert it to 'r-x'.
A subroutine that translates IR code should be resided in another 'r-x' segment that you don't change.
Forth is agnostic to the format of generated code, and in a straightforward system only a handful of definitions "know" what format is generated. So only these definitions should be changed to generate IR code. If you relocate machine code, you probably don't need to change even these definitions.

What instruction set would be easiest to implement on a homemade ALU?

I'm designing a basic 8 or 16 bit computer (haven't really decided yet) using eeprom chips, sram, and an ALU made (mostly) out of individual transistors on a PCB using cmos logic that I already have partially designed and tested. And I thought it would be cool to use an already existing instruction set so I can compile C++ code for it instead of writing everything in machine code.
I looked at the AVR gcc compiler on Compiler Explorer and the machine code it produces, it looks very simple and I think it is only 8-bits. Or should I go for 32-bits and try to use x86? That would make the ALU a lot bigger. Are there compilers that let you use limited instructions so I don't have to make every single one? Or would it even be easier to just write an interpreter for a custom instruction set? Any advice is welcome, thank you.
After a bit of research it has become apparent that trying to recreate modern ALUs and instructions would be very complicated and time consuming, and I should definitely make my own simplistic architecture and if I really want to compile C code for it I could probably just interpret x86 or AVR assembly from gcc.
I would also love some feedback on my design, I came up with a really weird ISA last night that is focused mainly on being easy to engineer the hardware.
There are two registers in the ALU, all other registers perform functions based off those two numbers all at the same time. For instance, there is a register that holds the added result of A and B, one that holds the result of A shifted right B times, a "jump if A > B" branch, and so on.
And so to add a number, it would take 3 clock cycles, you would move two values from ram into A and B, then copy the data back to ram afterwards. It would look like this:
setA addressInRam1 (6-bit opcode, 18-bit address/value)
setB addressInRam2
copyAddedResult addressInRam1
And program code is executed directly from EEPROM memory. I don't know if I should think of it as having two general purpose registers or it having 2^18 registers. Either way, it makes it much easier and simpler to build when you're executing instructions one at a time like that. Again any advice is welcome, I am somewhat of a noob in this field, thank you!
Oh and then an additional C register to hold a value to be stored in RAM the next clock cycle specified in the set register. This is what the Fibonacci sequence would look like:
1: setC 1; // setting C reg to 1
2: set 0; // setting address 0 in ram to the C register
3: setA 0; // copying value in address 0 of ram into A reg
// repeat for B reg
4: set 1; // setting this to the same as the other
5: setB 1;
6: jumpIf> 9; // jump to line 9 if A > B
7: getSum 0; // put sum of A and B into address 0 of ram
8: setA 0; // set the A register to address 0 of ram
9: getSum 1; // "else" put the sum into the second variable
10: setB 1;
11: jump 6; // loop back to line 6 forever
I made a C++ equivalent and put it through compiler explorer and despite the many drawbacks of this architecture it uses the same amount of clock cycles as x64 in the loop and two more in total. But I think this function in particular works pretty well with it as I don't have to reassign A and B often.

How is SLOC counted by Delphi IDE?

You see pretty often here people saying that they have a x million of lines of code project. How is this measured? Is this number, the number shown under 'Information' menu? The manual says that only the compilable lines are counted (so, without comments and empty lines):
Source compiled -> Displays total number of lines compiled.
But the manual doesn't explain how a piece of code as if/then/else is counted:
if B=true
then
for i:= 0 to 100
do Stuff
else ;
Is every line that has a blue dot is a "compiled line"?
The Embarcadero code (the RTL and VCL code) and 3rd party libraries are also included into the count?
(The conclusion) What does it mean when somebody says about a Delphi program that it has 1 million lines?
The Total lines the compiler tells you is counting the number of lines in the unit(s), regardless of what code is (or isn't) there. It even counts blank lines. Start a new project. Compile it and note the number of lines it says (mine says 42). Then, add just one line break somewhere, and compile again. It will claim there is one more line of code (43). So it does not seem the compiler takes any code into consideration for this number - only the actual line breaks.
In fact, if you add the total number of lines in the main form's unit (new project) as well as the project's main file, it will total to 2 less than what the compiler tells you (40 out of 42). So I wouldn't trust this number to mean much other than a rough estimate.
Libraries such as VCL, RTL, and Indy are not included in this count because those are pre-compiled. It is possible that your project might refer to a library or external unit which needs to be compiled, thus it will also include those into the count.
As far as your mention of how it counts if..then..else blocks, keep in mind that your 5 lines of code can be combined into just 1 line of code (stripping line breaks) and it will still compile, and the compiler will count only 1 line, not 5 lines.

What's the difference of j__objc_msgSend and objc_msgSend?

I'm surprised that searching j__objc_msgSend returns 0 result on stackoverflow, and Google doesn't seem to know it well either. According to it's disassembly, j__objc_msgSend only calls objc_msgSend, then why do we need j__objc_msgSend when we have objc_msgSend already? And in general, what's the difference of j__objc_msgSend and objc_msgSend?
And in this screenshot specifically, what's the difference of the right most branch ending with "End of function", and the left most branch ending without "End of function"? Does it have something to do with j__objc_msgSend?
This is ARM RISC code, and the preferred programming mode for ARM is relative -- don't use absolute addresses, always use "IP+/-offset". Here, the called address was out of range for a direct call or jump, and the compiler used the nearest he could find. It adds an extra jump (or more than 1!) but it's position-independent. (*)
The compiler cannot construct a jump to the target address with a simple instruction, because you cannot immediately load every possible 2^32 number with RISC assembly.
If the routine objc_msgSend returns of its own, then this is equivalent to call objc_msgSend; return -- only shorter. Both forms do a single 'return', from the point of view of the current function.
(*) You can see in the disassembly screenshot (?? why not text?) that R12 gets loaded with the difference between the target and the current address. This difference is calculated by the compiler; it does not appear as a subtraction in the original binary, that's IDA's work. Then the difference is added to the current address -- whatever this is! The immediate value objc_msgSend - 0x1AE030 uses a small enough amount of bits to be loaded into R12 in a single instruction (an ARM RISC 'feature' you ought to be familiar with).
In case you are wondering about the j__label syntax: that's just IDA, telling you this is a direct jump to a known label. Presumably, if your code is long enough, you might find the distance to this label is too big again, and so you might find j__j__objc_msgSend.

F# Array.Parallel hanging

I have been struggling with parallel and async constructs in F# for the last couple days and not sure where to go at this point. I have been programming with F# for about 4 months - certainly no expert - and I currently have a series of calculations that are implemented in F# (asp.net 4.5) and are working correctly when executed sequentially. I am running the calculations on a multi-core server and since there are millions of inputs to perform the same calculation on, I am hoping to take advantage of parallelism to speed it up.
The calculations are extremely data parallel - basically the exact calculation on different input data. I have tried a number of different avenues and I continually run into the same issue - it seems as if the parallel looping never gets to the end of the input data set. I have tried TPL, ConcurrentQueues, Parallel.Array.map/iter and all the same result: the program starts out fine and then somewhere in the middle (indeterminate) it just hangs and never completes. For simplicity I actually removed the calculation from the program and I am just calling a print method, and Here is where the code is currently at:
let runParallel =
let ids = query {for c in db.CustTable do select c.id} |> Seq.take(5)
let customerInputArray= getAllObservations ids
Array.Parallel.iter(fun c -> testParallel c) customerInputArray
let key = System.Console.ReadKey()
0
A few points...
I limited the results above to only 5 just for debugging. The actual program does not apply the Take(5).
The testParallel method is just a printfn "test".
The customerInputArray is a complex data type. It is a tuple of lists that contain records. So I am pretty sure my problem must be there...but I added exception handling and no exception is getting raised, so have no idea how to go about finding the problem.
Any help is appreciated. Thanks in advance.
EDIT: Thanks for the advice...I think it is definitely deadlock. When I remove all of the printfn, sprintfn, and string concat operations, it completes. (of course, I need those things in there.)
Is printfn, sprintfn, and string ops not thread-safe?
Another EDIT: Iteration always stops on the last item..So if my input array has 15 items, the processing stops on item 14, or seems to never get to item 15. Then everything just hangs. Does not matter what the size of the input array is..Any ideas what can be causing this? I even switched over to Parallel.ForEach (instead of Array.Parallel) and same behavior.
Update on the situation and how I resolved this issue.
I was unable to upload code from my example due to my company's firewall policy, so in the end my question did not have enough details. I failed to mention that I was using a type provider which was important information in this situation. But here is what I figured out.
I am using the F# type provider for SQL Server and was passing around its Service Types which I suspect are not thread-safe. When I replaced the ServiceTypes with plain old F# Records, the code worked fine - no more deadlocks and everything completed without error.

Resources