Cobol RELEASE statement? - cobol

I am working on my first project with Cobol Sort VSAM files. I ran into a keyword RELEASE.
The way the book read that I have is the "The release statement transfers records from the INPUT PROCEDURE to the input Phase of the sort operation.
My Question is: That takes what ever I have in my Sort-Rec (or whatever I called it) and sends it directly into the OUTPUT PROCEDURE part of my SORT?
Seems confusing to me here.
Cobol Code:
SORT SORT-FILE ASCENDING KEY SORT-PROVIDER
INPUT PROCEDURE IS PROC-THE-REC THRU PTR-X
OUTPUT PROCEDURE IS WRITE-THE-RPT THRU WTR-X.
MOVE CC-CERT-NO TO SAVE-CERT-NO.
MOVE CC-CERT-STATUS TO SAVE-CERT-STATUS.
MOVE CC-CERT-BEGIN-DATE TO SAVE-CERT-BEGIN-DATE.
MOVE CC-CERT-END-DATE TO SAVE-CERT-END-DATE.
MOVE CC-CERT-FUNDING TO SAVE-CERT-FUNDING.
MOVE CC-PROV-NUMB TO SAVE-PROV-NUMB.
MOVE CC-PROV-RES-CNTY TO SAVE-PROV-RES-CNTY.
MOVE CC-PROV-TYPE TO SAVE-PROV-TYPE.
MOVE CC-WORKER-USERID TO SAVE-WORKER-USERID.
MOVE CC-WORKER-NAME TO SAVE-WORKER-NAME.
RELEASE SORT-REC.

Following from Bill,
When using a sort in Cobol is a bit like having two or three separate programs in the one program.
Pre-Sort (PROC-THE-REC in your case)
|
V
Sort
|
V
Post sort (WRITE-THE-RPT in your case)
The RELEASE statement "Writes" a record to the "Sort" step.
In Unix you could achieve the same thing by
writing 2 separate programs (Pre and post sort)
Replacing the Release with Writes
piping the output from the Pre-Sort to the sort.
On the mainframe you would use 3 JCL steps and temporary files.
On the mainframe, most sites I worked at discourage (ban) the use of the Cobol sort verb and you would write 2 programs and use the utility sort.

The release statement transfers records from the INPUT PROCEDURE to
the input Phase of the sort operation.
The input Phase of a sort is where SORT gets its data from, in this case.
COBOL Program
Loop-construct
Some COBOL code
Release
Next
The sort is actually an external program. In the case of the Mainframe,
it is the installed SORT product (usually DFSORT or SyncSort)
Input Phase of SORT
SORT
Output Phase of SORT
Another-Loop-construct
Some COBOL code
Return
Next
COBOL Program
Your input procedure will process, release, and then continue. When all data has been released, the sort will take place. The sorted records will be presented back to your program at the point of the RETURN statement you will have coded, and this will continue (return, stuff after return, another return, repeat until finished) until all the sorted file is processed (assuming that nothing goes wrong an you want to stop early).

normally, COBOL sort procedures are :
SORT SORTFILE ON SORT-ID, SORT-NAME, SORT-PHONE
INPUT PROCEDURE IS READ-IN
OUTPUT PROCEDURE IS PRINT-SORTED.
READ-IN SECTION.
loop:
READ INPUTFILE.
MOVE IN-ID TO SORT-ID.
MOVE IN-NAME TO SORT-NAME.
MOVE IN-PHONE TO SORT-PHONE.
RELEASE SORT-REC.
PRINT-SORTED SECTION.
loop:
RETURN SORT-REC.
DISPLAY "id#: " SORT-ID.
DISPLAY "Name: " SORT-NAME.
DISPLAY "Phone: " SORT-PHONE.

An INPUT PROCEDURE supplies records to the sort process by writing them to the work file declared in the SORT's SD entry. But to write the records to the work file a special verb - the RELEASE verb is used.
An operational template for an INPUT PROCEDURE, which gets records from and input file and RELEASEs them to the work file, is shown below.
OPEN INPUT InFileName
READ InFileName
PERFORM UNTIL TerminatingCondition
Process input record to create SDWorkRec
RELEASE SDWorkRec
READ InFileName
END-PERFORM
CLOSE InFileName
For further details on the SORT see my tutorial at http://www.csis.ul.ie/cobol/course/SortMerge.htm

Related

Splitting a string variable delimited list into individual binary variables in SPSS

I have a string variable created from a checkbox questions (Which of the following assets do you own?)
I am trying to create individual binary variables for each type of asset based on whether that number is present in the string list.
The syntax I am using cannot differentiate between 1 and 11.
do repeat wrd="1," ",2," ",3," ",4," ",5," ",6," ",7,"/NewVar= W3_CG_asset_TV_1 W3_CG_asset_radio_2 W3_CG_asset_payTV_3 W3_CG_asset_tel_4 W3_CG_asset_cellphone_5
W3_CG_asset_fridge_6 W3_CG_asset_freezer_7.
compute NewVar=char.index(W3_CG_HouseExpen1, wrd)>0.
end repeat.
do repeat wrd= ",8," ",9," ",10," ",11," ",12," ",13," ",14," ",15," ",16," ",17," ",18," ",19," /NewVar= W3_CG_asset_electricstove_8 W3_CG_asset_primusstove_9
W3_CG_asset_gasstove_10 W3_CG_asset_electrickettle_11 W3_CG_asset_microwave_12 W3_CG_asset_computer_13 W3_CG_asset_electricity_14 W3_CG_asset_geyser_15
W3_CG_asset_washingmachine_16 W3_CG_asset_workingvehicle_17 W3_CG_asset_bicycle_18 W3_CG_asset_donkeyhorse_19.
compute NewVar=char.index(W3_CG_HouseExpen1, wrd)>0.
end repeat.
I have tested this on SPSS 28.
make sure the column W3_CG_HouseExpen1 is string and length of it long enough to hold the data.
Then I added execute
data list list/W3_CG_HouseExpen1 (a50).
begin data
"1,2,11,12,"
"2,12,"
"1,2,"
"1,11,12,"
end data.
do repeat
wrd="1," ",2," ",3," ",4," ",5," ",6," ",7," ",8," ",9," ",10," ",11," ",12," ",13," ",14," ",15," ",16," ",17," ",18," ",19,"
/NewVar = W3_CG_asset_TV_1 W3_CG_asset_radio_2 W3_CG_asset_payTV_3 W3_CG_asset_tel_4 W3_CG_asset_cellphone_5 W3_CG_asset_fridge_6 W3_CG_asset_freezer_7
W3_CG_asset_electricstove_8 W3_CG_asset_primusstove_9 W3_CG_asset_gasstove_10 W3_CG_asset_electrickettle_11 W3_CG_asset_microwave_12 W3_CG_asset_computer_13 W3_CG_asset_electricity_14 W3_CG_asset_geyser_15
W3_CG_asset_washingmachine_16 W3_CG_asset_workingvehicle_17 W3_CG_asset_bicycle_18 W3_CG_asset_donkeyhorse_19.
compute NewVar=char.index(W3_CG_HouseExpen1, wrd)>0.
end repeat.
EXECUTE.
My suggestion is to run through this in reverse, erasing the values you've already recognized. So if you've got "11" and erased it, when you later search for "1" you won't find it in an "11".
I recreated a tiny exaple dataset to demonstrate on (EDIT-improved example):
data list list/W3_CG_HouseExpen1 (a50) .
begin data
"1,2,11,12,"
"11,12,"
"2,11,"
end data.
Now I do the whole process on a copy of the original W3_CG_HouseExpen1 variable so I can eat it away without damage to the original data:
string #temp(a50).
compute #temp=W3_CG_HouseExpen1.
do repeat wrd="12," "11," "2," "1," /NewVar= W3_12 W3_11 W3_2 W3_1.
compute NewVar=char.index(#temp, wrd)>0.
compute #temp=replace(#temp, wrd, ""). /*deleting the search string from the full string.
end repeat.
exe.

Why is my if statement not determining the correct output in two nested performs?

I have this Cobol paragraph that will search one table which at this point in my example would have a table counter of 2 which is what the first INDEX loop does. The variable A represents an Occurs that is defined in a file (include) which has 5 occurrences. I can get to the if statement but it returns false. I read the information out of a ParmCard and store that in the table which is Table-B and the ParmCard is correct.
I did get it to find one value when was changing values around (conditional statements) but I know that both of the values that it is looking for in the ParmCard are in the file and should be found and it should find two results. I would have tried Expeditor but the system was down at work.
Is there something wrong with the index or may be I think that the perform's are working one way but they are really working a different way? This Search paragraph gets executed with every read of the ID file thus it will look in the table as many times as the ID file has an ID and ID symbols are unique.
Question: Why would the IF-STATEMENT not be working?
Code:
SEARCH-PARAGRAPH.
PERFORM VARYING SUB FROM 1 BY 1 UNTIL SUB > 2 <--DUPLICATE INDEXER
IF A(TAB) = TABLE-B(SUB) THEN
MOVE 6 TO TAB
MOVE 'TRUE' TO FOUND-IS
PERFORM WRITE-FILE THRU X-WF
PERFORM LOG-RESULT THRU X-LR
END-IF
END-PERFORM
X-SP. EXIT.
SEARCH-INDEX.
PERFORM VARYING I FROM 1 BY 1 UNTIL I > 2
DISPLAY 'INDEX --> ' I
PERFORM VARYING TAB FROM 1 BY 1 UNTIL TAB > 5
DISPLAY 'TAB --> ' TAB
PERFORM SEARCH-PARAGRPAH THRU X-SP
END-PERFORM
END-PERFORM.
X-SEARCH-INDEX. EXIT.
Here is the way that it works now and I do get the results I want. It is difficult to past the company code up because you never know who might have a problem.
New Code:
READ-PROV.
READ P-FILE
AT END
MOVE 'Y' TO EOF2
GO TO X-READ-PROV
NOT AT END
ADD 1 TO T-REC-READ
MOVE P-RECORD TO TEST-RECORD
PERFORM CHECK-MATCH THRU X-CHECK-MATCH
END-READ.
X-READ-PROV. EXIT.
CHECK-MATCH.
PERFORM VARYING SUB FROM 1 BY 1 UNTIL SUB > TABLECOUNTER
IF PID >= FROM(SUB) AND
PID <= THRU(SUB) THEN
IF TODAY < P-END-DTE THEN
IF TOTAL-PD = 0 AND
TOTAL-PD = 0 AND
TOTAL-PD = 0 AND
TOTAL-PD = 0 AND
TOTAL-PD = 0 THEN
IF PBILLIND NOT EQUAL 'Y'
PERFORM VARYING TAB FROM 1 BY 1 UNTIL TAB > 5
IF P-CD(TAB) = TY(SUB) THEN
MOVE 6 TO TAB
DISPLAY('***Found***')
ADD 1 TO T-REC-FOUND
END-IF
END-PERFORM
END-IF
END-IF
END-IF
END-IF
END-PERFORM.
X-CM. EXIT.
We can't tell.
There is nothing "wrong" with your nested PERFORM. The IF test is failing simply because it is never true.
We can't get you further with that without seeing your data-definitions, sample input and expected output.
However... my guess would be that the problem is with your data from the PARM in the JCL. That is the most likely area.
It is of course possible that the problem is with the other definition.
A couple of things whilst waiting.
Please always post the actual code, always. We don't want to look for errors in what you have typed here, we want to see the actual code. You have not shown the actual code, because it will not compile, as INDEX is a Reserved Word in COBOL, so you can't use it for a data-name.
Please always bear in mind that what you think may be wrong may not be the problem, so post everything we are likely to need (data-definitions, data you used, actual results you got with the code (including anything you've added for problem-determination), results which were expected).
Some tips.
A paragraph requires a full-stop/period after the paragraph-name and before the next paragraph. If you put that second full-stop/period on a line of its own, and have no full-stops/periods attached to your PROCEDURE code itself, you'll make things look neater and avoid problems when you want to copy some lines which happen to have a full-stop/period to a place where they cause you a mess.
You are using literal values. This is bad. When the number of entries in one of your tables changes, you have to change those literal values. Say the 2 needs to be changed to 5. You have to look at every occurrence of the literal 2 and decide if it needs to be changed. Then change it to 5. Then you get another request, to change the table which originally had five entries so that it will have six. See how difficult/error-prone life can be?
If instead you have unique and well-named data-names for your maximum number of entries, you only have one place to make a change, and you know it can be changed without reference to the rest of the code (assuming someone clever hasn't seen it has a value they want for something, and use it despite its name, of course...).
The content of those fields you can set automatically:
01 TABLE-1.
05 FILLER OCCURS 2 TIMES.
10 A PIC X(10).
01 TABLE-2.
05 FILLER OCCURS 5 TIMES.
10 TABLE-B PIC X(10).
01 TABLE-1-NO-OF-ENTRIES COMP PIC 9(4).
01 TABLE-2-NO-OF-ENTRIES COMP PIC 9(4).
...
PROCEDURE DIVISION.
...
COMPUTE TABLE-1-NO-OF-ENTRIES = LENGTH OF TABLE-1
/ LENGTH OF A
COMPUTE TABLE-2-NO-OF-ENTRIES = LENGTH OF TABLE-2
/ LENGTH OF TABLE-B
DISPLAY TABLE-1-NO-OF-ENTRIES
DISPLAY TABLE-2-NO-OF-ENTRIES
That gives you the output 2 and 5.
The names I've used are a mixture of yours and some for demonstration purposes only. Make everything meaningful, and by that I don't mean trite, as my example names would be in real life.
If you insist on escaping from within your PERFORM like that (and take note of Bruce Martin's comment), you can calculate your escape value by using new, aptly-named, fields and giving them the value of the above plus one.
To do a nested loop when the outer loop only has two entries is overkill. You don't need to escape out of the loops like you do, if you have a termination condition on the loop.
That'll do for now until we see your definitions, data and results.

Boost spirit can handle Postscript/PDF like languages?

I noticed that Boost spirit offers some limits, in a question here on SO there is an user asking for help about boost spirit and the other user who gave the answer specified that boost spirit works well with statements and not with "generic text" ( I'm sorry if I don't recall it correctly ).
Now I would like to think about Postscript and PDF in terms of tokens and simplify my approach to this formats this way, the problem is that the PDF is kind of a mix between a markup language and a programming language with jumps and tables in it, and I can't think about something similar when considering the most popular file formats like XML, C++ code and others languages and formats.
There is also another fact: I can't really find people that had some kind of experience with boost::spirit wiriting a pdf parser or writer, so I'm asking, boost::spirit it's capable of parsing a PDF file and output the elements as tokens ?
Although this has nothing to do with Boost, let me assure you that the parsing of PDF (and PostScript) are about as trivial as you could want. Let's say that you have a scanner object that returns a series of tokens. The token types you will get from the scanner are:
String
Dict begin (<<)
Dict End (>>)
Name (/whatever)
Number
Hex array
Left Angle (<)
Right Angle (>)
Array begin ([)
Array end (])
Procedure begin ({)
Procedure end (})
Comment (%foo)
Word
My scanner is a finite-state automata with states for Start, Comment, String, HexArray, Token, DictEnd, and Done.
The way you parse PDF is not by parsing it, but by executing it. Given these tokens, my "parser" looks like this (in C#):
while (true) {
MLPdfToken = scanner.GetToken();
if (token == null)
return MachineExit.EndOfFile;
PdfObject obj = PdfObject.FromToken(token);
PdfProcedure proc = obj as PdfProcedure;
if (proc != null)
{
if (IsExecuting())
{
if (token.Type == PdfTokenType.RBrace)
proc.Execute(this);
else
Push(obj);
}
else {
proc.Execute(this);
}
if (proc.IsTerminal)
return Machine.ParseComplete;
}
else {
Push(obj);
}
}
I'll also add that if you give every PdfObject an Execute() method such that the base class implementation is machine.Push(this) and IsTerminal that returns false, the REPL gets easier:
while (true) {
MLPdfToken = scanner.GetToken();
if (token == null)
return MachineExit.EndOfFile;
PdfObject obj = PdfObject.FromToken(token);
if (IsExecuting())
{
if (token.Type == PdfTokenType.RBrace)
obj.Execute(this);
else
Push(obj);
}
else {
obj.Execute(this);
if (obj.IsTerminal)
return Machine.ParseComplete;
}
}
There's more support in Machine - Machine has a Stack of PdfObject and a few methods for accessing it (Push, Pop, Mark, CountToMark, Index, Dup, Swap), as well as ExecProcBegin and ExecProcEnd.
Beyond that, it's very light. The only thing that is slightly odd is that PdfObject.FromToken takes a token and if it is a primitive type (number, string, name, hex, bool) returns a corresponding PdfObject. Otherwise, it takes the given token and looks in a "proc set" dictionary of procedure names associated with PdfProcedure objects. So when you encounter the token << that gets looked up in a the proc set and comes up with this code:
void DictBegin(PdfMachine machine)
{
machine.Push(new PdfMark(PdfMarkType.Dictionary));
}
So << really means "mark the stack as the start of a dictionary. >> gets more interesting:
void DictEnd(PdfMachine machine)
{
PdfDict dict = new PdfDict();
// PopThroughMark pops the entire stack up to the first matching mark,
// throws an exception if it fails.
PdfObject[] arr = machine.PopThroughMark(PdfMarkType.Dictionary);
if ((arr.Length & 1) != 0)
throw new PdfException("dictionaries need an even number of objects.");
for (int i=0; i < arr.Length; i += 2)
{
PdfObject key = arr[i], val = arr[i + 1];
if (key.Type != PdfObjectType.Name)
throw new PdfException("dictionaries need a /name for the key.");
dict.put((PdfName)key, val);
}
machine.Push(dict);
}
So >> Pops up to the nearest dictionary mark into an array then puts each pair into the dictionary. Now, I could have done this without allocating the array. I could just pop pairs, putting them into the dictionary until I either hit the mark, fail to get a name or underflow the stack.
The important takeaway is that there really isn't any syntax in PDF, nor is there any in PostScript. At least not so much as you'd notice. The only real Syntax (and the read-eval-(push) loop shows it) is '}'.
So when you this is a PDF 14 0 obj << /Type /Annot /SubType /Square >> endobj what your really seeing is a series of procedures:
Push 14
Push 0
Execute obj (Pop two numbers and push a "definition" object).
Execute dictionary begin
Push /Type
Push /Annot
Push /SubType
Push /Square
Execute dictionary end
Execute endobj (pop the top object and then get (not pop) the next one. If the second is a definition, set its "value" to the first object, else throw).
Since "endobj" is terminal, parsing ends and the top of the stack is the result.
So when you are asked to look up object 14 in the PDF, the cross-reference table tells you where to seek to, you make a new Machine with the stream pointer at that location and run it. If the top of the stack is a "definition" object, you've succeeded.
About now you should be nodding but not trusting me, since you're thinking about PDF streams, which look like this:
<< [/key value]* >> stream ...raw data... endstream endobj
Again, there is no syntax. The proc stream looks at the top of the stack, which should be a PdfDict. If it is, it consumes characters until the next newline (scanner does this), stores the current file position in the stream as data start, reads the stream length from the dict (which may cause another Machine to get newed up), and skips past the end of stream and pushes the new stream object on the stack. endstream is a no-op. The only difference between a PdfDict and a PdfStream is that a PdfStream has a start position and a bool saying that it's a stream, otherwise I dual-purpose the object.
PostScript is almost identical except that the execution environment is a little more complex. For example, you need several stacks in your machine: a parameter stack, a dictionary stack, and an execution stack. From there, you more or less just bind your tokenizer into the set of primitive procedures as well as the word exec, and then most of your interpreter is written in PS itself.
If you're talking about boost, you're looking at C++, which means that you can't be as fast and loose with memory as I am, so you'll want to either use smart pointers or figure out where you scope is and be careful to dispose objects instead of blithely throwing them away, but that's just the normal C++ stuff.
Currently, I make PDF tools for my company in .NET, but in a former life I worked on Acrobat versions 1-4, and most of what I described is exactly what Acrobat did under the hood (well, more or less - it was C, not C++, but it's the same approach).
With respect to the xref table (or xref stream), you read that first - the spec tells you that if you jump to EOF and scan back, you find the start of the xref table. You parse that (which is a CS 101 assignment), parse the trailer, seek to the /Prev if any and repeat until no more /Prev entries. That gives you a complete xref for looking up objects.
As for writing - there are a number of approaches that you can take. The most obvious one is that when an object is meant to be referenced, you create a new reference object by assigning the newest available xref entry to it. Whenever objects refer to other objects for writing, they ask if these objects are referenced. If they are, they write the reference (ie, 14 0 R). When it comes time to write a referenced object, you get the current stream pointer and store it in the xref, then write <objnum> <generation> obj <object contents> endobj. For example, my code to write a dictionary looks like this:
public override ToStream(PdfStreamingContext context)
{
if (context.HasReference(this)) // is object referenced in xref
{
PdfUtils.WriteObjectDefinitionBegin(this, context);
}
context.Writer.Indent();
context.Writer.WriteLine("<<");
WriteContents(context);
context.Writer.Exdent();
context.Writer.Writeline(">>");
if (context.HasReference(this))
{
PdfUtils.WriteObjectDefinitionEnd(this, context);
}
}
I've chopped out some chaff so you can see the wheat underneath. The context is an object that holds a new xref table as well as an object for writing to streams that automagically handles appropriate newline discipline, indentation, line wrapping, and so on.
What you should see is that the basics here are straight forward, if not trivial. And now's when you should be asking yourself the question, "if it's trivial, how come there isn't more (serious) competition for Acrobat in the market? The answer is that even though it's trivial, it's still easy to write PDFs that aren't spec compliant and Acrobat handles most of those. The real challenge is to be able to honor the spec and make sure that you include all required values in a dictionary and that they are in range and semantically correct. Hell, even the date time format--which is pretty well-specified--is a mound of special case code in my library to manage where other people have screwed it up royally. Being able to generate consistently correct PDF is hard and consuming the garbage in the sea of PDFs in the world is harder.
I could (and probably should) write a book about how to do this. While a lot of the fringe code is grubby, the overall structure can be very pretty.
tl;dr - If you're thinking of a recursive descent parser for PDF, you're thinking too hard. All you need is a tokenizer and a simple REPL.

Can I get a list of all currently-registered atoms?

My project has blown through the max 1M atoms, we've cranked up the limit, but I need to apply some sanity to the code that people are submitting with regard to list_to_atom and its friends. I'd like to start by getting a list of all the registered atoms so I can see where the largest offenders are. Is there any way to do this. I'll have to be creative about how I do it so I don't end up trying to dump 1-2M lines in a live console.
You can get hold of all atoms by using an undocumented feature of the external term format.
TL;DR: Paste the following line into the Erlang shell of your running node. Read on for explanation and a non-terse version of the code.
(fun F(N)->try binary_to_term(<<131,75,N:24>>) of A->[A]++F(N+1) catch error:badarg->[]end end)(0).
Elixir version by Ivar Vong:
for i <- 0..:erlang.system_info(:atom_count)-1, do: :erlang.binary_to_term(<<131,75,i::24>>)
An Erlang term encoded in the external term format starts with the byte 131, then a byte identifying the type, and then the actual data. I found that EEP-43 mentions all the possible types, including ATOM_INTERNAL_REF3 with type byte 75, which isn't mentioned in the official documentation of the external term format.
For ATOM_INTERNAL_REF3, the data is an index into the atom table, encoded as a 24-bit integer. We can easily create such a binary: <<131,75,N:24>>
For example, in my Erlang VM, false seems to be the zeroth atom in the atom table:
> binary_to_term(<<131,75,0:24>>).
false
There's no simple way to find the number of atoms currently in the atom table*, but we can keep increasing the number until we get a badarg error.
So this little module gives you a list of all atoms:
-module(all_atoms).
-export([all_atoms/0]).
atom_by_number(N) ->
binary_to_term(<<131,75,N:24>>).
all_atoms() ->
atoms_starting_at(0).
atoms_starting_at(N) ->
try atom_by_number(N) of
Atom ->
[Atom] ++ atoms_starting_at(N + 1)
catch
error:badarg ->
[]
end.
The output looks like:
> all_atoms:all_atoms().
[false,true,'_',nonode#nohost,'$end_of_table','','fun',
infinity,timeout,normal,call,return,throw,error,exit,
undefined,nocatch,undefined_function,undefined_lambda,
'DOWN','UP','EXIT',aborted,abs_path,absoluteURI,ac,accessor,
active,all|...]
> length(v(-1)).
9821
* In Erlang/OTP 20.0, you can call erlang:system_info(atom_count):
> length(all_atoms:all_atoms()) == erlang:system_info(atom_count).
true
I'm not sure if there's a way to do it on a live system, but if you can run it in a test environment you should be able to get a list via crash dump. The atom table is near the end of the crash dump format. You can create a crash dump via erlang:halt/1, but that will bring down the whole runtime system.
I dare say that if you use more than 1M atoms, then you are doing something wrong. Atoms are intended to be static as soon as the application runs or at least upper bounded by some small number, 3000 or so for a medium sized application.
Be very careful when an enemy can generate atoms in your vm. especially calls like list_to_atom/1 is somewhat dangerous.
EDITED (wrong answer..)
You can adjust number of atoms with +t
http://www.erlang.org/doc/efficiency_guide/advanced.html
..but I know very few use cases when it is necessary.
You can track atom stats with erlang:memory()

Is there a safe way to clean up stack-based code when jumping out of a block?

I've been working on Issue 14 on the PascalScript scripting engine, in which using a Goto command to jump out of a Case block produces a compiler error, even though this is perfectly valid (if ugly) Object Pascal code.
Turns out the ProcessCase routine in the compiler calls HasInvalidJumps, which scans for any Gotos that lead outside of the Case block, and gives a compiler error if it finds one. If I comment that check out, it compiles just fine, but ends up crashing at runtime. A disassembly of the bytecode shows why. I've annotated it with the original script code:
[TYPES]
<SNIPPED>
[VARS]
Var [0]: 27 Class TFORM
Var [1]: 28 Class TAPPLICATION
Var [2]: 11 S32 //i: integer
[PROCS]
Proc [0] Export: !MAIN -1
{begin}
[0] ASSIGN GlobalVar[2], [1]
{ i := 1;}
[15] PUSHTYPE 11(S32) // 1
[20] ASSIGN Base[1], GlobalVar[2]
{ case i of}
[31] PUSHTYPE 25(U8) // 2
{ 0:}
[36] COMPARE into Base[2]: [0] = Base[1]
[57] COND_NOT_GOTO currpos + 5 Base[2] [72]
{ end;}
[67] GOTO currpos + 41 [113]
{ 1:}
[72] COMPARE into Base[2]: [1] = Base[1]
[93] COND_NOT_GOTO currpos + 10 Base[2] [113]
{ goto L1;}
[103] GOTO currpos + 8 [116]
{ end;}
[108] GOTO currpos + 0 [113]
{ end; //<-- case}
[113] POP // 1
[114] POP // 0
{ Exit;}
[115] RET
{L1:
Writeln('Label L1');}
[116] PUSHTYPE 17(WideString) // 1
[121] ASSIGN Base[1], ['????????']
[144] CALL 1
{end.}
[149] POP // 0
[150] RET
Proc [1]: External Decl: \00\00 WRITELN
The "goto L1;" statement at 103 skips the cleanup pops at 113 and 114, which leaves the stack in an invalid state.
Delphi doesn't have any trouble with this, because it doesn't use a calculation stack. PascalScript, though, is not as fortunate. I need some way to make this work, as this pattern is very common in some legacy scripts from a much simpler system with little in the way of control structures that I've translated to PascalScript and need to be able to support.
Anyone have any ideas how to patch the codegen so it'll clean up the stack properly?
IIRC the goto rules in classic pascals were:
jumps are only allowed out of a block (iow from a higher to a lower nesting level on the "same" branch of the tree)
from local procedures to their parents.
The later was afaik never supported by Borland derived Pascals, but the first still holds.
So you need to generate exiting code like Martin says, but possibly it can be for multiple block levels, so you can't have a could codegeneration for each goto, but must generate code (to exit the precise number of needed blocks).
A typical test pattern is to exit from multiple nested ifs (possibly within a loop) using a goto, since that was a classic microoptimization that was faster at least up to D7.
Keep in mind that the if evaluation(s) and the begin..end blocks of their branches might have generated temps that need cleanup.
---------- added later
I think the codegenerator needs a way to walk the scopes between the goto and its endpoint, generating the relevant exit code for blocks along the way. That way a fix works for the general case and not just this example.
Since you can only jump out of scopes, and not into it that might not that be that hard.
IOW generate something that is equivalent to (for a hypothetical double case block)
Lgoto1gluecode:
// exit code first block
pop x
pop y
// exit code first block
pop A
pop B
goto real_goto_destination
Additional analysis can be done. E.g. if there is only one scope, and it has already a cleanup exit label, you can jump directly. If you know for certain that the above pop's are only discarded values (and not saves of registers) you can do them at once with add $16,%esp (4*4 byte values) etc.
The straightforward solution would be:
When generating a GOTO for goto statement, prefix the GOTO with the same cleanup code that comes before RET.
It looks to me like the calculation of how far to jump forward is the problem. I would have to spend some time looking at the implementation of the parser to help further, but my guess would be that additional handling must be performed when using a goto and there are values on the stack AND the goto would be placed after those values would be removed from the stack. Of course to determine this you would need to save the current location being parsed (the goto) and the forward parse to the target location watching for stack changes, and if so then to either adjust the goto location backwards, or inject the code as Martin suggested.

Resources