How do I control where new Forth words will be compiled? - forth

Is there a way to change what HERE points to, or some other way to make sure that the next definition I compile will end up in some special location? Or can I copy a definition to somewhere else once it's made? Ideally I'd want a solution in ANS Forth, but something Gforth-specific is good enough.

You can use ALLOT to move the data space pointer in both directions:
: here! ( a -- ) here - allot ;
As for gforth, this simpler version should work:
: here! dp ! ;
Having said this, HERE isn't necessarily where new words will be compiled. HERE points to data space, whereas definitions are written to name space and code space. However, in a traditional design like Gforth, the three are a single contiguous region. See DPANS94 3.3.

Related

How to force nom to parse the whole input string?

I am working with nom version 6.1.2 and I am trying to parse Strings like
A 2 1 2.
At the moment I would be happy to at least differentiate between input that fits the requirements and inputs which don't do that. (After that I would like to change the output to a tuple that has the "A" as first value and as second value a vector of the u16 numbers.)
The String always has to start with a capital A and after that there should be at least one space and after that one a number. Furthermore, there can be as much additional spaces and numbers as you want. It is just important to end with a number and not with a space. All numbers will be within the range of u16. I already wrote the following function:
extern crate nom;
use nom::sequence::{preceded, pair};
use nom::character::streaming::{char, space1};
use nom::combinator::recognize;
use nom::multi::many1;
use nom::character::complete::digit1;
pub fn parse_and(line: &str) -> IResult<&str, &str>{
preceded(
char('A'),
recognize(
many1(
pair(
space1,
digit1
)
)
)
)(line)
}
Also I want to mention that there are answers for such a problem which use CompleteStr but that isn't an option anymore because it got removed some time ago.
People explained that the reason for my behavior is that nom doesn't know when the slice of a string ends and therefore I get parse_and: Err(Incomplete(Size(1))) as answer for the provided example as input.
It seems like that one part of the use declarations created that problem. In the documentation (somewhere in some paragraph way to low that I looked at it) it says:
"
Streaming / Complete
Some of nom's modules have streaming or complete submodules. They hold different variants of the same combinators.
A streaming parser assumes that we might not have all of the input data. This can happen with some network protocol or large file parsers, where the input buffer can be full and need to be resized or refilled.
A complete parser assumes that we already have all of the input data. This will be the common case with small files that can be read entirely to memory.
"
Therefore, the solution to my problem is to swap use nom::character::complete::{char, space1}; instead of nom::character::streaming::{char, space1}; (3rd loc without counting empty lines). That worked for me :)

Is there a solution for transpiling Lua labels to ECMAScript3?

I'm re-building a Lua to ES3 transpiler (a tool for converting Lua to cross-browser JavaScript). Before I start to spend my ideas on this transpiler, I want to ask if it's possible to convert Lua labels to ECMAScript 3. For example:
goto label;
:: label ::
print "skipped";
My first idea was to separate each body of statements in parts, e.g, when there's a label, its next statements must be stored as a entire next part:
some body
label (& statements)
other label (& statements)
and so on. Every statement that has a body (or the program chunk) gets a list of parts like this. Each part of a label should have its name stored in somewhere (e.g, in its own part object, inside a property).
Each part would be a function or would store a function on itself to be executed sequentially in relation to the others.
A goto statement would lookup its specific label to run its statement and invoke a ES return statement to stop the current statements execution.
The limitations of separating the body statements in this way is to access the variables and functions defined in different parts... So, is there a idea or answer for this? Is it impossible to have stable labels if converting them to ECMAScript?
I can't quite follow your idea, but it seems someone already solved the problem: JavaScript allows labelled continues, which, combined with dummy while loops, permit emulating goto within a function. (And unless I forgot something, that should be all you need for Lua.)
Compare pages 72-74 of the ECMAScript spec ed. #3 of 2000-03-24 to see that it should work in ES3, or just look at e.g. this answer to a question about goto in JS. As usual on the 'net, the URLs referenced there are dead but you can get summerofgoto.com [archived] at the awesome Internet Archive. (Outgoing GitHub link is also dead, but the scripts are also archived: parseScripts.js, goto.min.js or goto.js.)
I hope that's enough to get things running, good luck!

Determine Cobol coding style

I'm developing an application that parses Cobol programs. In these programs some respect the traditional coding style (programm text from column 8 to 72), and some are newer and don't follow this style.
In my application I need to determine the coding style in order to know if I should parse content after column 72.
I've been able to determine if the program start at column 1 or 8, but prog that start at column 1 can also follow the rule of comments after column 72.
So I'm trying to find rules that will allow me to determine if texts after column 72 are comments or valid code.
I've find some but it's hard to tell if it will work everytime :
dot after column 72, determine the end of sentence but I fear that dot can be in comments too
find the close character of a statement after column 72 : " ' ) }
look for char at columns 71 - 72 - 73, if there is not space then find the whole word, and check if it's a key word or a var. Problem, it can be a var from a COPY or a replacement etc...
I'd like to know what do you think of these rules and if you have any ideas to help me determine the coding style of a Cobol program.
I don't need an API or something just solid rules that I will be able to rely on.
I think you need to know the COBOL compiler for each program. Its documentation should tell you what conventions/configurations/switches it uses to decide if the source code ends at column 72 or not.
So.... which compiler(s)?
And if you think the column 72 issue is a pain, wait till you get around to actually parsing the COBOL itself. If you are not well prepared to handle the lexical issues of the language, you are probably very badly prepared to handle the syntactic ones.
There is no absolutely reliable way to determine if a COBOL program
is in fixed or free format based only on the source code. Heck it is sometimes difficult to identify
the programming language based only on source code. Check out
this classic polyglot - it is valid under 8 different language compilers. That
said, you could try a few heuristics that might yield
the correct answer more often than not.
Compiler directives imbedded in source code
Watch for certain compiler directives that determine code format.
Unfortunately, every compiler vendor uses their own flavour of directive.
For example, Microfocus COBOL uses the
SOURCEFORMAT directive. This directive will appear near the top of the program so a short pre-scan
could be used to find it. On the other hand, OpenCobol uses >>SOURCE FORMAT IS FREE and
>>SOURCE FORMAT IS FIXED to toggle between free and fixed format, different parts of the same program
could be formatted differently!
The bottom line here is that you will have to support the conventions of multiple COBOL compilers.
Compiler switches
Source code format can be also be specified using a compiler switch. In this case, there are no concrete
clues to go on. However, you can be reasonably sure that the entire source program will be either
fixed or free. All you can do here is guess. Unless the programmer is out to "mess with
your head" (and some will), a program in free format will have the keywords IDENTIFICATION DIVISION or ID DIVISION, starting before column 8.
Every COBOL program will begin with these keywords so you can use them as the anchor point for determining code format in the
absence of imbedded compiler directives.
Warning - this is far from fool proof, but might be a good start.
There won't be an algorithm to do this with 100% certainty, because if comments can be anything, they can also be compilable COBOL code. So you could theoretically write a program that means one thing if the comments are ignored, and something else entirely if the comments are treated as part of the COBOL.
But that's extremely unlikely. What's most likely to happen is that if you try to compile the code under the wrong convention, it will simply fail. So the only accurate way to do this is to try compiling/parsing the program one way, and if you come to a line that can't make sense, switch to the other style. You could also support passing an argument to the compiler when the style is already known.
You can try using heuristics like what you've described, but that will never be totally accurate. The most they can give you is a probability that the code is one or the other style, which will increase as they examine more and more lines of code. They could be useful for helping you guess the style before you start compiling, or for figuring out when the problem is really just a typo in the code.
EDIT:
Regarding ideas for heuristics, it's hard to say. If there were a standard comment sigil like // or # in other languages, this would be a lot easier (actually, there is, but it sounds like your code doesn't follow this convention). The only thing I can think of would be to check whether every line (or maybe 99% of lines, and not counting empty lines or lines commented with *) has a period somewhere before position 72.
One thing you DON'T want to do is apply any heuristics to the part after position 72. That is, you don't want to be checking the comments to see if they're valid COBOL. You want to check what you know is COBOL first, and see if that works by itself. There are several reasons for this:
Comments written in English are likely to have periods and quotes in them, so your first and second bullet points are out.
Natural languages are WAY harder to parse than something like COBOL.
The comments could easily have COBOL in them (maybe someone commented out the previous version of the line).
An important rule for comments is that they should never affect what the program does. If changing the comments can change how the program is compiled, you violate that.
All that in mind, my opinion is that you shouldn't use heuristics at all. You should always try to compile the program under both conventions unless one is explicitly specified. There's a chance that code will compile successfully under both conventions, and then you'll have two different programs and no way to tell which one is correct.
If that happens, you need to compare the two results (perhaps with a hash or something) to see if they're the same program. If they're the same, great, but if not, you'll need to force the user to explicitly choose a convention.
Most COBOL compilers will allow you to generate and analyze the post text manipulation phase.
The text preprocessor output can be seen (using OpenCOBOL for the example)
cobc -E program.cob
The text manipulation processor deals with any COPY ... REPLACING compiler directives, as well as converting SOURCE FORMAT IS FIXED (with line continuations, string literal concatenations, comment line removal, among other things) to the actual free format that the compiler lexical analyzer needs. A lot of the OpenCOBOL toolkits (Cross referencer and Animator, to name two) use source code AFTER the preprocessor pass. I don't think you'll lose any street cred if your parser program relies on post processed source code files.

Strings in a separate .pas file

This may not be the correct place for this question, if not feel free to move it. I tagged as Delphi/Pascal because it's what I am working in atm, but this could apply to all programming I guess.
Anyway I am doing some code cleanup and thinking of moving all the strings in my program to a separate single .pas file. Are there any pros and cons to doing this? Is it even worth doing?
To clarify: I mean that I will be creating a separate file, Strings.pas in it I will make all my text string variables.
Ex
Current Code
Messages.Add('The voucher was NOT sent to ' + sName+
' because the application is in TEST MODE.');
Messages.Add('Voucher Saved to ' + sFullPath);
Messages.Add('----------------------------------------------------------');
New Code would be something like:
Messages.Add(sMsgText1 + '' + sName + '' + sMsgText2 + '' + sFullPath)
The Strings.pas file would hold all the string data. Hope that makes better sense
Moving your strings to a separate file is a good idea! It keeps them together and will let you easily change them if required. Your question doesn't say you want to be able to translate them, but centralizing will help that to.
But, code like:
Messages.Add(sMsgText1 + '' + sName + '' + sMsgText2 + '' + sFullPath)
is not better than code like:
Messages.Add('The voucher was NOT sent to ' + sName+
' because the application is in TEST MODE.');
You've turned a messy but readable function call into a messy and un-readable function call. With the old code (the second snippet just above), you can read the code and see roughly what the message is going to say, because a lot of it is there in text. With the new code, you can't.
Second, the reason for moving the strings to to keep related items together and make it easier to change them. What if you want to change the above message so that instead of saying "The file 'foo' in path 'bar'..." it is phrased "The file bar\foo is..."? You can't: the way the messages are built is still fixed and scattered throughout your code. If you want to change several messages to be formatted the same way, you will need to change lots of individual places.
This will be even more of a problem if your goal is to translate your messages, since often translation requires rephrasing a message not just translating the components. (You need to change the order of subitems included in your messages, for example - you can't just assume each language is a phrase-for-phrase in order substitution.)
Refactor one step further
I'd suggest instead a more aggressive refactoring of your message code. You're definitely on the right track when you suggest moving your messages to a separate file. But don't just move the strings: move the functions as well. Instead of a large number of Messages.Add('...') scattered through your code, find the common subset of messages you create. Many will be very similar. Create a family of functions you can call, so that all similar messages are implemented with a single function, and if you need to change the phrasing for them, you can do it in a single spot.
For example, instead of:
Messages.Add('The file ' + sFile + ' in ' + sPath + ' was not found.');
... and elsewhere:
Messages.Add('The file ' + sFileName + ' in ' + sURL + ' was not found.');
have a single function:
Messages.ItemNotFound(sFile, sPath);
...
Messages.ItemNotFound(sFileName, sURL);
You get:
Centralized message strings
Centralized message functions
Less code duplication
Cleaner code (no assembling of strings in a function call, just parameters)
Easier to translate - provide an alternate implementation of the functions (don't forget that just translating the substrings may not be enough, you often need to be able to alter the phrasing substantially.)
Clear descriptions of what the message is in the function name, such as ItemNotFount(item, path), which leads to
Clearer code when you're reading it
Sounds good to me :)
I think it makes a lot of sense to move all string constants to a single unit. It makes changing the texts a lot easier, especially if you want to translate to other (human) languages.
But instead of strings, why don't you do what I usually do, i.e. use resourcestring. That way, your strings can be changed by someone else with a resource editor, without recompilation.
unit Strings;
resourcestring
strMsgText1 = 'The voucher was NOT sent to ';
etc...
But such a string is probably better done as:
resourcestring
strVoucherNotSent =
'The voucher was NOT sent to %s because the application is in TEST MODE.';
strVoucherForNHasValueOf =
'The voucher for %s has a value of $%.2f';
The advantage of that is that in some languages, the placement and order of such substitutions is different. That way, a translator can place the arguments wherever it is necessary. Of course the application must then use Format() to handle the string:
Messages.Add(Format(strVoucherNotSent, [sName]));
Messages.Add(Format(strVoucherSavedTo, [sFullPath]));
Messages.Add(Format(strVoucherForNHasValueOf, [sName, dblValue]));
If you are wanting to translate the UI into different languages then you may benefit from having all your text in a single file, or perhaps a number of files dedicated to declaring string constants.
However, if you do this change without such a strong motivation, then you may just make your code hard to read.
Generally you have to ask what the benefits of such a major refactoring are, and if they are not self-evident, then you may well be changing things just for the sake of change.
If you want to translate your app, consider using Gnu GetText for delphi also known as dxGetText. This is better than putting your strings into a separate .pas file because it allows you to enable translation without any special tools, any recompilation, by even end users.
Well, the consensus here seems to tend towards the pro side, and I agree with that completely. Wrong overuse of string literals could lead to your source getting stringly typed.
One benefit I am still missing is reusability of strings. Although that is not a direct benefit from the shipping to another unit, it ís from moving the string literals to constants.
But a rather significant drawback is the required time to create this separate source file. You might consider to implement this good programming practice in the next project. It entirely depends on whether you have sufficient time (e.g. hobby project) or a deadline, on having a next project or not (e.g. student), or on just wanting to do some practice. Like David H answers and comments, as with all decisions you make, you have to weight the benefits.
Seen apart from all kinds of fancy refactoring tools that could provide in some automatic assistance, realize that moving the string literals to another unit alone does not get the job done. Like Rudy and David M already answered, you partly have to rewrite your source. Also, finding readable, short and applicable constant names dóes take time. As many comments already stated that having control over spelling an consistency is important, I like to think that same argument goes for the replacing constants itself as well.
And about the translation answers, whether being applicable to OP or not, moving all your source strings to a separate unit is only part of a translation solution. You also have to take care for designer strings (i.e. captions), and GUI compatibility: a longer translation still has to fit in your label.
If you have the luxury or the need to: go for it. But I would take this to the next project.
I did group all literals with resourcestrings in an older version of our framework. I came back from that since in frameworks you might not use all strings in a framework (e.g. because some units or unit groups are not used, but scanning the common dirs will show up all strings in your translation tool).
Now I distribute them again over units. I originally started grouping them to avoid duplicates, but in retrospect that was the lesser problem
(*) using dxgettext on the "common" dirs.

how to use GO TO in COBOL

I have the following code snippet in one of my COBOL program.
IF FIRST < SECOND
MOVE FIRST TO WS
END-IF.
MOVE SECOND TO WS.
MOVE WS TO RESULT.
I need to use GO TO inside the IF block to jump to the last statement (MOVE WS TO RESULT).
IF FIRST < SECOND
MOVE FIRST TO WS
GO TO <last line.(MOVE WS to RESULT)>
END-IF.
MOVE SECOND TO WS.
MOVE WS TO RESULT.
in other word, i need to skip "MOVE SECOND TO WS.".
what is the simplest way to jump to a specific line in cobol?
I read somewhere that this is possible by defining a PARAGRAPH, but don't know how to define it.
It might seems very simple but I'm newbie to COBOL programming.
Thanks.
----------------* UPDATE *----------
based on #lawerence solution, is this correct?
IF FIRST < SECOND
MOVE FIRST TO WS
GO TO C10-END.
END-IF.
MOVE SECOND TO WS.
C10-END.
MOVE WS TO RESULT.
i just moved back the last statement to be in first level.
GOTO can do what you're looking for, but IF/ELSE would be more direct. You want MOVE SECOND TO WS to run iff the IF block does not, correct?
IF FIRST < SECOND
MOVE FIRST TO WS
ELSE
MOVE SECOND TO WS
END-IF.
MOVE WS TO RESULT.
I hope I got the syntax right, I have never used COBOL and just tried to work off your snippet and this example http://www.fluffycat.com/COBOL/If-and-End-If/. There probably will be situations in the future where you need GOTO, but it A) should be avoided when another control structure will work and B) I haven't the slightest idea how its done
to be honest, COBOL looks pretty miserable lol. ive never seen a language so verbose. good luck with everytihng
EDIT
Mostly for joe...
Cant this all be better done with a min function? I'm sure the syntax is wrong, but:
Move Function Min(FIRST, SECOND) to RESULT
OMFSM! It is not 1974, why are you writing Cobol like that? This:
IF FIRST < SECOND
MOVE FIRST TO WS
END-IF.
MOVE SECOND TO WS.
MOVE WS TO RESULT.
Has a number of problems:
It uses periods to delimit scope, that is nearly three decades deprecated.
It isn't using ELSE
It is trying to use GO TO.
May I suggest the following as the way to approach it since 1985:
If FIRST < SECOND
Move FIRST to WS
Else
Move SECOND to WS
End-IF
Move WS to RESULT
But really, the code should simply read:
If FIRST < SECOND
Move FIRST to RESULT
Else
Move SECOND to RESULT
End-If
Unless the intermediate result is needed in WS. Cobol 66 and 74 used GOTO and periods because they lacked modern control structures. I realize you are a 'newbie', but you should suggest to whoever is teaching you that they really need to upgrade their skills.
jon_darkstar is right when it comes to improving the logic, however if you want to see how GO TO works here goes:
IF FIRST < SECOND
MOVE FIRST TO WS
GO TO C10-RESULT.
END-IF.
MOVE SECOND TO WS.
C10-RESULT.
MOVE WS TO RESULT.
C10-RESULT. starts a paragraph and has to be a unique name in your code SECTION. By convention it should also start with the same prefix as the enclosing section. Therefore this example assumes that your code SECTION is something like C00-MAIN-PROCESS SECTION.
A new non-Answer has brought this silly question to light.
ELSE in the IF is the 100% obvious answer. It is deeply odd that GO TO has to be used whereas ELSE may not be used.
Another way, surprised it didn't come up:
MOVE SECOND TO WS
IF FIRST LESS THAN SECOND
MOVE FIRST TO WS
END-IF
MOVE WS TO RESULT
No ELSE, no GO TO. Extra MOVE executed when FIRST is less than second, though.
To include a GO TO is simple, but stupid. Add a GO TO. A GO TO has to go somewhere (unless using ALTER ... TO PROCEED TO ..., which everyone hopes you weren't), so make a label at the point you want it to arrive, and add the name of that label to the GO TO.
A label is a user-defined word. If referenced (as in this case) it must be unique within a SECTION, if you use SECTIONs, which you don't need to, otherwise unique within the program and whether referenced or not it may not be the same name as something else (like a data-definition or the internal name of a file).
A label is a procedure-name. A procedure-name should terminate with a period/full-stop and the procedure itself should also terminate with a period/full-stop.
What about the MOVE FUNCTION MIN ( ... ) ... as a solution?
Well, it works. If other staff at your site are not used to it, you will not be thanked for using it (without prior discussion, anyway).
What does it do? Well, in Enterprise COBOL, the compiler generates an extra little area, copies the first argument to that area, tests against the second argument, copies the copy of the first argument, or the second argument, whichever is relevant, to the result.
Vs the ELSE, that is an extra storage area defined, an extra instruction for addressability of that, and an extra Assembler move (MVC) plus the lack of ready recognition.
Better for programmers new to COBOL, used to a multitude of functions in other languages? Not really, as they will be soundly beaten if they don't write programs that can be understood (at 2am) by the rest of the staff.
IF FUNCTION MIN(VAR1 VAR2 VAR3 VAR4 VAR5) = 17
It's another downside of FUNCTION. You see, you can do that. Then, at 2am, when the program has crashed 32 lines later, after VAR1 and VAR3 have been changed, are you going to be able to find the result of that IF in the core dump? Maybe, maybe not. Depends if any other functions, and of what type, have been used. At 2am, you don't want that. Not at all.
On the upside, it is less typing. For those who type, rather than use the editor.
In our shop, we'd use the example provided by Bill Woodger. However, we do use periods as scope-terminators. COBOL should be structured and use the KISS principle, just like any other language. This:
MOVE SECOND TO WS.
IF FIRST LESS THAN SECOND
MOVE FIRST TO WS.
MOVE WS TO RESULT.
Note that this only works if we are assured that SECOND and FIRST have numeric values, or that WS is a PIC X() string, not a numeric PIC 9().
This is by far the easiest to read. No ELSE or GO TO required.

Resources