What is the equivalent of memset in Forth? - forth

If I need to fill an array with a single value, is there a typical Forth word to do something like what memset from C does (that is, set a region of bytes to a specific value)?

In the Forth standard, the word for initializing raw memory is ERASE, which is in the Core Extensions word set:
ERASE ( addr u -- )
If u is greater than zero, clear all bits in each of u consecutive address units of memory beginning at addr.
ERASE, as its name suggests, only clears/zeroes memory.
There is also FILL, in the Core word set, for setting consecutive characters:
FILL ( c-addr u char -- )
If u is greater than zero, store char in each of u consecutive characters of memory beginning at c-addr.
However, the standard does not require characters to be one address unit in size.
If you just want to set a range of address units to a value other than zero, then that is not really a portable or standard compliant operation anyway, so just use FILL if your address unit is the same size as a character (as it is in most Forths).
The standard also provides BLANK, in the String word set, for setting a string to spaces:
BLANK ( c-addr u -- )
If u is greater than zero, store the character value for space in u consecutive character positions beginning at c-addr.

Related

Using an escaped (magic) character as boundary in a character range in Lua patterns

The Lua manual in section 6.4.1 on Lua Patterns states
A character class is used to represent a set of characters. The
following combinations are allowed in describing a character class:
x: (where x is not one of the magic characters ^$()%.[]*+-?) represents the character x itself.
.: (a dot) represents all characters.
%a: represents all letters.
%c: represents all control characters.
%d: represents all digits.
%g: represents all printable characters except space.
%l: represents all lowercase letters.
%p: represents all punctuation characters.
%s: represents all space characters.
%u: represents all uppercase letters.
%w: represents all alphanumeric characters.
%x: represents all hexadecimal digits.
%x: (where x is any non-alphanumeric character) represents the character x. This is the standard way to escape the magic characters.
Any non-alphanumeric character (including all punctuation characters,
even the non-magical) can be preceded by a % when used to represent
itself in a pattern.
[set]: represents the class which is the union of all characters in set. A range of characters can be specified by separating the end
characters of the range, in ascending order, with a -. All classes
%x described above can also be used as components in set. All other
characters in set represent themselves. For example, [%w_] (or
[_%w]) represents all alphanumeric characters plus the underscore,
[0-7] represents the octal digits, and [0-7%l%-] represents the
octal digits plus the lowercase letters plus the - character.
You can put a closing square bracket in a set by positioning it as the
first character in the set. You can put a hyphen in a set by
positioning it as the first or the last character in the set. (You can
also use an escape for both cases.)
The interaction between ranges and classes is not defined. Therefore, patterns like [%a-z] or [a-%%] have no meaning.
[^set]: represents the complement of set, where set is interpreted
as above.
For all classes represented by single letters (%a, %c, etc.), the
corresponding uppercase letter represents the complement of the class.
For instance, %S represents all non-space characters.
The definitions of letter, space, and other character groups depend on
the current locale. In particular, the class [a-z] may not be
equivalent to %l.
(Highlighting and some formatting added by me)
So, since the "interaction between ranges and classes is not defined.", how do you create a character class set that starts and/or ends with a (magic) character that needs to be escaped?
For example,
[%%-c]
does not define a character class that ranges from % to c and includes all characters in-between but a set that consists only of the three characters %, -, and c.
The interaction between ranges and classes is not defined.
Obviously, this is not a hard and fast rule (of regex character sets in general) but a Lua implementation decision. While using shorthand characters in character sets/ranges work in some (most) regex flavors, it does not in all (like in Python's re module, demo).
However, the second example is misleading:
Therefore, patterns like [%a-z] or [a-%%] have no meaning.
While the first example is fine since %a is a shorthand class (that represents all letters) in a set, [%a-z] is undefined and will return nil if matched against a string.
Escaped range characters in a [set]
In the second example, [a-%%], %% simply defines an escaped % sign and not a shorthand character class. The superficial problem is, the range is defined upsidedown, from high to low (in reference to the US ASCII value of the characters a 61 and % 37), e.g like an erroneous Lua pattern like [f-a]. If the set is defined in reverse order it seems to work: [%%-a] but all it does is matching the three individual characters instead of the range of characters between % and a; credit cyclaminist).
This could be considered a bug and, indeed, means it is not possible to create a range of characters in a [set] if one of the defining range characters need to be escaped.
Possible Solution
Start the character range from the next character that does not need to be escaped - and then add the remaining escaped characters individually, e.g.
[%%&-a]
Sample:
for w in string.gmatch("%&*()-0Aa", "[%%&-a]") do
print(w)
end
This is the answer I have found. Still, maybe somebody else has something better.

How to test if a string character is a digit?

How can I test if a certain character of a string variable is a digit in SPSS (and then apply some operations, depending on the result)?
So let's for example say, I have a variable that reflects the street number. Some street numbers have additional character at the end e.g. "12b". Now let's further assume that I extracted the last character (that could be a digit, or the additional letter) into a string variable. After that I'd like to check if this character is a digit or a letter. How can this be done?
I managed to do this with the MAX function, where "mychar" is the character variable to be checked:
COMPUTE digitcheck = (MAX(mychar,"9")="9").
If the content of "mychar" is a digit [0-9] the result of the MAX function will be "9" otherwise the MAX function will return the letter and the equality test fails.
In this way you can also check if a whole string variable contains a letter or not. It looks pretty ugly though, because you have to compare every single character of your string variable.
compute justdigits = (MAX((CHAR.SUBSTR(mystr,1,1), CHAR.SUBSTR(mystr,2,1), CHAR.SUBSTR(mystr,3,1), ..., CHAR.SUBSTR(mystr,n,1),"9")="9").
If you try to turn a letter into a number then it becomes a missing value. Therefore, to test whether a character is a digit, you can do this:
if not missing(number(YourCharacter,f1)) .....
The same test can determine whether a string has only a number in it or not:
compute OnlyNumber=(not missing(number(YourString,f10))).
Note: using the number command on strings will produce warning messages which you can of course ignore.

Under what conditions can [NSEvent characters] be a NSString of length greater than 1?

NSEvent has a characters property which is a NSString valid for key up/down events. Under what conditions can the string length be greater than 1?
The only condition I have been able to find till now is when the NSEvent corresponds to input from an IME (Input Method Editor).
Edit - I knew about the surrogate pair case, but it somehow slipped out of my mind while asking this. I am more interested in the case when the no. of graphemes(characters) is greater than 1 itself.
Under what conditions can the string length be greater than 1?
When you have a keyboard/input method which can input any single character which requires a surrogate pair in UTF-16, e.g. a 𐀀 (Unicode Linear B Syllable B008 A), then the length will be 2. This is because length returns the number of 16-bit code units, not the number of characters.
You can also get this with programmatically-posted events. CGEventKeyboardSetUnicodeString() allows the caller to attach any arbitrary string to the key event.
High unicode codepoints are coded with a character sequence in Mac OS X. Try 𫝑.

ruby/rails detect financial track data and return nil/empty string

I read through similar stackoverflow questions to understand financial track card data.
I think the issue I am facing might be slightly different or maybe I am really weak in regex.
Now we have a service that returns track data accidentally instead of the guest name.
My goal is every time I receive track data I display "" empty string, else return the guest name.( This is a temp solution until we fix the root cause)
This is what my regular expressions is but looks like it doesn't detect track data.
irb(main):043:0> guestname="%4234242xx12^TEST/GUEST L ^324532635645744646462"
irb(main):044:0> (/[(%[bB])(;)]\d{3,}.{9,}[(^.+^)(=)].+\?.{,2}/.match(guestname)) ? "" : guestname
=> "%4234242xx12^TEST/GUEST L ^324532635645744646462"
(Not real data)
Now, looking at the wiki for track data information I want to cover most cases, if not all:
https://en.wikipedia.org/wiki/Magnetic_stripe_card#Financial_cards
Could some help with my regex. This is what I have:
/[(%[bB])(;)]\d{3,}.{9,}[(^.+^)(=)].+\?.{,2}/
Track 1, Format B:
Start sentinel — one character (generally '%')
Format code="B" — one character (alpha only)
Primary account number (PAN) — up to 19 characters. Usually, but not
always, matches the credit card number printed on the front of the
card.
Field Separator — one character (generally '^')
Name — 2 to 26 characters
Field Separator — one character (generally '^')
Expiration date — four characters in the form YYMM.
Service code — three characters
Discretionary data — may include Pin Verification Key Indicator (PVKI,
1 character), PIN Verification Value (PVV, 4 characters), Card
Verification Value or Card Verification Code (CVV or CVC, 3
characters)
End sentinel — one character (generally '?')
Longitudinal redundancy check (LRC) — it is one character and a
validity character calculated from other data on the track.
Track 2: This format was developed by the banking industry (ABA). This
track is written with a 5-bit scheme (4 data bits + 1 parity), which
allows for sixteen possible characters, which are the numbers 0-9,
plus the six characters : ; < = > ? . The selection of six
punctuation symbols may seem odd, but in fact the sixteen codes simply
map to the ASCII range 0x30 through 0x3f, which defines ten digit
characters plus those six symbols. The data format is as follows:
Start sentinel — one character (generally ';')
Primary account number (PAN) — up to 19 characters. Usually, but not
always, matches the credit card number printed on the front of the
card.
Separator — one char (generally '=')
Expiration date — four characters in the form YYMM.
Service code — three digits. The first digit specifies the interchange
rules, the second specifies authorisation processing and the third
specifies the range of services
Discretionary data — as in track one
End sentinel — one character (generally '?')
Longitudinal redundancy check (LRC) — it is one character and a
validity character calculated from other data on the track. Most
reader devices do not return this value when the card is swiped to the
presentation layer, and use it only to verify the input internally to
the reader.
Your example input string does not contain format code after first sentinel.
You are trying to parse html-encoded version, which is weird.
So, I would start with html decoding. E.g. with Nokogiri:
▶ guestname="%4234242xx12^TEST/GUEST L ^324532635645744646462"
#⇒ "%4234242xx12^TEST/GUEST L ^324532635645744646462"
▶ parsed = Nokogiri::HTML.parse(guestname).text
#⇒ "%4234242xx12^TEST/GUEST L ^324532635645744646462"
OK, now we at least have a leading percent. Now let us ask ourselves: how many users have a guest name starting with a percent sign? I bet none. You might re-check yourself by running a query against your database. Since it is a temporary solution, I would definitely shut the perfectionism up and go with:
▶ parsed =~ /\A%/ ? '' : parsed
Hope it helps.

In Cobol, to test "null or empty" we use "NOT = SPACE [ AND/OR ] LOW-VALUE" ? Which is it?

I am now working in mainframe,
in some modules, to test
Not null or Empty
we see :
NOT = SPACE OR LOW-VALUE
The chief says that we should do :
NOT = SPACE AND LOW-VALUE
Which one is it ?
Thanks!
Chief is correct.
COBOL is supposed to read something like natural language (this turns out to be just
another bad joke).
Lets play with the following variables and values:
A = 1
B = 2
C = 3
An expression such as:
IF A NOT EQUAL B THEN...
Is fairly straight forward to understand. One is not equal to two so we will do
whatever follows the THEN. However,
IF A NOT EQUAL B AND A NOT EQUAL C THEN...
Is a whole lot harder to follow. Again one is not equal to two AND one is not
equal to three so we will do whatever follows the 'THEN'.
COBOL has a short hand construct that IMHO should never be used. It confuses just about
everyone (including me from time to time). Short hand expressions let you reduce the above to:
IF A NOT EQUAL B AND C THEN...
or if you would
like to apply De Morgans rule:
IF NOT (A EQUAL B OR C) THEN...
My advice to you is avoid NOT in exprssions and NEVER use COBOL short hand expressions.
What you really want is:
IF X = SPACE OR X = LOW-VALUE THEN...
CONTINUE
ELSE
do whatever...
END-IF
The above does nothing when the 'X' contains either spaces or low-values (nulls). It
is exactly the same as:
IF NOT (X = SPACE OR X = LOW-VALUE) THEN
do whatever...
END-IF
Which can be transformed into:
IF X NOT = SPACE AND X NOT = LOW-VALUE THEN...
And finally...
IF X NOT = SPACE AND LOW-VALUE THEN...
My advice is to stick to simple to understand longer and straight forward expressions
in COBOL, forget the short hand crap.
In COBOL, there is no such thing as a Java null AND it is never "empty".
For example, take a field
05 FIELD-1 PIC X(5).
The field will always contain something.
MOVE LOW-VALUES TO FIELD-1.
now it contains hexadimal zeros. x'0000000000'
MOVE HIGH-VALUES TO FIELD-1.
Now it contains all binary ones: x'FFFFFFFFFF'
MOVE SPACES TO FIELD-1.
Now each byte is a space. x'4040404040'
Once you declare a field, it points to a certain area in memory. That memory area must be set to something, even if you never modify it, it still will have what ever garbage it had before the program was loaded. Unless you initialize it.
05 FIELD-1 PIC X(6) VALUE 'BARUCH'.
It is worth noting that the value null is not always the same as low-value and this depends on the device architecture and its character set in use as determined by the manufacturer. Mainframes can have an entirely different collating sequence (low to high character code and symbol order) and symbol set compared to a device using linux or windows as you have no doubt seen by now. The shorthand used in Cobol for comparisons is sometimes used for boolean operations, like IF A GOTO PAR-5 and IF A OR C THEN .... and can be combined with comparisons of two variables or a variable and a literal value. The parser and compiler on different devices should deal with these situations in a standard (ANSI) method but this is not always the situation.
I agree with NealB. Keep it simple, avoid "short cuts", make it easy to understand without having to refer to the manual to check things out.
IF ( X EQUAL TO SPACE )
OR ( X EQUAL TO LOW-VALUES )
CONTINUE
ELSE
do whatever...
END-IF
However, why not put an 88 on X, and keep it really simple?:
88 X-HAS-A-VALUE-INDICATING-NULL-OR-EMPTY VALUE SPACE, LOW-VALUES.
IF X-HAS-A-VALUE-INDICATING-NULL-OR-EMPTY
CONTINUE
ELSE
do whatever...
END-IF
Note, in Mainframe Cobol, NULL is very restricted in meaning, and is not the meaning that you are attributing to it, Tom. "Empty" only means something in a particular coder-generated context (it means nothing to Cobol as far as a field is concerned).
We don't have "strings". Therefore, we don't have "null strings" (a string of length one including string-terminator). We don't have strings, so a field always has a value, so it can never be "empty" other than as termed by the programmer.
Oguz, I think your post illustrates how complex something that is really simple can be made, and how that can lead to errors. Can you test your conditions, please?

Resources