How to edit an IBAN number in four character groups? - delphi

https://en.wikipedia.org/wiki/International_Bank_Account_Number#Practicalities
The IBAN should not contain spaces when transmitted electronically:
when printed it is expressed in groups of four characters separated by
a single space, the last group being of variable length as shown in
the example below:
A typical IBAN looks like this: GR16 0110 1250 0000 0001 2300 695 (taken from the above link).
I want to make it easier for users to enter IBAN numbers. Currently I use a TDBEdit to display the IBAN number and it is stored as characters (without the spaces) in the database.
I know that it is possible to format numbers using TNumericField.DisplayFormat, also there is TMaskEdit, but both aren't terribly useful for this purpose as IBAN is not a number and has different lenghts in different countries.
How to edit an IBAN number in four character groups in a DB control?
PS: I'm not asking for the actual IBAN validation as I already have that figured out.

You can use the EditMask property of the IBAN field as this will also work for string fields. A suitable EditMask for IBAN may look like this:
">LL00 aaaa aaaa aaaa aaaa aaaa aaaa aaaa aa;0; "
The first character makes the edit field convert all characters to upper case. The next four entries require two alpha characters followed by two numeric ones. The blanks represent the requested gaps. The lower "a" allows an alphanumeric character but doesn't require it.
The "0" in the second part of the mask will strip any literals (the gap blanks) from the entry stored in the field.
The blank in the third part of the mask makes the gap blanks being displayed as blank.

Related

How do I get rid of trailing and embedded spaces in a string?

I am writing a program that converts national and international account numbers into IBAN numbers. To start, I need to form a string: Bank ID + Branch ID + Account Number + ISO Country Code without the trailing spaces that may be present in these fields. But not every account number has the same length, some account numbers have branch identifiers while others don't, so I will always end up with trailing spaces from these fields.
My working storage looks something like this:
01 Input-IBAN.
05 BANK-ID PIC N(10) VALUE "LOYD".
05 BRANCH-ID PIC N(10) VALUE " ".
05 ACCOUNT-NR PIC N(28) VALUE "012345678912 ".
05 COUNTRY-CODE PIC N(02) VALUE "GB".
01 Output-IBAN PIC N(34).
I've put some values in there for the example; in reality it would depend on the input. The branch code is optional, hence me leaving it empty in the example.
I basically want to go from this input strung together:
"LOYD 012345678912 GB"
to this:
"LOYD012345678912GB"
Does anyone know a way to do this that does not result in performance issues? I have thought of using the FUNCTION REVERSE and then using an INSPECT for tallying leading spaces. But I've heard that's a slow way to do it. Does anyone have any ideas? And maybe an example on how to use said idea?
EDIT:
I've been informed that the elementary fields may contain embedded spaces.
I see now that you have embedded blanks in the data. Neither answer you have so far works, then. Gilbert's "squeezes out" the embedded blanks, mine would lose any data after the first blank in each field.
However, just to point out, I don't really believe you can have embedded blanks if you are in any way generating an "IBAN". For instance, https://en.wikipedia.org/wiki/International_Bank_Account_Number#Structure,
specifically:
The IBAN should not contain spaces when transmitted electronically.
When printed it is expressed in groups of four characters separated by
a single space, the last group being of variable length
If your source-data has embedded blanks, at the field level, then you need to refer that back up the line for a decision on what to do. Presuming that you receive the correct answer (no embedded blanks at the field level) then both existing answers are back on the table. You amend Gilbert's by (logically) changing LENGTH OF to FUNCTION LENGTH and dealing with any possibility of overflowing the output.
With the STRING you again have to deal with the possibility of overflowing the output.
Original answer based on the assumption of no embedded blanks.
I'll assume you don't have embedded blanks in the elementary items which make up your structure, as they are sourced by standard values which do not contain embedded blanks.
MOVE SPACE TO OUTPUT-IBAN
STRING BANK-ID
BRANCH-ID
ACCOUNT-NR
COUNTRY-CODE
DELIMITED BY SPACE
INTO OUTPUT-IBAN
STRING only copies the values until it runs out of data to copy, so it is necessary to clear the OUTPUT-IBAN before the STRING.
Copying of the data from each source field will end when the first SPACE is encountered in each source field. If a field is entirely space, no data will be copied from it.
STRING will almost certainly cause a run-time routine to be executed and there will be some overhead for that. Gilbert LeBlanc's example may be slightly faster, but with STRING the compiler deals automatically with all the lengths of all the fields. Because you have National fields, ensure you use the figurative-constant SPACE (or SPACES, they are identical) not a literal value which you think contains a space " ". It does, but it doesn't contain a National space.
If the result of the STRING is greater than 34 characters, the excess characters will be quietly truncated. If you want to deal with that, STRING has an ON OVERFLOW phrase, where you specify what you want done in that case. If using ON OVERFLOW, or indeed NOT ON OVERFLOW you should use the END-STRING scope-terminator. A full-stop/period will terminate the STRING statement as well, but when used like that it can never, with ON/NOT ON, be used within a conditional statement of any type.
Don't use full-stops/periods to terminate scopes.
COBOL doesn't have "strings". You cannot get rid of trailing spaces in fixed-length fields, unless the data fills the field. Your output IBAN will always contain trailing spaces when the data is short.
If you were to actually have embedded blanks at the field level:
Firstly, if you want to "squeeze out" embedded blanks so that they don't appear in the output, I can't think of a simpler way (using COBOL) than Gilbert's.
Otherwise, if you want to preserve embedded blanks, you have no reasonable choice other than to count the trailing blanks so that you can calculate the length of the actual data in each field.
COBOL implementations do have Language Extensions. It is unclear which COBOL compiler you are using. If it happens to be AcuCOBOL (now from Micro Focus) then INSPECT supports TRAILING, and you can count trailing blanks that way. GnuCOBOL also supports TRAILING on INSPECT and in addition has a useful intrinsic FUNCTION, TRIM, which you could use to do exactly what you want (trimming trailing blanks) in a STRING statement.
move space to your-output-field
string function
trim
( your-first-national-source
trailing )
function
trim
( your-second-national-source
trailing )
function
trim
( your-third-national-source
trailing )
...
delimited by size
into your-output-field
Note that other than the PIC N in your definitions, the code is the same as if using alphanumeric fields.
However, for Standard COBOL 85 code...
You mentioned using FUNCTION REVERSE followed by INSPECT. INSPECT can count leading spaces, but not, by Standard, trailing spaces. So you can reverse the bytes in a field, and then count the leading spaces.
You have National data (PIC N). A difference with that is that it is not bytes you need to count, but characters, which are made up of two bytes. Since the compiler knows you are using PIC N fields, there is only one thing to trip you - the Special Register, LENGTH OF, counts bytes, you need FUNCTION LENGTH to count characters.
National data is UTF-16. Which happens to mean the two bytes for each character happen to be "ASCII", when one of the bytes happens to represent a displayable character. That doesn't matter either, running on z/OS, an EBCDIC machine, as the compiler will do necessary conversions automatically for literals or alpha-numeric data-items.
MOVE ZERO TO a-count-for-each-field
INSPECT FUNCTION
REVERSE
( each-source-field )
TALLYING a-count-for-each-field
FOR LEADING SPACE
After doing one of those for each field, you could use reference-modification.
How to use reference-modification for this?
Firstly, you have to be careful. Secondly you don't.
Secondly first:
MOVE SPACE TO output-field
STRING field-1 ( 1 : length-1 )
field-2 ( 1 : length-2 )
DELIMITED BY SIZE
INTO output-field
Again deal with overflow if possible/necessary.
It is also possible with plain MOVEs and reference-modification, as in this answer, https://stackoverflow.com/a/31941665/1927206, whose question is close to a duplicate of your question.
Why do you have to be careful? Again, from the answer linked previously, theoretically a reference-modification can't have a zero length.
In practice, it will probably work. COBOL programmers generally seem to be so keen on reference-modification that they don't bother to read about it fully, so don't worry about a zero-length not being Standard, and don't notice that it is non-Standard, because it "works". For now. Until the compiler changes.
If you are using Enterprise COBOL V5.2 or above (possibly V5.1 as well, I just haven't checked) then you can be sure, by compiler option, if you want, that a zero-length reference-modification works as expected.
Some other ways to achieve your task, if embedded blanks can exist and can be significant in the output, are covered in that answer. With National, just always watch to use FUNCTION LENGTH (which counts characters), not LENGTH OF (which counts bytes). Usually LENGTH OF and FUNCTION LENGTH give the same answer. For multi-byte characters, they do not.
I have no way to verify this COBOL. Let me know if this works.
77 SUB1 PIC S9(4) COMP.
77 SUB2 PIC S9(4) COMP.
MOVE 1 TO SUB2
PERFORM VARYING SUB1 FROM 1 BY 1
UNTIL SUB1 > LENGTH OF INPUT-IBAN
IF INPUT-IBAN(SUB1:1) IS NOT EQUAL TO SPACE
MOVE INPUT-IBAN(SUB1:1) TO OUTPUT-IBAN(SUB2:1)
ADD +1 TO SUB2
END-IF
END-PERFORM.

I want to my edittext first 6 characters should allow only alphabets and next characters should be only numbers

I want to my edittext first 6 characters should allow only alphabets and next characters should be only numbers and whole edittext should not allow special characters
When you can get your hands on the text the user typed you can try putting it through a regular expression. Java has a class, java.util.regex.Pattern, that can tell you whether a string fits a pattern. (the match method)
Your pattern would be something like:
[\p{Alpha}][\p{Alpha}][\p{Alpha}][\p{Alpha}][\p{Alpha}][\p{Alpha}][\d]+
This regular expression says "six alphabet characters followed by one or more digits."
That's the best I can offer. Your question is a bit vague. Does the string the user enters have to have six letters? Or is it one to six letters? If it's the latter then my expression above is insufficient. And what about the digits? Is there a minimum number of digits required?

How many alphanumeric charecters can be embedded in a QR Code?

As per the Wiki below
QRC Storage Data
the maximum number of alphanumeric charecters that can be stored is 4,296. But while trying it out, I'm unable to proceed for more than approx 2220 charecters at an error correction level of L.
Is alphanumeric charecters not the same numeric charecters. Do "123", "XYZ" and "AB%" not all contain "Three" charecters?
The max is 4296 if you are in alphanumeric mode. To do that you can only use the characters listed on the link you sent. Your examples are certainly all 3 characters, and fit the alphanumeric character set.
It is probably a problem with the encoder, or some other intermediate limitation, like the length of a URL you are sending to the encoder.

How many chars can numeric EDIFACT data elements be long?

In EDIFACT there are numeric data elements, specified e.g. as format n..5 -- we want to store those fields in a database table (with alphanumeric fields, so we can check them). How long must the db-fields be, so we can for sure store every possible valid value? I know it's at least two additional chars (for decimal point (or comma or whatever) and possibly a leading minus sign).
We are building our tables after the UN/EDIFACT standard we use in our message, not the specific guide involved, so we want to be able to store everything matching that standard. But documentation on the numeric data elements isn't really straightforward (or at least I could not find that part).
Thanks for any help
I finally found the information on the UNECE web site in the documentation on UN/EDIFACT rules Part 4. UN/EDIFACT rules Chapter 2.2 Syntax Rules . They don't say it directly, but when you put all the parts together, you get it. See TOC-entry 10: REPRESENTATION OF NUMERIC DATA ELEMENT VALUES.
Here's what it basically says:
10.1: Decimal Mark
Decimal mark must be transmitted (if needed) as specified in UNA (comma or point, put always one character). It shall not be counted as a character of the value when computing the maximum field length of a data element.
10.2: Triad Seperator
Triad separators shall not be used in interchange.
10.3: Sign
[...] If a value is to be indicated to be negative, it shall in transmission be immediately preceded by a minus sign e.g. -112. The minus sign shall not be counted as a character of the value when computing the maximum field length of a data element. However, allowance has to be made for the character in transmission and reception.
To put it together:
Other than the digits themselves there are only two (optional) chars allowed in a numeric field: the decimal seperator and a minus sign (no blanks are permitted in between any of the characters). These two extra chars are not counted against the maximum length of the value in the field.
So the maximum number of characters in a numeric field is the maximal length of the numeric field plus 2. If you want your database to be able to store every syntactically correct value transmitted in a field specified as n..17, your column would have to be 19 chars long (something like varchar(19)). Every EDIFACT-message that has a value longer than 19 chars in a field specified as n..17 does not need to be stored in the DB for semantic checking, because it is already syntactically wrong and can be rejected.
I used EDI Notepad from Liaison to solve a similar challenge. https://liaison.com/products/integrate/edi/edi-notepad
I recommend anyone looking at EDI to at least get their free (express) version of EDI Notepad.
The "high end" version (EDI Notepad Productivity Suite) of their product comes with a "Dictionary Viewer" tool that you can export the min / max lengths of the elements, as well as type. You can export the document to HTML from the Viewer tool. It would also handle ANSI X12 too.

When to use the terms "delimiter," "terminator," and "separator"

What are the semantics behind usage of the words "delimiter," "terminator," and "separator"? For example, I believe that a terminator would occur after each token and a separator between each token. Is a delimiter the same as either of these, or are they simply forms of a delimiter?
SO has all three as tags, yet they are not synonyms of each other. Is this because they are all truly different?
A delimiter denotes the limits of something, where it starts and where it ends. For example:
"this is a string"
has two delimiters, both of which happen to be the double-quote character. The delimiters indicate what's part of the thing, and what is not.
A separator distinguishes two things in a sequence:
one, two
1\t2
code(); // comment
The role of a separator is to demarcate two distinct entities so that they can be distinguished. (Note that I say "two" because in computer science we're generally talking about processing a linear sequence of characters).
A terminator indicates the end of a sequence. In a CSV, you could think of the newline as terminating the record on one line, or as separating one record from the next.
Token boundaries are often denoted by a change in syntax classes:
foo()
would likely be tokenised as word(foo), lparen, rparen - there aren't any explicit delimiters between the tokens, but a tokenizer would recognise the change in grammar classes between alpha and punctuation characters.
The categories aren't completely distinct. For example:
[red, green, blue]
could (depending on your syntax) be a list of three items; the brackets delimit the list and the right-bracket terminates the list and marks the end of the blue token.
As for SO's use of those terms as tags, they're just that: tags to indicate the topic of a question. There isn't a single unified controlled vocabulary for tags; anyone with enough karma can add a new tag. Enough differences in terminology exist that you could never have a single controlled tag vocabulary across all of the topics that SO covers.
Technically a delimiter goes between things, perhaps in order to tell you where one field ends and another begins, such as in a comma-separated-value (CSV) file.
A terminator goes at the end of something, terminating the line/input/whatever.
A separator can be a delimiter or anything else that separates things. Consider the spaces between words in the English language for example.
You could argue that a newline character is a line terminator, a delimiter of lines or something that separates two lines. For this reason there are a few different newline-type characters in the Unicode specification.
A delimiter is one or two markers that show the start and end of something. They're needed because we don't know how long that 'something' will be. We can have either: 1. a single delimiter, or 2. a pair of pair-delimiters
[a, b, c, d, e] each comma (,) is a single delimiter. The left and right brackets, ([, ]) are pair-delimiters.
"hello", the two quote symbols (") are pair-delimiters
A seperator is a synonym of a "delimiter", but from my experience it usually refers to field delimiters. A field delimiter acts as a divider between one field and the one following it, which is why is can be though of as "separating" them.
<file1>␜<file2>␜<file3>, the file separator character (␜), despite explicitly the name having "separator", is both a delimiter and a separator
A terminator marks the end of a group of things, again needed because we don't know how long it is.
abdefa\0, here the null character \0 is a terminator that tells us the string has ended.
foo\n, here the newline character \n is a terminator that tells us the line has ended.
The terms, delimiter, separator originate from the classical idea of storage, conceptually, being comprised of files, records, and fields, (a file has many records, a record has many fields). In this context, a single delimiter and pair-delimiters might be called record delimiters and field delimiters. Because of the historical significance of files-records-field taxonomy, this terms have a more widespread usage (see Wikipedia page for Delimiter).
Below are two files, each with three records with each record having four fields:
martin,rodgers,33,28000\n
timothy,byrd,22,25000\n
marion,summers,35,37000\n
===
lucille,rowe,28,33000\n
whitney,turner,24,19000\n
fernando,simpson,35,40900\n
Here, , and \n as we know are single delimiters, but they might also be called a record delimiters and field delimiters respectively.
For complex nested structures, a terminator can also be a delimiter/separator (they're not mutually exclusive definitions). From the previous example, the === marker from inside a file could be considered a terminator (it's the end of the file). But when we look at many files, the === acts like a delimiter/separator.
Consider lines in a UNIX file
This is line 1\n
This is line 2\n
This is line 3\n
The newlines are both terminators (they tell us where the string ends) and are delimiters (they tell us where each line begins and ends). From Wikipedia:
Two ways to view newlines, both of which are self-consistent, are that newlines either separate lines or that they terminate lines.
Really you'll only need to say "terminator" when you're talking at one individual item, (just one string 1234\0, just one line abcd\n, etc.) -- and it'll be unclear whether the terminator in this context could also be a delimiter in a more complex parent structure.
This response is in context of CSV because all of the provided answers focus on English language instead.
Delimiters are all elements mentioned in the given CSV specification that describe the boundaries of stuff, separator is a common name for field delimiters, terminator is a common name for record delimiters.
Delimiter is a part of CSV format specification, it defines boundaries and doesn't have to be a printable character.
Terminators, separators and field qualifiers are delimiters but are not necessary to specify a CSV format, e.g. 10 columns field delimiter and 30 columns record delimiter mean each 30 columns are one record and each 10 columns are one field (usually padded with white space). In other words CSV format without separators has a constant field and record length, e.g.:
will smith 1 chris rock 0
Terminator is a delimiter that marks the end of a single CSV record and is usually represented either by Line Feed (LF), a Carriage Return (CR) or a combination of both (e.g. CRLF), e.g.:
will smith 1
chris rock 0
Separator is a delimiter that marks the division between CSV fields and is most often represented by a comma (or a semicolon), it has been introduced to store dynamic length values, e.g. two comma separated records in CSV format with CRLF terminator after 1 and 0:
will,smith,1
chris,rock,0
Field qualifier is a delimiter usually used in pairs instead of escape sequence. It is a printable character that isn't allowed in the field value (unless given CSV format specification provides the escape sequence) and marks the beginning and the end of a field, it was introduced to store values containing separators, e.g. this CSV has 2 records with 3 fields each but 3rd field value can contain a semicolon that otherwise acts as a fields separator:
will;smith;"rich;famous;slaps people"
chris;rock;"rich;famous;gets slapped"
Escape sequence is a character (or a set of characters) that marks anything that follows the escape sequence as non-significant and therefore as a part of the field value (e.g. backslash might specify the immediately following separator as a part of the value). This sequence can escape one or multiple characters, e.g. CSV with \ as a 1 character escape sequence:
will;smith;rich\;famous\;slaps people 100\\100% of time
chris;rock;rich\;famous\;slaps people 0\\100% of time
Delimiter
There are a couple of senses for delimiter:
As the space used in sentences (frontier).
A delimiter is like a frontier, it exists between countries.
In that sense, there must be two countries to have a frontier.
An space usually exists between words, but not at the end. The space delimits words but does not terminate sentences (collection of words). The sentence:
This is a short sentence.
Has four spaces, they act as word delimiters. There is no ending space.
In fact, there are two additional delimiters usually not named: The start and end of the sentence. Like the ^ and $ used in regular expressions to mark the start and end of an string of text.
And, in human language, there are punctuation marks (dot, comma, semicolon, colon, etc.) that serve also as word delimiters (additionally to spaces)
As used in quotes (boundary).
A sentence like:
“This is a short sentence.”
Is delimited (start and end) by the double quotes (“”). In this sense it is like "balanced delimiters" (Balanced Brackets in Wikipedia).
Some may argue the frontier and boundary are essentially the same, and, under some conditions they actually are correct.
Separator
Is exactly the same as the first sense (above) of a delimiter (a frontier).
So, a separator is a synonym of delimiter in many computer uses.
Terminator
Demarcate the end of an individual "field".
Like the newlines in a Unix text file. Each line is terminated by a NewLine (\n).
In a proper Unix text file all lines are terminated (even the last one).
Like paragraphs are terminated by a newline in human language.
Or, more strictly, as the NUL (\0) is the terminator of a C string:
A string is defined as a contiguous sequence of code units terminated by the first zero code unit (often called the NUL code unit).
So, a terminator character is also a delimiter but must also appear at the end.
Tags
Stackoverflow has tags only for delimiters and separators
delimiterA delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams.
separatorA character that separates parts of a string.
The terminator tag only apply to a shell terminal emulator:
terminatorTerminator is a GPL terminal emulator.
And, yes, delimiter and separator are many times equivalent
except for the parenthesis, braces, square brackets and similar balanced delimiters.
Interesting question and answers. To summarize, 1) delimiter marks the "limits" of something, i.e. beginning and/or end; 2) terminator is just a special term for "end delimiter"; 3) separator entails there are items on both sides of it (unlike delimiter).
Best example I can think of for a start delimiter is the start-comment markers in programming languages ("#", "//", etc.).
Best example I can think of for a terminator (end delimiter) is the newline character in Unix. It's a misnomer -- it always terminates a (possibly empty) line but doesn't always start a new line, i.e. when it is the last character in a file. Maybe a better common example is the simple period for sentences.
Best example I can think of for a separator is the simple comma. Note that comma never appears in English without text both before and after it.
Interesting to note that none of these is necessarily limited to single-character. In fact awk (or maybe only gawk?) in Unix allows FS (field separator) to be any regexp.
Also, although "any non-zero amount of whitespace" is considered a "word delimiter" in e.g. the wc command, there are also zero-width "word boundary" specifiers in regexps (e.g. \b). Interesting to ponder whether such zero-width items/boundaries could be considered "delimiters" as well. I tend to think not (too much of a stretch).
Terminators are separators when you start with empty. A;B;C; is actually A;B;C;empty.
Just like the English language, there is the technically correct answer, and the generally used answer, and it is probably relevant to isolate to the programming usage of the term definitions being sought.
The industry has long used the phrase 'Comma Delimited' file to mean:
FirstRowFirstValue,FirstRowSecondValue,FirstRowThirdValue
SecondRowFirstValue,SecondRowSecondValue,SecondRowThirdValue
TECHNICALLY, this is a Comma 'SEPARATED' list.
TECHNICALLY, THIS is a Comma 'DELIMITED' list.
,FirstRowFirstValue,FirstRowSecondValue,FirstRowThirdValue,
,SecondRowFirstValue,SecondRowSecondValue,SecondRowThirdValue,
or this:
,FirstRowFirstValue,,FirstRowSecondValue,,FirstRowThirdValue,
,SecondRowFirstValue,,SecondRowSecondValue,,SecondRowThirdValue,
and nobody does that. Ever.
And the industry standard is to use 'TEXT QUALIFIER' for the TECHNICAL definition of a 'DELIMITER' where (") is the 'TEXT QUALIFIER' and (,) is called the 'DELIMITER'.
FirstRowFirstValue,"First Row Second Value",FirstRowThirdValue
SecondRowFirstValue,SecondRowSecondValue,SecondRowThirdValue
Adding to the answer here already, I've use the term notator.
Annotation is a super set of notation.
A notator is the super set of delimiter.
A delimiter is the super set of terminator and separator.
Annotation is all notation and markup used in a particular document. For example, a "TODO List" document must be a line separated list of strings.
Notation is markup used to denote specific meaning. For example, "string are in quotes" is a notation.
A delimiter is the character or set of characters used to denote a notation. For example, the character quote is the delimiter for strings.
A terminator is ending delimiter and prefix is the starting delimiter. For the "TODO List" document, quote may be used as the prefix and terminating delimiter.
A seperator is a delimiter that separates two things. For example, "new line" is the separator for each "TODO List" item. In this example, "new line" is also a terminator; a new line may be used to terminate each line. A separator also being a terminator is typical, but not guaranteed to always be the case.
Delimiters can also be "positional". A positionally delimited example is a column delimited mainframe flat file.
"word 1", "word 2" \NULL
The words are delimited by quotes,
separated by the comma,
and the whole thing is terminated by \NULL.

Resources