Convert WideString to String - delphi

How can I replace the empty widestring characters from a string?
e.g:
'H e l l o' convert to 'Hello'
I read an text blob field from a DB, and want to store it in another table. When I read it, I get the extra spaces, when I store it the extra spaces remains, and cannot be read correctly at the next query.
DXE, Firebird 2.5
UPDATE:
I use the IBQuery -> DataSetProvider -> ClientDataSet without desin-time created fields.
It seams, that the IBQuery retrieves the data in that wrong format.
Current Write Code:
blobStream := TStringStream.Create;
...
blobStream.WriteString(blobText); //blobText = 'H e l l o';
ibsql.ParamByName('ABLOBCOL').LoadFromStream(blobStream);
...
ibsql.ExecQuery;
...
In the FB database is 'H e l l o' stored, but it must be 'Hello'. Since it seems to be a bug in IBQuery, I need a way to convert that string.

First of all, I'm not going to attempt to describe how to remove every other character from a string. Whilst that might appear to solve your problem it merely papers thinly over the gaping cracks. What you have here is a classic text encoding mismatch. The real solution to your problem will involve fixing the mismatch.
I suspect the problem arises in code that you have not shown. As I understand your question now, you have a string variable blobText that contains incorrectly encoded text. But the code in the question takes blobText as input, and so the damage has already been done by the time we reach the code in the question. The key to solving this is the code that puts duff data into blobText.
You need to find the code which assigns to blobText and sort out the encoding problem there. It looks like you have taken UTF-16 encoded text and interpreted it as if it had an 8 bit encoding. I could speculate as to how that would happen, but it would be better for you to look at the actual code, the code that assigns to blobText. If you cannot work it out, please do post an update to the question.
I'm pretty confident that there are no bugs in the database libraries and that this is just an encoding mismatch.

Related

Number notation "SK"

I use an ODBC table handler to read data from Excel and CSV files into an AMPL model. But the thing I encountered probably doesn't have much to do with the precise programs and programming language I use.
Among the data are two specific types of strings: three-digit alphabetic and six-digit alphanumeric.
When the three-digit alphabetic type includes a NAN string, AMPL throws an error. As I found out, the reason is that it understands NAN as "NaN" (not a number). It cannot use this as an index.
The six-digit alphanumeric type sometimes include strings like 3E1234. This seems to be a problem because AMPL (or the handler) understands this as a number in scientific notation. So it reads 3*10^1234, which is handled as infinity. So when there is one 3E1234 entry and one 3E1235 entry, it sees them both as infinity.
I understand these two. And although they are annoying, I can work with that. Now I encountered that a string SK1234 is parsed as the number 1234. I have learned a bit of programming in college, but I don't have any idea why this happens. Is the prefix SK anything special?
EDIT: Here is an example that reproduces the error:
The model file:
set INDEX;
param value;
The "run" file:
table Table1 IN "tableproxy" "odbc" "DSN=NDE" "Test.csv": INDEX <- [Index], value ~ Value;
read table Table1;
NDE is a user DSN that uses the Microsoft Text Driver in the appropriate folder.
And the CSV file:
Index,Value
SK1202,1
SK1445,2
SK0124,3
SK7896,4
SK1,5
AB1234,6
After running all this code, I type display INDEX and get
set INDEX := 1202 1445 124 7896 1 Missing;
So the field Index is treated as a numeric field with the first five entries converted to a number. The last entry cannot be converted so it is treated as Missing.
The DSN's setting is that it sets the type according to the first 25 lines. For some reason, it understands the SK... entries as numbers and therefore reads all as numbers.
For the Text ODBC driver to detect column type correctly, values should be quoted:
Index,Value
'SK1202',1
'SK1445',2
'SK0124',3
'SK7896',4
'SK1',5
'AB1234',6

What has happened to 'tick' in ANS Forth?

As I remembered 'tick' from FIG-Forth, it could be used without abortion when a word wasn't in the wordlist:
' the_word
gave a reference to the word if it was in the word-list and gave 'false' otherwise.
Is it possible to construct something like that in ANS Forth to be used with [if], [then] and [else]?
I guess something like this:
: tick ( a u -- xt|f ) bl word find 0= if drop 0 then ;
The FIG-Forth document says:
Leaves the parameter field address of dictionary word nnnn. As a
compiler directive, executes in a colon-definition to compile the
address as a literal. If the word is not found after a search of
CONTEXT and CURRENT, an appropriate error message is given.
Although it is entirely possible the version of FIG-Forth you where using did not abide by the standard, and returned false.

URL Escape in Uppercase

I have a requirement to escape a string with url information but also some special characters such as '<'.
Using cl_http_utility=>escape_url this translates to '%3c'. However due to our backend webserver, it is unable to recognize this as special character and takes the value literally. What it does recognize as special character is '%3C' (C is upper case). Also if one checks http://www.w3schools.com/tags/ref_urlencode.asp it shows the value with all caps as the proper encoding.
I guess my question is is there an alternative to cl_http_utility=>escape_url that does essentially the same thing except outputs the value in upper case?
Thanks.
Use the string function.
l_escaped = escape( val = l_unescaped
format = cl_abap_format=>e_url ).
Other possible formats are e_url_full, e_uri, e_uri_full, and a bunch of xml/json stuff too. The string function escape is documented pretty well, demo programs and all.

How can we eliminate junk value in field?

I have some csv record which are variable in length , for example:
0005464560,45667759,ZAMTR,!To ACC 12345678,DR,79.85
0006786565,34567899,ZAMTR,!To ACC 26575443,DR,1000
I need to seperate each of these fields and I need the last field which should be a money.
However, as I read the file, and unstring the record into fields, I found that the last field contain junk value at the end of itself. The amount(money) field should be 8 characters, 5 digit at the front, 1 dot, 2 digit at the end. The values from the input could be any value such as 13.5, 1000 and 354.23 .
"FILE SECTION"
FD INPUT_FILE.
01 INPUT_REC PIC X(66).
"WORKING STORAGE SECTion"
01 WS_INPUT_REC PIC X(66).
01 WS_AMOUNT_NUM PIC 9(5).9(2).
01 WS_AMOUNT_TXT PIC X(8).
"MAIN SECTION"
UNSTRING INPUT_REC DELIMITED BY ","
INTO WS_ID_1, WS_ID_2, WS_CODE, WS_DESCRIPTION, WS_FLAG, WS_AMOUNT_TXT
MOVE WS_AMOUNT_TXT(1:8) TO WS_AMOUNT_NUM(1:8)
DISPLAY WS_AMOUNT_NUM
From the display, the value is rather normal: 345.23, 1000, just as what are, however, after I wrote the field into a file, here is what they become:
79.85^M^#^#
137.35^M^#
I have inspect the field WS_AMOUNT_NUM, which came from the field WS_AMOUNT_TXT, and found that ^# is a kind of LOW-VALUE. However, I cannot find what is ^M, it is not a space, not a high-value.
I am guessing, but it looks like you may be reading variable length records from a file into a fixed length
COBOL record. The junk
at the end of the COBOL record is giving you some grief. Hard to say how consistent that junk is going
to be from one read to the next (data beyond the bounds of actual input record length are technically
undefined). That junk ends up
being included in WS_AMOUNT_TXT after the UNSTRING
There are a number of ways to solve this problem. The suggestion I am giving you here may not
be optimal, but it is simple and should get the job done.
The last INTO field, WS_AMOUNT_TXT, in your UNSTRING statement is the one that receives all of the trailing
junk. That junk needs to be stripped off. Knowing that the only valid characters in the last field are
digits and the decimal character, you could clean it up as follows:
PERFORM VARYING WS_I FROM LENGTH OF WS_AMOUNT_TXT BY -1
UNTIL WS_I = ZERO
IF WS_AMOUNT_TXT(WS_I:1) IS NUMERIC OR
WS_AMOUNT_TXT(WS_I:1) = '.'
MOVE ZERO TO WS_I
ELSE
MOVE SPACE TO WS_AMOUNT_TXT(WS_I:1)
END-IF
END-PERFORM
The basic idea in the above code is to scan from the end of the last UNSTRING output field
to the beginning replacing anything that is not a valid digit or decimal point with a space.
Once a valid digit/decimal is found, exit the loop on the assumption that the rest will
be valid.
After cleanup use the intrinsic function NUMVAL as outlined in my answer to your
previous question
to convert WS_AMOUNT_TXT into a numeric data type.
One final piece of advice, MOVE SPACES TO INPUT_REC before each READ to blow away data left over
from a previous read that might be left in the buffer. This will protect you when reading a very "short"
record after a "long" one - otherwise you may trip over data left over from the previous read.
Hope this helps.
EDIT Just noticed this answer to your question about reading variable length files. Using a variable length input record is a better approach. Given the
actual input record length you can do something like:
UNSTRING INPUT_REC(1:REC_LEN) INTO...
Where REC_LEN is the variable specified after OCCURS DEPENDING ON for the INPUT_REC file FD. All the junk you are encountering occurs after the end of the record as defined by REC_LEN. Using reference modification as illustrated above trims it off before UNSTRING does its work to separate out the individual data fields.
EDIT 2:
Cannot use reference modification with UNSTRING. Darn... It is possible with some other COBOL dialects but not with OpenVMS COBOL. Try the following:
MOVE INPUT_REC(1:REC_LEN) TO WS_BUFFER
UNSTRING WS_BUFFER INTO...
Where WS_BUFFER is a working storage PIC X variable long enough to hold the longest input record. When you MOVE a short alpha-numeric field to a longer one, the destination field is left justified with spaces used to pad remaining space (ie. WS_BUFFER). Since leading and trailing spaces are acceptable to the NUMVAL fucnction you have exactly what you need.
I have a reason for pushing you in this direction. Any junk that ends up at the trailing end of a record buffer when reading a short record is undefined. There is a possibility that some of that junk just might end up being a digit or a decimal point. Should this occur, the cleanup routine I originally suggested would fail.
EDIT 3:
There are no ^# in the resulting WS_AMOUNT_TXT, but still there are a ^M
Looks like the file system is treating <CR> (that ^M thing) at the end of each record as data.
If the file you are reading came from a Windows platform and you are now
reading it on a UNIX platform that would explain the problem. Under Windows records
are terminated with <CR><LF> while on UNIX they are terminated with <LF> only. The
UNIX file system treats <CR> as if it were part of the record.
If this is the case, you can be pretty sure that there will be a single <CR> at the
end of every record read. There are a number of ways to deal with this:
Method 1: As you already noted, pre-edit the file using Notepad++ or some other
tool to remove the <CR> characters before processing through your COBOL program.
Personally I don't think this is the best way of going about it. I prefer to use a COBOL
only solution since it involves fewer processing steps.
Method 2: Trim the last character from each input record before processing it. The last
character should always be <CR>. Try the following if you
are reading records as variable length and have the actual input record length available.
SUBTRACT 1 FROM REC_LEN
MOVE INPUT_REC(1:REC_LEN) TO WS_BUFFER
UNSTRING WS_BUFFER INTO...
Method 3: Treat <CR> as a delimiter when UNSTRINGing as follows:
UNSTRING INPUT_REC DELIMITED BY "," OR x"0D"
INTO WS_ID_1, WS_ID_2, WS_CODE, WS_DESCRIPTION, WS_FLAG, WS_AMOUNT_TXT
Method 4: Condition the last receiving field from UNSTRING by replacing trailing
non digit/non decimal point characters with spaces. I outlined this solution a litte earlier in this
question. You could also explore the INSPECT statement using the REPLACING option (Format 2). This should be able to do pretty much the same thing - just replace all x"00" by SPACE and x"0D" by SPACE.
Where there is a will, there is a way. Any of the above solutions should work for you. Choose the one you are most comfortable with.
^M is a carriage return.
Would Google Refine be useful for rectifying this data?

.gsub erroring with non-regular character 194

I've seen this posted a couple of times but none of the solutions seem to work for me so far...
I'm trying to remove a spurious  character from a string...
e.g.
"myÂstring here Â$100"
..but it should be my string here $100
I've tried:
string.gsub(/\194/,'')
string.gsub(194.chr,'')
string.delete 194.chr
All of these still leave the  intact..
Any thoughts?
By default, Rails supports UTF-8.
You can use your favorite editor to write a gsub call using the proper character you want to replace, as in:
"myÂstring here Â$100".gsub(/Â/,"")
If this does not work as well, you might be having an encoding error somewhere on your stack, probably on your HTML document. Try running rails console, extract somehow that string (if it comes from the Model, try to perform a find on the containing class) and run the gsub. It won't solve your problem, but you'll get a clue to where exactly the problem may lie.
Looks like a character encoding problem to me. For every Unicode code point in the range U+0080..U+00BF inclusive, the UTF-8 encoding is a two-byte sequence, 0xC2 (194 decimal) and the numeric value the code point. For example, a non-breaking space--U+00A0--becomes 0xC2 0xA0. Was there another extra character in there, that you already removed?
At any rate, gsub(/\194/,'') is wrong. \nnn is supposed to be an octal escape, but the number is in its decimal form. 194 in octal is \302.
"myÂstring here Â$100".gsub("Â","") # "mystring here $100"
Is that what you meant?

Resources