Delphi - Problem With Set String and PAnsiChar and Other Strings not Displaying - delphi

I was getting advice from Rob Kennedy and one of his suggestions that greatly increased the speed of an app I was working on was to use SetString and then load it into the VCL component that displayed it.
I'm using Delphi 2009 so now that PChar is Unicode,
SetString(OutputString, PChar(Output), OutputLength.Value);
edtString.Text := edtString.Text + OutputString;
Works and I changed it to PChar myself but since the data being moved isn't always Unicode in fact its usually ShortString Data.... so onto what he actually gave me to use:
SetString(OutputString, PAnsiChar(Output), OutputLength.Value);
edtString.Text := edtString.Text + OutputString;
Nothing shows up but I check in the debugger and the text that normally appeared the way I did it building 1 char at a time in the past was in the variable.
Oddly enough this is not the first time I ran into this tonight. Because I was trying to come up with another way, I took part of his advice and instead of building into a VCL's TCaption I built it into a string variable and then copied it, but when I send it over nothing's displayed. Once again in the debugger the variable that the data is built in... has the data.
for I := 0 to OutputLength.Value - 1 do
begin
OutputString := OutputString + Char(OutputData^[I]);
end;
edtString.Text := OutputString;
The above does not work but the old slow way of doing it worked just fine....
for I := 0 to OutputLength.Value - 1 do
begin
edtString.Text := edtString.Text + Char(OutputData^[I]);
end;
I tried making the variable a ShortString, String and TCaption and nothing is displayed. What I also find interesting is while I build my hex data from the same array into a richedit it's very fast while doing it inside an edit for the text data is very very slow. Which is why I haven't bothered trying to change the code for the richedit as it works superfast as it is.
Edit to add - I think I sorta found the problem but I have no solution. If I edit the value in the debugger to remove anything that can't be displayed (which by the old method used to just not display... not fail) then what I have left is displayed. So if it's just a matter of getting rid of bytes that were turned to characters that are garbage how can I fix that?
I basically have incoming raw data from a SCSI device that's being displayed hex-editor style. My original slow style of adding one char at a time successfully displayed strings and Unicode strings that did not have Unicode-specific characters in them. The faster methods even if working won't display ShortStrings one way and the other way wont display UnicodeStrings that aren't using non 0-255 characters. I really like and could use the speed boost but if it means sacrificing the ability to read the string... then what's the point in the app?
EDIT3 - Alright now that I've figured out that 0-31 is control char and 32 and up is valid I think I'm gonna make an attempt to filter char and replace those not valid with a . which is something I was planning on doing later to emulate hex editor style.
If there are any other suggestions I'd be glad to hear about them but otherwise I think I can craft a solution that's faster than the original and does what I need it to at the same time.

Some comments:
Your question is very unclear. What exactly do you want to do?
Your question reads terrible, please check your text with a spelling checker.
The question you are referring to is this one: Delphi accessing data from dynamic array that is populated from an untyped pointer
Please give a complete code sample of your function like you did in your previous question, I want to know if you implemented Rob Kennedy's suggestion or the code you gave yourself in a following answer (let's hope not :) )
As far a I understand your question: You're sending a query to your SCSI device and you get an array of bytes back which you store in the variable OutputData. After that you want to show your data to the user. So your real question is: How to show an array of bytes to the user?
Login as the same user and don't create an account for every new question. That way we can track your question history an can find out what you mean by 'getting advice'.
Some assumptions and suggestions if I'm right about the true meaning of your question:
Showing your data as a hex string won't give any problems
Showing your data in a normal Memo field gives you problems, although a Delphi string can contain any character, including 0 bytes, displaying them will give you problems. A TMemo for example will show your data till the first 0 byte. What you have to do (and you gave the answer yourself), is replacing non viewable characters with a dummy. After that you can show your data in a TMemo. Actually all hex viewers do the same, characters that cannot be printed will be shown as a dot.

I used PAnsiChar in my example for a reason. It looked like OutputLength was being measured in bytes, not characters, so I made sure to use a type whose length is always measured in bytes. You'll also notice that I showed the declaration of OutputString as an AnsiString.
Since the edit control stored Unicode, though, there will be a conversion between AnsiString and UnicodeString. That will take the system's current code page into account, but that's probably not what you want. You might want to declare the variable as a RawByteString instead. That won't have any code page associated with it, so there won't be any unexpected conversions.
Don't use strings for storing binary data. If you're building what amounts to a hex editor, then you're working with binary data. It's important to remember that. Even if your binary data happens to consist mostly of bytes that can be interpreted as text, you can't treat the data as text or you'll run into exactly the problems you're seeing — characters that don't appear as expected. If you get a bunch of bytes from your SCSI device, then store them in an array of bytes, not characters.
In hex editors, you'll notice that they always show the hexadecimal values of the bytes. They might show those bytes interpreted as characters, but that's secondary, and they generally only show the bytes that can represent ASCII characters; they don't try to get too fancy with the basic display. The good hex editors will offer to display the data interpreted as wide characters, too. This aids in debugging because the user can look at the same data in multiple ways. But they're just views of the data. They're not actually changing the binary contents of the data.

When you filter out non viewable characters...You'll probably need to decide what to do with a couple of them like #9(Tab), #10(LF), #11(Verticle Tab), #12(FF-or New Page),#13(CR)

Related

ASCII Representation of Hexadecimal

I have a string that, by using string.format("%02X", char), I've received the following:
74657874000000EDD37001000300
In the end, I'd like that string to look like the following:
t e x t NUL NUL NUL í Ó p SOH NUL ETX NUL (spaces are there just for clarification of characters desired in example).
I've tried to use \x..(hex#), string.char(0x..(hex#)) (where (hex#) is alphanumeric representation of my desired character) and I am still having issues with getting the result I'm looking for. After reading another thread about this topic: what is the way to represent a unichar in lua and the links provided in the answers, I am not fully understanding what I need to do in my final code that is acceptable for this to work.
I'm looking for some help in better understanding an approach that would help me to achieve my desired result provided below.
ETA:
Well I thought that I had fixed it with the following code:
function hexToAscii(input)
local convString = ""
for char in input:gmatch("(..)") do
convString = convString..(string.char("0x"..char))
end
return convString
end
It appeared to work, but didnt think about characters above 127. Rookie mistake. Now I'm unsure how I can get the additional characters up to 256 display their ASCII values.
I did the following to check since I couldn't truly "see" them in the file.
function asciiSub(input)
input = input:gsub(string.char(0x00), "<NUL>") -- suggested by a coworker
print(input)
end
I did a few gsub strings to substitute in other characters and my file comes back with the replacement strings. But when I ran into characters in the extended ASCII table, it got all forgotten.
Can anyone assist me in understanding a fix or new approach to this problem? As I've stated before, I read other topics on this and am still confused as to the best approach towards this issue.
The simple way to transform a base16-encoded string is just to
function unhex( input )
return (input:gsub( "..", function(c)
return string.char( tonumber( c, 16 ) )
end))
end
This is basically what you have, just a bit cleaner. (There's no need to say "(..)", ".." is enough – if you specify no captures, you'll automatically get the whole match. And while it might work if you write string.char( "0x"..c ), it's just evil – you concatenate lots of strings and then trigger the automatic conversion to numbers. Much better to just specify the base when explicitly converting.)
The resulting string should be exactly what went into the hex-dumper, no matter the encoding.
If you cannot correctly display the result, your viewer will also be unable to display the original input. If you used different viewers for the original input and the resulting output (e.g. a text editor and a terminal), try writing the output to a file instead and looking at it with the same viewer you used for the original input, then the two should be exactly the same.
Getting viewers that assume different encodings (e.g. one of the "old" 8-bit code pages or one of the many versions of Unicode) to display the same thing will require conversion between different formats, which tends to be quite complicated or even impossible. As you did not mention what encodings are involved (nor any other information like OS or programs used that might hint at the likely encodings), this could be just about anything, so it's impossible to say anything more specific on that.
You actually have a couple of problems:
First, make sure you know the meaning of the term character encoding, and that you know the difference between characters and bytes. A popular post on the topic is The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Then, what encoding was used for the bytes you just received? You need to know this, otherwise you don't know what byte 234 means. For example it could be ISO-8859-1, in which case it is U+00EA, the character ê.
The characters 0 to 31 are control characters (eg. 0 is NUL). Use a lookup table for these.
Then, displaying the characters on the terminal is the hard part. There is no platform-independent way to display ê on the terminal. It may well be impossible with the standard print function. If you can't figure this step out you can search for a question dealing specifically with how to print Unicode text from Lua.

Does Fast Report 4 (Delphi 7) support UTF8 using frxUserDataSet?

I've done my homework, and specifically:
1) Read the whole FASTREPORT 4 manual. It does not mention UTF8, nor Unicode support
2) Looked for an answer here on SO
3) Googled it around
If I set a Text field and fill it with Thai characters, they are perfectly printed, so FastReport CAN handle Unicode characters, at least it can print them.
If I try to "pass" a value using the callbacks provided by the frxUserDataSet, then what I see is some garbled not-unicode text. In particular, if I pass e.g. a string made with the same 10 Thai characters, I see the same "set" of 3 or 4 garbled characters repeated ten times, so I am sure the data is passed correctly, but then FastReport has probably no way to know that they should be handled as Unicode.
The callback requires the data passed back to be of "variant" type, so I guess it's totally useless to cast them to any type, because variant will accept any of them.
I forgot to mention that I get the strings from a MySql DB and the data is stored as UTF8, and I do not even copy the data in a local variable: what I get from the DB is put into the variant.
Is there a way to force FastReport to print the data received as Unicode?
Thank you
Yes, FR4 with Delphi7 supports UTF8 using frxUserDataSet.
Just for future reference:
1) You MUST set your DB (MySql in my case) to use UTF8
2) You MUST set the character set in the component you use to access the DB to utf8 ("DAC for MySql" in my case, and the property is called ConnectionCharacterSet)
3) In all the frxUserDataSet callbacks, before setting the "value" variable, you MUST CONVERT whatever you have using the Utf8decode Delphi system routine, like this:
value := Utf8decode(fReports.q1.FieldValueByFieldName('yourDBfield'));
where fReports is the form name, and q1 the component used to access the DB.
I keep reading that using D7 and UniCode is almost impossible, but - as long as you use XP and up - it's only harder from what I am seeing. Unfortunately, I must use XP, D7 and cannot upgrade. But, as said, I am quickly becoming used to solve these problems so, in the future, I hope to be able to give back some help in the same way everybody has always helped me here :)

How to get a single Arabic letter in a string with its Unicode transformation value in DELPHI?

Considering this Arabic word(جبل) made of 3 letters .
-the first letter is جـ,
-name is (ǧīm),
-its Unicode value is FE9F when its in the beginning,
-its basic value is 062C and
-its isolated value is FE9D but the last two values return the same shape drawing ج .
Now, Whenever I try to get it as a single character -trying many different ways-, Delphi returns the basic Unicode value.
well,that makes sense,but what happens to the char with transformation? It is a single char too..Looks like it takes the transformed value only when it is within a string, but where? how to extract it?When and which process decides these values?
Again the MAIN QUESTION:
How can I get the Arabic letter or its Unicode value as it is within a string?
just for information: Unlike English which has tow cases for its letters(Capital and Small), Arabic has four cases(Isolated, Beginning,Middle And End) with different rules as well.
I'm not sure I understand the question. If you want to know how to write U+FE9F in Delphi source code, in a modern Unicode version of Delphi. Do that simply like so:
Char($FE9F)
If you want to read individual characters from جبل then do it like this:
const
MyWord = 'جبل';
var
c: Char;
....
c := MyWord[1];//this is U+062C
Note that the code above is fine for your particular word because each code point can be encoded with a single UTF-16 WideChar character element. If the code point required multiple elements, then it would be best to transform to UTF-32 for code point level processing.
Now, let's look at the string that you included in the question. I downloaded this question using wget and the file that came down the wires was UTF-8 encoded. I used Notepad++ to convert to UTF16-LE and then picked out the three UTF-16 characters of your string. They are:
U+062C
U+0628
U+0644
You stated:
The first letter is جـ, name is (ǧīm), its Unicode value is U+FE9F.
But that is simply incorrect. As can be seen from the above, the actual character you posted was U+062C. So the reason why your attempts to read the first character yield U+062C is that U+062C really is the first character of your string.
The bottom line is that nothing in your Delphi code is transforming your character. When you do:
S[1] := Char($FE9F);
the compiler performs a simple two byte copy. There is no context aware transformation that occurs. And likewise when reading S[1].
Let's look at how these characters are displayed, using this simple code on a VCL forms application that contains a memo control:
Memo1.Clear;
Memo1.Lines.Add(StringOfChar(Char($FE9F), 2));
Memo1.Lines.Add(StringOfChar(Char($062C), 2));
The output looks like this:
As you can see, the rendering layer knows what to do with a U+062C character that appears at the beginning of the string.
Shaping of Arabic characters for presentation in Windows is served by the Uniscribe services (USP10.dll).
UniScribe
You may find the following blog post useful:
Roozbeh's Programming Blog
I don't think you can do it using string/char related methods. But using pchar, maybe can you access the memory and read the Pword values directly
EDIT: After discussing with David, I think that you will always get the basic/isolated value of the letter. The fact that begin or end glyph is used, is probably just handled by the display framework of the OS

how to know (in code) that some characters are displayed fine (or not) in the interface of a program made in Delphi

Sorry about my english...
I'm trying to make a small program in Delphi 7.
Its interface will have text in my language, which has some characters with diacritics.
If "Language for non-Unicode programs" is set to my language those characters are always displayed fine. That's normal.
If is set to something else, sometimes are displayed fine, sometimes they are not.
How can I know that they can be displayed fine or not...?
Oh, and I can't use Unicode components, only normal.
Only way that I found is to capture the image of one characters into a bitmap and check pixel by pixel. But it's a lot of work to implement, slow and imprecise.
I can use GetSystemDefaultLangID function and know that "Language for non-Unicode programs" is set to something else but still don't know if they are displayed fine or not.
Thank you for any idea.
Welcome to the joys of AnsiStrings encoded using code-pages. You should not be using AnsiStrings at all, and you know that, but you say without explaining it that you can't use unicode controls. this seems strange to me. You should be using either:
(a) A Unicode version of Delphi (2009,2010, XE), where String=UnicodeString.
(b) If not that, at least use Proper Unicode controls, such as TNT Controls, and internally use WideString types where you need to store accented or international characters.
Your version of Delphi has String=AnsiString, and you are relying on the locale that your system is set to (as you say in your question) to select the codepage representations of accented characters, a problematic scheme. If you really can't move up from Delphi 7, at least start using WideStrings, and TNT Unicode Controls, but I must say that effort is WASTED you would be better off getting Delphi XE, and just porting to Unicode.
Your question asks "how can I know if they can be stored fine or not?" You can encode and decode using your codepage, and check if anything is replaced with a "?". The windows function WideCharToMultiByte, for example behaves like this. MBCS is a world of pain, and not worth doing, but you asked how you can find out where the floor falls out from under you, so that API will help you understand your selected encoding rule.
Use WideCharToMultiByte Function - http://msdn.microsoft.com/en-us/library/dd374130(v=vs.85).aspx and check lpUsedDefaultChar parameter.
Since this has been on my research list for a while, but didn't reach the top of that list yet, I can only help you out with a few links.
You will need to to quite a bit of experimentation :-)
When using Unicode, you can use functions ScriptGetCMap and GetGlyphIndices to test if a code point is in the font.
When not using Unicode, you can use the function GetGlyphIndices
There are few Delphi translations of these functions around. This Borland Newsgroup thread has a few hints on using GetGlyphIndices in Delphi.
Here is a search ScriptGetCMap in Delphi.
This page has a list of some interesting API calls that might help you further.
An extra handicap is that because not all fonts contain all characters, so Windows can do font substitution for you.
I'm not sure how to figure out that, but it is something you have to check for too.
Good luck :-)
procedure TForm1.Button2Click(Sender: TObject);
var
ACP: Integer;
begin
ACP := GetACP;
Caption := 'CP' + IntToStr(ACP);
if ACP = 1250 then
Caption := Caption + ' is okay for Romanian language';
end;

Delphi 2009 + Unicode + Char-size

I just got Delphi 2009 and have previously read some articles about modifications that might be necessary because of the switch to Unicode strings.
Mostly, it is mentioned that sizeof(char) is not guaranteed to be 1 anymore.
But why would this be interesting regarding string manipulation?
For example, if I use an AnsiString:='Test' and do the same with a String (which is unicode now), then I get Length() = 4 which is correct for both cases.
Without having tested it, I'm sure all other string manipulation functions behave the same way and decide internally if the argument is a unicode string or anything else.
Why would the actual size of a char be of interest for me if I do string manipulations?
(Of course if I use strings as strings and not to store any other data)
Thanks for any help!
Holger
With Unicode SizeOf(SomeChar) <> Length(SomeChar). Essentially the length of a string is less then the sum of the size of its chars. As long as you don't assume SizeOf(Char) = 1, or SizeOf(SomeString[x]) = 1 (since both are FALSE now) or try to interchange bytes with chars, then you shouldn't have any trouble. Any place you are doing something creative stuffing Bytes into Chars or Strings, then you will need to use AnsiString.
(SizeOf(SomeString) is still 4 no matter the length since it is essentially a pointer with some compiler magic.)
People often implicitly convert from characters to bytes in old Delphi code without really thinking about it. For example, when writing to a stream. When you write a string to a stream, you have to specify the number of bytes you write, but people often pass the character count instead. See this post from Chris Bensen for another example.
Another way people often make this implicit conversion and older code is by using a "string" to store binary data. In this case, they actually want bytes, but the data type expects characters. D2009 has a better type for this.
I didn't try Delphi 2009, but are using fpc which is also switching to unicode slowly. I'm 95% sure that everything below also holds for Delphi 2009
In fpc (when supporting unicode) it will be so that functions like 'length' take the codepage into consideration. Thus it will return the length of the string as a 'human' would see it. If there are - for example - two chinese characters, that both take two bytes of memory in unicode, length will return 2, since there are two characters in the string. But the string will take 4 bytes of memory. (+the memory for the reference count and the leading #0, but that aside)
What you can not do anymore is this:
var p : pchar;
begin
p := s[1];
for i := 0 to length(string)-1 do
begin
write(p);
inc(p);
end;
end;
Because this code will - in the two chinese-character example - write the wrong two characters. Namely the two bytes which are part of the first 'real' character.
In short: Length() doesn't return the amount of bytes allocated for the string anymore, but the amount of characters. (Before the switch to unicode, those two values were equal to eachother)
The actual size of a character shouldn't matter, unless you are doing the manipulation at the byte level.
(Of course if I use strings as strings and not to store any other data)
That's the key point, YOU don't use strings for other purposes, but some people do. They use strings just like arrays, so they (and that's including me) would need to check all such uses to make sure nothing is broken...
Lets not forget that there are times when this conversion is not really desired. Say for storing a GUID in a record for instance. The guid can only contain hexadecimal characters plus the - and brackets...making them take up twice the space can make quite an impact on existing code. Sure the simple solution is to change them to AnsiString, and deal with the compiler warnings if you do any string manipulation on them.
It can be an issue if you make Windows API calls. Or if you have legacy code that does inc or dec of str[0] to change its length.

Resources