How can i convert string character (123-jhk25) to ASCII in Delphi7
If you mean the ASCII code for the character you need to use the Ord() function which returns the Ordinal value of any "enumerable" type
In this case it works on character values, returning a byte:
var
Asc : Byte;
i : Integer;
begin
for i := 1 to Length(s) do
begin
Asc := Ord(s[i]);
// do something with Asc
end;
end;
It depends on your Delphi version. In Delphi 2007 and before, strings are automatically in ANSI string format, and anything below 128 is an ASCII character.
In D2009 and later, things become more complicated since the default string type is UnicodeString. You'll have to cast the character to AnsiChar. It'll perform a codepage conversion, and then whatever you end up with may or may not work depending on which language the character in question came from. But if it was originally an ASCII character, it should convert without trouble.
Related
I'm writing a delimiter for some Excel spreadsheet data and I need to read the rightward arrow symbol and pilcrow symbol in a large string.
The pilcrow symbol, for row ends, was fairly simply, using the Chr function and the AnsiChar code 182.
The rightward arrow has been more tricky to figure out. There isn't an AnsiChar code for it. The Unicode value for it is '2192'. I can't, however, figure out how to make this into a string or char type for me to use in my function.
Any easy ways to do this?
You can't use the 2192 character directly. But since a STRING variable can't contain this value either (as thus your TStringList can't either), that doesn't matter.
What character(s) are the 2192 character represented as in your StringList AFTER you have read it in? Probably by these three separate characters: 0xE2 0x86 0x92 (in UTF-8 format). The simple solution, therefore, is to start by replacing these three characters with a single, unique character that you can then assign to the Delimiter field of the TStringList.
Like this:
.
.
.
<Read file into a STRING variable, say S>
S := ReplaceStr(S,#$E2#$86#$92,'|');
SL := TStringList.Create;
SL.Text := S;
SL.Delimiter := '|';
.
.
.
You'll have to select a single-character representation of your 3-byte UTF-8 Unicode character that doesn't occur in your data elsewhere.
You need to represent that character as a UTF-16 character. In Unicode Delphi you would do it like this:
Chr(2192)
which is of type WideChar.
However, you are using Delphi 7 which is a pre-Unicode Delphi. So you have to do it like this:
var
wc: WideChar;
....
wc := WideChar(2192);
Now, this might all be to no avail for you since it sounds a little like your code is working with 8 bit ANSI text. In which case that character cannot be encoded in any 8 bit ANSI character set. If you really must use that character, you'll need to use Unicode text.
I am making a program in Delphi 7, that is supposed to encode a unicode string into html entity string.
For example, "ABCģķī" would result in "ABCģķī"
Now 2 basic things:
Delphi 7 is non-Unicode, so I can't just write unicode chars directly in code to encode them.
Codepages consist of 255 entries, each holding a character, specific to that codepage, except first 127, that are same for all the codepages.
So - How do I get a value of a char, that is in 1-255 range?
I tried Ord(Integer), but it also returns values way past 255. Basically, everything is fine (A returns 65 an so on) until my string reaches non-Latin unicode.
Is there any other method for returning char value? Any help appreciated
I suggest you avoid codepages like the plague.
There are two approaches for Unicode that I'd consider: WideString, and UTF-8.
Widestrings have the advantage that it's 'native' to Windows, which helps if you need to use Windows API calls. Disadvantages are storage space, and that they (like UTF-8) can require multiple WideChars to encode the full Unicode space.
UTF-8 is generally preferable. Like WideStrings, this is a multi-byte encoding, so a particular unicode 'code point' may need several bytes in the string to encode it. This is only an issue if you're doing lots of character-by-character processing on your strings.
#DavidHeffernan comments (correctly) that WideStrings may be more compact in certain cases. However, I'd only recommend UTF-16 only if you are absolutely sure that your encoded text will really be more compact (don't forget markup!), and this compactness is highly important to you.
In HTML 4, numeric character references are relative to the charset used by the HTML. Whether that charset is specified in the HTML itself via a <meta> tag, or out-of-band via an HTTP/MIME Content-Type header or other means, it does not matter. As such, "ABCģķī" would be an accurate representation of "ABCģķī" only if the HTML were using UTF-16. If the HTML were using UTF-8, the correct representation would be either "ABCģķī" or "ABCģķī" instead. Most other charsets do no support those particular Unicode characters.
In HTML 5, numeric character references contain original Unicode codepoint values regardless of the charset used by the HTML. As such, "ABCģķī" would be represented as either "ABC#291;ķī" or "ABCģķī".
So, to answer your question, the first thing you have to do is decide whether you need to use HTML 4 or HTML 5 semantics for numeric character references. Then, you need to assign your Unicode data to a WideString (which is the only Unicode string type that Delphi 7 natively supports), which uses UTF-16, then:
if you need HTML 4:
A. if the HTML charset is not UTF-16, then use WideCharToMultiByte() (or equivalent) to convert the WideString to that charset, then loop through the resulting values outputting unreserved characters as-is and character references for reserved values, using IntToStr() for decimal notation or IntToHex() for hex notation.
B. if the HTML charset is UTF-16, then simply loop through each WideChar in the WideString, outputting unreserved characters as-is and character references for reserved values, using IntToStr() for decimal notation or IntToHex() for hex notation.
If you need HTML 5:
A. if the WideString does not contain any surrogate pairs, then simply loop through each WideChar in the WideString, outputting unreserved characters as-is and character references for reserved values, using IntToStr() for decimal notation or IntToHex() for hex notation.
B. otherwise, convert the WideString to UTF-32 using WideStringToUCS4String(), then loop through the resulting values outputting unreserved codepoints as-is and character references for reserved codepoints, using IntToStr() for decimal notation or IntToHex() for hex notation.
In case I understood the OP correctly, I'll just leave this here.
function Entitties(const S: WideString): string;
var
I: Integer;
begin
Result := '';
for I := 1 to Length(S) do
begin
if Word(S[I]) > Word(High(AnsiChar)) then
Result := Result + '#' + IntToStr(Word(S[I])) + ';'
else
Result := Result + S[I];
end;
end;
I have a text that I need to store it in a widestring variable. But my text is UTF8 and widestring doesn't support UTF8 and converts it to some chinese characters.
so is there any UTF8 version of WIDESTRING?
I always use UTF8string but in this case I have to use WideString
When you assign a UTF8String variable to a WideString variable, the compiler automatically inserts instructions to decode the string (in Delphi 2009 and later). It coverts UTF-8 to UTF-16, which is what WideString holds. If your WideString variable holds Chinese characters, then that's because your UTF-8-encoded string holds UTF-8-encoded Chinese characters.
If you want your string ws to hold 16-bit versions of the bytes in your UTF8String s, then you can by-pass the automatic conversion with some type-casting:
var
ws: WideString;
i: Integer;
c: AnsiChar;
SetLength(ws, Length(s));
for i := 1 to Length(s) do begin
c := s[i];
ws[i] := WideChar(Ord(c));
end;
If you're using Delphi 2009 or later (which includes the XE series), then you should consider using UnicodeString instead of WideString. The former is a native Delphi type, whereas the latter is more of a wrapper for the Windows BSTR type. Both types exhibit the automatic conversion behavior when assigning to and from AnsiString derivatives like UTF8String, though, so they type you use doesn't affect this answer.
In earlier Delphi versions, the compiler would attempt to decode the string using the system code page (which is never UTF-8). To make it decode the string properly, call Utf8Decode:
ws := Utf8Decode(s);
In the case where a Unicode character or a UTF8 character exists in a ansistring is it possible to strip the characters from the string? In this particular case the ansistring contains EXIF parameters.
Edit
When the string is read it is visible as: Copyright © 2013 The States of Guernsey (Guernsey Museums & Galleries)
In one case, the copyright symbol © is encoded as UTF-8 sequence (that is 0xc2 and 0xa9).
Delphi 7 and Delphi 2010 shows it as ascii, displaying an "Â" (C2) and a "©" (A9), ignoring that is a UTF8 sequence. Exif tags and the Copyright tag (33432) should be simple ASCII, not UTF8 or unicode.
So if a ansistring contains one or more of these characters can they be stripped from the string or do they have to be manually edited?
Edit2
Attempting to recover the UTF8 I tried:
// remove the null terminator from a string (part of imageen unit}
function RemoveNull(sValue: string): string;
begin
result := trim(svalue);
if (result <> '') and
(result[length(result)] = #0) then
SetLength(result, length(result) - 1);
result := trim(result);
end;
EXIF_Copyright: is defined by ImageEn as AnsiString;
utf8: UTF8String;
// EXIF_Copyright
// Shows copyright information
SetLength(utf8, Length(EXIF_Copyright)); // [DCC Error] iexEXIFRoutines.pas(911): E2026 Constant expression expected
Move(Pointer(EXIF_Copyright)^, Pointer(utf8)^, Length(EXIF_Copyright)));
_EXIF_Copyright: result := RemoveNull(EXIF_Copyright);
Unfortunately I have little experience dealing with UTF8.
where EXIF_Copyright is an ansistring;
but this will not compile...
The simplest approach is to read your UTF-8 string into a variable of type UTF8String and then assign to another string variable.
You can assign to an AnsiString if you want, but I don't understand why you would do that. If you do convert to ANSI, any characters that cannot be represented will be converted to question marks. If you are desperate to strip non-ASCII characters, read into UTF8String, convert to string, and strip characters > 127.
As I understand it, the standard mandates ASCII but it's common now for EXIF text to be encoded with UTF-8.
I suggest you simply read the text into a UTF8String and leave it at that.
Your library gives you an AnsiString that actually contains UTF-8 text. So you can simply convert to UTF8String like this:
function ReinterpUTF8storedInAnsiString(const ansi: AnsiString): string;
var
utf8: UTF8String;
begin
SetLength(utf8, Length(ansi));
Move(Pointer(ansi)^, Pointer(utf8)^, Length(ansi));
Result := utf8;
end;
Now you will have the text that the file creator intended you to see.
I have incorrect result when converting file to string in Delphi XE. There are several ' characters that makes the result incorrect. I've used UnicodeFileToWideString and FileToString from http://www.delphidabbler.com/codesnip and my code :
function LoadFile(const FileName: TFileName): ansistring;
begin
with TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite) do
begin
try
SetLength(Result, Size);
Read(Pointer(Result)^, Size);
// ReadBuffer(Result[1], Size);
except
Result := '';
Free;
end;
Free;
end;
end;
The result between Delphi XE and Delphi 6 is different. The result from D6 is correct. I've compared with result of a hex editor program.
Your output is being produced in the style of the Delphi debugger, which displays string variables using Delphi's own string-literal format. Whatever function you're using to produce that output from your own program has actually been fixed for Delphi XE. It's really your Delphi 6 output that's incorrect.
Delphi string literals consist of a series of printable characters between apostrophes and a series of non-printable characters designated by number signs and the numeric values of each character. To represent an apostrophe, write two of them next to each other. The printable and non-printable series of characters can be written right not to each other; there's no need to concatenate them with the + operator.
Here's an excerpt from the output you say is correct:
#$12'O)=ù'dlû'#6't
There are four lone apostrophes in that string, so each one either opens or closes a series of printable characters. We don't necessarily know which is which when we start reading the string at the left because the #, $, 1, and 2 characters are all printable on their own. But if they represent printable characters, then the 0, ), =, and ù characters are in the non-printable region, and that can't be. Therefore, the first apostrophe above opens a printable series, and the #$12 part represents the character at code 18 (12 in hexadecimal). After the ù is another apostrophe. Since the previous one opened a printable string, this one must close it. But the next character after that is d, which is not #, and therefore cannot be the start of a non-printable character code. Therefore, this string from your Delphi 6 code is mal-formed.
The correct version of that excerpt is this:
#$12'O)=ù''dlû'#6't
Now there are three lone apostrophes and one set of doubled apostrophes. The problematic apostrophe from the previous string has been doubled, indicating that it is a literal apostrophe instead of a printable-string-closing one. The printable series continues with dlû. Then it's closed to insert character No. 6, and then opened again for t. The apostrophe that opens the entire string, at the beginning of the file, is implicit.
You haven't indicated what code you're using to produce the output you've shown, but that's where the problem was. It's not there anymore, and the code that loads the file is correct, so the only place that needs your debugging attention is any code that depended on the old, incorrect format. You'd still do well to replace your code with that of Robmil since it does better at handling (or not handling) exceptions and empty files.
Actually, looking at the real data, your problem is that the file stores binary data, not string data, so interpreting this as a string is not valid at all. The only reason it works at all in Delphi 6 is that non-Unicode Delphi allows you to treat binary data and strings the same way. You cannot do this in Unicode Delphi, nor should you.
The solution to get the actual text from within the file is to read the file as binary data, and then copy any values from this binary data, one byte at a time, to a string if it is a "valid" Ansi character (printable).
I will suggest the code:
function LoadFile(const FileName: TFileName): AnsiString;
begin
with TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite) do
try
SetLength(Result, Size);
if Size > 0 then
Read(Result[1], Size);
finally
Free;
end;
end;