I use the HTTPEncode() function in Delphi XE8 to encode Japanese text. Some characters can encode correctly, but some cannot. Below is an example:
aStr := HTTPEncode('萩原小学校');
I expected this:
aStr = '%E8%90%A9%E5%8E%9F%E5%B0%8F%E5%AD%A6%E6%A0%A1'
But I got this:
aStr = '%E8%90%A9%E5%8E%9F%E5%B0%8F%3F%E6%A0%A1'
Can someone help me to encode '萩原小学校' as '%E8%90%A9%E5%8E%9F%E5%B0%8F%E5%AD%A6%E6%A0%A1'?
I'm not sure what this HTTPEncode function is. There is a function in Web.HTTPApp of that name. Perhaps that is what you refer to. If so, it is clearly marked as deprecated. Assuming you have enabled compiler warnings, the compiler will be telling you this, and telling you also to use TNetEncoding.UTL.Encode instead.
Let's try that:
{$APPTYPE CONSOLE}
uses
System.NetEncoding;
begin
Writeln(TNetEncoding.URL.Encode('萩原小学校'));
end.
Output
%E8%90%A9%E5%8E%9F%E5%B0%8F%E5%AD%A6%E6%A0%A1
# David Heffernan, # Remy Lebeau, thank you so much for your time to help me. your answer make me understand why i cannot convert my string with HTTPEncode.
I have tried many times myself till i found this: Delphi: Convert from windows-1251 to Shift-JIS
function MyEncode(const S: string; const CodePage: Integer): string;
var
Encoding: TEncoding;
Bytes: TBytes;
b: Byte;
sb: TStringBuilder;
begin
Encoding := TEncoding.GetEncoding(CodePage);
try
Bytes := Encoding.GetBytes(S);
finally
Encoding.Free;
end;
sb := TStringBuilder.Create;
try
for b in Bytes do begin
sb.Append('%');
sb.Append(IntToHex(b, 2));
end;
Result := sb.ToString;
finally
sb.Free;
end;
end;
MyEncode('萩原小学校', 65001);
Output = %E8%90%A9%E5%8E%9F%E5%B0%8F%E5%AD%A6%E6%A0%A1
Related
I have this function to check if a string is a regular expression and it works fine :
function IsValidRegEx(aString: string): Boolean;
var
aReg : TRegEx;
begin
Result := False;
if Trim(aString) = '' then
begin
Exit;
end;
try
aReg := TRegEx.Create(aString);
if aReg.IsMatch('asdf') then
begin
end;
Result := True;
except
end;
end;
the problem is it always raise a debugger exception notification if string value is false. I want to eliminate that notification. There is an option to ignore that exception in the notification itself but I don't want it. As much as possible it would be the codes that will adjust.
If you want to use this approach, then you can't avoid exceptions being raised by the Delphi regex library. You'd need to dig down to the PCRE library that Delphi uses to implement its regex library. For instance:
{$APPTYPE CONSOLE}
uses
System.RegularExpressionsAPI;
function IsValidRegEx(const Value: UTF8String): Boolean;
var
CharTable: Pointer;
Options: Integer;
Pattern: Pointer;
Error: PAnsiChar;
ErrorOffset: Integer;
begin
CharTable := pcre_maketables;
Options := PCRE_UTF8 or PCRE_NEWLINE_ANY;
Pattern := pcre_compile(PAnsiChar(Value), Options, #Error, #ErrorOffset, CharTable);
Result := Assigned(Pattern);
pcre_dispose(Pattern, nil, CharTable);
end;
begin
Writeln(IsValidRegEx('*'));
Writeln(IsValidRegEx('.*'));
Readln;
end.
Note that I have written this with Delphi XE7, as I don't have access to XE2. If this code doesn't compile, then it should not be too hard to study the source code for the Delphi regex library to work out how to achieve the same in XE2.
I want to achieve a very very basic task in Delphi: to save a string to disk and load it back. It seems trivial but I had problems doing this TWICE since I upgraded to IOUtils (and one more time before that... this is why I took the 'brilliant' decision to upgrade to IOUtils).
I use something like this:
procedure WriteToFile(CONST FileName: string; CONST uString: string; CONST WriteOp: WriteOperation);
begin
if WriteOp= (woOverwrite)
then IOUtils.TFile.WriteAllText (FileName, uString) //overwrite
else IOUtils.TFile.AppendAllText(FileName, uString); //append
end;
Simple right? What could go wrong? Well, I recently stepped into a (another) bug in IOUtils. So, TFile is buggy. The bug is detailed here.
Anyone has can share an alternative (or simply your thoughts/ideas) that is not based on IOUtils and it is known to work? Well... the code above also worked for a while for me... So, I know if difficult to guaranty that a piece of code (no matter how small) will really work!
Also I would REALLY like to have my WriteToFile procedure to save the string to an ANSI file when it is possible (the uString contains only ANSI chars) and as Unicode otherwise.
Then the ReadAFile function should automagically detect the encoding and correctly read the string back.
The idea is that there are still text editors out there that will wrongly open/interpret an Unicode/UTF file. So, whenever possible, give a good old ANSI text file to the user.
So:
- Overwrite/Append
- Save as ANSI when possible
- Memory efficient (don't eat 4GB of ram when the file to load is 2GB)
- Should work with any text file (up to 2GB, obviously)
- No IOUtils (too buggy to be of use)
Then the ReadAFile function should automagically detect the encoding and correctly read the string back.
This is not possible. There exists files that are well-formed if interpreted as any text encoding. For instance see The Notepad file encoding problem, redux.
This means that your goals are unattainable and that you need to change them.
My advice is to do the following:
Pick a single encoding, UTF-8, and stick to it.
If the file does not exists, create it and write UTF-8 bytes to it.
If the file exists, open it, seek to the end, and append UTF-8 bytes.
A text editor that does not understand UTF-8 is not worth supporting. If you feel inclined, include a UTF-8 BOM when you create the file. Use TEncoding.UTF8.GetBytes and TEncoding.UTF8.GetString to encode and decode.
Just use TStringList, until size of file < ~50-100Mb (it depends on CPU speed):
procedure ReadTextFromFile(const AFileName: string; SL: TStringList);
begin
SL.Clear;
SL.DefaultEncoding:=TEncoding.ANSI; // we know, that old files has this encoding
SL.LoadFromFile(AFileName, nil); // let TStringList detect real encoding.
// if not - it just use DefaultEncoding.
end;
procedure WriteTextToFile(const AFileName: string; const TextToWrite: string);
var
SL: TStringList;
begin
SL:=TStringList.Create;
try
ReadTextFromFile(AFileName, SL); // read all file with encoding detection
SL.Add(TextToWrite);
SL.SaveToFile(AFileName, TEncoding.UTF8); // write file with new encoding.
// DO NOT SET SL.WriteBOM to False!!!
finally
SL.Free;
end;
end;
The Inifiles unit should support unicode. At least according to this answer: How do I read a UTF8 encoded INI file?
Inifiles are quite commonly used to store strings, integers, booleans and even stringlists.
procedure TConfig.ReadValues();
var
appINI: TIniFile;
begin
appINI := TIniFile.Create(ChangeFileExt(Application.ExeName,'.ini'));
try
FMainScreen_Top := appINI.ReadInteger('Options', 'MainScreen_Top', -1);
FMainScreen_Left := appINI.ReadInteger('Options', 'MainScreen_Left', -1);
FUserName := appINI.ReadString('Login', 'UserName', '');
FDevMode := appINI.ReadBool('Globals', 'DevMode', False);
finally
appINI.Free;
end;
end;
procedure TConfig.WriteValues(OnlyWriteAnalyzer: Boolean);
var
appINI: TIniFile;
begin
appINI := TIniFile.Create(ChangeFileExt(Application.ExeName,'.ini'));
try
appINI.WriteInteger('Options', 'MainScreen_Top', FMainScreen_Top);
appINI.WriteInteger('Options', 'MainScreen_Left', FMainScreen_Left);
appINI.WriteString('Login', 'UserName', FUserName);
appINI.WriteBool('Globals', 'DevMode', FDevMode);
finally
appINI.Free;
end;
end;
Also see the embarcadero documentation on inifiles: http://docwiki.embarcadero.com/Libraries/Seattle/en/System.IniFiles.TIniFile
Code based on David's suggestions:
{--------------------------------------------------------------------------------------------------
READ/WRITE UNICODE
--------------------------------------------------------------------------------------------------}
procedure WriteToFile(CONST FileName: string; CONST aString: String; CONST WriteOp: WriteOperation= woOverwrite; WritePreamble: Boolean= FALSE); { Write Unicode strings to a UTF8 file. It can also write a preamble }
VAR
Stream: TFileStream;
Preamble: TBytes;
sUTF8: RawByteString;
aMode: Integer;
begin
ForceDirectories(ExtractFilePath(FileName));
if (WriteOp= woAppend) AND FileExists(FileName)
then aMode := fmOpenReadWrite
else aMode := fmCreate;
Stream := TFileStream.Create(filename, aMode, fmShareDenyWrite); { Allow read during our writes }
TRY
sUTF8 := Utf8Encode(aString); { UTF16 to UTF8 encoding conversion. It will convert UnicodeString to WideString }
if (aMode = fmCreate) AND WritePreamble then
begin
preamble := TEncoding.UTF8.GetPreamble;
Stream.WriteBuffer( PAnsiChar(preamble)^, Length(preamble));
end;
if aMode = fmOpenReadWrite
then Stream.Position:= Stream.Size; { Go to the end }
Stream.WriteBuffer( PAnsiChar(sUTF8)^, Length(sUTF8) );
FINALLY
FreeAndNil(Stream);
END;
end;
procedure WriteToFile (CONST FileName: string; CONST aString: AnsiString; CONST WriteOp: WriteOperation);
begin
WriteToFile(FileName, String(aString), WriteOp, FALSE);
end;
function ReadFile(CONST FileName: string): String; {Tries to autodetermine the file type (ANSI, UTF8, UTF16, etc). Works with UNC paths }
begin
Result:= System.IOUtils.TFile.ReadAllText(FileName);
end;
In Delphi 7, I have a widestring encoded with Base64(That I received from a Web service with WideString result) :
PD94bWwgdmVyc2lvbj0iMS4wIj8+DQo8c3RyaW5nPtiq2LPYqjwvc3RyaW5nPg==
when I decoded it, that result is not UTF-8:
<?xml version="1.0"?>
<string>طھط³طھ</string>
But when I decoded it by base64decode.org, result is true :
<?xml version="1.0"?>
<string>تست</string>
I have use EncdDecd unit for DecodeString function.
The problem you have is that you are using DecodeString. That function, in Delphi 7, treats the decoded binary data as being ANSI encoded. And the problem is that your text is UTF-8 encoded.
To continue with the EncdDecd unit you have a couple of options. You can switch to DecodeStream. For instance, this code will produce a UTF-8 encoded text file with your data:
{$APPTYPE CONSOLE}
uses
Classes,
EncdDecd;
const
Data = 'PD94bWwgdmVyc2lvbj0iMS4wIj8+DQo8c3RyaW5nPtiq2LPYqjwvc3RyaW5nPg==';
var
Input: TStringStream;
Output: TFileStream;
begin
Input := TStringStream.Create(Data);
try
Output := TFileStream.Create('C:\desktop\out.txt', fmCreate);
try
DecodeStream(Input, Output);
finally
Output.Free;
end;
finally
Input.Free;
end;
end.
Or you could continue with DecodeString, but then immediately decode the UTF-8 text to a WideString. Like this:
{$APPTYPE CONSOLE}
uses
Classes,
EncdDecd;
const
Data = 'PD94bWwgdmVyc2lvbj0iMS4wIj8+DQo8c3RyaW5nPtiq2LPYqjwvc3RyaW5nPg==';
var
Utf8: AnsiString;
wstr: WideString;
begin
Utf8 := DecodeString(Data);
wstr := UTF8Decode(Utf8);
end.
If the content of the file can be represented in your application's prevailing ANSI locale then you can convert that WideString to a plain AnsiString.
var
wstr: WideString;
str: string; // alias to AnsiString
....
wstr := ... // as before
str := wstr;
However, I really don't think that using ANSI encoded text is going to lead to a very fruitful programming life. I encourage you to embrace Unicode solutions.
Judging by the content of the decoded data, it is XML. Which is usually handed to an XML parser. Most XML parsers will accept UTF-8 encoded data, so you quite probably can base64 decode to a memory stream using DecodeStream and then hand that stream off to your XML parser. That way you don't need to decode the UTF-8 to text and can let the XML parser deal with that aspect.
As an addendum to David Heffernan's awesome answer, and Remy Lebeau's note on how it's broken on Delphi 7, I would like to add a function that will help any developer stuck on Delphi 7.
Since UTF8Decode() is broken in Delphi 7, I found a function in a forum that solved my problem:
function UTF8ToWideString(const S: AnsiString): WideString;
var
BufSize: Integer;
begin
Result := '';
if Length(S) = 0 then Exit;
BufSize := MultiByteToWideChar(CP_UTF8, 0, PAnsiChar(S), Length(S), nil, 0);
SetLength(result, BufSize);
MultiByteToWideChar(CP_UTF8, 0, PANsiChar(S), Length(S), PWideChar(Result), BufSize);
end;
So now, you can use DecodeString, and then decode the UTF-8 text to a WideString using this function:
begin
Utf8 := DecodeString(Data);
wstr := UTF8ToWideString(Utf8);
end.
i have the following code snippit that won't compile:
procedure Frob(const Grob: WideString);
var
s: WideString;
begin
s :=
Grob[7]+Grob[8]+Grob[5]+Grob[6]+Grob[3]+Grob[4]+Grob[1]+Grob[2];
...
end;
Delphi5 complains Incompatible types.
i tried simplifying it down to:
s := Grob[7];
which works, and:
s := Grob[7]+Grob[8];
which does not.
i can only assume that WideString[index] does not return a WideChar.
i tried forcing things to be WideChars:
s := WideChar(Grob[7])+WideChar(Grob[8]);
But that also fails:
Incompatible types
Footnotes
5: Delphi 5
The easier, and faster, in your case, is the following code:
procedure Frob(const Grob: WideString);
var
s: WideString;
begin
SetLength(s,8);
s[1] := Grob[7];
s[2] := Grob[8];
s[3] := Grob[5];
s[4] := Grob[6];
s[5] := Grob[3];
s[6] := Grob[4];
s[7] := Grob[1];
s[8] := Grob[2];
...
end;
Using a WideString(Grob[7])+WideString(Grob[8]) expression will work (it circumvent the Delphi 5 bug by which you can't make a WideString from a concatenation of WideChars), but is much slower.
Creation of a WideString is very slow: it does not use the Delphi memory allocator, but the BSTR memory allocator supplied by Windows (for OLE), which is damn slow.
Grob[7] is a WideChar; that's not the issue.
The issue seems to be that the + operator cannot act on wide chars. But it can act on wide strings, and any wide char can be cast to a wide string:
S := WideString(Grob[7]) + WideString(Grob[8]);
As Geoff pointed out my other question dealing with WideString weirdness in Delphi, i randomly tried my solution from there:
procedure Frob(const Grob: WideString);
var
s: WideString;
const
n: WideString = ''; //n=nothing
begin
s :=
n+Grob[7]+Grob[8]+Grob[5]+Grob[6]+Grob[3]+Grob[4]+Grob[1]+Grob[2];
end;
And it works. Delphi is confused about what type a WideString[index] in, so i have to beat it over the head.
I created the following code:
Function AnsiStringToStream(Const AString: AnsiString): TStream;
Begin
Result := TStringStream.Create(AString, TEncoding.ANSI);
End;
But I'm "W1057 Implicit string cast from 'AnsiString' to 'string'"
There is something wrong with him?
Thank you.
The TStringStream constructor expects a string as its parameter. When you give it an an AnsiString instead, the compiler has to insert conversion code, and the fact that you've specified the TEncoding.ANSI doesn't change that.
Try it like this instead:
Function AnsiStringToStream(Const AString: AnsiString): TStream;
Begin
Result := TStringStream.Create(string(AString));
End;
This uses an explicit conversion, and leaves the encoding-related work up to the compiler, which already knows how to take care of it.
In D2009+, TStringStream expects a UnicodeString, not an AnsiString. If you just want to write the contents of the AnsiString as-is without having to convert the data to Unicode and then back to Ansi, use TMemoryStream instead:
function AnsiStringToStream(const AString: AnsiString): TStream;
begin
Result := TMemoryStream.Create;
Result.Write(PAnsiChar(AString)^, Length(AString));
Result.Position := 0;
end;
Since AnsiString is codepage-aware in D2009+, ANY string that is passed to your function will be forced to the OS default Ansi encoding. If you want to be able to pass any 8-bit string type, such as UTF8String, without converting the data at all, use RawByteString instead of AnsiString:
function AnsiStringToStream(const AString: RawByteString): TStream;
begin
Result := TMemoryStream.Create;
Result.Write(PAnsiChar(AString)^, Length(AString));
Result.Position := 0;
end;