Handling of Unicode Characters using Delphi 6 - delphi

I have a polling application developed in Delphi 6.
It reads a file, parse the file according to specification, performs validation and uploads into database (SQL Server 2008 Express Edition)
We had to provide support for Operating Systems having Double Byte Character Sets (DBCS) e.g. Japanese OS.
So, we changed the database fields in SQL Server from varchar to nvarchar.
Polling works fine in Operating Systems with DBCS. It also works successfully for non-DBCS Operating systems, if the
System Locale is set to Japanese/Chinese/Korean and Operating system has the respective language pack.
But, if the Locale is set to english then, the database contains junk characters for the double byte characters.
I performed a few tests but failed to identify the solution.
e.g. If I read from a UTF-8 file using a TStringList and save it to another file then, the Unicode data is saved.
But, if I use the contents of the file to run an update query using TADOQuery component then, the junk characters are shown.
The database also contains the junk characters.
PFB the sample code:
var
stlTemp : TStringList;
qry : TADOQuery;
stQuery : string;
begin
stlTemp := TStringList.Create;
qry := TADOQuery.Create(nil);
stlTemp.LoadFromFile('D:\DelphiUnicode\unicode.txt');
//stlTemp.SaveToFile('D:\DelphiUnicode\1.txt'); // This works. Even though
//the stlTemp.Strings[0] contains junk characters if seen in watch
stQuery := 'UPDATE dbo.receivers SET company = ' + QuotedStr(stlTemp.Strings[0]) +
' WHERE receiver_cd = N' + QuotedStr('Receiver');
//company is a nvarchar field in the database
qry.Connection := ADOConnection1;
with qry do
begin
Close;
SQL.Clear;
SQL.Add(stQuery);
ExecSQL;
end;
qry.Free;
stlTemp.Free
end;
The above code works fine in a DBCS Operating system.
I have tried playing with string,widestring and UTF8String. But, this does not work in English OS if the locale is set to English.
Please provide any pointers for this issue.

In non Unicode Delphi version, The basics are that you need to work with WideStrings (Unicode) instead of Strings (Ansi).
Forget about TADOQuery.SQL (TStrings), and work with TADODataSet.CommandText or TADOCommand.CommandText(WideString) or typecast TADOQuery as TADODataSet. e.g:
stlTemp: TWideStringList; // <- Unicode strings - TNT or other Unicode lib
qry: TADOQuery;
stQuery: WideString; // <- Unicode string
TADODataSet(qry).CommandText := stQuery;
RowsAffected := qry.ExecSQL;
You can also use TADOConnection.Execute(stQuery) to execute queries directly.
Be extra careful with Parametrized queries: ADODB.TParameters.ParseSQL is Ansi. If ParamCheck is true (by default) TADOCommand.SetCommandText->AssignCommandText will cause
problems if your Query is Unicode (InitParameters is Ansi).
(note that you can use ADO Command.Parameters directly - using ? chars as placeholder for the parameter instead of Delphi's convention :param_name).
QuotedStr returns Ansi string. You need a Wide version of this function (TNT)
Also, As #Arioch 'The mentioned TNT Unicode Controls suite is your best fried for making Delphi Unicode application.
It has all the controls and classes you need to successfully manage Unicode tasks in your application.
In short, you need to think Wide :)

You did not specified database server, so this investigation remains on our part. You should check how does your database server support Unicode. That means how to specify Unicode charset for the database and the tables/column/indices/collations/etc inside it. You have to ensure that the whole DB is pervasively Unicode-enabled in every its detail, to avoid data loss.
Generally you also should check that your database connection (using database access library of choice) also is unicode-enabled. Generally Microsoft ADO, just like and OLE, should be Unicode-enabled. But still check your database server manual how to specify unicode codepage or charset in the connection string. non-Unicode connection may also result in data loss.
When you tell you read some unicode file - it is ambiguous. What ius unicode file ? Is it UTF-8 ? Or one of four flavours of UTF-16 ? Or UTF-7 ? Or some other Unicode Transportation Format ? Usual windows WideChar roughly corresponds to legacy UCS-2 and is expected be BOM-stripped Intel-Endian flavour of UTF-16. http://msdn.microsoft.com/en-us/library/windows/desktop/ms221069.aspx
If the file is surely that flavour of UTF-16, then you can load it using Delphi TWideStringList or Jedi CodeLibrary TJclWideStringList. Review you code that you never work with your data using string variables - use WideString everywhere to avoid data loss.
Since D6 was one of buggiest releases, i'd prefer to ensure EVERY update to Delphi is installed and then install and use JCL. JCL also provides codepage transition functions, that might be more flexible than plain AnsiStringVar := WideStringVar approach.
For UTF-8 file, it can be loaded by TWideStringList class of JCL (but not TJclWideStringList).
When debugging, load lines of the list to WideString variable and see that their content is preserved.
Don't write queries like that. See http://bobby-tables.com/ Even if you do not expect malicious cracker - you can yourself make errors or meat unexpected data. Use parametrized queries, everywhere, every time! EVER!
See the example of such: http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/ADODB_TADOQuery_Parameters.html
Check that every SQL VARCHAR parameter would be ftWideString to contain Unicode, not ftString. Check the same about fields(columns).
Think if legacy technologies can be casted aside since their support would only get harder in time.
7.1. Since Microsoft ADO is deprecated (for exampel newer versions of Microsoft SQL Server would not support it), consider switching to 'live' data access libraries. Like AnyDAC, UniDAC, ZeosDB or some other library. Torry.net may hint you some.
7.2. Since Delphi 6 RTL and VCL is not Unicode-ready, consider migrating your application to TNT Unicode Components, if you'd manage to find their free version or purchase them. Or migrating to newer Delphi releases.
7.3. Since Delphi 6 is very old and long not-supported and since it was one of buggiest Delphi releases, consider migrating to newer Delphi versions or free tools like CodeTyphoon or Lazarus. As a bonus, Lazarus started moving to Unicode in its recent beta builds, and it is possible that by the end of migration to it you would get you application unicode-ready.
7.4 Migration might be excuse and stimulus for re-factoring your application and getting rid of legacy spaghetti.

Related

IBStoredProc does not commit insert if returns data?

I have a stored procedure which insert/update, and then returns result.
create or alter procedure sp_update_system_sticker (
i_sticker_id integer,
i_file_name file_name_type,
i_sticker_name item_name_type,
i_group_id integer)
returns (
o_result integer)
as
begin
update
system_stickers s
set
s.file_name = :i_file_name,
s.name = :i_sticker_name,
s.fk_stickers_groups = :i_group_id
where
s.id = :i_sticker_id;
o_result = -1;
suspend;
end
I am setting it in Delphi in a IBStoredProc, and execute it as follow:
procedure TDataModule_.updateSystemSticker(stickerId, groupId: integer;
stickerName, fileName: String);
var
r : Integer;
begin
with IBStoredProc_UpdateSystemSticker do
begin
Transaction.Active := true;
ParamByName( 'I_STICKER_ID' ).AsInteger := stickerId;
ParamByName( 'I_GROUP_ID' ).AsInteger := groupID;
ParamByName( 'I_STICKER_NAME' ).AsString := stickerName;
ParamByName( 'I_FILE_NAME' ).AsString := fileName;
ExecProc;
Transaction.Commit;
end;
end;
Anyway it does not commit the result into the database.
If I remove the returns - it start to commit.
How to execute and commit properly stored procedure with IBStoreProc which returns results ?
The problem is the presence of SUSPEND. This makes your stored procedure a selectable procedure, and not an executable procedure. When you use a selectable procedure, then all work done since the previous fetched row will be undone when the cursor is closed (which happens on commit). If you fetched nothing, this means that it is as if no work was performed by the stored procedure*.
In other words, you need to remove the SUSPEND (an executable stored procedure outputs a single row immediately on execute without having to wait for a fetch).
I don't program Delphi, so I can't comment on the specifics of getting results in Delphi.
*: Recent versions of Firebird can prefetch rows, so this might not be entirely accurate
Since you made it a "selectable stored procedure" by adding SUSPEND PSQL statement - just do a select from it.
Use regular TIBQuery instead of TIBStoredProc with a command like
select * from StoredProcedureName( :input_param1, :input_param2, :input_param3 )
I would not recommended to use IBExpress to directly call stored procedures in Firebird. Interbase turned up to have a bug in stored procedures execution, AFAIR something wrong with errors handling. To counter it IBX team added an intentional bug of executing SPs twice (under some conditions), which usually (not always) neutralized the Interbase bug. When Firebird team fixed the server bug - this IBX counter-bug started breaking data. IBX team refused to revert to normal behavior for Firebird databases as Firebird was considered competitor to Interbase.
Equally, IBX would not support Firebird-specific changes made after Interbase 6.x/Firebird 0.9 split. For example:
client DLL name change to avoid collision: it became fbclient.dll or fbembed.dll from gds32.dll, however IBX only supports the legacy name. It is hardcoded and can not be changed. If you have IBX sources you may patch it and recompile the library - but why bother?
Firebird's new datatypes like 64-bit integer and boolean. Again, if you have IBX sources...
Firebird's new APIs are not supported.
That said, there is an IBX add-on library by Dmitry Loginov, IBX FB Utils, and it has a number of rather comfy wrappers, on top of IBX. It alone might be a good pro-IBX argument.
FPC (FreePascal) folks started IBX fork they named IBX2, which would hopefully have first-class support for Firebird. I do not know about quality and development speed of it, but i suspect it might appear an easiest migration route out of IBX, or not.
Personally for Firebird-centric Delphi projects i prefer opensource UIB (Unified Interbase) library. However
Being "lean thin API wrapper" it is not TDataSet derived, albeit having a read-only TDataSet wrapper and trying to keep API closely resembling one of TDataSet.
being Henri's Delphi project it has little documentation (tests and examples mostly) and is abandoned by the author (albeit some other guy was adding patches later)
it has neat features like SQL scripter component (but you might need to extend it to support all Firebird new SQL commands, at least i did it to support FB2's MERGE) and for RECORD in SQLQuery do... loop (albeit you can extract it and make it into a separate add-on over any your DB library of choice)

Delphi: ADOConnection, DBASE3 and character set (bug?)

Delphi XE3, Win7 Prof.
I need to write into DBASE 3 (old format) files to export data for a DOS-like application (Clipper?).
Ok, I thought: MS DBASE driver can do this.
But I have problem with hungarian accents.
I tried this connection string:
Driver={Microsoft dBASE Driver (*.dbf)};DriverID=21;Dbq=c:\temp;Extended Properties=dBASE III;charSet=CP 852;Locale Identifier=1038;Character Set=CP 852;CODEPAGE=852
As I saw it cannot write only ANSI files (the DOS app accepts CP852 chars).
I tried to convert the content with AnsiToOEM, but some characters lost on save. In the record I see good content, but the saved file contains wrong accents.
The test text is "árvíztűrő tükörfúrógép".
The "í", "ó", "Ó" is missing from the result.
And I found some strange thing!
If the main form have an opened ADOConnection (the connected property is true in the DFM) then I will read good characters from the DBASE files, and I can write them into the file - the ANSI characters will be converted correctly. "í" is ok, "ó" is ok.
This ADOConnection object could be different than the reader.
If I close this ADOConnection in IDE mode, the opened files won't be converted, so I will see some strange accented chars, and I won't write good text into the file.
It is strange, because if I open this connection on FormCreate by code, the problem will appear...
I can read and write the ADOQuery records if the resource streamer read the ADOConnection's active (True value) "connected" property from the DFM!
I don't know what happened in the background, and how to force this ADO character transformation routine to work, but I wasted more days to find a working DBASE III exporter, and I have found only a buglike thing...
Does anyone know what is this? Why the ADO character encoder/decoder works only if I had a connected ADOConnection in DFM?
Or how I can use ADODB.Connection instead of ADOConnection object to avoid this side effect?
Thanks for every idea!
As I see I need to set the code page to fix the string for ADO.
var
s: string;
aStr1, aStr2: AnsiString;
begin
...
s := 'árvíztûrõ tükörfúrógép';
aStr1 := s;
SetLength(aStr2, Length(aStr1));
AnsiToOemBuff(PAnsiChar(aStr1), PAnsiChar(aStr2), Length(aStr1));
SetCodePage(RawbyteString(aStr2), 852, False); // THIS IS THE SOLUTION
ADOQuery1.FieldBYName('name').AsAnsiString := aStr2;
Otherwise something is converting my AnsiString again in the background.

Delphi 7 calling DelphiXE2 dll getting corrupt widestrings

I have a Delphi 7 application that needs to call a SOAP API that is much too new for the available SOAP importers. I have satisfied myself that D7 can't call the SOAP API without too much effort to be worth while. But I also have Delphi XE2, and that can import the SOAP and call it quite happily. So I have written a simple dll wrapper in XE2 that exposes the necessary parts of the soap interface. I can call the dll from an XE program.
In Delphi7 I took the SOAP API import file from XE, stripped out the {$SCOPED_ENUMS ON} defines and the initialization section that calls unavailable SOAP wrappers, plus changed string to widestring throughout. That compiles. I'm using FastMM with ShareMM enabled to make string passing work and avoid making everything stdcall.
The reason I'm trying to do it this way is that if it works it will make the SOAP shim very easy to code and maintain, since 90% of the code is generated by the XE2 SOAP importer, and it will mean that when we move the D7 app to a modern Delphi the code will remain largely unchanged.
But when I run it, I get weird strings (and consequent access violations). I've got simple functions that don't use the SOAP code to make the problem more obvious.
Passing a widestring from Delphi7 exe into DelphiXE2 dll the string length is doubled (according to the Length() function), but there's no matching data conversion. So a widestring "123" in D7 becomes "1234...." in XE2, where the .... is whatever garbage happens to be on the stack. Viewed as byte arrays both have half zero bytes as expect.
Passing a widestring back from XE2 dll to D7 I get the mirror effect - the string length is halved and strings are simply truncated ("1234" becomes "12").
I'm pasting code in because I know you will ask for it.
In Delphi XE2 I'm exporting these functions:
// testing
function GetString(s:string):string; export;
function AddToString(s:string):string; export;
implementation
function GetString(s:string):string;
begin
Result := '0987654321';
end;
function AddToString(s:string):string;
begin
Result := s + '| ' + IntToStr(length(s)) + ' there is more';
end;
In Delphi 7:
function GetString(s:widestring):widestring; external 'SMSShim.dll';
function AddToString(s:widestring):widestring; external 'SMSShim.dll';
procedure TForm1.btnTestGetClick(Sender: TObject);
var
s: widestring;
begin
s := widestring('1234');
Memo1.Lines.Add(' GetString: ' + GetString(s));
end;
procedure TForm1.btnTestAddClick(Sender: TObject);
var
s: widestring;
begin
s := widestring('1234567890');
Memo1.Lines.Add(' AddToString: ' + AddToString('1234567890'));
end;
I can run from either side, using the D7 executable as the host app to debug the dll. Inspecting the parameters and return values in the debugger gives the results above.
Annoyingly, if I declare the imports in delphi7 as strings I get the correct length but invalid data. Declaring as shown I get valid data, wrong lengths, and access violations when I try to return.
Making it all stdcall doesn't change the behaviour.
The obvious solution is the just write simple wrapper functions that expose exactly the functionality I need right now. I can do that, but I'd prefer the above cunning way.
The DLL in question exports functions that expect to receive UnicodeString parameters. (As you know, the string type became an alias for UnicodeString in Delphi 2009.) A Delphi 7 application cannot consume that DLL; the run-time library doesn't not know how to operate on that type because it didn't exist back in 2002 when Delphi 7 was published.
Although the character size for UnicodeString is compatible with WideString, they are not the same types. UnicodeString is structured like the new AnsiString, so it has a length field, a reference count, a character size, and a code page. WideString has a length field, but any other metadata it carries is undocumented. WideString is simply Delphi's way of exposing the COM BSTR type.
A general rule to live by is to never export DLL functions that couldn't be consumed by C.1 In particular, this means using only C-compatible types for any function parameters and return types, so string is out, but WideString is safe because of its BSTR roots.
Change the DLL to use WideString for its parameters instead of string.
1 Maintaining C compatibility also means using calling conventions that C supports. Delphi's default register calling convention is not supported in Microsoft C, so use cdecl or stdcall instead, just like you've seen in every Windows DLL you've ever used.
There's not way to disable the UNICODE in Delphi XE2 (or any version greater than 2009) , however there are many resources that can help you to migrate your application.
White Paper: Delphi and Unicode (from Marco Cantù)
Delphi Conversion Unicode Issues
"Globalizing your Delphi applications" - Delphi Unicode Resources
Compilation of resources for migrate to Delphi 2009/2010 Unicode

HttpGetText(), autodetect charset, and convert source to UTF8

I'm using HttpGetText with Synapse for Delphi 7 Professional to get the source of a web page - but feel free to recommend any component and code.
The goal is to save some time by 'unifying' non-ASCII characters to a single charset, so I can process it with the same Delphi code.
So I'm looking for something similar to "Select All and Convert To UTF without BOM in Notepad++", if you know what I mean. ANSI instead of UTF8 would also be okay.
Webpages are encoded in 3 charsets: UTF8, "ISO-8859-1=Win 1252=ANSI" and straight up the alley HTML4 without charset spec, ie. htmlencoded Å type characters in the content.
If I need to code a PHP page that does the conversion, that's fine too. Whatever is the least code / time.
When you retreive a webpage, its Content-Type header (or sometimes a <meta> tag inside the HTML itself) tells you which charset is being used for the data. You would decode the data to Unicode using that charset, then you can encode the Unicode to whatever you need for your processing.
I instead did the reverse conversion directly after retrieving the HTML using GpTextStream. Making the documents conform to ISO-8859-1 made them processable using straight up Delphi, which saved quite a bit of code changes. On output all the data was converted to UTF-8 :)
Here's some code. Perhaps not the prettiest solution but it certainly got the job done in less time. Note that this is for the reverse conversion.
procedure UTF8FileTo88591(fileName: string);
const bufsize=1024*1024;
var
fs1,fs2: TFileStream;
ts1,ts2: TGpTextStream;
buf:PChar;
siz:integer;
procedure LG2(ss:string);
begin
//dont log for now.
end;
begin
fs1 := TFileStream.Create(fileName,fmOpenRead);
fs2 := TFileStream.Create(fileName+'_ISO88591.txt',fmCreate);
//compatible enough for my purposes with default 'Windows/Notepad' CP 1252 ANSI and Swe ANSI codepage, Latin1 etc.
//also works for ASCII sources with htmlencoded accent chars, naturally
try
LG2('Files opened OK.');
GetMem(buf,bufsize);
ts1 := TGpTextStream.Create(fs1,tsaccRead,[],CP_UTF8);
ts2 := TGpTextStream.Create(fs2,tsaccWrite,[],ISO_8859_1);
try
siz:=ts1.Read(buf^,bufsize);
LG2(inttostr(siz)+' bytes read.');
if siz>0 then ts2.Write(buf^,siz);
finally
LG2('Bytes read and written OK.');
FreeAndNil(ts1);FreeAndNil(ts2);end;
finally FreeAndNil(fs1);FreeAndNil(fs2);FreeMem(buf);
LG2('Everything freed OK.');
end;
end; // UTF8FileTo88591

How can a text file be converted from ANSI to UTF-8 with Delphi 7?

I written a program with Delphi 7 which searches *.srt files on a hard drive. This program lists the path and name of these files in a memo. Now I need convert these files from ANSI to UTF-8, but I haven't succeeded.
The Utf8Encode function takes a WideString string as parameter and returns a Utf-8 string.
Sample:
procedure ConvertANSIFileToUTF8File(AInputFileName, AOutputFileName: TFileName);
var
Strings: TStrings;
begin
Strings := TStringList.Create;
try
Strings.LoadFromFile(AInputFileName);
Strings.Text := UTF8Encode(Strings.Text);
Strings.SaveToFile(AOutputFileName);
finally
Strings.Free;
end;
end;
Take a look at GpTextStream which looks like it works with Delphi 7. It has the ability to read/write unicode files in older versions of Delphi (although does work with Delphi 2009) and should help with your conversion.
var
Latin1Encoding: TEncoding;
begin
Latin1Encoding := TEncoding.GetEncoding(28591);
try
MyTStringList.SaveToFile('some file.txt', Latin1Encoding);
finally
Latin1Encoding.Free;
end;
end;
Please read the whole answer before you start coding.
The proper answer to question - and it is not the easy one - basically consist of tree steps:
You have to determine the ANSI code page used on your computer. You can achieve this goal by using the GetACP() function from Windows API. (Important: you have to retrieve the codepage as soon as possible after the file name retrieval, because it can be changed by the user.)
You must convert your ANSI string to Unicode by calling MultiByteToWideChar() Windows API function with the correct CodePage parameter (retrieved in the previous step). After this step you have an UTF-16 string (practically a WideString) containing the file name list.
You have to convert the Unicode string to UTF-8 using UTF8Encode() or the WideCharToMultiByte() Windows API. This function will return an UTF-8 string you needed.
However this solution will return an UTF-8 string containing the input ANSI string, this probably is not the best way to solve your problems, since the file names may already be corrupted when the ANSI functions returned them, so proper file names are not guaranteed.
The proper solution to your problem is ways more complicated:
If you want to be sure that your file name list is exactly clean, you have to make sure it won't get converted to ANSI at all. You can do this by explicitly using the "W" version of the file handling API's. In this case - of course - you can not use TFileStream and other ANSI file handling objects, but the Windows API calls directly.
It is not that hard, but if you already have a complex framework built on e.g. TFileStream it could be a bit of a pain in the #ss. In this case the best solution is to create a TStream descendant that uses the appropriate API's.
I hope my answer helps you or anyone who has to deal with the same problem. (I had to not so long ago.)
I did only this:
procedure TForm1.FormCreate(Sender: TObject);
begin
Strings := TStringList.Create;
end;
procedure TForm1.Button3Click(Sender: TObject);
begin
Strings.Text := UTF8Encode(Memo1.Text);
Strings.SaveToFile('new.txt');
end;
Verified with Notepad++ UTF8 without BOM
Did you mean ASCII?
ASCII is backwards compatible with UTF-8.
http://en.wikipedia.org/wiki/UTF-8

Resources