Open an ANSI file and Save a a Unicode file using Delphi - delphi

For some reason, lately the *.UDL files on many of my client systems are no longer compatible as they were once saved as ANSI files, which is no longer compatible with the expected UNICODE file format. The end result is an error dialog which states "the file is not a valid compound file".
What is the easiest way to programatically open these files and save as a unicode file? I know I can do this by opening each one in notepad and then saving as the same file but with the "unicode" selected in the encoding section of the save as dialog, but I need to do this in the program to cut down on support calls.
This problem is very easy to duplicate, just create a *.txt file in a directory, rename it to *.UDL, then edit it using the microsoft editor. Then open it in notepad and save as the file as an ANSI encoded file. Try to open the udl from the udl editor and it will tell you its corrupt. then save it (using notepad) as a Unicode encoded file and it will open again properly.

Ok, using delphi 2009, I was able to come up with the following code which appears to work, but is it the proper way of doing this conversion?
var
sl : TStrings;
FileName : string;
begin
FileName := fServerDir+'configuration\hdconfig4.udl';
sl := TStringList.Create;
try
sl.LoadFromFile(FileName, TEncoding.Default);
sl.SaveToFile(FileName, TEncoding.Unicode);
finally
sl.Free;
end;
end;

This is very simple to do with my TGpTextFile unit. I'll put together a short sample and post it here.
It should also be very simple with the new Delphi 2009 - are you maybe using it?
EDIT: This his how you can do it using my stuff in pre-2009 Delphis.
var
strAnsi : TGpTextFile;
strUnicode: TGpTextFile;
begin
strAnsi := TGpTextFile.Create('c:\0\test.udl');
try
strAnsi.Reset; // you can also specify non-default 8-bit codepage here
strUnicode := TGpTextFile.Create('c:\0\test-out.udl');
try
strUnicode.Rewrite([cfUnicode]);
while not strAnsi.Eof do
strUnicode.Writeln(strAnsi.Readln);
finally FreeAndNil(strUnicode); end;
finally FreeAndNil(strAnsi); end;
end;
License: The code fragment above belongs to public domain. Use it anyway you like.

Related

Delphi TZipFile extracts zero byte files

Using Delphi XE2 and the native TZipFile I attempt to extract the contents of a downloaded zip file (which contains 2 zipped XML files) and it always extracts zero byte files.
The file is being compressed by C# code like this:
var zipFile = new ZipFile();
foreach (Tuple<string, string> t in filesMeta) {
zipFile.AddFile(string.Format("{0}{1}", StaticVariables.WebServerFileStorage, t.Item2), "").FileName = t.Item1 + ".xml";
}
response.Clear();
response.ContentType = "application/zip";
zipFile.Save(response.OutputStream);
response.End();
The Delphi extraction code is this:
zipFile := TZipFile.Create;
try
filename := 'C:\test\57f52480-ec87-4169-a820-0a65bc4ad952.zip';
if zipFile.IsValid(filename) then begin
zipFile.Open(filename, zmRead);
zipFile.ExtractAll('C:\test\');
end;
finally
zipFile.Free;
end;
I even tried using a TStream as the source instead of a file on disk. That's actually what I want to do since the zip file is downloaded from a web server into a TStream. I tried to extract the data using the overloaded open method of TZipFile to open the stream.
That got me zero byte files so I saved the zip file to disk and tried to open the file from disk and extract. Same zero byte files are extracted.
I even tried using the class method to extract the files from the zip file on disk:
System.Zip.TZipFile.ExtractZipFile(filename, 'C:\Test\');
Same zero byte files extracted.
The zip file is valid and the 2 zipped XML files can be extracted properly by both Windows 7 native file handling and 7-Zip.
Now here is something nutty...
In desperation I tried to see what the ExtractToFile() procedure
David Heffernan came up with in this question about extracting a zip to a stream would do so I tried using it like this:
var x : integer;
var fileCount : integer;
var fileNames : TArray<string>;
begin
zipFile := TZipFile.Create;
try
filename := 'C:\test\57f52480-ec87-4169-a820-0a65bc4ad952.zip';
if zipFile.IsValid(filename) then begin
zipFile.Open(filename, zmRead);
fileCount := zipFile.FileCount;
fileNames := copy(zipFile.FileNames, 0, MaxInt);
zipFile.Close;
for x := 0 to fileCount-1 do begin
// Use David Heffernan's stream procedure
ExtractToFile(filename, x, 'C:\test\' + fileNames[x]);
end;
end;
finally
zipFile.Free;
end;
end;
And David's procedure extracts the files to disk as expected! WTF???
I am really confused why a convoluted method of extraction would work and the simple extraction method would not work. I'll use David's example if I have to but I'd prefer to get the normal extract working if possible.
Any ideas are appreciated.
Cheers!
TJ
I had this same problem.
The source code to TZipFile shows that the TStream passed into the Read function returns the entire zip file with position set to the start of the filename you're wanting. So don't rewind. Just copyfrom or do what you want with the TStream for the uncompressed length given in the TZipHeader.
ZipStream := TStream.Create;
ZipFile.Read(MyFileName, ZipStream, ZipHeader);
//leave ZipStream pointer where it is!!!
SomethingElse.LoadFromStream(ZipStream, ZipHeader.UncompressedSize);
ZipStream.Free;
In my opinion, TZipFile should really load the ZipStream with only what is requested. The way this is implemented is not intuitive without first going through the TZipFile source code.
TL;DR: The solution to my problem was an external component. Zip Forge (or Abbrevia)
Read on for details.
Nothing I tried except for the roundabout way of saving the file and re-opening it using David's function worked. While that would have worked, it was not optimal as it required me to save the downloaded file to disk first and reopen it for extract and then delete the zip file. My goal was to just open the downloaded stream and extract the files directly to disk. One write to disk and no temporary file.
We even tried two different C# libraries to zip the files and both gave the same results on the streamed data. The Delphi TZipFile component could not handle it.
It turns out we have a license to ZipForge which I had forgotten about since I had not used it in ages it and that handles the download stream from the C# web server and extracts the files successfully.
For reference, I also tried the Abbrevia component version 5.2 and that also successfully extracted the files from the stream.
Hopefully this will help someone else.
All the suggestions by David and Uwe were appreciated.
Cheers!
TJ

Exception with German Umlaut characters in TMemIniFile.Create

I have an .URL file which contains the following text which contains a German Umlaut character:
[InternetShortcut]
URL=http://edn.embarcadero.com/article/44358
[MyApp]
Notes=Special Test geändert
Icon=default
Title=Bug fix list for RAD Studio XE8
I try to load the text with TMemIniFile:
uses System.IniFiles;
//
procedure TForm1.Button1Click(Sender: TObject);
var
BookmarkIni: TMemIniFile;
begin
// The error occurs here:
BookmarkIni := TMemIniFile.Create('F:\Bug fix list for RAD Studio XE8.url',
TEncoding.UTF8);
try
// Some code here
finally
BookmarkIni.Free;
end;
end;
This is the error message text from the debugger:
Project MyApp.exe raised exception class EEncodingError with message
'No mapping for the Unicode character exists in the target multi-byte
code page'.
When I remove the word with the German Umlaut character "geändert" from the .URL file then there is NO error.
But that's why I use TMemIniFile, because TIniFile does not work here when the text in the .URL file contains Unicode characters. (There could also be other Unicode characters in the .URL file).
So why I get an exception here in TMemIniFile.Create?
EDIT: Found the culprit: The .URL file is in ANSI format. The error does not happen when the .URL file is in UTF-8 format. But what can I do when the file is in ANSI format?
EDIT2: I've created a workaround which does work BOTH with ANSI and UTF-8 files:
procedure TForm1.Button1Click(Sender: TObject);
var
BookmarkIni: TMemIniFile;
BookmarkIni_: TIniFile;
ThisFileIsAnsi: Boolean;
begin
try
ThisFileIsAnsi := False;
BookmarkIni := TMemIniFile.Create('F:\Bug fix list for RAD Studio XE8.url',
TEncoding.UTF8);
except
BookmarkIni_ := TIniFile.Create('F:\Bug fix list for RAD Studio XE8.url');
ThisFileIsAnsi := True;
end;
try
// Some code here
finally
if ThisFileIsAnsi then
BookmarkIni_.Free
else
BookmarkIni.Free;
end;
end;
What do you think?
It is not possible, in general, to auto-detect the encoding of a file from its contents.
A clear demonstration of this is given by this article from Raymond Chen: The Notepad file encoding problem, redux. Raymond uses the example of a file containing these two bytes:
D0 AE
Raymond goes on to show that this is a well formed file with the following four encodings: ANSI 1252, UTF-8, UTF-16BE and UTF-16LE.
The take home lesson here is that you have to know the encoding of your file. Either agree it by convention with whoever writes the file. Or enforce the presence of a BOM.
You need to decide on what the encoding of the file is, once and for all. There's no fool proof way to auto-detect this, so you'll have to enforce it from your code that creates these files.
If the creation of this file is outside your control, then you are more or less out of luck. You can try to rely of the BOM (Byte-Order-Mark) at the beginning of the file (which should be there if it is a UTF-8 file). I can't see from the specification of the TMemIniFile what the CREATE constructor without an encoding parameter assumes about the encoding of the file (my guess is that it follows the BOM and if there's no such thing, it assumes ANSI, ie. system codepage).
One thing you can do - if you decide to stick to your current method - is to change your code to:
procedure TForm1.Button1Click(Sender: TObject);
var
BookmarkIni: TCustomIniFile;
begin
// The error occurs here:
try
BookmarkIni := TMemIniFile.Create('F:\Bug fix list for RAD Studio XE8.url',
TEncoding.UTF8);
except
BookmarkIni := TIniFile.Create('F:\Bug fix list for RAD Studio XE8.url');
end;
try
// Some code here
finally
BookmarkIni.Free;
end;
end;
You don't need two separate variables, as both TIniFile and TMemIniFile (as well as TRegistryIniFile) all have a common ancestor: TCustomIniFile. By declaring your variable as this common ancestor, you can instantiate (create) it as any of the class types that inherit from TCustomIniFile. The actual (run-time) type is determined depending on which construtcor you're calling to create.
But first, you should try to use
BookmarkIni := TMemIniFile.Create('F:\Bug fix list for RAD Studio XE8.url');
ie. without any encoding specified, and see if it works with both ANSI and UTF-8 files.
EDIT: Here's a test program to verify my claim made in the comments:
program Project21;
{$APPTYPE CONSOLE}
uses
IniFiles, System.SysUtils;
const
FileName = 'F:\Bug fix list for RAD Studio XE8.url';
var
TXT : TextFile;
procedure Test;
var
BookmarkIni: TCustomIniFile;
begin
try
BookmarkIni := TMemIniFile.Create(FileName,TEncoding.UTF8);
except
BookmarkIni := TIniFile.Create(FileName);
end;
try
Writeln(BookmarkIni.ReadString('MyApp','Notes','xxx'))
finally
BookmarkIni.Free;
end;
end;
begin
try
AssignFile(TXT,FileName); REWRITE(TXT);
try
WRITELN(TXT,'[InternetShortcut]');
WRITELN(TXT,'URL=http://edn.embarcadero.com/article/44358');
WRITELN(TXT,'[MyApp]');
WRITELN(TXT,'Notes=The German a umlaut consists of the following two ANSI characters: '#$C3#$A4);
WRITELN(TXT,'Icon=default');
WRITELN(TXT,'Title=Bug fix list for RAD Studio XE8');
finally
CloseFile(TXT)
end;
Test;
ReadLn
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
end.
The rule of thumb - to read data (file, stream whatever) correctly you must know the encoding! And the best solution is to let user to choose encoding or force one e.g. utf-8.
Moreover, the information ANSI does make things easier without code page.
A must read - The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Other approach is to try to detect encoding (like browsers do with sites if no encoding specified). Detecting UTF is relatively easy if BOM exists, but more often is omitted. Take a look Mozilla's universalchardet or chsdet.

How to read and write a PDF file using TStreamWriter and TStreamReader?

Here is my code:
procedure TForm1.Button2Click(Sender: TObject);
var
Reader: TStreamReader;
Writer: TStreamWriter;
begin
Reader := TStreamReader.Create('D:\Downloads\cover.pdf', TEncoding.UTF8, False);
try
Writer := TStreamWriter.Create('D:\Downloads\coverb.pdf', False, TEncoding.UTF8);
try
Writer.Write(Reader.ReadToEnd());
finally
Writer.Free;
ShowMessage('Berhasil');
end;
finally
Reader.Free();
end;
end;
Using the above code, Reader.ReadToEnd(), I got no string, and coverb.pdf is empty.
I'm using Delphi XE.
PDF files are generally compressed binary files and so cannot be read as UTF8. Doing so will lead to codec errors. Remember that not all sequences of bytes are valid UTF8 sequences.
It looks like you just need to call CopyFile instead of your complex stream based code, but perhaps this is just a cut down sample.
If the file is not empty but ReadToEnd() is returning an empty string, then the TEncoding object being used to decode the file bytes into Unicode is encountering conversion errors. The RTL does not raise an exception on string conversion errors. If all you want to do is make an exact copy of the file, use CopyFile(), or use TFileStream and the TStream.CopyFrom() method.
You can use Embarcadero's ReadAllText function. Like this:
Uses IOUtils;
TFile.ReadAllText(FileName);
It will correctly detect ANSI, Unicode and binary files.
Do you need to write one new PDF file manually? In this case you need to know the struct of format to PDF files. Use the ISO 32000-2 to format version 2.0, so is possible to build with binary streams your PDF file, but with sure if you use some ready components will be more easy...
Here is one example how to do manually: https://blogs.embarcadero.com/how-to-create-a-pdf-file-with-delphi-and-add-an-image-to-it/ (this example is very simple, so don't use compress inside file)
But I sugest lybraries as GDPicture or Gnostice...

CopyFile docx makes hidden conversion to doc

i got a Delphi 7 program which has to copy a docx file. I'm using the Windows API CopyFile function. The problem is that this function seems to make a hidden conversion to the older doc format.
First thing: the file size increases after the copy.
Second: When opening the file in Office 2007 i got an error message stating that: Check your permissions to the document or disk, check free disk space.
And than the strange thing: if i change in TotalCommander the extension of the copied file from docx to doc it opens normally. So it seems to make a hidden conversion dont know why.
Tested on two different computers. Both Win XP Prof SP3, Office 2007 Prof Plus SP2
Any ideas?
Function body is below:
function TDlgNowySzablon.PobierzPlikNaDoc() : string;
var
openDlg : TOpenDialog;
begin
Result:='';
openDlg:=TOpenDialog.Create(self);
openDlg.Filter:='Dokumenty Microsoft Word (*.doc;*docx)|*.doc;*.docx';
if openDlg.Execute then begin
Result := IObsSzab.GetTempFullFileName( ExtractFileExt(openDlg.FileName) );
if not CopyFile(PChar(openDlg.FileName),PChar(Result),true) then begin
Result:='';
end;
end;
openDlg.Free;
end;
Try changing your code as follows:
Result := IObsSzab.GetTempFullFileName('.tmp');
Result := ChangeFileExt(Result, ExtractFileExt(openDlg.FileName));
I think your GetTempFullFileName function is truncating .docx to .doc. It's all guesswork though!
The CopyFile function does not modify the contents of the file.

How can a text file be converted from ANSI to UTF-8 with Delphi 7?

I written a program with Delphi 7 which searches *.srt files on a hard drive. This program lists the path and name of these files in a memo. Now I need convert these files from ANSI to UTF-8, but I haven't succeeded.
The Utf8Encode function takes a WideString string as parameter and returns a Utf-8 string.
Sample:
procedure ConvertANSIFileToUTF8File(AInputFileName, AOutputFileName: TFileName);
var
Strings: TStrings;
begin
Strings := TStringList.Create;
try
Strings.LoadFromFile(AInputFileName);
Strings.Text := UTF8Encode(Strings.Text);
Strings.SaveToFile(AOutputFileName);
finally
Strings.Free;
end;
end;
Take a look at GpTextStream which looks like it works with Delphi 7. It has the ability to read/write unicode files in older versions of Delphi (although does work with Delphi 2009) and should help with your conversion.
var
Latin1Encoding: TEncoding;
begin
Latin1Encoding := TEncoding.GetEncoding(28591);
try
MyTStringList.SaveToFile('some file.txt', Latin1Encoding);
finally
Latin1Encoding.Free;
end;
end;
Please read the whole answer before you start coding.
The proper answer to question - and it is not the easy one - basically consist of tree steps:
You have to determine the ANSI code page used on your computer. You can achieve this goal by using the GetACP() function from Windows API. (Important: you have to retrieve the codepage as soon as possible after the file name retrieval, because it can be changed by the user.)
You must convert your ANSI string to Unicode by calling MultiByteToWideChar() Windows API function with the correct CodePage parameter (retrieved in the previous step). After this step you have an UTF-16 string (practically a WideString) containing the file name list.
You have to convert the Unicode string to UTF-8 using UTF8Encode() or the WideCharToMultiByte() Windows API. This function will return an UTF-8 string you needed.
However this solution will return an UTF-8 string containing the input ANSI string, this probably is not the best way to solve your problems, since the file names may already be corrupted when the ANSI functions returned them, so proper file names are not guaranteed.
The proper solution to your problem is ways more complicated:
If you want to be sure that your file name list is exactly clean, you have to make sure it won't get converted to ANSI at all. You can do this by explicitly using the "W" version of the file handling API's. In this case - of course - you can not use TFileStream and other ANSI file handling objects, but the Windows API calls directly.
It is not that hard, but if you already have a complex framework built on e.g. TFileStream it could be a bit of a pain in the #ss. In this case the best solution is to create a TStream descendant that uses the appropriate API's.
I hope my answer helps you or anyone who has to deal with the same problem. (I had to not so long ago.)
I did only this:
procedure TForm1.FormCreate(Sender: TObject);
begin
Strings := TStringList.Create;
end;
procedure TForm1.Button3Click(Sender: TObject);
begin
Strings.Text := UTF8Encode(Memo1.Text);
Strings.SaveToFile('new.txt');
end;
Verified with Notepad++ UTF8 without BOM
Did you mean ASCII?
ASCII is backwards compatible with UTF-8.
http://en.wikipedia.org/wiki/UTF-8

Resources