I read an UTF8-File, made with Winword, into a Tmemo, using the code below (tried all 2 methods). The file contains IPA pronunciation characters. For these characters, I see only squares. I tried different versions of tmemo.font.charset, but it did not help.
What can I do?
Peter
// OD is an TOpenDialog
procedure TForm1.Load1Click(Sender: TObject);
{
var fileH: textFile;
newLine: RawByteString;
begin
if od.execute (self.Handle) then begin
assignFile(fileH,od.filename);
reset(fileH);
while not eof(fileH) do begin
readln(fileH,newLine);
Memo1.lines.Add(UTF8toString(newLine));
end;
closeFile(fileH);
end;
end;
}
var
FileStream: tFileStream;
Preamble: TBytes;
memStream: TMemoryStream;
begin
if od.Execute then
begin
FileStream := TFileStream.Create(od.FileName,fmOpenRead or fmShareDenyWrite);
MemStream := TMemoryStream.Create;
Preamble := TEncoding.UTF8.GetPreamble;
memStream.Write(Preamble[0],length(Preamble));
memStream.CopyFrom(FileStream,FileStream.Size);
memStream.Seek(0,soFromBeginning);
memo1.Lines.LoadFromStream(memStream);
showmessage(SysErrorMessage(GetLastError));
FileStream.Free;
memStream.Free;
end;
end;
First, you are doing too much work. Your code can be simplified to this:
procedure TForm1.Load1Click(Sender: TObject);
begin
if od.Execute then
memo1.Lines.LoadFromFile(od.FileName, TEncoding.UTF8);
end;
Second, as David said, you need to use a font that supports the Unicode characters/glyphs that are stored in the file. It is not enough to set the Font.Charset, you have to set the Font.Name to a compatible font. Look at the fonts that loursonwinny mentioned.
For these characters, I see only squares.
The squares indicate that the font does not contain glyphs for those characters. You'll need to switch to a font that does. Assuming that your file has been properly encoded and that you are reading in the code points that you intend to.
You can pass TEncoding.UTF8 to the LoadFromFile method to avoid having to add a BOM to the content. Finally, don't call GetLastError unless the Win32 documentation says it has meaning. Where you call it, there is no reason to believe that the value has any meaning.
Related
I would like to retrieve the file size of a file copied into the clipboard.
I read the documentation of TClipboard but I did not find a solution.
I see that TClipboard.GetAsHandle could be of some help but I was not able to complete the task.
Just from inspecting the clipboard I could see at least 2 useful formats:
FileName (Ansi) and FileNameW (Unicode) which hold the file name copied to the clipboard.
So basically you could register one of then (or both) with RegisterClipboardFormat and then retrieve the information you need. e.g.
uses Clipbrd;
var
CF_FILE: UINT;
procedure TForm1.FormCreate(Sender: TObject);
begin
CF_FILE := RegisterClipboardFormat('FileName');
end;
function ClipboardGetAsFile: string;
var
Data: THandle;
begin
Clipboard.Open;
Data := GetClipboardData(CF_FILE);
try
if Data <> 0 then
Result := PChar(GlobalLock(Data)) else
Result := '';
finally
if Data <> 0 then GlobalUnlock(Data);
Clipboard.Close;
end;
end;
procedure TForm1.Button1Click(Sender: TObject);
begin
if Clipboard.HasFormat(CF_FILE) then
ShowMessage(ClipboardGetAsFile);
end;
Once you have the file name, just get it's size or other properties you want.
Note: The above was tested in Delphi 7. for Unicode versions of Delphi use the FileNameW format.
An alternative and more practical way (also useful for multiple files copied) is to register and handle the CF_HDROP format.
Here is an example in Delphi: How to paste files from Windows Explorer into your application
I have the following setup:
- Windows system language is English.
- I use Delphi 10.1 Berlin.
- In Windows Region & Language/Country set to Japan.
- Region/Administrative/Language for non-Unicode programs set to Japanese (Japan).
I have implemented communication client/server using strings.
Let's skip the question 'why not bytes' for now. I want to show the issue and find the reason why.
I write 2 things into TStringStream:
Header: which includes its size of Int64 (8 bytes), object size of Int64 (8 bytes) and class name: Header length - 2*SizeOf(Int64).
object (TComponent descendant)
procedure ComponentToStream(AComponent: TComponent; AStream: TStream; out HL,OL: Int64);
var
CN: TBytes;
MS1: TMemoryStream;
begin
MS1 := TMemoryStream.Create;
try
CN := TEncoding.Unicode.GetBytes(AComponent.ClassName);
SaveComponentToStream(MS1, AComponent);
OL := MS1.Size;
MS1.Position := 0;
HL := SizeOf(HL) + SizeOf(OL) + Length(CN);
AStream.Write(HL,SizeOf(HL));
AStream.Write(OL,SizeOf(OL));
AStream.Write(CN[0], Length(CN));
MS1.SaveToStream(AStream);
finally
FreeAndNil(MS1);
end;
end;
function PrepareDataBeforeSend(Component: TComponent): string;
var
HL, OL: Int64;
SS: TStringStream;
begin
SS := TStringStream.Create('', TEncoding.Unicode);
try
ComponentToStream(Component, SS, HL, OL);
Result := SS.DataString;
SS.SaveToFile('Orginal stream data.debug');
finally
FreeAndNil(SS);
end;
The result of this method saved in file here
click.
To verify the data I used code below right after calling of one above.
SS := TStringStream.Create({PrepareDataBeforeSend result}, TEncoding.Unicode);
SS.SaveToFile('New stream data.debug');
SS.Free;
Saved binary can be found here Click
And now 2 problems:
If I don't specify explicitly TEncoding.Unicode encoding in constructor of TStringStream, then TEncoding.Default will be used. But for Japanese code page it is ANSII and for English it is Unicode. As a result object size I read later
SS.Read(OL, SizeOf(OL));
is wrong.
Here's the binary to compare. See 8-15 bytes Click
OK, issue 1 was resolved, but still the binary I saved for verification does not match the original one: there is 1 byte missing at the end.
Can anyone tell where is a problem?
Important: there is no issues if I have English localization!!
I want to achieve a very very basic task in Delphi: to save a string to disk and load it back. It seems trivial but I had problems doing this TWICE since I upgraded to IOUtils (and one more time before that... this is why I took the 'brilliant' decision to upgrade to IOUtils).
I use something like this:
procedure WriteToFile(CONST FileName: string; CONST uString: string; CONST WriteOp: WriteOperation);
begin
if WriteOp= (woOverwrite)
then IOUtils.TFile.WriteAllText (FileName, uString) //overwrite
else IOUtils.TFile.AppendAllText(FileName, uString); //append
end;
Simple right? What could go wrong? Well, I recently stepped into a (another) bug in IOUtils. So, TFile is buggy. The bug is detailed here.
Anyone has can share an alternative (or simply your thoughts/ideas) that is not based on IOUtils and it is known to work? Well... the code above also worked for a while for me... So, I know if difficult to guaranty that a piece of code (no matter how small) will really work!
Also I would REALLY like to have my WriteToFile procedure to save the string to an ANSI file when it is possible (the uString contains only ANSI chars) and as Unicode otherwise.
Then the ReadAFile function should automagically detect the encoding and correctly read the string back.
The idea is that there are still text editors out there that will wrongly open/interpret an Unicode/UTF file. So, whenever possible, give a good old ANSI text file to the user.
So:
- Overwrite/Append
- Save as ANSI when possible
- Memory efficient (don't eat 4GB of ram when the file to load is 2GB)
- Should work with any text file (up to 2GB, obviously)
- No IOUtils (too buggy to be of use)
Then the ReadAFile function should automagically detect the encoding and correctly read the string back.
This is not possible. There exists files that are well-formed if interpreted as any text encoding. For instance see The Notepad file encoding problem, redux.
This means that your goals are unattainable and that you need to change them.
My advice is to do the following:
Pick a single encoding, UTF-8, and stick to it.
If the file does not exists, create it and write UTF-8 bytes to it.
If the file exists, open it, seek to the end, and append UTF-8 bytes.
A text editor that does not understand UTF-8 is not worth supporting. If you feel inclined, include a UTF-8 BOM when you create the file. Use TEncoding.UTF8.GetBytes and TEncoding.UTF8.GetString to encode and decode.
Just use TStringList, until size of file < ~50-100Mb (it depends on CPU speed):
procedure ReadTextFromFile(const AFileName: string; SL: TStringList);
begin
SL.Clear;
SL.DefaultEncoding:=TEncoding.ANSI; // we know, that old files has this encoding
SL.LoadFromFile(AFileName, nil); // let TStringList detect real encoding.
// if not - it just use DefaultEncoding.
end;
procedure WriteTextToFile(const AFileName: string; const TextToWrite: string);
var
SL: TStringList;
begin
SL:=TStringList.Create;
try
ReadTextFromFile(AFileName, SL); // read all file with encoding detection
SL.Add(TextToWrite);
SL.SaveToFile(AFileName, TEncoding.UTF8); // write file with new encoding.
// DO NOT SET SL.WriteBOM to False!!!
finally
SL.Free;
end;
end;
The Inifiles unit should support unicode. At least according to this answer: How do I read a UTF8 encoded INI file?
Inifiles are quite commonly used to store strings, integers, booleans and even stringlists.
procedure TConfig.ReadValues();
var
appINI: TIniFile;
begin
appINI := TIniFile.Create(ChangeFileExt(Application.ExeName,'.ini'));
try
FMainScreen_Top := appINI.ReadInteger('Options', 'MainScreen_Top', -1);
FMainScreen_Left := appINI.ReadInteger('Options', 'MainScreen_Left', -1);
FUserName := appINI.ReadString('Login', 'UserName', '');
FDevMode := appINI.ReadBool('Globals', 'DevMode', False);
finally
appINI.Free;
end;
end;
procedure TConfig.WriteValues(OnlyWriteAnalyzer: Boolean);
var
appINI: TIniFile;
begin
appINI := TIniFile.Create(ChangeFileExt(Application.ExeName,'.ini'));
try
appINI.WriteInteger('Options', 'MainScreen_Top', FMainScreen_Top);
appINI.WriteInteger('Options', 'MainScreen_Left', FMainScreen_Left);
appINI.WriteString('Login', 'UserName', FUserName);
appINI.WriteBool('Globals', 'DevMode', FDevMode);
finally
appINI.Free;
end;
end;
Also see the embarcadero documentation on inifiles: http://docwiki.embarcadero.com/Libraries/Seattle/en/System.IniFiles.TIniFile
Code based on David's suggestions:
{--------------------------------------------------------------------------------------------------
READ/WRITE UNICODE
--------------------------------------------------------------------------------------------------}
procedure WriteToFile(CONST FileName: string; CONST aString: String; CONST WriteOp: WriteOperation= woOverwrite; WritePreamble: Boolean= FALSE); { Write Unicode strings to a UTF8 file. It can also write a preamble }
VAR
Stream: TFileStream;
Preamble: TBytes;
sUTF8: RawByteString;
aMode: Integer;
begin
ForceDirectories(ExtractFilePath(FileName));
if (WriteOp= woAppend) AND FileExists(FileName)
then aMode := fmOpenReadWrite
else aMode := fmCreate;
Stream := TFileStream.Create(filename, aMode, fmShareDenyWrite); { Allow read during our writes }
TRY
sUTF8 := Utf8Encode(aString); { UTF16 to UTF8 encoding conversion. It will convert UnicodeString to WideString }
if (aMode = fmCreate) AND WritePreamble then
begin
preamble := TEncoding.UTF8.GetPreamble;
Stream.WriteBuffer( PAnsiChar(preamble)^, Length(preamble));
end;
if aMode = fmOpenReadWrite
then Stream.Position:= Stream.Size; { Go to the end }
Stream.WriteBuffer( PAnsiChar(sUTF8)^, Length(sUTF8) );
FINALLY
FreeAndNil(Stream);
END;
end;
procedure WriteToFile (CONST FileName: string; CONST aString: AnsiString; CONST WriteOp: WriteOperation);
begin
WriteToFile(FileName, String(aString), WriteOp, FALSE);
end;
function ReadFile(CONST FileName: string): String; {Tries to autodetermine the file type (ANSI, UTF8, UTF16, etc). Works with UNC paths }
begin
Result:= System.IOUtils.TFile.ReadAllText(FileName);
end;
okay, so I (VERY) recently started playing with lazaruz/free pascal, and I'm a little stuck with reading files with TMemoryStream and it's streaming kin.
I'm trying to write a simple base64 encoder, that can encode strings of text, or files (like images and WAVs) to then be used in html and javascript.
The following code compiles great but I get EReadError Illegal stream image when trying to load a file. I'll include the working string only procedure for reference:
procedure TForm1.TextStringChange(Sender: TObject);
begin
Memo1.Lines.Text := EncodeStringBase64(TextString.Text);
end;
procedure TForm1.FormCreate(Sender: TObject);
begin
Memo1.Lines.Text := '';
Form1.BorderIcons := [biSystemMenu,biMinimize];
end;
procedure TForm1.BitBtn1Click(Sender: TObject);
begin
if OpenDialog1.Execute then
begin
filename := OpenDialog1.Filename;
stream := TMemoryStream.Create;
try
StrStream := TStringStream.Create(s);
try
stream.LoadFromFile(filename);
stream.Seek(0, soFromBeginning);
ObjectBinaryToText(stream, StrStream);
StrStream.Seek(0, soFromBeginning);
Memo1.Lines.Text := EncodeStringBase64(StrStream.DataString);
finally
StrStream.Free;
end;
finally
stream.Free;
end;
end;
end;
Can anyone help me out?
You get the "illegal stream image" exception because the file you're loading probably isn't a binary DFM file. That's what ObjectBinaryToText is meant to process. It's not for arbitrary data. So get rid of that command.
You can skip the TMemoryStream, too. TStringStream already has a LoadFromFile method, so you can call it directly instead of involving another buffer.
StrStream.LoadFromFile(filename);
But a string isn't really the right data structure to store your file in prior to base64-encoding it. The input to base64 encoding is binary data; the output is text. Using a text data structure as an intermediate format means you may introduce errors into your data because of difficulties in encoding certain data as valid characters. The right interface for your encoding function is this:
function Base64Encode(Data: TStream): string;
You don't need to load the entire file into memory prior to encoding it. Just open the file with a TFileStream and pass it to your encoding function. Read a few bytes from it at a time with the stream's Read method, encode them as base64, and append them to the result string. (If you find that you need them, you can use an intermediate TStringBuilder for collecting the result, and you can add different buffering around the file reads. Don't worry about those right away, though; get your program working correctly first.)
Use it something like this:
procedure TForm1.BitBtn1Click(Sender: TObject);
var
filename: string;
stream: TStream;
begin
if OpenDialog1.Execute then begin
filename := OpenDialog1.Filename;
stream := TFileStream.Create(filename, fmOpenRead);
try
Memo1.Lines.Text := Base64Encode(stream);
finally
stream.Free;
end;
end;
end;
I never heard before about ObjectBinaryToText(), but looks like funky one. Also, what is EncodeStringBase64() function?
At first place, you shouldn't convert binary stream to text to encode it, instead you should directly B64 encode binary data. B64 algorithm is intended to work on array of bytes.
Since Delphi 6, there is EncdDecd.pas unit, which implements B64 encoding methods. I'm not sure if Lazarus/FPC have this, but if they do, your code to B64 encode file should look like this (add EncdDecd to uses list):
procedure TForm1.Button1Click(Sender: TObject);
var
instream : TFileStream;
outstream: TStringStream;
begin
if OpenDialog1.Execute then
begin
instream := TFileStream.Create(OpenDialog1.FileName, fmOpenRead or fmShareDenyNone);
try
outstream := TStringStream.Create;
try
EncodeStream(instream, outstream);
Memo1.Lines.Text := outstream.DataString;
finally
outstream.Free;
end;
finally
instream.Free;
end;
end;
end;
I'm trying to save some lines of text in a codepage different from my system's such as Cyrillic to a TFileStream using Delphi XE. However I can't find any code sample to produce those encoded file ?
I tried using the same code as TStrings.SaveToStream however I'm not sure I implemented it correctly (the WriteBom part for example) and would like to know how it would be done elsewhere. Here is my code:
FEncoding := TEncoding.GetEncoding(1251);
FFilePool := TObjectDictionary<string,TFileStream>.Create([doOwnsValues]);
//...
procedure WriteToFile(const aFile, aText: string);
var
Preamble, Buffer: TBytes;
begin
// Create the file if it doesn't exist
if not FFilePool.ContainsKey(aFile) then
begin
// Create the file
FFilePool.Add(aFile, TFileStream.Create(aFile, fmCreate));
// Write the BOM
Preamble := FEncoding.GetPreamble;
if Length(Preamble) > 0 then
FFilePool[aFile].WriteBuffer(Preamble[0], Length(Preamble));
end;
// Write to the file
Buffer := FEncoding.GetBytes(aText);
FFilePool[aFile].WriteBuffer(Buffer[0], Length(Buffer));
end;
Thanks in advance.
Not sure what example are you looking for; may be the following can help - the example converts unicode strings (SL) to ANSI Cyrillic:
procedure SaveCyrillic(SL: TStrings; Stream: TStream);
var
CyrillicEncoding: TEncoding;
begin
CyrillicEncoding := TEncoding.GetEncoding(1251);
try
SL.SaveToStream(Stream, CyrillicEncoding);
finally
CyrillicEncoding.Free;
end;
end;
If I understand it's pretty simple. Declare an AnsiString with affinity for Cyrillic 1251:
type
// The code page for ANSI-Cyrillic is 1251
CyrillicString = type AnsiString(1251);
Then assign your Unicode string to one of these:
var
UnicodeText: string;
CyrillicText: CyrillicString;
....
CyrillicText := UnicodeText;
You can then write CyrillicText to a stream in the traditional manner:
if Length(CyrillicText)>0 then
Stream.WriteBuffer(CyrillicText[1], Length(CyrillicText));
There should be no BOM for an ANSI encoded text file.