I have made a C++ program that uses ofstream to create 2 files in my disk: one is called details.obf and the other is records.txt. The file details has only 1 line inside (1 integer) and the file records.txt has a non fixed number of lines (they are all strings).
With the code below I can get the value inside the file details. It's pretty simple and I am using a MemoryStream.
m := TMemoryStream.Create;
try
try
m.LoadFromFile(TPath.Combine(TPath.GetHomePath, 'details.obf'));
m.Read(i, sizeOf(i));
//other stuff...
except
//...
end;
finally
m.Free;
end;
With the code below instead I am reading the content of the records file:
a := TStreamReader.Create('C:\Users\betom\Desktop\records.txt');
try
while not(a.EndOfStream) do
begin
Memo1.Lines.Add(a.ReadLine);
end;
finally
a.Free;
end;
In the second block of code I have used a different class (TStreamReader) and I have written that code looking at embarcadero's documentation. I had to use the while not(a.EndOfStream) do because the lenght of records.txt is unknown.
I have seen that MemoryStream (and other classes) are all subclasses of TStream. Why I cannot call something like while not(m.EndOfStream) do with m a TMemoryStream?
I cannot understand the difference between a MemoryStream and a StreamReader. From what I have understood the latter can read automatically all the values in a given range while the first cannot.
Note: I have read on the docs that I can have a TStreamReader and a TStreamWriter and both are fine when I need to create a file that contains some data. I just cannot understand what are memorystream used for if I have the same behavior with a StreamReader.
TStreamReader is a general purpose class for reading text/character data from any stream. It does not support any other form of data in a stream and is not intended for use with any other form of data.
A stream itself might be a file on disk or data on the network or data in memory. Different stream classes exist to provide stream-access to data from those different sources.
TMemoryStream exists specifically to provide access to data in memory as a sequence of bytes, which may be binary data or text/character data or a mixture of both.
To answer your actual question:
I have seen that MemoryStream (and other classes) are all subclasses
of TStream. Why I cannot call something like while not(m.EndOfStream) do
with m a TMemoryStream?
First a correction. It is correct that TMemoryStream and some other stream manipulating classes (e.g. TFileStream) inherit from TStream. That is however not the case with TStreamReader (and TStringReader). These inherit from TTextReader, which together with TTextWriter and its descendents TStreamWriter and TStringWriter mainly exist to provide familiar classes for .Net users.
Here's the hierarchy of some of the discussed classes:
TObject
TStream
TCustomMemoryStream
TMemoryStream
TBytesStream
TStringStream
THandleStream
TFileStream
TWinSocketStream
TOleStream
TTextReader
TStreamReader
TStringReader
TBinaryReader
The answer is that the property EndOfStream is declared in TStreamReader, iow in a different branch than TMemoryStream.
In TStream descendents you can use e.g. the Position and Size properties to determine if you are at the end of the stream.
Related
I need to be able to pass the same set of structures (basically arrays of different records) over two different interfaces
The first (legacy) which is working requires a pointer to a record and the record size
The second, which I am attempting to develop, is type-safe and requires individual fields to be set using Get/Set methods for each field
Existing code uses records (probably around 100 or so) with memory management being handled in a 3rd party DLL (i.e. we pass the record pointer and size to it and it deals with memory management of new records).
My original thought was to bring the memory management into my app and then copy over the data on the API call. This would be easy enough with the old interface, as I just need to be able to access SizeOf() and the pointer to the record structure held in my internal TList. The problem comes when writing the adapter for the new type-safe interface
As these records are reliant on having a known size, there is heavy use of array 0..n of char static arrays, however as soon as I try to access these via 2010-flavour RTTI I get error messages stating 'Insufficient RTTI information available to support this operation'. Standard Delphi strings work, but old short-strings don't. Unfortunately, fixing string lengths is important for the old-style interface to work properly. I've had a look at 3rd party solutions such as SuperObject and the streaming in MorMot, though they can't do anything out of the box which doesn't give me too much hope of a solution not needing significant re-work.
What I want to be able to do is something like the following (don't have access to my Delphi VM at the moment, so not perfect code, but hopefully you get the gist):
type
RTestRec = record
a : array [0..5] of char;
b : integer;
end;
// hopefully this would be handled by generic <T = record> or passing instance as a pointer
procedure PassToAPI(TypeInfo: (old or new RTTI info); instance: TestRec)
var
Field: RTTIField;
begin
for Field in TypeInfo.Fields do
begin
case Field.FieldType of
ftArray: APICallArray(Field.FieldName, Field.Value);
ftInteger: APICallInteger(Field.FieldName, Field.Value.AsInteger);
...
end;
end;
Called as:
var
MyTestRec: RTestRec;
begin
MyTestRec.a := 'TEST';
MyTestRec.b := 5;
PassToAPI(TypeInfo(TestRec), MyTestRec);
end;
Can the lack of RTTI be forced by a Compiler flag or similar (wishful thinking I feel!)
Can a mixture of old-style and new-style RTTI help?
Can I declare the arrays differently to give RTTI but still having the size constraints needed for old-style streaming?
Would moving from Records to Classes help? (I think I'd need to write my own streaming to an ArrayOfByte to handle the old interface)
Could a hacky solution using Attributes help? Maybe storing some of the missing RTTI information there? Feels like a bit of a long-term maintenance issue, though.
In the past, I have seen this work, but I never really understood how it should be done.
Assume we have a file of known data types, but unknown length, like a dynamic array of TSomething, where
type
TSomething = class
Name: String;
Var1: Integer;
Var2: boolean;
end;
The problem, though, is that this object type may be extended in the future, adding more variables (e.g. Var3: String).
Then, files saved with an older version will not contain the newest variables.
The File Read procedure should somehow recognize data in blocks, with an algorithm like:
procedure Read(Path: String)
begin
// Read Array Size
// Read TSomething --> where does this record end? May not contain Var3!
// --> how to know that the next data block I read is not a new object?
end;
I have seen this work with BlockRead and BlockWrite, and I assume each object should probably write its size before writing itself in the file, but I would appreciate an example (not necessarily code), to know that I am thinking towards the right direction.
Related readings I have found:
SO - Delphi 2010: How to save a whole record to a file?
Delphi Basics - BlockRead
SO - Reading/writing dynamic arrays of objects to a file - Delphi
SO - How Can I Save a Dynamic Array to a FileStream in Delphi?
In order to make this work, you need to write the element size to the file. Then when you read the file, you read that element length which allows you to read each entire element, even if your program does not know how to understand all of it.
In terms of matching up your record with the on-disk record that's easy enough if your record only contains simple types. In that scenario you can read from the file Min(ElementLength, YourRecordSize) bytes into your record.
But it does not look as though you actually have that scenario. Your record is in fact a class and so not suitable for memory copying. What's more, its first member is a string which is most definitely not a simple type.
Back in the day (say the 1970s), the techniques you described were how files were read. But these days programming has moved on. Saving structured data to files usually means using a more flexible and adaptable serialization format. You should be looking to using JSON, XML, YAML or similar for such tasks.
I'd say you need a method of versioning you file. That way you know what version of the record is contained in the file. Write it at the start of the file and then on reading, read in the version identifier first and then use the corresponding structure to read the rest.
If I understand you correctly your main issue is if TSomething changes. Most important thing is that you need to add version info into your file, this you really cannot avoid.
As for actual storage using Sqlite would most likely solve all your problems, but depending on your situation it might be an overkill.
Except for unexceptional circumstances I wouldn't really worry about extending the class too much.If you add add version number to the beginning of the file you can easily convert the file after the class have changed. All you need to do is implement your solution so that adding conversions would as simple as reasonable.
In order to read/write files I would prefer streams/XML/JSON (depending on situation) instead of blockread/blockwrite as you don't have to implement a hack to store version number.
In theory you could also have unused space for each record so I you could avoid recreating entire file if class changes upto a point (until you have enough unused space). It maybe helpful if TSomething changes often and files are big, but most likely I would not go that route.
This is how I would do it: Include a simple version number in the header. This can be any string, integer or whatever.
Reading and writing the file is very easy (I am using pseudocode):
Procedure Read (MyFile : TFile);
Var
reader : IMyFileReader;
begin
versionInfo = MyFile.ReadVersionInfo();
reader = ReaderFactory.CreateFromVersion(versionInfo);
reader.Read(MyFile);
end;
Type
ReaderFactory = Class
public
class function CreateFromVersion(VersionInfo : TVersionInfo) : IMyFileReader;
end;
function ReaderFactory.CreateFromVersion(VersionInfo : TVersionInfo) : IMyFileReader;
begin
if VersionInfo = '0.9-Alpha' then
result := TVersion_0_9_Alpha_Reader.Create()
else if VersionInfo = '1.0' then
result := TVersion1_0_Reader.Create()
else ....
end;
This can easily be maintained and extended forever. You will never have to touch the Read-routine, but only add a new reader and enhance the factory. With a simple registration method and a TDictionary<TVersionInfo,TMyFileReaderClass>, you can even avoid having to modify the factory.
I have discovered (the hard way) that if a file has a valid UTF-8 BOM but contains any invalid UTF8 encodings, and is read by any of the Delphi (2009+) encoding-enabled methods such as LoadFromFile, then the result is a completely empty file with no error indication. In several of my applications, I would prefer to simply lose a few bad encodings, even if I get no error report in this case either.
Debugging reveals that MultiByteToWideChar is called twice, first to get the output buffer size, then to do the conversion. But TEncoding.UTF8 contains a private FMBToWCharFlags value for these calls, and this is initialized with a MB_ERR_INVALID_CHARS value. So the call to get the charcount returns 0 and the loaded file is completely empty. Calling this API without the flag would 'silently drop illegal code points'.
My question is how best to weave through the nest of classes in the Encoding area to work around the fact that this is a private value (and needs to be, because it is a class var for all threads). I think I could add a custom UTF8 encoding, using the guidance in Marco Cantu's Delphi 2009 book. And it could optionally raise an exception if MultiByteToWideChar has returned an encoding error, after calling it again without the flag. But that does not solve the problem of how to get my custom encoding used instead of Tencoding.UTF8.
If I could just set this up as a default for the application at initialization, perhaps by actually modifying the class var for Tencoding.UFT8, this would probably be sufficient.
Of course, I need a solution without waiting to lodge a QC report asking for a more robust design, getting it accepted, and seeing it changed.
Any ideas would be very welcome. And can someone confirm this is still an issue for XE4, which I have not yet installed?
I ran into the MB_ERR_INVALID_CHARS issue when I first updated Indy to support TEncoding, and ended up implementing a custom TEncoding-derived class for UTF-8 handling to avoid specifying MB_ERR_INVALID_CHARS. I didn't think to use a class helper.
However, this issue is not just limited to UTF-8. Any decoding failure of any of the TEncoding classes will result in a blank result, not an exception being raised. Why Embarcadero chose that route, when most of the RTL/VCL uses exceptions instead, is beyond me. Not raising an exception on error caused a fair amount of issues in Indy that had to be worked around.
This can be done pretty simply, at least in Delphi XE5 (have not checked earlier versions). Just instantiate your own TUTF8Encoding:
procedure LoadInvalidUTF8File(const Filename: string);
var
FEncoding: TUTF8Encoding;
begin
FEncoding := TUTF8Encoding.Create(CP_UTF8, 0, 0);
// Instead of CP_UTF8, MB_ERR_INVALID_CHARS, 0
try
with TStringList.Create do
try
LoadFromFile(Filename, FEncoding);
// ...
finally
Free;
end;
finally
FEncoding.Free;
end;
end;
The only issue here is that the IsSingleByte property for the newly instantiated TUTF8Encoding is then incorrectly set to False, but this property is not currently used anywhere in the Delphi sources.
A partial workaround is to force the UTF8 encoding to suppress MB_ERR_INVALID_CHARS globally. For me, this avoids the need for raising an exception, because I find it makes MultiByteToWideChar not quite 'silent': it actually inserts $fffd characters (Unicode 'replacement character') which I can then find in the cases where this is important. The following code does this:
unit fixutf8;
interface
uses System.Sysutils;
type
TUTF8fixer = class helper for Tmbcsencoding
public
procedure setflag0;
end;
implementation
procedure TUTF8fixer.setflag0;
{$if CompilerVersion = 31}
asm
XOR ECX,ECX
MOV Self.FMBToWCharFlags,ECX
end;
{$else}
begin
Self.FMBToWCharFlags := 0;
end;
{$endif}
procedure initencoding;
begin
(Tencoding.UTF8 as TmbcsEncoding).setflag0;
end;
initialization
initencoding;
end.
A more useful and principled fix would require changing the calls to MultiByteToWideChar not to use MB_ERR_INVALID_CHARS, and to make an initial call with this flag so that an exception could be raised after the load is complete, to indicate that characters will have been replaced.
There are relevant QC reports on this issue, including 76571, 79042 and 111980. The first one has been resolved 'as designed'.
(Edited to work with Delphi Berlin)
Your "global" approach is not really global - it relies upon the assumption that all the code would only use one and the same instance of TUTF8Encoding. The same instance where you hacked the flags field.
But it would not work if one obtain TUTF8Encoding object(s) by other means than TEncoding.GetUTF8, for example in XE2 another method - TEncoding.GetEncoding(CP_UTF8) - would create a new instance of TUTF8Encoding instead of re-using FUTF8 shared one. Or some function might run TUTF8Encode.Create directly.
So i'd suggest two more approaches.
Approach with patching the class implementation, somewhat hacky. You introduce your own class for the sake of obtaining new "fixes" constructor body.
type TMyUTF8Encoding = class(TUTF8Encoding)
public constructor Create; override;
end;
This constructor would be the copycat of TUTF8Encoding.Create() implementation, except for setting the flag as you want it ( in XE2 it is done by calling another, inherited Create(x,y,z) so u would not need an access to the private field ) instead.
Then you can patch the stock TUTF8Encoding VMT overriding its virtual constructor to that new constructor of yours.
You may read Delphi documentation about "internal formats" and so forth, to get the VMT layout. You would also need calling VirtualProtect (or other platform-specific function) to remove protection from VMT memory area before patching and then to restore it.
Examples to learn from
How to change the implementation (detour) of an externally declared function
https://stackoverflow.com/a/1482802/976391
Or you may try using Delphi Detours library, hopefully it can patch virtual constructors. Then... it might be an overkill here to use that rather complex lib for that single goal.
After you hacked the TUTF8Encoding class do call the TEncoding.FreeEncodings to remove the already created shared instances (if any) if any and thus trigger recreating the UTF8 instances with your modifications.
Then, if you compile your program as a single monolithic EXE , without using runtime BPL modules, you just can copy the SysUtils.pas sources to your application folder and then to include that local copy into your project explicitly.
How to patch a method in Classes.pas
There you would change the very TUTF8Encoding implementation as you see fit in the sources and Delphi would use it.
This brain-deadly simplistic (hence - equally reliable) approach would not work though if your projects would be built to reuse rtlNNN.bpl runtime package instead of being monolithic.
How can I best determine the Size of a Class from Memory?
Here is a basic sample class to work with - Note the variables serve no purpose other than for the example:
type
TMyClass = class
public
fString1: string;
fString2: string;
fInteger1: Integer;
fInteger2: Integer;
constructor Create;
destructor Destroy; override;
end;
What I would like to do is return the same size as if it were a file on disk.
So if I saved TMyClass to File and the Size of the File was 2.4kb for example, I would like to get that value without even needing the File to be on disk (getting the size from Memory).
I have been searching and reading before asking here, this is what I have tried so far with mixed results:
InstanceSize
Using TMyClass.InstanceSize on my class only ever returns the value 12.
SizeOf
Using SizeOf(TMyClass) on my class only ever returns the value 4.
Whatever the values return I will be formatting to FileSize format such as kilobytes and megabytes etc.
I must be doing something wrong because I know for a fact there is more than 12 (or 4) bytes of data being used on my class. I also know I am referencing the correct class, it is the same class I am creating and freeing in the Form Create and Destroy events - as well as using at runtime.
Is what I am trying to do the correct way to go about it, and if so should my examples work? If they should work, I know then I need to look closer at my actual classes and objects.
The dirty way to achieve this would be for me to save my object classes to file, read the file size and then delete the temp file. But I don't like doing approaches like that, I think they are dirty and taking shortcuts. The purpose is to read the size of my classes from memory, without resorting to saving and temp files etc.
I look forward to hearing your advice and suggestions thanks.
Not sure how you got 12 for your InstanceSize on a class like that. It should be 20 (pre-D2009) or 24 (D2009 and later). The reason it's so much smaller than the actual size of your saved file is because the strings aren't held in the object itself; they're reference types, implemented as pointers to the actual string data.
Your "dirty approach" is on the right track. Pretty much the only way to find out how much disk space you need for an object like that is to actually serialize it. But you don't need to save it to disk. If you're currently saving the object with a TFileStream, (and you should be if you're not,) use a TMemoryStream instead, which will "save" it to a memory buffer. Then get the Size of the stream and that's the serialized size of your object, all without having to create and then delete a temp file.
I need to store multiple objects (most of them are TObject/non persistent) to a TMemoryStream, save the stream to disk and load it back. The objects need to be streamed one after each other. Some kind of universal container.
At the moment I put all properties/fields/variables of an object into a record and save the record to stream. But I intend to use functions file WriteInterger, WriteString (see below), WriteBoolean, etc functions to save/load data from stream.
StreamReadString(CONST MemStream: TMemoryStream): string;
StreamWriteString(CONST MemStream: TMemoryStream; s: string);
However, it seems that I need to rewrite a lot of code. One of the many examples is TStringList.LoadFromStream that will not work so it needs to be rewritten. This is because TStringList needs to be the last object in the stream (it reads from current position to the end of the stream).
Anybody knows a library that provide basic functionality like this?
I am using Delphi 7 so RTTI is not that great.
See related post here
Btw, Delphi7 also has RTTI support, otherwise your forms (.dfm) could not be loaded :-)
If you use published properties, RTTI will work for you "out of the box".
Otherwise you have to do it yourself with a
procedure DefineProperties(Filer: TFiler); override;
You can take a look at how it's implemented in:
procedure TDataModule.DefineProperties(Filer: TFiler);
These are the only ways for object serialization.
But you could also try records: if you do not use array(strings are also arrays of char) or object properties, you can directly save and load a record to memory (stream, file, etc). I use this in my AsmProfiler to be able to read and write many (small) results very fast (array of record with some integer values can be saved and loaded with one Move/CopyMemory call!).
Which Delphi version? Delphi 2010 has new RTTI functionality, so you can use DeHL which has "Full generic serialization for all included types and collections".
Have you thought about using TReader and TWriter to fill your streams.
Why not use XML?
Write an XSD for the XML that defines the XML.
Generate a Delphi unit form that XSD using the XML Data Binding Wizard.
Put a bunch of your objects into that XML.
Save the XML to disk (or stream it to some other medium).
For more info on XML and the XML Data Binding Wizard see this answer.
Edit:
Just map your objects to the interfaces/objects generated from the XSD; or use the objects/interfaces that have been generated.
That is usually far easier than hooking into the Delphi streaming mechanism (by either writing TPersistent wrappers with published properties around your objects, going the DefineBinaryProperty way, or the TReader/TWriter/DefineProperty way).
--jeroen