I'm looking for the best option to store my application settings. I decided to write own class that inherits from TPersistent which would store all the config options available. Currently I'm looking for the best way to save it - and I found JvAppStorage which looked very promising (as I'm using JVCL in my project anyway...) but it doesn't handle unicode (WideStrings) properly. For XML files it stores chars as entities, for ini file it seems to be stored ok, but in both cases loading strings replaces the text with lots of question marks...
Is there any good replacement that handles Unicode as well?
Thanks in advance.
Recently converted to JSON from ini files (and dreaded xml!) for setting storage. It's just so convenient and flexible. See SuperObject.
It's quite common use use UTF-8 as the on-disk representation of Unicode data. In your code, use the Utf8String data type to hold data encoded that way so you remember that you'll need to convert it before using it in the rest of your application.
I use MSXML to store settings per user in a personal directory on the network.
It should handle Unicode as well.
Related
I am having a hard time to load a textfile into a stringlist in firemonkey on osx when the encoding of the textfile in not known.
When I just use list.loadfromfile(filename), I get most of the time an exception regarding encoding.
list.loadfromfile(filename,TEncoding.unicode) will also fail when the file is in ansi, and opposite.
There is no issue on Windows, list.loadfromfile(filename) just works, but not on osx.
I cant specify the encoding, because it will be unknown (user provide the text files).
Any clue how I can get around this encoding issue when running the app on a mac?
In general this is not possible. It is quite possible to create a single file that is valid when interpreted in all common encodings. This has been discussed many times, for instance: The Notepad file encoding problem, redux.
I'm assuming that you are working with files that do not contain byte order marks, BOMs. Obviously if your input files contained BOMs then you could simply check the BOM and be done.
With that assumption stated, the right solution to the problem, in a perfect world, is to know the encoding. Either pick a specific encoding which your program requires, or arrange for the user to tell you the encoding when they supply the file.
If, for whatever reason, you cannot do that then the next best thing to do is to use heuristics to attempt to guess the encoding used. I'm not aware of any Pascal code to do this. But you should be able to put something together that will work reasonably well. This answer gives an outline of a basic strategy: https://stackoverflow.com/a/20747074
Over time I've rolled my own formats for saving and loading object properties but on having to revisit this I'm wondering about using Delphi's own text DFM format. I know tha this is really an 'internal' format, but the reader for it now seems pretty well defined and it copes with all types of property. Has anyone any comments about possible pitfalls?
I wouldn't really say that DFM is an 'internal format'. Sure Delphi uses it internally for forms and datamodules, but TReader and TWriter classes that perform streaming are publicly accessible and even documented. So they are clearly intended for end users as well.
Now, the possible problem is when you save a stream and later one of the classes in the stream changes so that the stream is not compatible any more. You may have seen this in Delphi if you attempt to open a form saved in D2007+ in D7 (missing property). But even if it happens, it's not too hard to resolve. You will get an exception that will report the exact property that is causing the problem. You also have to register all classes that you want to stream with RegisterClass.
DFM can be stored in binary or text format. Even if you store it Binary you can convert it to Text (using ObjectBinaryToText), once in text format, it's easy to fix.
So, the problems you may get happen due to incompatible changes in the structure, but those have nothing to do with DFM mechanism itself, and would also happen using any other streaming mechanism.
As for longevity, you can still open DFM's saved with D1 in the latest Delphi. So as long as you keep backward compatibility in mind, you have nothing to fear.
In conclusion, the choice of any particular format, DFM, XML, JSON, your own... doesn't really affect longevity. They all require same level of compatibility.
The reasons for choosing the format have more to do with decisions regarding:
interoperability with other apps/services
size/speed/human readability
But you didn't mention any of those in the question.
So I suggest using DFM over roll your own, as it would mean less code to maintain.
I am trying to deserialize an old file format that was serialized in Delphi, it uses binary seralization. I know nothing about the structure of the file except some very high level records that are in it.
What steps would you take to solve this problem? Any tools etc?
A good hexeditor, and use the gray matter to identify structures.
If you get a hint what kind of file it is, you can search for more specialized tools.
Running the unix/Linux "file" command can be good too (*) See Barry's comment below for how it works. It can be a quick check for common filetypes like DBF,ZIP etc hidden by using a different extension.
(*) there are 3rd party builds for windows, but they might lag in versions. If you can do it on a recent *nix distro, it is advised to do so.
The serialization process simply loops over all published properties and streams their value to a text file. If you do not know the exact classes that were streamed to the file you will have a very hard time deserializing the file. (if not impossible)
A good hex editor is first. If the file is read without buffering (eg read directly from a TFileStream) you could gain some information when using ProcMon from SysInternals; You can see exactly what data is read in what chunks and thus determine more quickly where the boundaries are between the structures you already identified.
I have an external device that spits out UDP packets of binary data and software running on an embedded system that needs to read this data stream, parse it and do somethign useful. The binary data gets logged to a file as well. I would like to write a parser that can easily take the input directly from either the UDP stream, or a file, parse the data into a specific format and then direct the output to either a file (e.g. matlab dat file) or to another process that will do some real time processing. Are there any resources that would help me with this and what is the best way to go about this? I think it might make sense to use C++ streams but I'm not familiar with creating custom output streams. Does this seem like a good approach to take or is there a better way to go about it?
Thanks.
The beauty of binary data is that its is generally of very fixed format.
A typical method of parsing it is to declare a structure that maps onto the received packets, and then to just use type-casts to read the fields as structure elements.
The beauty is that this requires no parsing.
you have to be careful about structure packing rules, and endian-ness to make the structure map exactly the same way. Use of the C "offsetof" and "sizeof" macros is useful to emit some debug info to check that your structure is indeed mapping to what you think it is mapping.
Packing rules can typically be altered either by directives (such as #pragma's) or command line options. Endian-ness you are stuck with. If its different from what your embedded system uses, declare all the fields as bytes, or use something like the "ntoh" macro to do the byte swapping.
The New Jersey Machine Code Toolkit is a scheme for decoding arbitrary binary patterns. It was originally designed for decoding instruction sets, but it ought to be just fine for decoding message formats. You provide a description of the binary format, it synthesizes code to access the fields of that format (when valid). THus you can refer to message fields using generated function calls rather than think about where the field is or how it is encoded.
Is there any difference or advantages using binary a file or XML file with
TClientDataSet.
Binary will be smaller and faster.
XML will be more portable and human readable.
The Binary file will be a little smaller.
The main advantage of the XML format is that you can pass it around via http(s) protocols.
Binary is smaller and faster, but only readable by TClientDataSets.
XML is larger and slower (both are not that bad, i.e. not by orders of magnitude bigger or slower).
XML is readable by people (not recommended in general, but it is doable), and software.
Therefore it is more portable (as Nick wrote).
TClientDataSets can load and save their own style of XML, or you can use the Delphi XML Mapper tool to read and write any kind of XML.
XSLT can for instance be used to transform those XML files into any kind of text, including other XML, HTML, CSV, fixed columns, etc.
In contrast to what Tim indicates, both binary and XML can be transferred through HTTP and HTTPS. However, it is often appreciated sending XML as it is easier to trace.
Without having tested it: I guess the binary format would be quite a lot faster when reading and writing. You'd better do your own benchmarks for that, though.
Another advantage of binary might be, that it cannot be easily edited which prevents people from mucking up the data outside the application.
When using Delphi 2009, we have noticed that if the file has an extension of .XML, it will not save in binary format over an existing dfXMLUTF8 format, even with a LoadFromFile, SaveToFile. Changing the file extension to something else (.DAT, for example) allows saving the file in dfBinary. Our experience is that the binary file, in addition to being somewhat more difficult for the end-user to manipulate (a plus!), is approximately 50% smaller than the dfXMLUTF8 format file.