Does anybody knows which charset is used in Moldava. We to prepare our software (and database) for Moldava. I guess UTF-8 should work, shouldn't it?
UTF-8 works for everything :-)
The question is whether your software will need to interface with "native" applications. If so, it may need to understand the encodings used by that software. Those are most likely ISO-8859-5 for cyrillic script and ISO-8859-16 for latin script.
Moldova has some controversy on what script to use (Transnistria uses Moldovan Cyrillic and mainland uses latin with lots of diacritics).
UTF-8 is always a good choice, anyway.
Related
I am writing an Delphi 2009 program that sends the escape command to the label printer for printing barcode. Refer to Sending printer specific commands, I can use Windows.Escape() to do the job. But my question is our database stores UTF8 data(for storing different languages), may I ask if Windows.Escape() accepts UTF8 data?
Thanks
*I discovered that Escape accept PAnsiChar...
When using PASSTHROUGH, as the linked code does, the Escape API accepts raw 8 bit data which is not processed in any way by Escape. The data is passed directly to the device.
You can learn about the Escape function from its documentation: https://msdn.microsoft.com/en-us/library/windows/desktop/dd162701.aspx
If the printer understands UTF-8, then your approach should work. But, if the printer does not understand UTF-8 it will fail. In other words this is not really a question about Escape, but rather a question about your printer. You will need to consult its documentation.
Reading between the lines of your question, it seems that you are letting the encoding used in your database drive your thinking regarding printing. That seems to me to be mistaken. There's no connection between your database and the printer. Whether or not your printer understands UTF-8 is unrelated to your database text encoding. You need to first work out what encoding the printer needs. If it is not the same as used by the database, then you will need to convert. Converting from one encoding to another is usually straightforward.
I am having a hard time to load a textfile into a stringlist in firemonkey on osx when the encoding of the textfile in not known.
When I just use list.loadfromfile(filename), I get most of the time an exception regarding encoding.
list.loadfromfile(filename,TEncoding.unicode) will also fail when the file is in ansi, and opposite.
There is no issue on Windows, list.loadfromfile(filename) just works, but not on osx.
I cant specify the encoding, because it will be unknown (user provide the text files).
Any clue how I can get around this encoding issue when running the app on a mac?
In general this is not possible. It is quite possible to create a single file that is valid when interpreted in all common encodings. This has been discussed many times, for instance: The Notepad file encoding problem, redux.
I'm assuming that you are working with files that do not contain byte order marks, BOMs. Obviously if your input files contained BOMs then you could simply check the BOM and be done.
With that assumption stated, the right solution to the problem, in a perfect world, is to know the encoding. Either pick a specific encoding which your program requires, or arrange for the user to tell you the encoding when they supply the file.
If, for whatever reason, you cannot do that then the next best thing to do is to use heuristics to attempt to guess the encoding used. I'm not aware of any Pascal code to do this. But you should be able to put something together that will work reasonably well. This answer gives an outline of a basic strategy: https://stackoverflow.com/a/20747074
I am working on a application based on java, javascript(dojo). When the user enters danish characters, they are converted into question marks. I have checked that throughout the application only UTF-8 encoding is used. Also I have tried different encoding schemes but to no-effect.
One solution I found suggested to save the data in a notepad file and then use the same.....that also yields nothing.
Can anybody suggest what might be causing this issue?
Appreciate your help!
thank you.
The Wikipedia entry for Subversion contains a paragraph about problems with different ways of Unicode encoding:
While Subversion stores filenames as Unicode, it does not specify if
precomposition or decomposition is used for certain accented
characters (such as é). Thus, files added in SVN clients running on
some operating systems (such as OS X) use decomposition encoding,
while clients running on other operating systems (such as Linux) use
precomposition encoding, with the consequence that those accented
characters do not display correctly if the local SVN client is not
using the same encoding as the client used to add the files
While this describes a specific problem with Subversion client implementations, I am not sure if the underlying Unicode composition problem could also appear with regular Delphi applications. I guess that the problem can only arise if Delphi applications are able to use both Unicode encoding ways (maybe in Delphi XE2). If yes, what could Delphi developers do to avoid it?
There is a minor display issue in that many fonts used on Windows won't render the decomposed form in the ideal way, by using the combined glyph for both the letter and the diacritical. Instead it falls back to rendering the letter and than overlaying the standalone diacritical mark on top, which typically results in a less visually pleasing, potentially-lopsided grapheme.
However that is not the issue the Subversion bug referenced from wiki is talking about. It's actually completely fine to check in filenames to SVN that contain composed or decomposed character sequences; SVN neither knows nor cares about composition, it just uses the Unicode code points as-is. As long as the backend filesystem leaves filenames in the same state as they were put in, all is fine.
Windows and Linux both have filesystems that are equally blind to composition. Mac OS X, unfortunately, does not. Both HFS+ and UFS filesystems perform ‘normalisation’ to decomposed form before storing an incoming filename, so the filename you get back won't necessarily be the same sequence of Unicode code points you put in.
It is this [IMO: insane] behaviour that confuses SVN—and many other programs—when being run on OS X. It's particularly likely to bite because Apple happened to choose decomposed (NFD) as their normalisation form, whereas most of the rest of the world uses composed (NFC) characters.
(And it's not even real NFD, but an incompatible Apple-only variant. Joy.)
The best way to cope with this is, if you can, is never to rely on the exact filename something's stored under. If you only ever read a file from a given name, that's fine, as it'll be normalised to match the filesystem at the time. But if you're reading a directory listing and trying to match filenames you find in there to what you expected the filename to be—which is what Subversion is doing—you're going to get mismatches.
To do a filename match reliably you would have to detect that you're running on OS X, and manually normalise both the filename and the string to some normal form (NFC or NFD) before doing the comparison. You shouldn't do this on other OSes which treat the two forms as different.
AFAICT, both encodings should produce the same results when displaying, and both are valid Unicode, so I don't quite see the problem there. A display routine should be able to handle both if decomposition is encountered for. A code point é should display as-is, while e´ should only display as é in decomposition mode.
The problem is not display, IMO, it is comparison, either for equality (which fails if both use a different encoding) or lexically, i.e. for sorting. That is why one should normalize to one encoding, as David says. That way there are no abmiguities anymore.
The same problem could arise in any application that deals with text. How to avoid it depends on what operations the application is performing and the question lacks specific details. Mostly I think you'd solve such problems by normalizing the text. This involves using a single preferred representation whenever you encounter ambiguity of encoding.
I'm looking for the best option to store my application settings. I decided to write own class that inherits from TPersistent which would store all the config options available. Currently I'm looking for the best way to save it - and I found JvAppStorage which looked very promising (as I'm using JVCL in my project anyway...) but it doesn't handle unicode (WideStrings) properly. For XML files it stores chars as entities, for ini file it seems to be stored ok, but in both cases loading strings replaces the text with lots of question marks...
Is there any good replacement that handles Unicode as well?
Thanks in advance.
Recently converted to JSON from ini files (and dreaded xml!) for setting storage. It's just so convenient and flexible. See SuperObject.
It's quite common use use UTF-8 as the on-disk representation of Unicode data. In your code, use the Utf8String data type to hold data encoded that way so you remember that you'll need to convert it before using it in the rest of your application.
I use MSXML to store settings per user in a personal directory on the network.
It should handle Unicode as well.