Danish characters not supported on UTF-8 encoded application - character-encoding

I am working on a application based on java, javascript(dojo). When the user enters danish characters, they are converted into question marks. I have checked that throughout the application only UTF-8 encoding is used. Also I have tried different encoding schemes but to no-effect.
One solution I found suggested to save the data in a notepad file and then use the same.....that also yields nothing.
Can anybody suggest what might be causing this issue?
Appreciate your help!
thank you.

Related

Loading textfile into stringlist with firemonkey on osx when the encoding is unknown

I am having a hard time to load a textfile into a stringlist in firemonkey on osx when the encoding of the textfile in not known.
When I just use list.loadfromfile(filename), I get most of the time an exception regarding encoding.
list.loadfromfile(filename,TEncoding.unicode) will also fail when the file is in ansi, and opposite.
There is no issue on Windows, list.loadfromfile(filename) just works, but not on osx.
I cant specify the encoding, because it will be unknown (user provide the text files).
Any clue how I can get around this encoding issue when running the app on a mac?
In general this is not possible. It is quite possible to create a single file that is valid when interpreted in all common encodings. This has been discussed many times, for instance: The Notepad file encoding problem, redux.
I'm assuming that you are working with files that do not contain byte order marks, BOMs. Obviously if your input files contained BOMs then you could simply check the BOM and be done.
With that assumption stated, the right solution to the problem, in a perfect world, is to know the encoding. Either pick a specific encoding which your program requires, or arrange for the user to tell you the encoding when they supply the file.
If, for whatever reason, you cannot do that then the next best thing to do is to use heuristics to attempt to guess the encoding used. I'm not aware of any Pascal code to do this. But you should be able to put something together that will work reasonably well. This answer gives an outline of a basic strategy: https://stackoverflow.com/a/20747074

Error encoding special characters

Bit of a strange problem (at least for me). In my Grails app I'm sending emails with some special characters (east European letters). Values of strings with special characters that I get from database are valid but the ones I create in application have "?".
Even more confusing is the fact that in development everything works fine, but when I deploy app to Tomcat instance I get the question marks.
I've set up everything to encode to UTF-8. At least I beleave so - obviously I'm missing something.
It sounds like you don't have the operating system language
packs installed for the languages you're trying to display.
While it appears as if the files themselves are saved properly, and the JVM
'understands' them because the character sets are supported, the GUIs
you're using can't display the corresponding encoding because the
underlying OS isn't displaying them.
I've experienced similar problems and the solution that
worked for me was to turn on the corresponding languages in the OS.

why can't I use secureTextEntry with a UTF-8 keyboard?

All,
I ran into this problem where for a UITextField that has secureTextEntry=YES, I cannot get any UTF-8 keyboards(Japanese, Arabic, etc.) to show, only non UTF-8 ones do(English, French, etc..). I did alot of searching on Google, on this site, and on Apple dev forums and see others with the same problem, but short of implementing my own UITextField, nobody seems to have a reasonable solution or an answer as to whether this is a bug or intended behavior.
And if this is intended behavior, why? Is there a standard, a white paper, SOMETHING someplace that I can look at and then point to when I go to my Product Manager and say we cannot support UTF-8 passwords?
THanks,
I was unable to find anything in Apple's documentation to explain why this should be the case, but after creating a test project it does indeed appear to be so. At a guess, I imagine secure text entry is disallowed for any language using composite characters because it would make character input difficult.
For instance, for Japanese input, should each kana character be hidden after it is typed? Or just kanji characters? If the latter, the length of time characters remain onscreen is long enough make secure input almost moot. Similarly for other languages using composite input methods.
This post includes code for manually implementing your own secure input behaviour.

Which pagecode was used to encode this DOC document?

I got a bunch of .DOC documents. I'm not even positive they are Word documents, but even if they are, I need to open and parse them with eg. Python to extract information from them.
Problem is, I couldn't figure out how they were encoded: UltraEdit's Conversion function wouldn't correct the text no matter which encoding I tried. OpenOffice 3.2 also failed displaying the contents correctly (guessing Windows-1252).
Here's an example, hoping that someone knows what pagecode it is:
"lÕAssemblŽe gŽnŽrale" instead of "l'Assemblée générale"
Thank you for any tip.
Greenstone digital library http://www.greenstone.org/ provides pretty good text extraction from word documents, including encoding detection.
Running msword in server mode gives you a range of scripting options- I'm sure detecting the encoding will be possible.

Usable charset for Moldava

Does anybody knows which charset is used in Moldava. We to prepare our software (and database) for Moldava. I guess UTF-8 should work, shouldn't it?
UTF-8 works for everything :-)
The question is whether your software will need to interface with "native" applications. If so, it may need to understand the encodings used by that software. Those are most likely ISO-8859-5 for cyrillic script and ISO-8859-16 for latin script.
Moldova has some controversy on what script to use (Transnistria uses Moldovan Cyrillic and mainland uses latin with lots of diacritics).
UTF-8 is always a good choice, anyway.

Resources