FreeMarker on Chinese OS loses a character - localization

I have FreeMarker being used on an application running on Windows 7 in a Chinese locale.
The .ftl file includes this XML:
<run fontname='Arial'>Æ
</run>
The text is the letter Æ (the grapheme of AE, U+00C6) followed by an encoded newline. There is no FreeMarker text substitution on this line.
After FreeMarker text substitution is run on the file, the XML is changed, losing the ampersand:
<run fontname='Arial'>Æ#10;</run>
Without the ampersand, the encoded newline is lost, and the text "#10;" is displayed instead.
This isn't happening in other Windows systems running with other locales (English, French, German, and most notably Japanese). How can I avoid this, or is this a bug?

Looks like the result of some kind of charset disagreement. Ensure that FreeMarker uses the same charset for decoding the template file as the charset actually used for it. (For XML that's usually UTF-8.) If you don't configure FreeMarker to use a specific charset, by default it uses the default charset of the OS, which not what you want usually. Assuming your files are usually in UTF-8, you need to set the default_encoding setting (Configuration.setDefaultEncoding) to utf-8. You can also force the template charset by starting it with <#ftl encoding="utf-8">.

Related

karaf configuration property is garbled

I implement org.osgi.service.cm.ManagedService interface to get Karaf configuration. But when I give a Chinese value to the property, it is garbled.Initially, the files in the etc folder are encoded in latin1. I have tried to set utf-8 encoding, but it has no effect. Can anyone help me?
In Karaf, the configurations files (ie etc/*.cfg) are handled by the felix subproject "fileinstall".
fileinstall doesn't support yet to specified a custom character encoding for the configuration, it uses the Properties class and Properties.load(InputStream), which documents:
The load(Reader) / store(Writer, String) methods load and store
properties from and to a character based stream in a simple
line-oriented format specified below. The load(InputStream) /
store(OutputStream, String) methods work the same way as the
load(Reader)/store(Writer, String) pair, except the input/output
stream is encoded in ISO 8859-1 character encoding. Characters that
cannot be directly represented in this encoding can be written using
Unicode escapes as defined in section 3.3 of The Java™ Language
Specification; only a single 'u' character is allowed in an escape
sequence. The native2ascii tool can be used to convert property files
to and from other character encodings.
So, you have to encode your file in ISE-8859-1 and quote every UTF character, or use an xml file to encode your configuration files.
There is a way to change cfg files encoding.
Configuration for fileinstall subproject polling etc/*.cfg files is written in config.properties file.
You can add
felix.fileinstall.configEncoding = UTF-8
The solution was checked in Karaf 4

—- " added in HTML when converting MarkDown file to HTML using Jekyll tool

I have used Jekyll tool to convert MarkDown file To HTML. It has been successfully converted to HTML. but the below following encoded punctuation characters has been added at the top of the HTML, due to the file encoded format is Encode in UTF-8.
"—-"
After changed the same markdown file to Encode in ANSI format in NotePad++[encoding option in menu bar]. The punctuation character not included in generated HTML.
In this we need to manually change the markdown file to ANSI for HTML generation 'Jekyll'.
Any solution for this?
 is the UTF-8 BOM so that's probably what you are seeing, assuming you are looking at it using CP1252; and — is something out of the General Punctuation block.
Proper diagnostics are not possible without an indication of which character encoding you are using instead of UTF-8 to view the file, and/or what exact bytes you have in the file, probably as a hex dump. The first few bytes (the BOM) would be EF BB BF. See also the character-encoding tag wiki for troubleshooting tips.
Quick googling indicates that Jekyll is highly allergic to UTF-8 BOM in its input, so it seems unlikely that it generates spurious BOM characters on output. I could speculate that the template file you are using has a BOM and that it is being faithfully included in the output, but I'm not really familiar enough with Jekyll to actually help troubleshoot any further.
Of course, as per the big ugly warnings all over the Jekyll site, I assume you have already made sure that your Markdown input doesn't have a BOM character. Many Windows editors are notorious for putting one in when you save as UTF-8; make sure you use "UTF-8 without a BOM" as the "Save As..." format -- and switch to an editor which offers this option if yours doesn't have it.
try using charset=utf-8
or
Check your content has any straight double quote (" ") or straight single quote (' ') and remove those
http://practicaltypography.com/straight-and-curly-quotes.html
This encoding format issue. make the markdown file in UTF-8 without BOM format.
This will remove the punctuation character in 'html' .

How to programmatically set application name in Japanese?

Currently I am trying to set application name using
net.rim.blackberry.api.homescreen.HomeScreen.setName("これはある");
but it throws exception: IllegalArgumentException.
Can anyone provide the solution?
I am using Blackberry JDE 5.0.
This is probably a string encoding problem. Try
new String(new String("これはある").getBytes("UTF-16BE"), "UTF-16BE");
It's not pretty but I think that will work.
Here's a link to the Blackberry string spec: http://www.blackberry.com/developers/docs/5.0.0api/java/lang/String.html
By default it's ISO-8859-1 which does not include Japanese characters.
The problem you are facing is how to get a string represented in your source code into your application with the same characters. For latin characters, this is pretty straightforward, as we can just put the characters in quotes, and get a string, like "Hello world"
When you go to non-latin, like Japanese, it gets harder. You can still directly write Japanese in your source code, but you need to make sure your editor and your compiler agree on an encoding so that the characters can be interpreted correctly. The Java-SE compiler takes an argument "-encoding" which allows you to specify the encoding of your java source files.
Unfortunately, rapc, the BlackBerry compiler, does not offer an option to specify encoding, even though it is invoking javac itself. So rapc uses the platform default, which is utf-8 on Linux and OSX and iso-8859-1 on Windows.
The way around this problem is to use a feature of the Java language for parsing strings - unicode escaping. By entering the six character sequence "\u3053" in a string, the java compiler will parse that number as hexidecimal and use the corresponding unicode code point, solving problems with source file encoding.
So "Hello world" and "\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064" will result in the same strings appearing in your class files.
Because of this, Svetlin's answer from the comments is the right approach here:
net.rim.blackberry.api.homescreen.HomeScreen.setName("\u3053\u308C\u306F\u3042\u‌​308B");

User language support in Delphi 7

My program is written in Delphi 7 and I want to avoid a Russian or a Chinese,
Korean try to use my soft because file paths contains Unicode chars and my program can t handle them yet (as long as I do not port my program on a new Delphi version supporting UNICODE).
How do I write a function detecting the "Unicode language" in Delphi 7?
A Delphi 7 program (in its VCL part) can handle Russian, Chinese or Korean characters without any problem.
If the Windows system language is properly set, the charset will match the corresponding encoding, and the file names will be able to have Unicode chars as available in this charset. In fact, default string=AnsiString is converted into Unicode when the VCL calls Windows APIs (all ....A() calls will do the conversion then call the ....W() version).
You can force the default code page (the one which will select the charset to be used) by calling code like this:
if GetThreadLocale<>LCID then // force locale settings if different
if SetThreadLocale(LCID) then
GetFormatSettings; // resets all locale-specific variables
In this case, the TFileName (=AnsiString) in the current system charset will be converted by Windows into the corresponding Unicode characters, and you'll be able to use it in your Delphi 7 application.
What you can't do with the standard VCL AnsiString use it to directly mix charsets, as you can since Delphi 2009, thanks to the new string = UnicodeString default paradigm.
PS:
Since the CharSet only involve #128..#255 chars (i.e. all with bit 7 set), if you use only #0..#127 chars, your string will be consistent whatever the current charset/codepage setting is. If you use only English chars and numbers e.g., your path will always work, whatever the charset/codepage is. But if you use non English chars, the path will only work if the charset/codepage is correctly set, which is the case for a path used by an end-user (using a TOpenDialog at runtime for instance).

ruby on rails x charset

i'm having problem to deal with charset in ruby on rails app, specificially in my templates. Code that comes from my database, works fine, but codes like ç ~ that are located in my views are not working. I added the following codes to my code
I added a function like that, but that still not working i have ç ~ codes in my application.rhtml that are not working.
before_filter :configure_charsets
# Configuring charset to UTF-8 def configure_charsets
headers["Content-Type"] = "text/html; charset=UTF-8"
end
I added as well meta http-equiv html to utf-8 and a .htaccess parameter AddDefaultCharset UTF-8
That's still not working, any other tip?
Put this piece of code in your config (environment.rb)
Rails::Initializer.run do |config|
config.action_controller.default_charset = "iso-8859-1"
end
This will do it.
Also, remove the default charset line if any in layouts/application.html
Is the text editor you're using to put the special characters into the file (either source or views) treating those characters as UTF-8? For example, if you're using TextMate, you can deliberately save a file as UTF-8. If for some reason you used a different encoding earlier (a default, perhaps), those UTF-8 characters might be getting transcoded at the code editing stage, so even if the rendering process is using UTF-8 throughout, it'll still not work.
Further, if you're using something from a shell, like vi, or whatever, is your terminal set up to accept UTF-8 as default? If you had it set to ISO-8859-1 or whatever, you'd get the same issue.
Is your application.rhtml file written in the correct character set? Make sure it's UTF-8, and not ISO-8859-1.
So if the contents of your file are UTF-8, and the output is being interpreted as UTF-8, something in between is changing the data. Can give give us the the hex interpretation of the input bytes (anything non-ASCII will be at least two bytes in UTF-8) for one of your special characters, and the hex interpretation of the output byte or bytes? Perhaps we can figure out what the change is, and work back from there.

Resources