How to programmatically set application name in Japanese? - blackberry

Currently I am trying to set application name using
net.rim.blackberry.api.homescreen.HomeScreen.setName("これはある");
but it throws exception: IllegalArgumentException.
Can anyone provide the solution?
I am using Blackberry JDE 5.0.

This is probably a string encoding problem. Try
new String(new String("これはある").getBytes("UTF-16BE"), "UTF-16BE");
It's not pretty but I think that will work.
Here's a link to the Blackberry string spec: http://www.blackberry.com/developers/docs/5.0.0api/java/lang/String.html
By default it's ISO-8859-1 which does not include Japanese characters.

The problem you are facing is how to get a string represented in your source code into your application with the same characters. For latin characters, this is pretty straightforward, as we can just put the characters in quotes, and get a string, like "Hello world"
When you go to non-latin, like Japanese, it gets harder. You can still directly write Japanese in your source code, but you need to make sure your editor and your compiler agree on an encoding so that the characters can be interpreted correctly. The Java-SE compiler takes an argument "-encoding" which allows you to specify the encoding of your java source files.
Unfortunately, rapc, the BlackBerry compiler, does not offer an option to specify encoding, even though it is invoking javac itself. So rapc uses the platform default, which is utf-8 on Linux and OSX and iso-8859-1 on Windows.
The way around this problem is to use a feature of the Java language for parsing strings - unicode escaping. By entering the six character sequence "\u3053" in a string, the java compiler will parse that number as hexidecimal and use the corresponding unicode code point, solving problems with source file encoding.
So "Hello world" and "\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064" will result in the same strings appearing in your class files.
Because of this, Svetlin's answer from the comments is the right approach here:
net.rim.blackberry.api.homescreen.HomeScreen.setName("\u3053\u308C\u306F\u3042\u‌​308B");

Related

String UTF-8 encoding with cyrillic in H2O

I load csv file of utf-8 encoding with cyrillic strings. After parsing in Flow interface - i see not cyrillic, but not readable symbols like "пїўпѕЂпѕ™пїђпѕ" How can i use utf-8 cyrillic strings in H2O?
This appears to be a bug in the Flow interface, but only in the setupParse command. If you continue through and do the import, the data gets imported correctly.
I've reported the bug, with test data and screenshots (taken in Firefox) here:
https://0xdata.atlassian.net/browse/PUBDEV-4640
So if you have additional information, or the bug is behaving differently for you, it'd be good to add it to that bug report.
check your csv file in text and binary presentation to find how Cyrillic text is encoded, if it is UTF-8 it should look like this:
Привет
for the word
Привет

karaf configuration property is garbled

I implement org.osgi.service.cm.ManagedService interface to get Karaf configuration. But when I give a Chinese value to the property, it is garbled.Initially, the files in the etc folder are encoded in latin1. I have tried to set utf-8 encoding, but it has no effect. Can anyone help me?
In Karaf, the configurations files (ie etc/*.cfg) are handled by the felix subproject "fileinstall".
fileinstall doesn't support yet to specified a custom character encoding for the configuration, it uses the Properties class and Properties.load(InputStream), which documents:
The load(Reader) / store(Writer, String) methods load and store
properties from and to a character based stream in a simple
line-oriented format specified below. The load(InputStream) /
store(OutputStream, String) methods work the same way as the
load(Reader)/store(Writer, String) pair, except the input/output
stream is encoded in ISO 8859-1 character encoding. Characters that
cannot be directly represented in this encoding can be written using
Unicode escapes as defined in section 3.3 of The Java™ Language
Specification; only a single 'u' character is allowed in an escape
sequence. The native2ascii tool can be used to convert property files
to and from other character encodings.
So, you have to encode your file in ISE-8859-1 and quote every UTF character, or use an xml file to encode your configuration files.
There is a way to change cfg files encoding.
Configuration for fileinstall subproject polling etc/*.cfg files is written in config.properties file.
You can add
felix.fileinstall.configEncoding = UTF-8
The solution was checked in Karaf 4

—- " added in HTML when converting MarkDown file to HTML using Jekyll tool

I have used Jekyll tool to convert MarkDown file To HTML. It has been successfully converted to HTML. but the below following encoded punctuation characters has been added at the top of the HTML, due to the file encoded format is Encode in UTF-8.
"—-"
After changed the same markdown file to Encode in ANSI format in NotePad++[encoding option in menu bar]. The punctuation character not included in generated HTML.
In this we need to manually change the markdown file to ANSI for HTML generation 'Jekyll'.
Any solution for this?
 is the UTF-8 BOM so that's probably what you are seeing, assuming you are looking at it using CP1252; and — is something out of the General Punctuation block.
Proper diagnostics are not possible without an indication of which character encoding you are using instead of UTF-8 to view the file, and/or what exact bytes you have in the file, probably as a hex dump. The first few bytes (the BOM) would be EF BB BF. See also the character-encoding tag wiki for troubleshooting tips.
Quick googling indicates that Jekyll is highly allergic to UTF-8 BOM in its input, so it seems unlikely that it generates spurious BOM characters on output. I could speculate that the template file you are using has a BOM and that it is being faithfully included in the output, but I'm not really familiar enough with Jekyll to actually help troubleshoot any further.
Of course, as per the big ugly warnings all over the Jekyll site, I assume you have already made sure that your Markdown input doesn't have a BOM character. Many Windows editors are notorious for putting one in when you save as UTF-8; make sure you use "UTF-8 without a BOM" as the "Save As..." format -- and switch to an editor which offers this option if yours doesn't have it.
try using charset=utf-8
or
Check your content has any straight double quote (" ") or straight single quote (' ') and remove those
http://practicaltypography.com/straight-and-curly-quotes.html
This encoding format issue. make the markdown file in UTF-8 without BOM format.
This will remove the punctuation character in 'html' .

What Windows 'hosts' encoding is?

What the Windows 'hosts' file encoding is? Is it UTF-8? Or ASCII + system codepage? How IDN (international domain names with umlauts etc.) entries should be added and can they be added at all?
It should be ANSI or UTF-8 without BOM. I just dealt with a server that had the hosts file encoding set to UCS-2 Little Endian, and that led to the file being ignored.
There is a wealth of information here:
https://serverfault.com/questions/452268/hosts-file-ignored-how-to-troubleshoot
The simple answer is
ANSI or UTF-8 WITH BOM.
(UTF-8 without BOM is NOT valid).
Details:
As far as I have tried, the encoding of the hosts file on Windows should be
ANSI or UTF-8 with BOM.
I know this question is many years old, but a colleague made the mistake of looking at this post and the ServerFault post, so I decided to add an answer.
1. Simple case only ASCII
Works.
Without any multi-byte characters, This is equivalent to ANSI, also equivalent to UTF-8 without BOM.
2. ANSI (with Japanese ANSI multi-byte characters)
Works.
note: There are Japanese characters but this is valid ANSI encoding in windows.
In Japanese editions of Windows, this code page cp932 is referred to as "ANSI",
https://en.wikipedia.org/wiki/Code_page_932_(Microsoft_Windows)
3. UTF-8 with BOM
Works.
note: BOM 付き means with BOM.
4. UTF-8 without BOM
DOES NOT work.
5. Additional test cases
If you use emoji instead of Japanese, the result will be the same.
Use emoji and save as UTF8 without BOM does not work.
(However, other lines not include emoji may be worked correctly.)
Use emoji and save as UTF8 with BOM can resolve host correctly.
note: If you use Notepad to check it yourself, be sure to put double quotes in the file name when you save it, or Notepad will be create hosts.txt.
Appended:
(Asked in comment)
The hosts file supports inline comments.

How does acrobat encode annotations added as sticky notes to pdfs?

We have been reading and writing Sticky Notes/Annotations/Comments to pdfs via an activex control in our application for a number of years. We have recently upgraded to Delphi2009 with Unicode Support. The following is causing problems.
When we call
CAcroPDAnnot.GetContents
The results seem to be rather strange and we lose our Unicode Chars. It is not like saving as an ansi string which would usually result in returning ????? instead we get a string such as
‚És‚­“ú‚É•—Ž×‚ð‚Ђ¢‚½‚ç
For a string of Japanese characters.
However if I save the comments in the pdf to a datafile via the menu in the pdf itself it is written to file as something like
0kˆL0Oeå0k˜¨ª0’0r0D0_0‰
The latter can be export and reimported into an acrobat pdf and will recreate the correct unicode characters. However once I call CAcroPDAnnot.GetContents in my code it is coming back as something else.
Is CAcroPDAnnot.GetContents broken?
Is there an encoding scheme I should be aware of?
Is there an alternative I might be able to do?
Thanks
‚És‚­“ú‚É•—Ž×‚ð‚Ђ¢‚½‚ç
That's the string:
に行く日に風邪をひいたら
in CP-932 aka Shift-JIS encoding, an awful but lamentably still-popular encoding in Japan.
You're currently interpreting it in as CP-1252 (Windows Western European). If your PDF-reading component won't convert it for you automatically, you'll need to find a way to detect what encoding the document is in and convert it manually.
I don't know what Delphi provides for reading encodings, but have you got the encodings for Shift-JIS installed in Windows, from the Control Panel -> Regional Options -> "Install files for East Asian languages" option? If not, that might explain why it'd be failing to convert automatically, perhaps.
You're not exactly giving us a lot of information to work with.
I take it you're talking about the "Acrobat.CAcroPDAnnot" class' method GetContents here. Which version of Acrobat are you using? Have you perhaps switched versions (or run an update) around the time you started programming with Delphi 2009?
Then: how did you instantiate the object? If using a *_TLB.pas file generated from the DLL, are you certain it still matches it? (Try re-generating it, if uncertain).
Third: how are you calling the method? What type of variable are you assigning the result to?
What might also help, is if you could provide a sample of an annotation (preferably including non-ASCII chars); and for that annotation:
what it should look like (and what it does look like inside Reader)
what it returns when using a pre-2009 version of Delphi*
what it returns when using Delphi 2009*
(* preferably the HEX byte codes of the (ansi/wide)strings; but output from the Ctrl-F7 inspector should do)
Then maybe someone could provide a more meaningful answer.
Ok, one of the main differences between Delphi 2009 and the earlier versions is that the default string type is an unicode string. That means that if you use the same ActiveX component as in previous versions, you are passing unicode strings to ascii strings and that is usually not a good idea.
There are a couple of solutions for this problem:
Try if you can upgrade your activeX component so that it supports full unicode strings.
Use AnsiString and not string to communicate with the activeX component. In this case, you can still use the old interface, but you are still bound to the same limitations.
Use an other control that creates pdf. There is a lot to find, but be prepared to change a big chunk of your software. (Some controls are XML based and use encoding. )

Resources