Native2Ascii task not working - ant

I'm trying to use the native2ascii ant task but it seems that is not doing anything. Here's my ant task:
<target name="-pre-init">
<native2ascii src="src/com/bluecubs/xinco/messages" dest="src/com/bluecubs/xinco/messages/test"
includes="**/_*.properties"/>
<copy todir="src/com/bluecubs/xinco/messages">
<fileset dir="src/com/bluecubs/xinco/messages/test"/>
</copy>
<delete dir="src/com/bluecubs/xinco/messages/test" />
</target>
I did the copy part to see if it was an overwriting issue but the files come out exactly the same.
This is the output I get when running the task:
Converting 12 files from Z:\Netbeans\Xinco\2.01.xx\Xinco\src\com\bluecubs\xinco\messages to Z:\Netbeans\Xinco\2.01.xx\Xinco\src\com\bluecubs\xinco\messages\test
Copying 12 files to Z:\Netbeans\Xinco\2.01.xx\Xinco\src\com\bluecubs\xinco\messages
Deleting directory Z:\Netbeans\Xinco\2.01.xx\Xinco\src\com\bluecubs\xinco\messages\test
Edit:
Additional information:
OS: Windows 7 (but answer should work on any OD)
File encoding: Western (ISO-8859-1) obtained with this article.
Files location
Any idea?

native2ascii converts native characters like áéí to escaped unicode sequences. It means that á will be \u00e1, é -> \u00e9 and í -> \u00ed. After running native2ascii your files will be standard ASCII files which are more portable.
native2ascii does not touch the characters which are already in the escaped unicode form. Your properties files are already in escaped unicode form so it does not change anything. For example _XincoMessages_cz.properties contains this line:
general.accessrights=opr\u00E1vnen\u00ED k pr\u00EDstupu
It's escaped unicode. The nonescaped unicode form is this:
general.accessrights=oprávnení k prístupu
Wordpad vs. Netbans: When you open the properties files with Wordpad it opens it as a simple text file and shows \u00e1 as \u00e1. It does not convert it back to á. Netbeans does this conversion and you see the 'á' character. Furthermore, it writes it back to the disk as \u00e1 (!) when you save the file. To see the raw files use for example a Total or Double Commander which doesn't do any converting. (Note that Netbeans does this conversion just for properties files.)
If you put for example an á character to your _XincoMessages_cz.properties file it will be changed to \u00e1 if your run your ant task. Of course now don't use Netbeans for the editing, a simple notepad will do.
Loading properties files in java converts the escaped unicode characters to real unicode characters. An example:
final Reader inStream = new FileReader("..../_XincoMessages_cz.properties");
final Properties properties = new Properties();
properties.load(inStream);
System.out.println(properties.getProperty("general.accessrights"));
It prints:
oprávnení k prístupu
The ASCII/escaped unicode form in properites files is usually handled well by java applications. Finally, I think your properties files are good in their current format.

It ended being a view issue. Looking the files in a raw editor (i.e. Wordpad) showed that the files were already converted by the task. Viewing them from NetBeans shows them the same.

Related

Delphi 7 cyrillic characters not showing correctly

i recently asked (and paid) for translation of my Delphi app to support Macedonian (Cyrillic font) support.
I posted text to translate to my contracted translator, she sent me back translated strings. The text was extracted from all my .dfm and .pas files
when i replaced the original text with cyrillic translation, i can open .dfm fies also .pas files in my favourite Notepad++ (or notepad) , and i see translated characters correctly.
When i open these files in Delphi (as dpr file) , i see something like this:
Please someone tell me how to convert/display these strings in Delphi correctly.
I am using Macedonian regional settings, but it not helped me with this problem.
PS: Yes I am still using Delphi 7 because i love it / purchased this version.
UPDATE
Original text in Delphi:
original: ПОДГОТВИ КУТИИ ЗРДРУГИТЕ ЦЕÐТРÐЛИ
Correct text:
ПОДГОТВИ КУТИИ ЗА ДРУГИТЕ ЦЕНТРАЛИ
I noticed, when i change ParentFont property to false and font set to Verdana and Cyrillic (RUSSIAN_CHARSET) , then i copy/paste cyrillic text, it shows normally in Delphi
OK so i SOLVED that!
The solution is multi step one, and Notepad++ is needed:
1st step: Replace all fonts in .dfm with (for example) Verdana , or some font that allows Cyrillic support
2nd step: Replace all ParentFont = False to ParentFont = True
3rd step: In notepad++ Choose: Encoding -> Convert to ANSI
that's all, do this for all .dfm and .pas file (only 3rd step)
i am happy to not Listened David Heffernan and not gave up!
Your text file was UTF-8 encoded, whereas Delphi7 requires WinAnsi encoding, with codepage 1251 for Cyrillic characters.
You have the UTF8Decode() function in System.pas to make the conversion programmatically, if you prefer.

What encoding is this and how do I turn it into something I can see properly?

I'm writing a script that will operate on the subtitle files of a popular streaming service (Netfl*x).
The subtitle files have strange characters in them and I can't get them to render in a way that my text editors or web browser will display in a readable way. The xml encoding says UTF-8, but some characters are not readable.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<tt xmlns:tt="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:ttp="http://www.w3.org/ns/ttml#parameter" xmlns:tts="http://www.w3.org/ns/ttml#styling" ttp:tickRate="10000000" ttp:timeBase="media" xmlns="http://www.w3.org/ns/ttml">
<p>de 15 % la nuit dernière.</span></p>
<p>if youâve got things to doâ¦</span></p>
And in Vim:
This is what it looks like in the browser:
How can I convert this into something I can use?
I'll go out on a limb and say that file is UTF-8 encoded just fine, and you're merely looking at it using the wrong encoding. The character À encoded in UTF-8 is C3 80. C3 in ISO-8859-1 is Ã, which in your screenshot is followed by an 80. So looks like you're looking at a UTF-8 file using the (wrong) ISO-8859 encoding.
Use the correct encoding when opening the file.
My terminal is set to en_US.UTF-8, but was also rendering this supposedly UTF-8 encoded file incorrectly (sonné -> sonné). I was able to solve this by using iconv to encode the file in ISO8859-1.
iconv original.xml -t ISO8859-1 -o converted.xml
In the new file, the characters were properly rendered, although I don't quite understand why.

Jenkins Pipeline: How to write UTF-8 files with writeFile?

I'm hoping this is not a bug and that I'm just doing something wrong. I have a Jenkins (v2.19.1) Pipeline job and in it's groovy script, I need to search and replace some text in an existing text file, on a Windows node.
I've used fart.exe and powershell to do the search and replace, but I would really like to do this with just the groovy in Jenkins and eliminate dependency on fart/powershell/etc. and make this code more reusable on both linux and windows nodes.
After much googling and trying various approaches, the closest I got was to use readFile and writeFile. However, I've not been able to get writeFile to create a UTF-8 file. It creates an ANSI file even when I specify UTF-8 (assuming I'm doing it correctly).
Here's what I have so far...
def fileContents = readFile file: "test.txt", encoding: "UTF-8"
fileContents = fileContents.replace("hello", "world")
echo fileContents
writeFile file: "test.txt", text: fileContents, encoding: "UTF-8"
I've confirmed with multiple text editors that the test.txt file is UTF-8 when I start, and ANSI after the writeFile line. I've tried all combinations of including/not-including the encoding property and "utf-8" vs "UTF-8". But in all cases, the file is written out as ANSI (as reported by both Notepad++ and VS Code). Also, a question mark (HEX 3F) is added as the very first character of the file.
The echo line does not show the extra 3F character, so it seems the issue is in the writeFile line.
Your code looks correct. See that ANSI and UTF-8 are the same if you are using just non accented chars and numbers. Try to have some accented letters (áéíóúç) in your file that the editor will probably recognize it as an UTF-8 file.
You must use the Jenkins pipeline writeFile function. This is a special Jenkins method to write files inside your workspace. The default Java File objects won't work.
To specify the encoding, you must use named parameters. Here is a an example:
writeFile(file: "filename.txt", text: "áéíóú", encoding: "UTF-8")
This will create at the root of your workspace, a file named filename.txt with "áéíóú" as content and encoded as UTF-8.
BTW, if you have full control of the file you must search and replace, consider using Groovy's builtin SimpleTemplateEngine

BlackBerry - language support for Chinese

I have localised my app by adding the correct resource files for various European languages / dialects.
I have the required folder in my project: ./res/com/demo/localization
It contains the required files e.g. Demo.rrh, Demo.rrc, Demo_de.rrc etc.
I want to add support for 2 Chinese dialects, and I have the translations in an Excel file. On iPhone, they are referred to by the codes zh_TW & zh_CM. Following the pattern with German, I created 2 extra files called Demo_zh_TW.rrc & Demo_zh_CN.rrc.
I opened file Demo_zh_CN.rrc using Eclipse's text editor, and pasted in line of the Chinese translation using the normal resource file format:
START_LOCATION#0="开始位置";
When I tried to save the file, I got Eclipse's error about the Cp1252 character encoding:
Save could not be completed.
Reason:
Some characters cannot be mapped using "Cp1252" character encoding.
Either change the encoding or remove the characters which are not
supported by the "Cp1252" character encoding.
It seems the Eclipse editor will accept the Chinese characters, but the resource tool expects that these characters must be saved in the resource file as Java Unicode /u encoding.
How do I add language support for these 2 regions without manually copy n pasting in each string?
Is there maybe a tool that I can use to Java Unicode /u encode the strings from Excel so they can be saved in Code page 1252 Latin chars only?
I'm not aware of any readily available tools for working with BlackBerry's peculiar localization style.
Here's a snippet of Java-SE code I use to convert the UTF-8 strings I get for use with BlackBerry:
private static String unicodeEscape(String value, CharsetEncoder encoder) {
StringBuilder sb = new StringBuilder();
for(char c : value.toCharArray()) {
if(encoder.canEncode(c)) {
sb.append(c);
} else {
sb.append("\\u");
sb.append(hex4(c));
}
}
return sb.toString();
}
private static String hex4(char c) {
String ret = Integer.toHexString(c);
while(ret.length() < 4) {
ret = "0" + ret;
}
return ret;
}
Call unicodeEscape with the 8859-1 encoder with Charset.forName("ISO-8859-1").newEncoder()
I suggest you look at Blackberry Hindi and Gujarati text display
You need to use the resource editor to make these files with the right encoding. Eclipse will escape the characters automatically.
This is a problem with the encoding of your resource file. 1252 Code Page contains Latin characters only.
I have never worked with Eclipse, but there should be somewhere you specify the encoding of the file, you should set your default encoding for files to UTF-8 if possible. This will handle your chinese characters.
You could also use a good editor like Notepad++ or EMEditor to set the encoding of your file.
See here for how you can configure Eclipse to use UTF-8 by default.

how to load a properties file with non-ascii in ant

Suppose I have a properties file test.properties, which saved using utf-8
testOne=测试
I am using the following ant script to load it and echo it to another file:
<loadproperties srcFile="test.properties" encoding="utf-8"/>
<echo encoding="utf-8" file="text.txt">${testOne}</echo>
When I open the generated text.txt file using "utf-8" encoding I see:
??
What's wrong with my script?
Use "encoding" and "escapeunicode" together. It's work fine.
<loadproperties srcfile="${your.properties.file}" encoding="UTF-8">
<filterchain>
<escapeunicode />
</filterchain>
</loadproperties>
I found a work around, but I still doesn't understand why the org one doesn't work:
<native2ascii src="." dest=".">
<mapper type="glob" from="test.properties" to="testASCII.properties"/>
</native2ascii>
<loadproperties srcFile="testASCII.properties"/>
Then the echo works as expected.
I don't know why the encoding in loadproperties doesn't work.
Can anyone explain?
Try it this way:
<loadproperties srcfile="non_ascii_property.properties">
<filterchain>
<escapeunicode/>
</filterchain>
</loadproperties>
Apparently, InputStreamReader that uses the ISO Latin-1 charset, which kills your non-ascii characters. I ran into the same issue w/Arabic.
What editor were you using and what platform are you on?
Your generated property file might actually be good, but the editor you're using to examine it may be incapable of viewing it. For example, on my Mac, the VIM command line editor can view it (which surprises me), but in Eclipse, it looks like this:
testOne=������
If you're on Unix/Linux/Mac, try using od to dump your generated file, and examine the actual hex code to see what it should be.
For example, I copied your property file, and ran od on a Mac:
$ od -t x1 -t c test.property
0000000 74 65 73 74 4f 6e 65 3d e6 b5 8b e8 af 95 0a
t e s t O n e = 测 ** ** 试 ** ** \n
Here I can see that the code for 测 is 36 b5 8b and 试 is e8 af 95 which is the correct UTF-8 representation for these two characters. (Or, I at least think so. It shows up correctly in the Character Viewer Mac OS X panel).
The right answer is pointed to in this comment by David W.:
how to load a properties file with non-ascii in ant
Java Property Files must be encoded in ISO-8859-1:
http://docs.oracle.com/javase/7/docs/api/java/util/Properties.html
But Unicode escape sequences like \u6d4b can/must be used to encode unicode characters therein.
Tools/ANT-Targets like <native2ascii>, generating ascii-encoded files from natively maintained ones, can help here.
You can write your own task that reads properties from your default java character encoding for your OS (mine is utf-8), instead of converting your property files to unreadable unicode-escaped ASCII files (designed by people who read and write only English). Here's how to do it by copying and modifying Property.java from Ant's source code to your own package (e.g. org.my.ant). I used Ant 1.10.1.
Download Ant's source code in your format of choice from here:
http://ant.apache.org/srcdownload.cgi
Copy src/main/org/apache/tools/ant/taskdefs/Property.java to your own project (such as a new Java project), in org/my/ant/Property.java)
replace:
package org.apache.tools.ant.taskdefs;
with:
package org.my.ant;
Fix any imports needed by the package change. I just needed to add:
import org.apache.tools.ant.taskdefs.Execute;
In the method:
loadProperties(Properties props, InputStream is, boolean isXml)
replace:
props.load(is);
with:
props.load(new InputStreamReader(is));
In your project's resources folder (could be the same as your source folder), add the file org/my/ant/antlib.xml, with the content:
<?xml version="1.0" encoding="UTF-8"?>
<antlib>
<taskdef name="property" classname="org.my.ant.Property"/>
</antlib>
Compile this project (Property.java + antlib.xml).
Put the resulting jar in Ant's classpath, as explained here:
http://ant.apache.org/manual/using.html#external-tasks
Then use it in a build.xml file as follows:
<?xml version="1.0" encoding="UTF-8"?>
<project name="example"
xmlns:my="antlib:org.my.ant"
default="print"
>
<my:property file="greek.properties" prefix="example" />
<target name="print">
<echo message="${example.a}"/>
</target>
</project>
The file greek.properties contains:
a: ΑΒΓ

Resources