Ant copy task corrupts UTF-8 symbols - ant

I have a .properties file with translations in Arabic. I am using it to replace strings in an html file. However, when I start the copy task, it completely corrupts the symbols and I get something like this:
اÙÙزادات
Any idea what's causing this and how I can fix it?
build.xml
<target name="copyAndReplace">
<copy todir="..." overwrite="yes" encoding="UTF-8">
<fileset dir="..." includes="*.html"></fileset>
<filterset>
<filtersfile file="***.properties" />
</filterset>
</copy>
</target>

I see some possible problems:
In Java, Properties files are assumed to have ISO-8859-1 encoding. Even if you're not dealing directly with Java, ant is reading a property file. I've run into this problem when opening a property file in Vim and NetBeans editor. Vim saved it in UTF-8 and NetBeans in ISO-8859-1.
You should use the outputencoding attribute of copy task. In Windows, UTF-8 is not the default encoding.

i encountered the same issue, but with images.
In the ant manual i found the following remark:
Note: If you employ filters in your copy operation, you should limit the copy to text files. Binary files will be corrupted by the copy operation. This applies whether the filters are implicitly defined by the filter task or explicitly provided to the copy operation as filtersets. See encoding note.
Maybe that is the source of the problem. I will need to check on my own whether this solves my problem.
Kind regards,
Marc

As mentioned by #Jean Waghetti above, ANT expects the files to be ISO-8859-1 encoded. I posted a similar stack overflow question for Chinese characters.
The only solution I've found is by ensuring my .properties file was ISO-8859-1 and the characters were escaped.
For example مرحبا بالعالم
Would be:
\u0645\u0631\u062D\u0628\u0627 \u0628\u0627\u0644\u0639\u0627\u0644\u0645
This is not ideal as it's not terribly human-readable. I have noticed that eclipse automatically converts it on hover.

You can add some code to translate the utf-8 properties to iso-8859-1 properties and the use the converted and escaped properties
<project name="xyz" default="copyAndReplace">
<property name="srcdir" value="src" />
<property name="propdir" value="src" />
<property name="tmpdir" value="tmp" />
<target name="encodeProps">
<script language="javascript">
importPackage(java.io);
importPackage(java.lang);
var files = new java.io.File(propdir).listFiles();
for (var i in files) {
var f = files[i];
if (!f.getName().endsWith(".properties")) continue;
var io = new InputStreamReader(new FileInputStream(f), "utf-8");
var out = new FileOutputStream(new File(tmpdir, f.getName()));
do {
var c = io.read();
if (c == -1) break;
if (c > 127) {
var s = Integer.toHexString(c);
s = new StringBuilder().append("\\u").append("0000".substring(s.length())).append(s).toString();
out.write(s.getBytes());
} else {
out.write(c);
}
} while (true);
io.close();
out.close();
}
</script>
</target>
<target name="copyAndReplace" depends="encodeProps">
<copy todir="dst" overwrite="yes" encoding="UTF-8" filtering="true">
<fileset dir="${srcdir}" includes="*.html">
</fileset>
<filterset>
<filtersfile file="${tmpdir}/c.properties" />
</filterset>
</copy>
</target>
</project>

Related

How to use ant expandproperties with windows pathseparator

I tried to use ants loadproperties with expandproperties:
This works for simple text properties but i get weird results when a property contains a windows path.
<property name="myAntFile" value="${ant.file}" />
<loadproperties srcFile="my.properties">
<filterchain>
<expandproperties />
</filterchain>
</loadproperties>
<echo message="$${external} = ${external}" />
the properties file looks like this:
external=${myAntFile}
the result is:
Buildfile: C:\projects\trunk\build.xml
...
[echo] ${external} = C:projects\trunkbuild.xml
I know that for properties files there are escape rules for backslashes and special whitespace characters. However i dont see how i can translate the buildscripts properties to that special meaning.
Anyone has a idea how to solve that or is this a ant bug (maybe the expandproperties chain should get a additional property for escaping when used in property file contexts?)?
With ant you can use a forward slash / as the path separator when defining paths, even on Windows: C:/projects/trunk/build.xml
If ${ant.file} returns the path using backslashes, convert this path first before you load the properties file.
Unfortunately I haven't yet found the definitive way to convert paths from C:\a\path to C:/a/path and back. Supposedly pathconvert can do the trick...
<pathconvert targetos="unix" property="myAntFile.withForwardSlashes">
<path location="${myAntFile}"/>
</pathconvert>
... but it confuses relative and absolute paths and I couldn't make it work while testing this on my OS X machine.

Ant: Rename files to include their MD5

The question is likely VERY trivial for anyone familiar with ant, of which I only use the basics thus far.
I know how to rename files, e.g. I already use:
<copy todir="build/css/">
<fileset dir="css/">
<include name="*.css"/>
</fileset>
<globmapper from="*.css" to="*-min.css"/>
</copy>
I know how to calculate an MD5:
<checksum file="foo.bar" property="foobarMD5"/>
I don't know how to include the second into the first, to rename all those files to include their MD5 - the purpose is to serve as webbrowser cache buster. The other cache-busting option, to append "?[something]" is not as good, as is explained on some Google webmaster pages, having the MD5 as part of the name is better.
I managed to produce a somewhat strange solution using for from ant contrib.
But you have to install ant contrib first.
The copy in the sequential does not seem to accept/evaluate mappers (it wouldn't work, I tried with ant 1.7.0), so I had to create an extra move with a filtermapper to create the results.
It does the following:
for each file create an md5sum and save it in property foobarMD5
the property has to be unset before each iteration
I create a new file in the same dir named example.java_foobarMD5.java (Notice that the filename contains the fileextension)
I move all files with .java_ in its name to a new Folder and remove the .java_
I leave this example with .java.
<for param="file">
<path>
<fileset dir="src/" includes="**/*.java"/>
</path>
<sequential>
<echo>Letter #{file}</echo>
<var name="foobarMD5" unset="true"/>
<checksum file="#{file}" property="foobarMD5"/>
<echo>${foobarMD5}</echo>
<copy file="#{file}" tofile="#{file}_${foobarMD5}.java"/>
</sequential>
</for>
<move todir="teststack" verbose="true">
<fileset dir="src/">
<include name="**/*java_*"/>
</fileset>
<filtermapper>
<replacestring from=".java_" to="-"/>
</filtermapper>
</move>
You could do this without having to include ant contrib. I had to implement this for work and was not allowed to introduce that extension for security reasons. The solution I came to was this:
<target name="appendMD5">
<copy todir="teststack">
<fileset dir="css/" includes="**/*.css"/>
<scriptmapper language="javascript"><![CDATA[
var File = Java.type('java.io.File');
var Files = Java.type('java.nio.file.Files');
var MessageDigest = Java.type('java.security.MessageDigest');
var DatatypeConverter = Java.type('javax.xml.bind.DatatypeConverter');
var buildDir = MyProject.getProperty('builddir');
var md5Digest = MessageDigest.getInstance('MD5');
var file = new File(buildDir, source);
var fileContents = FIles.readAllBytes(file.toPath());
var hash = DatatypeConverter.printHexBinary(md5Digest.digest(fileContents));
var baseName = source.substring(0, source.lastIndexOf('.'));
var extension = source.substring(source.lastIndexOf('.'));
self.addMappedName(baseName + '-' + hash + extension);
]]></scriptmapper>
</copy>
</target>
It is worth noting that I wrote this for Java 8 but with some minor tweaks it could work on Java 7. Sadly this won't work for earlier versions of Java without more effort.

ant copy from absolute path read from xml

My problem is, I have to read the source path for a copy job from an xml file and then copy all files in that dir read from the xml file to another dir.
Since code is more than words:
<xmltask source="${projectfile}">
<copy path="Project/RecentResultsInfo/ResultsDirectoryOfRecentLoadTest/text()" property="recentdir" attrValue="true"/>
</xmltask>
<copy todir="${targetdirectory}">
<fileset dir="${recentdir}"/>
</copy>
The output when running this target is:
C:\develop\build.xml:44: Warning: Could not find resource file "C:\develop\C:\Programme\tool\test_90\" to copy.
It seems in fileset it does not recognize, that recentdir holds a full path inside. The written xml from the application has a newline before and after the path in the xml file that is read with the path. So ant does not recognize the path since theres a newline in front of it.
Is there anything like trim for ant?
Can anybody help me getting ant to accept that path?
Done it now by using Ant-Contrib, but that is used in this project anyway.
<xmltask source="${projectfile}">
<copy path="Project/RecentResultsInfo/ResultsDirectoryOfRecentLoadTest/text()" property="recentdirraw" attrValue="true"/>
</xmltask>
<!-- replace newlines and whitespace from read path -->
<propertyregex property="recentdir" input="${recentdirraw}" regexp="^[ \t\n]+|[ \t\n]+$" replace="" casesensitive="false" />
<copy todir="${targetdirectory}">
<fileset dir="${recentdir}"/>
</copy>
Simply modifying the property with a regex trimming the text by striping of whitespace and newlines.
As far as I can see, the copy element in xmltask provides a trim attribute.
trims leading/trailing spaces when writing to properties
Does that work?

Can I send Ant 'replace' task output to a new file?

The Ant replace task does an in-place replacement without creating a new file.
The below snippet replaces tokens in any of the '*.xml' files with the corresponding values from the 'my.properties' file.
<replace dir="${projects.prj.dir}/config"
replacefilterfile="${projects.prj.dir}/my.properties"
includes="*.xml" summary="true" />
I want those files that had their tokens replaced to be created named after a pattern (e.g.) '*.xml.filtered', and keep the original files.
Is this possible in Ant with some smart combination of tasks?
There are a couple of ways to get close to what you want without copying to a temporary directory and copying back.
Filtersets
If the source files can be changed so that the parts to be replaced can be delimited with begin and end tokens, as in #date# (# is the default token, but it can be changed) then you can use the copy task with a globmapper and a filterset:
<copy todir="config">
<fileset dir="config" includes="*.xml" />
<globmapper from="*.xml" to="*.xml.filtered" />
<filterset filtersfile="replace.properties" />
</copy>
If replace.properties contains FOO = bar, then any occurrence of #FOO# in a source xml file file be replaced with bar in the target.
Note that the source and target directories are the same, the globmapper means the target files and named with the suffix .filtered. It's possible (and more usual) to copy files into a different target directory)
Filterchains
If the source file can't be changed to add begin and end tokens, a possible alternative would be to use a filterchain with one or more replacestring filters instead of the filterset:
<copy todir="config">
<fileset dir="config" includes="*.xml" />
<globmapper from="*.xml" to="*.xml.filtered" />
<filterchain>
<tokenfilter>
<replacestring from="foo" to="bar" />
<!-- extra replacestring elements here as required -->
</tokenfilter>
</filterchain>
</copy>
This will replace any occurrence of foo with bar, anywhere in the file, which is more like the behaviour of the replace task. Unfortunately this way means you need to include all your replacements in the build file itself, you can't have them in a separate properties file.
In both cases the copy task will only copy source files that are newer than the target files, so unnecessary work won't be done.
Copy then replace
A third possibility (that has just occured to me whilst writing up the other two) would be to perform the copy first to the renamed files, then run the replace task specifying the renamed files:
<copy todir="config">
<fileset dir="config" includes="*.xml" />
<globmapper from="*.xml" to="*.xml.filtered" />
</copy>
<replace dir="config" replacefilterfile="replace.properties" summary="true"
includes="*.xml.filtered" />
This might be the closest solution to the original requirement. The downside is that the replace task will be run each time on the renamed files. This could be a problem for some replacement patterns (admittedly they would be odd ones like foo=foofoo, but they would be okay with the first two methods) and you will be doing unnecessary work when the dependencies don't change.
The replace task doesn't observe dependencies, instead it carries out the replacement by writing a temporary file for each input file. If the temporary file is the same as the input file, it is discarded. A temporary file that differs from the input file is renamed to replace that input. This means all the files are processed, even if none of them need be - hence it can be inefficient.
The original solution to this question was to carry out a copy-replace-copy. The second copy isn't needed though, as a mapper can be used in the first. In the copy, dependencies can be used to restrict processing to just the files that have changed - by means of a depend selector in an explicit fileset:
<copy todir="${projects.prj.dir}">
<fileset dir="${projects.prj.dir}">
<include name="*.xml" />
<depend targetdir="${projects.prj.dir}">
<mapper type="glob" from="*.xml" to="*.xml.filtered" />
</depend>
</fileset>
<mapper type="glob" from="*.xml" to="*.xml.filtered" />
</copy>
That will restrict the copy fileset to just those files that have changed. An alternative syntax for the mappers is:
<globmapper from="*.xml" to="*.xml.filtered" />
The simplest replace would then be:
<replace dir="${projects.prj.dir}"
replacefilterfile="my.properties"
includes="*.xml.filtered" />
That will still process all the files though, even if none of them need undergo replacements. The replace task has an implicit fileset and can operate on an explicit fileset, but unlike similar tasks the implicit fileset is not optional, hence to take advantage of selectors in an explicit fileset you must make the implicit one 'do nothing' - hence the .dummy file here:
<replace dir="${projects.prj.dir}"
replacefilterfile="my.properties">
includes=".dummy" />
<fileset dir="${projects.prj.dir}" includes="*.xml.filtered">
<not>
<different targetdir="${projects.prj.dir}">
<globmapper from="*.xml.filtered" to="*.xml" />
</different>
</not>
</fileset>
</replace>
That will prevent the replace task from needlessly processing files that have previously undergone substitution. It doesn't, however, prevent processing of files that haven't changed and don't need substitution.
Beyond that, I'm not sure there is a way to 'code golf' this problem to reduce the number of steps to one.
There isn't a multiple string replacement filter that can be used in a copy task to achieve the same affect as replace, which is a shame because that feels like it would be the right solution.
One other approach would be to generate the xml for a series of replace string filters and then have Ant execute that. But that will be more complex than the existing solution, and prone to problems with replacement strings that, if pasted into an xml fragment will result in something that can't be parsed.
Yet another approach would be to write a custom task or script task to do the work. If there are many files and the copy-replace solution is judged to be too slow, then this might be the way to go. But again, that approach is less simple than the existing solution.
If the requirement is to minimise the work done in the processing, rather than to come up with the shortest Ant solution, then this approach might do.
Make a fileset containing a list of inputs that have changed.
From that fileset create a comma-separated list of corresponding filtered files.
Carry out the copy on the fileset.
Carry out the replace on the comma-separated list.
A wrinkle here is that the implicit fileset in the replace task will fall back to processing everything if no files have changed. To overcome this we insert a dummy file name.
<fileset id="changed" dir="${projects.prj.dir}" includes="*.xml">
<depend targetdir="${projects.prj.dir}">
<globmapper from="*.xml" to="*.xml.filtered" />
</depend>
</fileset>
<pathconvert property="replace.includes" refid="changed">
<map from=".xml" to=".xml.filtered" />
</pathconvert>
<copy todir="${projects.prj.dir}" preservelastmodified="true">
<fileset refid="changed" />
<globmapper from="*.xml" to="*.xml.filtered" />
</copy>
<replace dir="${projects.prj.dir}"
replacefilterfile="my.properties"
includes=".dummy,${replace.includes}" summary="true" />

Text manipulation in ant

Given an ant fileset, I need to perform some sed-like manipulations on it, condense it to a multi-line string (with effectively one line per file), and output the result to a text file.
What ant task am I looking for?
The Ant script task allows you to implement a task in a scripting language. If you have JDK 1.6 installed, Ant can execute JavaScript without needing any additional dependent libraries. The JavaScript code can read a fileset, transform the file names, and write them to a file.
<fileset id="jars" dir="${lib.dir}">
<include name="*.jar"/>
</fileset>
<target name="init">
<script language="javascript"><![CDATA[
var out = new java.io.PrintWriter(new java.io.FileWriter('jars.txt'));
var iJar = project.getReference('jars').iterator();
while (iJar.hasNext()) {
var jar = new String(iJar.next());
out.println(jar);
}
out.close();
]]></script>
</target>
Try the ReplaceRegExp optional task.
ReplaceRegExp is a directory based task for replacing the occurrence of a given regular expression with a substitution pattern in a selected file or set of files.
There are a few examples near the bottom of the page to get you started.
Looks like you need a conbination of tasks:
This strips the '\r' and '\n' characters of a file and load it to a propertie:
<loadfile srcfile="${src.file}" property="${src.file.contents}">
<filterchain>
<filterreader classname="org.apache.tools.ant.filters.StripLineBreaks"/>
</filterchain>
</loadfile>
After loading the files concatenate them to another one:
<concat destfile="final.txt">
...
</concat>
Inside concat use a propertyset to reference the files content:
<propertyset id="properties-starting-with-bar">
<propertyref prefix="src.file"/>
</propertyset>
rodrigoap's answer is enough to build a pure ant solution, but it's not clean enough for me and would be some very complicated ant code, so I used a different method: I subclassed ant's echo task to make an echofileset task, which takes a fileset and a mapper. Subclassing echo buys me the ability to output to a file. A regexmapper performs the transformation on filenames that I need. I hardcoded it to print out each file on a separate line, but if I needed more flexibility I could add an optional separator attribute. I also thought about providing the ability to output to a property, too, but it turned out I didn't need it since I echo'ed straight to a file.

Resources