Parse HTML using with an Ant Script - ant

I need to retrieve some values from an HTML file. I need to use Ant so I can use these values in other parts of my script.
Can this even be achieved in Ant?

As stated in the other answers you can't do this in "pure" XML. You need to embed a programming language. My personal favourite is Groovy, it's integration with ANT is excellent.
Here's a sample which retrieves the logo URL, from the groovy homepage:
parse:
print:
[echo]
[echo] Logo URL: http://groovy.codehaus.org/images/groovy-logo-medium.png
[echo]
build.xml
Build uses the ivy plug-in to retrieve all 3rd party dependencies.
<project name="demo" default="print" xmlns:ivy="antlib:org.apache.ivy.ant">
<target name="resolve">
<ivy:resolve/>
<ivy:cachepath pathid="build.path" conf="build"/>
</target>
<target name="parse" depends="resolve">
<taskdef name="groovy" classname="org.codehaus.groovy.ant.Groovy" classpathref="build.path"/>
<groovy>
import org.htmlcleaner.*
def address = 'http://groovy.codehaus.org/'
// Clean any messy HTML
def cleaner = new HtmlCleaner()
def node = cleaner.clean(address.toURL())
// Convert from HTML to XML
def props = cleaner.getProperties()
def serializer = new SimpleXmlSerializer(props)
def xml = serializer.getXmlAsString(node)
// Parse the XML into a document we can work with
def page = new XmlSlurper(false,false).parseText(xml)
// Retrieve the logo URL
properties["logo"] = page.body.div[0].div[1].div[0].div[0].div[0].img.#src
</groovy>
</target>
<target name="print" depends="parse">
<echo>
Logo URL: ${logo}
</echo>
</target>
</project>
The parsing logic is pure groovy programming. I love the way you can easily walk the page's DOM tree:
// Retrieve the logo URL
properties["logo"] = page.body.div[0].div[1].div[0].div[0].div[0].img.#src
ivy.xml
Ivy is similar to Maven. It manages your dependencies on 3rd party software. Here it's being used to pull down groovy and the HTMLCleaner library the groovy logic is using:
<ivy-module version="2.0">
<info organisation="org.myspotontheweb" module="demo"/>
<configurations defaultconfmapping="build->default">
<conf name="build" description="ANT tasks"/>
</configurations>
<dependencies>
<dependency org="org.codehaus.groovy" name="groovy-all" rev="1.8.2"/>
<dependency org="net.sourceforge.htmlcleaner" name="htmlcleaner" rev="2.2"/>
</dependencies>
</ivy-module>
How to install ivy
Ivy is a standard ANT plugin. Download it's jar and place it in one of the following directories:
$HOME/.ant/lib
$ANT_HOME/lib
I don't know why the ANT project doesn't ship with ivy.

Yes this is very possible.
Note that in order to use this solution you will need to set your JAVA_HOME variable to JRE 1.6 or later.
<project name="extractElement" default="test">
<!--Extract element from html file-->
<scriptdef name="findelement" language="javascript">
<attribute name="tag" />
<attribute name="file" />
<attribute name="property" />
<![CDATA[
var tag = attributes.get("tag");
var file = attributes.get("file");
var regex = "<" + tag + "[^>]*>(.*?)</" + tag + ">";
var patt = new RegExp(regex,"g");
project.setProperty(attributes.get("property"), patt.exec(file));
]]>
</scriptdef>
<!--Only available target...-->
<target name="test">
<!--Load html file into property-->
<loadfile srcFile="D:\Tools\CruiseControl\Build\artifacts\RECO\20110831100942\RECO_merged_report.html" property="html.file"/>
<!--Find element with specific tag and save it to property element-->
<findelement tag="title" file="${html.file}" property="element"/>
<echo message="File : ${html.file}"/>
<echo message="Title : ${element}"/>
</target>
</project>
Output : [echo] Title : <title>Test Report</title>,Test Report
As I don't know what exactly variables you were looking for this particular solution will find all elements that you specify in the tag attribute. Of course you could modify the regex to suit your own specific needs.
Also this is pure build.xml ant with no external dependencies whatsoever.

Sure, but you have to write your own task for it. Visit http://ant.apache.org/manual/develop.html#writingowntask for more information about writing own tasks for Ant. In your Ant task you may parse your HTML file as needed.
I claim, that it is not directly possible with "pure" XML (build.xml) to achieve what you want.

Take a look at the (http://ant.apache.org/manual/Tasks/xmlproperty.html) task and see if it'll work for you. It's pretty straight forward:
<xmlProperty file="${html.file}"
prefix="html."/>
After all, HTML is just a subset of XML. I've used it before to do this very task. No need to write your own task or script.

Related

Is it possible to call an Non-Standard/Custom Ant Task from inside a script?

I'm fairly new to ant.
I'm currently working with the IBM MQ product.
As part of this product there are some ant executions that make use of their IBMs own defined Ant Tasks (eg below 'fte:filecopy')
<project xmlns:fte="antlib:com.ibm.wmqfte.ant.taskdefs" name="transfer" default="orchestrate_transfers">
<taskdef resource="net/sf/antcontrib/antcontrib.properties"/>
...
<target name="copy">
<fte:filecopy src="X#Y" dst="X#Y" outcome="await" jobname="test_job" rcproperty="result">
<fte:filespec srcfilespec="/from/here" dstfile="/to/here" overwrite="true" recurse="false"/>
<fte:metadata>
<fte:entry name="mykey" value="myvalue"/>
</fte:metadata>
</fte:filecopy>
</target>
...
I need to execute this and tasks like it, but with a bit of controlling logic that assigns a variable number of metadata entries. I could potentially do this with javascript.
Now, I know I can execute/perform tasks from javascript (eg, The below tasks are equivalent, but one is implemented natively and the other via javascript) but project.createTask("filecopy") creates a null object.
<project name="demo">
<target name="test_task">
<echo message="Hello World"/>
</target>
<target name="test_script">
<script language="script">
var echo = project.createTask("echo")
echo.setMessage("Hello World")
echo.perform()
</script>
</target>
</project>
Does anyone know how you would normally call a custom task from inside the script?

ANT Build: Can the token itself be parsed from other values from within the property file?

Can the token itself be parsed from other values from within the property file?
Is it possible to evaluate the token key, without hardcoding the token? Can the token itself be parsed from other values from within the property file?
For example, if the properties file has the following tokens (test.properties):
module_no = 01
module_code = bb
title_01_aa = ABC
title_02_aa = DEF
title_03_aa = GHI
title_01_bb = JKL
title_02_bb = MNO
title_03_bb = PQR
Contents of build.xml
<?xml version="1.0" encoding="utf-8"?>
<project default="repl">
<property file="test.properties" />
<target name="repl">
<replace file="test.txt" token="module_title" value="title_${module_no}_${module_code}" />
</target>
</project>
Sample content with text:
Welcome to module_title.
The replace task will result in:
Welcome to title_01_bb.
How to achieve this instead?
Welcome to JKL.
This might be very basic, but please do guide me in the right direction. Thank you.
Nested property expansion does not work by default in Ant as described in the documentation:
Nesting of Braces
In its default configuration Ant will not try to balance braces in property expansions, it will only consume the text up to the first closing brace when creating a property name. I.e. when expanding something like ${a${b}} it will be translated into two parts:
the expansion of property a${b - likely nothing useful.
the literal text } resulting from the second closing brace
This means you can't use easily expand properties whose names are given by properties, but there are some workarounds for older versions of Ant. With Ant 1.8.0 and the the props Antlib you can configure Ant to use the NestedPropertyExpander defined there if you need such a feature.
If you check the workarounds link, one solution is to use a macrodef to copy the property:
<property file="test.properties" />
<target name="repl">
<gettitleprop name="titleprop" moduleno="${module_no}" modulecode="${module_code}" />
<replace file="test.txt" token="module_title" value="${titleprop}" />
</target>
<macrodef name="gettitleprop">
<attribute name="name"/>
<attribute name="moduleno"/>
<attribute name="modulecode"/>
<sequential>
<property name="#{name}" value="${title_#{moduleno}_#{modulecode}}"/>
</sequential>
</macrodef>

Phing alternative for fixlastline from ant

I'm rewriting build.xml file from Ant to Phing and everything goes fine with one exception.
I need to add new line at the end of each appended file but I can't find any alternative for fixlastline="true".
In Ant it was
<concat destfile="${libraryFilePrefix}.js" fixlastline="yes">
<!-- many filesets -->
</concat>
In Phing it's like
<append destfile="${libraryFilePrefix}.js">
<!-- many filesets -->
</append>
Is there any attribute that works like fixlastline or maybe I need to find another way to achieve this?
I believe, one of the approaches (and possibly the only one) is applying replaceregexp filter on each fileset. You only need to apply filterchain at the beginning and it will do the job for each fileset, like this:
<append destfile="${libraryFilePrefix}.js">
<filterchain>
<replaceregexp>
<regexp pattern="([^\n])$" replace="$1${line.separator}" ignoreCase="true"/>
</replaceregexp>
</filterchain>
<!-- many filesets -->
</append>
As of Phing 3.x the AppendTask is aware of the fixlastline attribute. Your Ant script provided is now working as expected
<project name="concat-supports-fixlastline" default="concat-fixed-lastline" basedir=".">
<target name="concat-fixed-lastline">
<concat destfile="${libraryFilePrefix}.js" fixlastline="yes">
<!-- many filesets -->
</concat>
</target>
</project>

passing properties defined inside antcall target back to calling target

I'm rather new to Ant but I have experienced it's quite good pattern to create generic ant targets which are to be called with antcall task with varying parameters.
My example is compile target, which compiles multiple systems using complex build command which is a bit different for each system. By using pattern described above it's possible not to create copy paste code for that compile command.
My problem here is, that I'm not aware of any way to pass return value (for example the return value of compiler) back to target which called the antcall task. So is my approach pathological and it's simply not possible to return value from antcall task or do you know any workaround?
Thanks,
Use antcallback from the ant-contrib jar instead of antcall
<target name="testCallback">
<antcallback target="capitalize2" return="myKey">
</antcallback>
<echo>a = ${myKey}</echo>
</target>
<target name="capitalize2">
<property name="myKey" value="it works"/>
</target>
Output:
testCallback:
capitalize2:
[echo] a = it works
BUILD SUCCESSFUL
One approach is to write out a property to a temp file using "echo file= ...." or PropertyFile task. Then read the property back in where required. Kludge but works.
Ant tasks are all about stuff goes in, side effect happens. So trying to program in terms of functions (stuff goes in, stuff comes out) is going to be messy.
That said what you can do is generate a property name per invocation and store the result value in that property. You would need to pass in a indentifier so you do not end up trying to create copies of the same property. Something like this:
<target name="default">
<property name="key" value="world"/>
<antcall target="doSomethingElse">
<param name="param1" value="${key}"/>
</antcall>
<echo>${result-${key}}</echo>
</target>
<target name="doSomethingElse">
<property name="hello-${param1}" value="it works?"/>
</target>
But I believe the more typical approach -instead of antcalls- is to use macros. http://ant.apache.org/manual/Tasks/macrodef.html
Antcall can be used from the ant-contrib jar task.
You can get a similar behaviour with the keyword "depends".
<?xml version="1.0" encoding="UTF-8"?>
<project name="test" default="main">
<target name="main">
<antcall target="build-system-with-depends" />
<!-- wait for different results -->
<waitfor checkevery="1000" checkeveryunit="millisecond" maxwaitunit="millisecond" maxwait="2000">
<available file="dummy.not.present.file" classname="" property=""></available>
</waitfor>
<antcall target="build-system-with-depends" />
</target>
<target name="build-system-with-depends" depends="do-compiler-stuff">
<echo>$${compiler.result}=${compiler.result}</echo>
</target>
<target name="do-compiler-stuff">
<!-- simulate different return states -->
<tstamp>
<format pattern="yyyyMMddHHmmss" property="compiler.result" />
</tstamp>
</target>
</project>

editing the xml using ant task

I was trying to edit the my config.xml file using ant task but I could not do that can anybody tell me How can I edit the xml using ant task automatically so that i dont need to change it manually for every new branch?
The first option to check would be the Ant xslt task. For an introduction to its use see the Ant/XSLT Wikibook.
I've used groovy to do this. groovy is very Java like, so you can create your groovy classes very similarly to a static java method, and have ant call out to your groovy script using the <groovy> task (you will of course need to include the groovy task def).
Because groovy can use Java syntax you can include the org.w3c.com.* libraries to have access to DOM classes.
For example, snippet of code showing the adding a resource ref element to a specifed web.xml file :-
import org.w3c.dom.*;
String web_xml_filename=args[0];
String res_ref_name=args[1];
Document doc = DomHelper.getDoc(web_xml_filename);
Element rootNode=doc.getDocumentElement();
newNode = doc.createElement("resource-ref");
DomHelper.createElement(doc, newNode, "res-ref-name", res_ref_name);
DomHelper.createElement(doc, newNode, "res-type", "javax.sql.DataSource");
DomHelper.createElement(doc, newNode, "description", description);
DomHelper.createElement(doc, newNode, "res-auth", "Container");
rootNode.insertBefore(newNode, nodes.item(0));
DomHelper.writeDoc(doc, web_xml_filename, false);
To call from ant, use the groovy task :-
<groovy src="${e5ahr-groovy.dir}/addResoureRefToJBossWebXML.groovy" classpath="${groovy.dir}">
<arg value="${jboss-web.xml}"/>
<arg value="jdbc/somesource/>
<arg value="java:jdbc/somesource"/>
</groovy>
You can use ReplaceRegExp. Pattern and Expression options won't let you use less than or more than, but it can be replaced with HTML entities. For example:
<replaceregexp byline="true">
<!-- In config.xml this looks like <myVariable></myVariable> -->
<regexp pattern="<myVariable>(.*)</myVariable>" />
<substitution expression="<myVariable>${myVariable.value}</myVariable>" />
<fileset dir="${user.dir}">
<include name="config.xml" />
</fileset>
</replaceregexp>

Resources