How can I parse a YouTube Url using XSLT? - parsing

I would like to parse a youtube url using XSLT and get only the Video ID from that URL. What's the best way to do this using XSLT?
So, if the url is: http://www.youtube.com/watch?v=qadqO3TOvbQ&feature=channel&list=UL
I only want qadqO3TOvbQ and put it into an embed code:
<iframe width="560" height="315" src="http://www.youtube.com/embed/qadqO3TOvbQ" frameborder="0" allowfullscreen=""></iframe>

I. This XPath 2.0 expression:
substring-after(tokenize($pUrl, '[?|&]')[starts-with(., 'v=')], 'v=')
produces the wanted, correct result.
Alternatively, one can use the slightly shorter:
tokenize(tokenize($pUrl, '[?|&]')[starts-with(., 'v=')], '=')[2]
Here is a complete XSLT 2.0 transformation:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:param name="pUrl" select=
"'http://www.youtube.com/watch?v=qadqO3TOvbQ&feature=channel&list=UL'"/>
<xsl:template match="/">
<xsl:sequence select=
"tokenize(tokenize($pUrl, '[?|&]')[starts-with(., 'v=')], '=')[2]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on any XML document (not used), the wanted, correct result is produced:
qadqO3TOvbQ
II. This XPath 1.0 expression:
concat
(substring-before(substring-after(concat($pUrl,'&'),'?v='),'&'),
substring-before(substring-after(concat($pUrl,'&'),'&v='),'&')
)
produces the wanted result.
Do note:
Both solutions extract the wanted string even if the query string parameter named v isn't the first one or even in the case when it is the last one.

XSLT/XPath is not best suited to string handling (1.0 especially) but you can achieve what you need by mixing up the substring-after() and substring-before() functions:
<xsl:value-of select="substring-before(substring-after($yt_url, '?v='), '&feature')" />
(assumes the YT URL is stored in an XSLT var, $yt_url, and that it has its & escaped to &).
Demo at this this XML Playground

Related

xslt 2.0: read in text files via collection()

I have a bunch of text files that I'd like to process witth XSLT 2.0.
Here's how I try to read them in:
<xsl:variable name="input" select="collection(iri-to-uri('file:///.?select=*.txt'))" />
However, when I do this:
<xsl:message>
<xsl:sequence select="count($input)"/>
</xsl:message>
It outputs 0. No files are selected.
If I do it like this:
<xsl:variable name="input" select="collection(iri-to-uri('.?select=*.txt'))" />
I get the error that collection should return a node but is returning an xs:string.
What I would like do to is read each file and then iterate over each file and process the text, like this
<xsl:for-each select="unparsed-text($input, 'UTF-8')">
<!-- tokenizing, etc. -->
How would I do that?
You need the XPath 3.0 uri-collection function supported in version="3.0" stylesheet in Saxon 9.7 (all versions including HE) and 9.6 (commercial versions I think):
<xsl:template match="/" name="main">
<xsl:for-each select="uri-collection('.?select=*.txt')!unparsed-text(.)">
<xsl:message select="'Parsed:' || . || '
'"/>
</xsl:for-each>
</xsl:template>
collection is supposed to return a sequence of nodes while uri-collection can access other resources not parsable as XML.
With Altova XMLSpy respectively RaptorXML and XSLT 3.0 you can also use uri-collection, it seems the way to access all .txt files is a bit different from Saxon and you use uri-collection('*.txt') to access all .txt files in the directory.

how to decode base64Binary-to-string in xslt

Hi I am trying to convert my Base64 data to string in xslt but unable to do so.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:saxon="http://saxon.sf.net/">
<xsl:template match="Data">
<xsl:element name="fos1">
<xsl:value-of select="(saxon:base64Binary-to-string(
xs:base64Binary("Data"),
"UTF8"))"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
I have one xml file also where i have stored base64 data the looks like below code
<?xml version="1.0" encoding="UTF-8"?>
<Data>e1xydGYxXGZiaWRpc1xhbnNpXGFuc2ljcGcxMjUyXGRlZmYwXGRlZmxhbmcxMDMxXGRlZmxhbmdmZTEwMzFcZGVmdGFiNzA4e1xmb250dGJse1xmMFxmc3dpc3NcZnBycTJcZmNoYXJzZXQwIFZlcmRhbmE7fX0NClx2aWV3a2luZDRcdWMxXHBhcmRcbHRycGFyXGxhbmcyMDU3XGJcZjBcZnMyMCBGYXJlIFJ1bGVzXHBhcg0KXGIwXHBhcg0KXHBhcg0KXGIgUmVib29raW5nc1xwYXINClxwYXJkXGx0cnBhclxxalxiMCBCb29raW5nIGNoYW5nZXMgYWxsb3dlZCwgdXAgdG8gMzAgbWludXRlcyBwcmlvciB0byBkZXBhcnR1cmUsIGZvciBhIGZlZSBvZiBcYiBFVVIgNjAgLyBDSEYgNzkgLyBHQlAgNTQgLyBVU0QgODMgLyBDWksgMTY4MCAvIFNFSyA2MDAgLyBOT0sgNDkyIC8gUExOIDI3NiAvIEhVRiAxOTgwMFxiMCAgcGVyIHBlcnNvbiBhbmQgbGVnLCB3aGVyZSBhcHBsaWNhYmxlIHBsdXMgZGlmZmVyZW5jZSB0byBhY3R1YWwgZmxpZ2h0IGZhcmUuIENoYW5nZSBvZiB0YXJpZmYgb25seSBhbGxvd2VkIGludG8gbmV4dCBoaWdoZXIgY2F0ZWdvcnkgaWYgaW5pdGlhbCB0YXJpZmYgbm90IGF2YWlsYWJsZS4gSUYgVEhFIE5FVyBGQVJFIElTIExPV0VSIFRIQU4gVEhFIE9SR0lOQUwgT05FLCBOTyBSRUZVTkQgV0lMTCBCRSBHUkFOVEVELlxwYXINClxwYXINClxwYXJkXGx0cnBhclxiIENhbmNlbGxhdGlvbnNccGFyDQpcYjAgQ2FuY2VsbGF0aW9ucyBhcmUgbm90IGFsbG93ZWQuIEZhcmUgaXMgbm9uIHJlZnVuZGFibGUuIElOIENBU0UgT0YgTk9TSE9XLCBQQVggTVVTVCBQVVJDSEFTRSBBTiBFTlRJUkVMWSBORVcgVElDS0VUIC8gT1JJR0lOQUwgUEFJRCBBTU9VTlQgV0lMTCBOT1QgQkUgUkVGVU5ERUQvQ1JFRElURUQuXHBhcg0KXHBhcg0KXGIgTmFtZSBDaGFuZ2VzXHBhcg0KXGIwIE5hbWUgY2hhbmdlcyBhbGxvd2VkLCB1cCB0byAzMCBtaW51dGVzIHByaW9yIHRvIGRlcGFydHVyZSwgZm9yIGEgZmVlIG9mIFxiIEVVUiA2MCAvIENIRiA3OSAvIEdCUCA1NCAvIFVTRCA4MyAvIENaSyAxNjgwIC8gU0VLIDYwMCAvIE5PSyA0OTIgLyBQTE4gMjc2IC8gSFVGIDE5ODAwIFxiMCBwZXIgcGVyc29uIGFuZCBib29raW5nLCB3aGVyZSBhcHBsaWNhYmxlIHBsdXMgZGlmZmVyZW5jZSB0byBhY3R1YWwgZmxpZ2h0IGZhcmUuXHBhcg0KXHBhcg0KXGIgQ2hlY2tlZCBMdWdnYWdlXHBhcg0KXGIwIENoZWNrZWQgbHVnZ2FnZSBvcHRpb25hbCwgc3RhbmRhcmQgYWRkaXRpb25hbCBjb3N0cyBhcHBseS5ccGFyDQpcYiBBVFROISBTbWFydDpcYjAgIE9uZSBwaWVjZSBvZiBjaGVja2VkIGx1Z2dhZ2UgaXMgYXV0b21hdGljYWxseSBpbmNsdWRlZCBwZXIgcGVyc29uICh0b3RhbCB3ZWlnaHQgMjMga2cpLCBmb3IgZnVydGhlciBwaWVjZXMgYW5kIGV4Y2VzcyBsdWdnYWdlLCBzdGFuZGFyZCBhZGRpdGlvbmFsIGNvc3RzIGFwcGx5LlxwYXINClxwYXINClxiIFNlYXRpbmdccGFyDQpcYjAgT3B0aW9uYWwsIHN0YW5kYXJkIGFkZGl0aW9uYWwgY29zdHMgYXBwbHkuXHBhcg0KXGIgQVRUTiEgU21hcnQ6IFxiMCBTZWF0IHJlc2VydmF0aW9uIGlzIGluY2x1ZGVkIHBlciBQZXJzb24sIGFsc28gbW9yZSBsZWdyb29tIHNlYXQgKGlmIGF2YWlsYWJsZSlcYlxwYXINClxiMFxwYXINClxiIE1pbGVzXHBhcg0KXGIwIEluY2x1ZGluZyBtaWxlc1xwYXINClxwYXINClxiIE1lYWxccGFyDQpcYjAgTm90IGluY2x1ZGVkXHBhcg0KXGIgQVRUTiEgU21hcnQ6IFxiMCBTbmFjayBhbmQgZHJpbmsgaW5jbHVkZWRccGFyDQpccGFyDQpccGFyDQpcYiBBVFROISEgRm9yIEJsaW5kIEJvb2tpbmcgRmFyZXMgYW5kIEJvb21lcmFuZyBSZXdhcmQgZmxpZ2h0cyB0aGUgcGFydGljdWxhciBjb25kaXRpb25zIGFjY29yZGluZyB0byBHZXJtYW53aW5ncyBHQ0MgYXBwbHkuXHBhcg0KXHBhcg0KfQ0K</Data>
The call xs:base64Binary("Data") tries to create a base-64 value from the character string "Data", which is not a legal base-64 string. So you should be getting an error message saying roughly that. Hmm. You didn't mention an error message; you didn't explain what exactly you mean by "unable to do so". Next time, provide more information, ok?
If you want to apply the function to the string value of the Data element, bear in mind that you are making the call in an expression located in a template which matches that element. You almost certainly want to replace the call in the current expression with something like xs:base64Binary(.).

Can I apply a character map to a given node?

If I look at the xslt specs it seems a character map applies to the whole document, bit is it also possible to use it on a given node, or within a template ?
Example : I have a node containing look up values, but they might contain characters that don't play well with regular expressions when using it in another template. For now I use a replace functionwhich works well,, but after a few characters that becomes pretty hard to read or maintain. So if I have something like this :
<xsl:variable name="myLookup" select="
replace(
replace(
replace(
replace(
string-join(/*/lookup/*, '|'),
'\[','\\['),
'\]','\\]'),
'\(','\\('),
'\)','\\)')
"/>
is there a way to achieve something like below fictitious example ?
<xsl:character-map name="escapechar">
<xsl:output-character character="[" string="\[" />
<xsl:output-character character="]" string="\]" />
<xsl:output-character character="(" string="\(" />
<xsl:output-character character=")" string="\)" />
</xsl:character-map>
<xsl:variable name="myLookup" select="string-join(/*/lookup/*, '|')" use-character-map="escapechar"/>
I know this is not working at all, it is just to make my request a bit visual.
Any idea ?
I think character maps in XSLT 2.0 are a serialization feature to be applied when a result tree is serialized to a file or stream so I don't see how you could apply one to a certain string or certain node during a transformation.
As for escaping meta characters of regular expression patterns, maybe http://www.xsltfunctions.com/xsl/functx_escape-for-regex.html helps.
Character maps is only a serialization feature, which means that it is only executed when the final output of a transformation is produced. However, you can significantly simplify your current code.
Just use:
replace($pStr, '(\[|\]|\(|\))','\\$1')
Here is a complete example:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:my="my:my">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/*">
<xsl:value-of select="my:escape(.)"/>
</xsl:template>
<xsl:function name="my:escape" as="xs:string">
<xsl:param name="pStr" as="xs:string"/>
<xsl:value-of select="replace($pStr, '(\[|\]|\(|\))','\\$1')"/>
</xsl:function>
</xsl:stylesheet>
When this transformation is applied on the following XML document:
<t>([a-z]*)</t>
the wanted, correct result is produced:
\(\[a-z\]*\)

How can I get xslt to indent xml (from Ant)?

From what I understand having looked around for an answer to this the following should work:
<xslt basedir="..." destdir="..." style="xslt-stylesheet.xsd" extension=".xml"/>
Where xslt-stylesheet.xsd contains the following:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
Unfortunately while most formatting is applied (spaces are stripped, newlines entered, etc.), indentation is not and every element is along the left side in the file. Is this an issue with the xslt processor Ant uses, or am I doing something wrong? (Using Ant 1.8.2).
It might help to set some processor-specific output options, though you should note that these may vary depending on the XSLT processor that you're using.
For example, if you're using Xalan, it defines an indent-amount property, which seems to default to 0.
To override this property at runtime, you can declare xalan namespace in your stylesheet and override using the processor-specific attribute indent-amount in your output element as follows:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xalan">
<xsl:output method="xml"
encoding="UTF-8"
indent="yes"
xalan:indent-amount="2"/>
This example is from the Xalan usage patterns documentation at http://xml.apache.org/xalan-j/usagepatterns.html
If you do happen to be using Xalan, the documentation also says you can change all of the output preferences globally by setting changing the file org/apache/serializer/output_xml.properties in the serializer jar.
In the interest of completeness, the complete set of Xalan-specific xml output properties defined in that file (Xalan 2.7.1) are:
{http://xml.apache.org/xalan}indent-amount=0
{http://xml.apache.org/xalan}content-handler=org.apache.xml.serializer.ToXMLStream
{http://xml.apache.org/xalan}entities=org/apache/xml/serializer/XMLEntities
If you're not using Xalan, you might have some luck looking for some processor-specific output properties in the documentation for your XSLT processor
Different XSLT processors implement indent="yes" in different way. Some indent properly, while others only put the element starting on a new line. It seems that your XSLT processor is among the latter group.
Why is this so?
The reason is that the W3C XSLT Specification allows significant leeway in what indentation could be produced:
"If the indent attribute has the value yes, then the xml output
method may output whitespace in addition to the whitespace in the
result tree (possibly based on whitespace stripped from either the
source document or the stylesheet) in order to indent the result
nicely; if the indent attribute has the value no, it should not
output any additional whitespace. The default value is no. The xml
output method should use an algorithm to output additional whitespace
that ensures that the result if whitespace were to be stripped from
the output using the process described in [3.4 Whitespace Stripping]
with the set of whitespace-preserving elements consisting of just
xsl:text would be the same when additional whitespace is output as
when additional whitespace is not output.
NOTE:It is usually not safe to use indent="yes" with document types that include element types with mixed content."
Possible solutions:
Start using another XSLT processor. For example, Saxon indents quite well.
Remove the <xsl:strip-space elements="*"/> directive. If there are whitespace-only text nodes in the source XML, they would be copied to the output and this may result in a better-looking indented output.
I don't know if ant is OK. But concerning your XSLT :
When you use the copy-of on an element, your XSLT processor does not indent. If you change your XSLT like this, your XSLT processor will may be manage to indent :
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This XSLT will go through the whole XML tree and indents each element it creates.
EDIT after comment :
You can see the following question to change your XSLT processor, maybe it will solve your problem : How to execute XSLT 2.0 with ant?
You can try adding the {http://xml.apache.org/xslt}indent-amount output property in ant, something like this:
<target name="applyXsl">
<xslt in="${inputFile}" out="${outputFile}" extension=".html" style="${xslFile}" force="true">
<outputproperty name="indent" value="yes"/>
<outputproperty name="{http://xml.apache.org/xslt}indent-amount" value="4"/>
</xslt>
</target>

XSLT append URL from relative to absolute

In my XML document, I am pulling the content of a <TextBlock> that contains images. The XML shows:
<img src="/templates_soft/images/facebook.png" alt="twitter" />
When I view the page, the image doesn't show up because it is not at the same path as the original page.
I need to add the rest of the URL for the images to display. Something like http://www.mypage.com/ so that the image displays from http://www.mypage.com/templates_soft/images/facebook.png
Is there a way to do this?
Use:
<img src="{$imageBase/}templates_soft/images/facebook.png" alt="twitter" />
where the xsl:variable named $imageBase is defined to contain the necessary prefix (in your case "http://www.mypage.com").
Here is a complete XSLT solution:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:param name="pimageBase" select="'http://www.mypage.com'"/>
<xsl:template match="img">
<img src="{concat($pimageBase, #src)}" alt="{#alt}"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the following XML document:
<img src="/templates_soft/images/facebook.png" alt="twitter" />
the wanted, correct result is produced:
<img src="http://www.mypage.com/templates_soft/images/facebook.png" alt="twitter"/>
If you go with XSLT, you simply create an XML that contains the entire URL as you desire, you then tag the XSLT up so it contains the "pointers" to the original fields in the XML file. If you are binding to a control, like a Grid, you can row bind and add the information at that point, if it is easier for you to do than XSLT.

Resources