How to extend a search for elements and values to attributes? - xml-parsing

I have an XML document like below. I am trying to get key/value pair for all elements and attributes as well. The query I have gets all elements names and element values. However, I would like to get attribute names and attribute values too (preferably in the same query, otherwise in a different query).
XML Document:
<bookstore>
<book category="COOKING">
<title lang="en">Cook Book</title>
<author>Chef Author</author>
<year>2015</year>
<price>310.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Kid Story Book</title>
<author>KJ Banter</author>
<year>2010</year>
<price>229.99</price>
</book>
</bookstore>
SQL Query in Oracle:
WITH tab AS
(SELECT xmltype('
<bookstore>
<book category="COOKING">
<title lang="en">Cook Book</title>
<author>Chef Author</author>
<year>2015</year>
<price>310.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Kid Story Book</title>
<author>KJ Banter</author>
<year>2010</year>
<price>229.99</price>
</book>
</bookstore>
') col
FROM dual
)
SELECT nodepath,nodevalue
FROM tab t,
xmltable('
for $i in $tmp/descendant::*
where $i/text() != ""
return <R><P>{string-join($i/ancestor-or-self::*/name(), "/")}</P><V>{$i/text()}</V></R>'
passing t.col AS "tmp" columns
nodepath varchar2(1000) path '//P',
nodevalue varchar2(1000) path '//V')

Given you want them in document order, descend for each retrieved element and follow its self- and attribute axis. You need to retrieve element and attribute values differently, so apply a typeswitch:
for $node in /descendant::*/(., #*)
let $value := (
typeswitch ($node)
case element() return $node/text()
case attribute() return data($node)
default return ()
)
where $value
return <R>
<P>{
string-join(($node/name(), $node/ancestor::*/name()), "/")
}</P>
<V>{ $value }</V>
</R>
Also observe I changed the ancestor-or-self step to an ancestor step and explicitly return $node's name, so we don't have to distinguish between elements and attributes again in the ancestor axis.

Related

xslt2: sequence of attribute nodes

This is not really a question but an astonishing xslt2 experience that I like to share.
Take the snippet (subtract one set from another)
<xsl:variable name="v" as="node()*">
<e a="a"/>
<e a="b"/>
<e a="c"/>
<e a="d"/>
</xsl:variable>
<xsl:message select="$v/#a[not(.=('b','c'))]"/>
<ee>
<xsl:sequence select="$v/#a[not(.=('b','c'))]"/>
</ee>
What should I expect to get?
I expected a d at the console and
<ee>a d</ee>
at the output.
What I got is
<?attribute name="a" value="a"?><?attribute name="a" value="d"?>
at the console and
<ee a="d"/>
at the output. I should have known to take $v/#a as a sequence of attribute nodes to predict the output.
In order to get what I wanted, I had to convert the sequence of attributes to a sequence of strings like:
<xsl:variable name="w" select="$v/#a[not(.=('b','c'))]" as="xs:string*"/>
Questions:
Is there any use of sequences of attributes (or is it just an interesting effect of the node set concept)?
If so, would I be able to enter statically a sequence of attributes like I am able to enter a sequence of strings: ('a','b','c','d')
Is there any inline syntax to convert a sequence of attributes to a sequence of strings? (In order to achieve the same result omitting the variable w)
It seems to be an elegant way for creating attributes using xsl:sequence. Or would that be a misuse of xslt2, not covered by the standard?
As for "Is there any inline syntax to convert a sequence of attributes to a sequence of strings", you can simply add a step $v/#a[not(.=('b','c'))]/string(). Or use a for $a in $v/#a[not(.=('b','c'))] return string($a) and of course in XPath 3 $v/#a[not(.=('b','c'))]!string().
I am not sure what the question about the "use of sequences of attributes" is about, in particular as it then mentions the XPath 1 concept of node sets. If you want to write a function or template to return some original attribute nodes from an input then xsl:sequence allows that. Of course, inside a sequence constructor like the contents of an element, if you look at 10) in https://www.w3.org/TR/xslt20/#constructing-complex-content, in the end a copy of the attribute is created.
As for creating a sequence of attributes, you can't do that in XPath which can't create new nodes, you can however do that in XSLT:
<xsl:variable name="att-sequence" as="attribute()*">
<xsl:attribute name="a" select="1"/>
<xsl:attribute name="b" select="2"/>
<xsl:attribute name="c" select="3"/>
</xsl:variable>
then you can use it elsewhere, as in
<xsl:template match="/*">
<xsl:copy>
<element>
<xsl:sequence select="$att-sequence"/>
</element>
<element>
<xsl:value-of select="$att-sequence"/>
</element>
</xsl:copy>
</xsl:template>
and will get
<example>
<element a="1" b="2" c="3"/>
<element>1 2 3</element>
</example>
http://xsltfiddle.liberty-development.net/jyyiVhg
XQuery has a more compact syntax and in contrast to XPath allows expressions to create new nodes:
let $att-sequence as attribute()* := (attribute a {1}, attribute b {2}, attribute c {3})
return
<example>
<element>{$att-sequence}</element>
<element>{data($att-sequence)}</element>
</example>
http://xqueryfiddle.liberty-development.net/948Fn56

XSLT: Using a key with a result tree fragment?

Following on from my earlier question, the p elements I want to apply the answer to are actually in a result tree fragment.
How do I make the key function:
<xsl:key name="kRByLevelAndParent" match="p"
use="generate-id(preceding-sibling::p
[not(#ilvl >= current()/#ilvl)][1])"/>
match against p elements in a result tree fragment?
In that answer the key is used via apply-templates:
<xsl:template match="/*">
<list>
<item>
<xsl:apply-templates select="key('kRByLevelAndParent', '')[1]" mode="start">
<xsl:with-param name="pParentLevel" select="$pStartLevel"/>
<xsl:with-param name="pSiblings" select="key('kRByLevelAndParent', '')"/>
</xsl:apply-templates>
</item>
</list>
</xsl:template>
I'd like to pass my result tree fragment as a parameter, and have the key match p elements in that.
Is this the right way to think about it?
There are no result tree fragments in XSLT 2.0 and later, you simply have temporary trees. As for keys, they apply to each document and the key function simply has a third argument to pass in the root node or subtree to search so assuming you have your temporary tree $var you can use key('keyname', key-value-expression, $var) to find elements in $var.

XSLT2 context for for-each-group attributes

I am fairly experienced with XSLT1, and have been starting to work with XSLT2. Most of the changes are easy enough to understand, but I am having a little trouble understanding how the attributes on a for-each-group element are evaluated in xslt2. Suppose that I have the following input document
<root>
<item>1</item>
<item>2</item>
<item>3</item>
<item>4</item>
<item>5</item>
<notanitem>abc</notanitem>
<item>6</item>
<item>7</item>
</root>
The following stylesheet
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<result>
<xsl:for-each-group select="root/item" group-by="(position() - 1) idiv 3">
<row>
<xsl:for-each select="current-group()">
<cell><xsl:value-of select="."/></cell>
</xsl:for-each>
</row>
</xsl:for-each-group>
</result>
</xsl:template>
</xsl:stylesheet>
groups the items into rows with 3 cells each, producing
<result>
<row>
<cell>1</cell>
<cell>2</cell>
<cell>3</cell>
</row>
<row>
<cell>4</cell>
<cell>5</cell>
<cell>6</cell>
</row>
<row>
<cell>7</cell>
</row>
</result>
Thus only the item elements are being counted in assigning positions.
Now, if I change the for-each-group to use group-starting-with="*[position() mod 3 = 1]". I get items 1, 2, and 3 in the first row; items 4 and 5 in the second row; and items 6 and 7 in the third row (so the notanitem element is being counted in assigning positions).
It seems that in the first case, the position function is evaluated only on the items, yet in the second case, the position function is evaluated on the entire document (or the children under the root element).
Is this the case? Is the group-by evaluation limited only to the items that actually are selected by the for-each-group construct, but the group-starting-with evaluation is based on the entire subtree that those elements are in (including unselected elements)? What is the context that those two attributes are evaluated in?
I feel that I almost have the right idea of how these work, but this is confusing me as I can't quite see the right way to interpret this in the specification, and other questions that I have looked at don't seem to be asking or answering about this.
The group-by is an expression computing grouping keys, the group-starting-with however is a pattern (you know from XSLT 1.0 as the match attribute of xsl:template or of xsl:key).
If you use group-starting-with="item[position() mod 3 = 1]" instead, then you can see the same result as with your group-by.
As for the evaluation of the group-by, see https://www.w3.org/TR/xslt20/#grouping which says
When calculating grouping keys for an item in the population, the
expression contained in the group-by or group-adjacent attribute is
evaluated with that item as the context item, with its position in
population order as the context position, and with the size of the
population as the context size. The resulting sequence is atomized,
and each atomic value in the atomized sequence acts as a grouping key
for that item in the population.
The population https://www.w3.org/TR/xslt20/#dt-population is defined as the sequence selected by the for-each-group select="expression":
[Definition: The sequence of items to be grouped, which is referred to
as the population, is determined by evaluating the XPath expression
contained in the select attribute.]
As for the position() call inside the pattern of your group-starting-with, you first need to look at the pattern and obviously * matches all elements and not only item elements.

Understand the response of Open street map query response

I am trying to get the speed limit of the surrounding locations of a specific coordinate.
OSM Query: www.overpass-api.de/api/xapi?*[maxspeed=*][bbox=5.6283473,50.5348043,5.6285261,50.534884]
Response:
<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="Overpass API">
<note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note>
<meta osm_base="2015-06-09T07:04:02Z"/>
<node id="21265775" lat="50.5350159" lon="5.6293520"/>
<node id="21265776" lat="50.5346804" lon="5.6276238"/>
<node id="1312239857" lat="50.5347491" lon="5.6278274"/>
<node id="1312239864" lat="50.5348877" lon="5.6286790">
<tag k="highway" v="crossing"/>
<tag k="traffic_calming" v="table"/>
</node>
<node id="2025084669" lat="50.5353414" lon="5.6303289">
<tag k="highway" v="traffic_calming"/>
<tag k="traffic_calming" v="choker"/>
</node>
<node id="3362188585" lat="50.5345623" lon="5.6274183">
<tag k="highway" v="traffic_calming"/>
<tag k="traffic_calming" v="choker"/>
</node>
<way id="191950462">
<nd ref="2025084669"/>
<nd ref="21265775"/>
<nd ref="1312239864"/>
<nd ref="1312239857"/>
<nd ref="21265776"/>
<nd ref="3362188585"/>
<tag k="highway" v="secondary"/>
<tag k="maxspeed" v="30"/>
<tag k="name" v="Rue d'Esneux"/>
<tag k="source:maxspeed" v="school zone"/>
</way>
</osm>
This is in case of bounding box (bbox: I am guessing these are the corner coordinates or the API makes a box or polygon based on the provided coordinates). But the issue is, I have only one coordinate and another issue is, I see maxspeed = 30 in the response. But not sure what kind of code should I write to parse this response as the response format may change. I am using objective C platform to parse this response.
The format of the response is regular XML. For understanding it you should read about OSM's elements.
Your response contains one way and several nodes as well as their tags. But it could contain more than a single way when querying a different bounding box.
The way has a maxspeed tag in which you seem to be interested. The way geometry is defined by its nodes. The way referes six different nodes via <nd ref="<node ID>"/>. Each <node> has a unique ID and a coordinate specified via lat and lon. The way geometry is defined by the order in which it references its nodes, not the order in which the nodes appear in the response file! In your specific case, the way starts at the node with ID 2025084669 and ends at the node with ID 3362188585. Also keep in mind that a single way can refer the same node more than once (e.g. if it is a roundabout). And that a single node can be referenced by more than one way (e.g. if it is a junction).
Understanding these primitives might get easier for you if you create an OSM account and try one of the map editors.
Regarding JSON output: I suggest to get rid of the XAPI compatibility call and instead start using Overpass XML or Overpass QL which are much more powerful (see the language guide): raw data, query and data on overpass turbo. Note that the bounding box format here differes from the ordering in the XAPI syntax.

regular expression in XPATH

How can I use the match function of the XPATH to search for whole words in an XML tag?
The follow code return "unknown method matches " :
XML_Doc:=CreateOleObject('Msxml2.DOMDocument.6.0') as IXMLDOMDocument3;
XML_DOC.selectNodes('/DATI/DATO[matches(TEST_TAG,"\bTest\b")]');
Example XML FILE
<DATI>
<DATO>
<TEST_TAG>Test</TEST_TAG>
</DATO>
<DATO>
<TEST_TAG>Test21</TEST_TAG>
</DATO>
<DATO>
<TEST_TAG>Abc</TEST_TAG>
</DATO>
</DATI>
matches is XPath 2 and Msxml only supports XPath 1.
As far as I know there is no library supporting XPath 2 for Delphi. (although I wrote a XPath 2 library for Freepascal, it should be not so difficult to port)
You could use
/DATI/DATO[not(contains(TEST_TAG," "))]
to find words that do not contain a space, which is XPath 1.
Suppose that by "word" you mean:
Starting with a Latin alphabet letter and all characters contained are either latin letters or decimal digits,
one can use an XPath expression to find exactly these:
//TEST_TAG
[contains('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
substring(.,1,1)
)
and
not(
translate(.,
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
'')
)
]
XSLT-based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/*">
<xsl:copy-of select=
"//TEST_TAG
[contains('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
substring(.,1,1)
)
and
not(
translate(.,
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
'')
)
]
"/>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document (the provided one, but with an illegal "word" added):
<DATI>
<DATO>
<TEST_TAG>Test</TEST_TAG>
</DATO>
<DATO>
<TEST_TAG>#$%Test21</TEST_TAG>
</DATO>
<DATO>
<TEST_TAG>Abc</TEST_TAG>
</DATO>
</DATI>
evaluates the above XPath expression and copies the selected elements to the output:
<TEST_TAG>Test</TEST_TAG>
<TEST_TAG>Abc</TEST_TAG>
Do note:
The currently-accepted answer incorrectly produces this:
<TEST_TAG>#$%Test21</TEST_TAG>
as an element whose string value is a "word".

Resources