I have a large xml file like below
:
:
<CN>222</CN>
<CT>Raam</CT>
:
:
I would like to merge these two elements as
<CN>222 Raam</CN>
then like to convert it as
<div>222 Raam</div>
which is the final output.
Well if all you need is merging the two consecutive elements in a div (I don't understand what the intermediary CN is for) then use
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="CN[following-sibling::*[1][self::CT]]">
<div>
<xsl:value-of select="concat(., ' ', following-sibling::*[1][self::CT])"/>
</div>
</xsl:template>
<xsl:template match="CT[preceding-sibling::*[1][self::CN]]"/>
Related
I need suggestions to merge multiple elements and sibling text nodes as a single element. Refer xref element in the below mentioned sample.
Input: <section><p>These pages are all about XSLT, an XML-based language <xref ref-type="bibr" rid="r1">1</xref><xref ref-type="bibr" rid="r2"/>--<xref ref-type="bibr" rid="r3">3</xref> for translating one set of XML into another set of XML, <xref ref-type="bibr" rid="r3">3</xref>, <xref ref-type="bibr" rid="r5">5</xref><xref ref-type="bibr" rid="r6"/>--<xref ref-type="bibr" rid="r7">7</xref> or into HTML. Of course, there are all sorts of other pages <xref ref-type="bibr" rid="r1">7</xref>, <xref ref-type="bibr" rid="r3">8</xref> around that cover XSLT. <xref ref-type="bibr" rid="r12">12</xref>, <xref ref-type="bibr" rid="r15">15</xref><xref ref-type="bibr" rid="r16"/><xref ref-type="bibr" rid="r17"/><xref ref-type="bibr" rid="r18"/><xref ref-type="bibr" rid="r19"/>--<xref ref-type="bibr" rid="r20">20</xref></p></section>
Output: <section><p>These pages are all about XSLT, an XML-based language <xref ref-type="bibr" rid="r1 r2 r3">1--3</xref> for translating one set of XML into another set of XML, <xref ref-type="bibr" rid="r3 r5 r6 r7">3, 5--7</xref> or into HTML. Of course, there are all sorts of other pages <xref ref-type="bibr" rid="r7 r8">7, 8</xref> around that cover XSLT. <xref ref-type="bibr" rid="r12 r15 r16 r17 r18 r19 r20">12, 15--20</xref></p></section>
The merge should happen only if characters (, )comma with space or ( ) space or (--) two hyphens or empty xref element (<xref ref-type="bibr" rid="r2"/>) appear inbetween xref elements.
E.g.
Input content: <xref ref-type="bibr" rid="r3">3</xref>, <xref ref-type="bibr" rid="r5">5</xref><xref ref-type="bibr" rid="r6"/>--<xref ref-type="bibr" rid="r7">7</xref>
Expected ouput: <xref ref-type="bibr" rid="r1 r2 r3">1--3</xref>
Thanks and Regards
Bala
Using XSLT 2.0, you can find adjacent nodes using for-each-group select="node()" group-adjacent="boolean(self::xref | self::text()[matches(., $pattern)] so you can use an approach like
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[xref]">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:for-each-group select="node()"
group-adjacent="boolean(self::xref | self::text()[matches(., '^[\s\p{P}]+$')])">
<xsl:choose>
<xsl:when test="current-grouping-key()">
<xsl:copy>
<xsl:copy-of select="#* except #rid"/>
<xsl:attribute name="rid" select="current-group()/#rid"/>
<xsl:value-of select="current-group()"/>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
What is the start symbol?
Based on some research "The start symbol we choose should allow the grammar to parse the most input sentences"
Clearly < Var > is NOT a start symbol as it will parse least input sentences, then start symbol is < Var > or < Group > ?
<Group> ::= [ <One>, <Group> ] | <One>
<One> ::= <Var> | ( <Group> )
<Var> ::= a | b | c
Final (start?) symbol is also called an AXIOM.
It is always given explicitly. It should never be deduced. It is decided by the author of the grammar.
Considering the following grammar for propositional logic:
<A> ::= <B> <-> <A> | <B>
<B> ::= <C> -> <B> | <C>
<C> ::= <D> \/ <C> | <D>
<D> ::= <E> /\ <D> | <E>
<E> ::= <F> | -<F>
<F> ::= <G> | <H>
<G> ::= (<A>)
<H> ::= p | q | r | ... | z
Precedence for conectives is: -, /\, /, ->, <->.
Associativity is also considered, for example p\/q\/r should be the same as p\/(q\/r). The same for the other conectives.
I pretend to make a predictive top-down parser in java. I dont see here ambiguity or direct left recursion, but not sure if thats all i need to consider this a LL(1) grammar. Maybe undirect left recursion?
If this is not a LL(1) grammar, what would be the steps required to transform it for my intentions?
It's not LL(1). Here's why:
The first rule of an LL(1) grammar is:
A grammar G is LL(1) if and only if whenever A --> C | D are two distinct productions of G, the following conditions hold:
For no terminal a , do both C and D derive strings beginning with a.
This rule is, so that there are no conflicts while parsing this code. When the parser encounters a (, it won't know which production to use.
Your grammar violates this first rule. All your non-terminals on the right hand of the same production , that is, all your Cs and Ds, eventually reduce to G and H, so all of them derive at least one string beginning with (.
This XPath expression:
for $n in 1 to 5 return $n
Returns
1 2 3 4 5
Is it possible to do something similar with alphabetic characters?
Yep:
for $n in 65 to 70 return fn:codepoints-to-string($n)
returns:
A
B
C
D
E
In ascii/iso-8859-1 at least.
for $n in fn:string-to-codepoints('A') to fn:string-to-codepoints('E')
return fn:codepoints-to-string($n)
should work in any locale.
Or, in XPath 3.0 (XSLT 3.0):
((32 to 127) ! codepoints-to-string(.))[matches(., '[A-Z]')]
Here we don't know whether or not the wanted characters have adjacent character codes (and in many real cases they wouldn't).
A complete XSLT 3.0 transformation using this XPath 3.0 expression:
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:sequence select=
"((32 to 127) ! codepoints-to-string(.))[matches(., '[A-Z]')]
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied (I am using Saxon-EE 9.4.0.6J) on any XML document (not used), the wanted, correct result is produced:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
In case we know the wanted result characters have all-adjacent character codes, then:
(string-to-codepoints('A') to string-to-codepoints('Z')) ! codepoints-to-string(.)
Explanation:
Use of the new XPath 3.0 simple map operator !.
fsyacc is emitting shift/reduce errors for all binary ops.
I have this recursive production:
scalar_expr:
| scalar_expr binary_op scalar_expr { Binary($2, $1, $3) }
Changing it to
scalar_expr:
| constant binary_op constant { Binary($2, Constant($1), Constant($3)) }
eliminates the errors (but isn't what I want). Precedence and associativity are defined as follows:
%left BITAND BITOR BITXOR
%left ADD SUB
%left MUL DIV MOD
Here's an excerpt from the listing file showing the state that produces the errors (one other state has the same errors).
state 42:
items:
scalar_expr -> scalar_expr . binary_op scalar_expr
scalar_expr -> scalar_expr binary_op scalar_expr .
actions:
action 'EOF' (noprec): reduce scalar_expr --> scalar_expr binary_op scalar_expr
action 'MUL' (explicit left 9999): shift 8
action 'DIV' (explicit left 9999): shift 9
action 'MOD' (explicit left 9999): shift 10
action 'ADD' (explicit left 9998): shift 6
action 'SUB' (explicit left 9998): shift 7
action 'BITAND' (explicit left 9997): shift 11
action 'BITOR' (explicit left 9997): shift 12
action 'BITXOR' (explicit left 9997): shift 13
You can see the parser shifts in all cases, which is correct, I think. I haven't found a case where the behavior is incorrect, at least.
How can I restate the grammar to eliminate these errors?
Is binary_op actually a production, i.e. you have something like:
binary_op:
| ADD { OpDU.Add }
| SUB { OpDU.Sub }
...
If so I think that is the problem, since I assume the precedence rules you defined wouldn't be honored in constant binary_op constant. You need to enumerate each scalar_expr pattern explicitly, e.g.
scalar_expr:
| scalar_expr ADD scalar_expr { Binary(OpDU.Add, $1, $3) }
| scalar_expr SUB scalar_expr { Binary(OpDU.Sub, $1, $3) }
...
(I don't think there is any way to abstract away this repetitiveness with FsYacc)
As Stephen pointed out in his answer, the precedence rules for your operators won't apply if you move them to a separate production. This is strange, since the state you posted seems to honor them correctly.
However, you should be able to declare and apply explicit precedence rules, e.g. you could define them as:
%left ADD_SUB_OP
%left MUL_DIV_OP
and apply them like this:
scalar_expr:
| scalar_expr add_sub_ops scalar_expr %prec ADD_SUB_OP { Binary($2, $1, $3) }
| scalar_expr mul_div_ops scalar_expr %prec MUL_DIV_OP { Binary($2, $1, $3) }
This way you still have to define multiple rules, but you can group all operators with the same precedence in a group (Note that the names I chose are a poor example; you may want to use names that reflect the precedence or describe the operators within so it's clear what they indicate).
Whether or not this solution is worth it depends on the number of operators per group I suppose.