Convert a large XML File (over 3G) to comma delimited file - saxon

I need to convert a large XML file (over 3G) into a comma delimited file. I created an XSL file to convert it. Unfortunately, the file is too large to process using XSLT 1.0. I tried using XSLT 3.0 (Saxon), but I get the error "XTSE3430: Template rule is not streamable".
Script:
java -cp saxon9ee.jar net.sf.saxon.Transform -t -s:costing.xml -xsl:costing.xsl -o:costing.csv
Error Message:
Java version 1.8.0_191
Using license serial number
Stylesheet compilation time: 345.113654ms
Processing file:costing.xml
Streaming file:costing.xml
Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
URIResolver.resolve href="" base="file:costing.xsl"
Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
Building tree for file:costing.xsl using class net.sf.saxon.tree.tiny.TinyBuilder
Tree built in 5.206935ms
Tree size: 237 nodes, 104 characters, 25 attributes
Error on line 71 of costing.xsl:
XTSE3430: Template rule is not streamable
* Operand {($currNode/element())/element()} of {let $vv:v0 := ...} selects streamed
nodes in a context that allows arbitrary navigation (line 86)
Template rule is not streamable
* Operand {($currNode/element())/element()} of {let $vv:v0 := ...} selects streamed nodes in a context that allows arbitrary navigation (line 86)
XML Structure:
<?xml version="1.0" encoding="UTF-8"?>
<DATA_DS>
<COSTREPORT>
<DR>
<PSU>ABC</PSU>
<TRU>ABC</TRU>
<CA>0</CA>
<DA>0.00</DA>
<UOM>ABC</UOM>
<FN>0</FN>
<RID>0</RID>
<SD>2018-10-25</SD>
<DN>ABC</DN>
<ETD>2018-10-31</ETD>
<DID>0</DID>
<LN>ABC</LN>
<LID>0</LID>
<PN>ABC</PN>
<EN>Jane Doe</EN>
<EID>0</EID>
<ELN>ABC</ELN>
<ELV>ABC</ELV>
<RELA>1234</RELA>
<ETM>A0</ETM>
<ASG>A0</ASG>
<MN>ABC</MN>
<CRY>ABC</CRY
><IVN>ABC</IVN>
<AD>2018-10-31</AD>
<CID>0</CID>
<CCN>ABC</CCN
><BOC>ABC</BOC>
<SG1>0</SG1>
<SG2>0</SG2>
<SG3>0</SG3>
<SG4>0</SG4>
<SG5>0</SG5>
<SG9>0</SG9>
<SG10>0</SG10>
<TRUID>0</TRUID>
</DR>
<DR>
[...]
</DR>
[...]
</COSTREPORT>
</DATA_DS>
XSL file:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:mode streamable="yes" />
<xsl:output method="text" />
<xsl:variable name="delimiter" select="','" />
<!-- define an array containing the fields we are interested in -->
<xsl:variable name="fieldArray">
<field>PSU</field> <!-- string -->
<field>TRU</field> <!-- string -->
<field>CA</field> <!-- number -->
<field>DA</field> <!-- number -->
<field>UOM</field> <!-- string -->
<field>FN</field> <!-- number -->
<field>RID</field> <!-- number -->
<field>SD</field> <!-- date -->
<field>DN</field> <!-- string -->
<field>ETD</field> <!-- date -->
<field>DID</field> <!-- number -->
<field>LN</field> <!-- string -->
<field>LID</field> <!-- number -->
<field>PN</field> <!-- string -->
<field>EN</field> <!-- string -->
<field>EID</field> <!-- number -->
<field>ELN</field> <!-- string -->
<field>ELV</field> <!-- string -->
<field>RELA</field> <!-- number -->
<field>ETM</field> <!-- string -->
<field>ASG</field> <!-- string -->
<field>MN</field> <!-- string -->
<field>CRY</field> <!-- string -->
<field>IVN</field> <!-- string -->
<field>AD</field> <!-- date -->
<field>CID</field> <!-- number -->
<field>CCN</field> <!-- string -->
<field>BOC</field> <!-- string -->
<field>SG1</field> <!-- number -->
<field>SG2</field> <!-- number -->
<field>SG3</field> <!-- number -->
<field>SG4</field> <!-- number -->
<field>SG5</field> <!-- number -->
<field>SG9</field> <!-- number -->
<field>SG10</field> <!-- number -->
<field>TRUID</field> <!-- number -->
</xsl:variable>
<xsl:param name="fields" select="document('')/*/xsl:variable[#name='fieldArray']/*" />
<!-- HEADER -->
<xsl:template match="/">
<!-- output the header row -->
<xsl:for-each select="$fields">
<xsl:if test="position() != 1">
<xsl:value-of select="$delimiter"/>
</xsl:if>
<xsl:value-of select="." />
</xsl:for-each>
<!-- output newline -->
<xsl:text>
</xsl:text>
<xsl:apply-templates select="DATA_DS/COSTREPORT/DR"/>
</xsl:template>
<!-- BODY -->
<xsl:template match="DR">
<xsl:variable name="currNode" select="." />
<!-- output the data row -->
<!-- loop over the field names and find the value of each one in the xml -->
<xsl:for-each select="$fields">
<xsl:if test="position() != 1">
<xsl:value-of select="$delimiter"/>
</xsl:if>
<xsl:value-of select="$currNode/*/*[name() = current()]" />
</xsl:for-each>
<!-- output newline -->
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>

The problem is the variable:
<xsl:variable name="currNode" select="." />
This binds a variable to a streamed input node, which isn't allowed because there's no way Saxon can ensure that your selections from this input node are done "in the right order"; you select children/descendants of this node by name, and the streamability analysis can't establish that these descendants are selected in the order they appear in the input.
The answer is actually simple: change the variable to
<xsl:variable name="currNode" select="copy-of(.)" />
This way, every time you hit a DR element, Saxon will read the subtree rooted at that element and hold it as a tree in memory. Because the variable is now a regular in-memory node, rather than a streamed node, there are no restrictions on how it is used.
Allow me a couple of other comments on your code.
Firstly, the document('') construct that was popular in XSLT 1.0 is now thoroughly obsolete. It's much better to put your lookup data in a global variable and access it directly, using
<xsl:param name="fields" select="$fieldArray/*"/>
The document('') call will actually fail if you try to compile the stylesheet and execute it somewhere other than the original source code location.
Secondly, the code to output the header row:
<xsl:for-each select="$fields">
<xsl:if test="position() != 1">
<xsl:value-of select="$delimiter"/>
</xsl:if>
<xsl:value-of select="." />
</xsl:for-each>
can be simplified to
<xsl:value-of select="$fields" separator="{$delimiter}"/>
Similarly, the code for the data rows:
<xsl:for-each select="$fields">
<xsl:if test="position() != 1">
<xsl:value-of select="$delimiter"/>
</xsl:if>
<xsl:value-of select="$currNode/*/*[name() = current()]" />
</xsl:for-each>
simplifies to
<xsl:value-of select="for $f in $fields return $currNode/*/*[name()=$f]"
separator="{$delimiter}"/>

Related

Duplicate cells being exported as one cell

I'm trying to export a simple spreadsheet where some of the cells are empty and some are full.
Every time this XSL exporter reads a cell which is the same as the cell before it doesn't write it. If there are cells with identical values next to each other they are missed out.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
exclude-result-prefixes="office table text">
<xsl:output method = "xml" indent = "yes" encoding = "UTF-8" omit-xml-declaration = "yes"/>
<xsl:param name="targetURL"/>
<!-- Process the document model -->
<xsl:template match="/">
<data name="SiteData"><xsl:apply-templates select="//table:table"/></data>
</xsl:template>
<xsl:template match="table:table">
<!-- Process all table-rows after the column labels in table-row 1 -->
<xsl:for-each select="table:table-row">
<!-- Process the columns -->
<xsl:for-each select="table:table-cell">
<xsl:value-of select="."/>:</xsl:for-each>
<xsl:value-of select="text:p"/> ;</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
I've tried every possible solution but I just can't work this out. Does anybody have an idea how to write ':' for every cell regardless of if it is empty or duplicated

Using Saxon-ce to display filtered output via a combobox

I am trying to set up a combobox with Saxon-ce whose value filters the display of items in an xml. Here is a sample of the code I have so far.
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ext="http://exslt.org/common"
xmlns:ixsl="http://saxonica.com/ns/interactiveXSLT"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
extension-element-prefixes="ixsl">
<xsl:template match="/">
<!-- set up combo box -->
<xsl:result-document href="#typeControl">
<xsl:text>Type: </xsl:text>
<select class="menuSub_monster" id="typeBox">
<option value="all" selected="'all'">Unsorted</option>
<option value="group">Sorted</option>
<option value="value1">Value1</option>
<option value="value2">Value2</option>
</select>
</xsl:result-document>
<!-- inital display -->
<xsl:call-template name="displayTemplates">
<xsl:with-param name="typeSel" select="'all'"/>
</xsl:call-template>
</xsl:template>
<!-- Change display when we change combo box -->
<xsl:template match="select[#id='typeBox']" mode="ixsl:onchange">
<xsl:variable name="control" select="."/>
<xsl:variable name="typeValue" select="ixsl:get($control,'value')" />
<xsl:call-template name="displayTemplates">
<xsl:with-param name="typeSel" select="$typeValue"/>
</xsl:call-template>
</xsl:template>
<!-- display routine -->
<xsl:template name="displayTemplates">
<xsl:param name="typeSel"/>
<xsl:result-document href="#display" method="ixsl:replace-content" >
<xsl:choose>
<xsl:when test="$typeSel='all'">
<xsl:for-each select="templates/template">
<xsl:sort select="sort_name"/>
... CODE FOR DISPLAY
</xsl:for-each>
</xsl:when>
<xsl:when test="$typeSel='group'">
... CODE FOR DISPLAY
</xsl:when>
<xsl:otherwise>
<xsl:for-each select="templates/template/types[type[text()=$typeSel]]">
<xsl:sort select="../sort_name"/>
...CODE FOR DISPLAY
</xsl:for-each>
</xsl:otherwise>
</xsl:choose>
</div>
</xsl:result-document>
</xsl:template>
The issue I am having is calling the template displayTemplates when the combobox changes value. My transition to the context templates/template is not working since this is part of the XML I am calling with this style sheet. However, match="select[#id='typeBox']" sets the context to an object not in the XML. How can I change my context from the combobox to my XML so the for-each statements in the display routine work correctly?
One way: bind a variable to some node in your XML from which you can successfully navigate to where you need to go. Then use a reference to that variable to move the context back to the other document. (There may be more elegant ways.)

Dynamic line wraping- condition based in XSLT1&2

My output type is text.
I am preparing for Reports.
My text output got to accept only 50 character width after that which has to be wrapped in to the next line.
I have a solution to line wrap for the elements in the text.
Is there any way to to wrap for the entire reports instead of doing for the every line?
Can I do it for the whole document?
I have solutions for line wrap, my problem is that I have many conditions like below:
Firstname lastname route (condition1 ) (condition2) (condition3)
(condition4)..go on...
Let us assume:
First name fixedwidth is 15,
lastname fixed width is 15,city fixed width is 3...
after that condition1 will have 10 width ,condition2 have 15 fixed with then go on...
importantly these conditions are option only...
So 15+emptyspace+15+emptyspace+3 =36 My condition will start from 36 th column..
After the first wrap I got to continue from the same line for the upcoming conditions.
So for the next item i got find the start and end locations.
How to solve this problem ?
xml input:
<?xml version="1.0" encoding="UTF-8"?>
<passengerlist>
<passengers>
<Firstname>JOHNNNNNNNNNNNN</Firstname>
<lastname>MARKKKKKKKKKKKK</lastname>
<comments>abcdefh abc abcde abc dekf jl</comments>
<route>air</route>
</passengers>
<!-- <passengers>
<Firstname>ANTONYYYYYYYYYYY</Firstname>
<lastname>NORMAN</lastname>
<comments>abcdefddddddddghhhhhhhhhhhhhh</comments>
<route>air</route>
</passengers>
<passengers>
<Firstname>BRITTOOOOOOOOOO</Firstname>
<lastname>MARKKKKKKK</lastname>
<comments>abcdedfffghghghghghghghghghghghghgh</comments>
<route>cruise</route>
</passengers> -->
</passengerlist>
XSLT Code:
<!-- For line Wrapping -->
<xsl:template name="callEmpty">
<xsl:param name="callEmpty"/>
<xsl:variable name="LNemptyCheck" select="$callEmpty"></xsl:variable>
</xsl:template>
<xsl:template name="text_wrapper">
<xsl:param name="Text"/>
<xsl:choose>
<xsl:when test="string-length($Text)">
<xsl:value-of select="substring($Text,1,15)"/>
<xsl:if test="string-length($Text) > 15">
<xsl:value-of select="$newline"/>
</xsl:if>
<xsl:call-template name="wrapper_helper">
<xsl:with-param name="Text" select="substring($Text,16)"/>
</xsl:call-template>
</xsl:when>
</xsl:choose>
</xsl:template>
<xsl:template name="wrapper_helper">
<xsl:param name="Text"/>
<xsl:value-of select="substring($Text,1,15)"/>
<xsl:text>
</xsl:text>
<xsl:call-template name="text_wrapper">
<xsl:with-param name="Text" select="substring($Text,15)"/>
</xsl:call-template>
</xsl:template>
<!-- Template for Line wrapping -->
<xsl:template match="/">
<xsl:for-each select="passengerlist/passengers">
<xsl:value-of select="Firstname"/>
<xsl:text> </xsl:text>
<xsl:value-of select="lastname"/>
<xsl:text> </xsl:text>
<xsl:value-of select="route"/>
<xsl:text> </xsl:text>
<xsl:variable name="firstwrap">
<xsl:if test="route='air'">
<xsl:value-of select="Firstname"/>
<xsl:text> </xsl:text>
<xsl:value-of select="comments"/>
</xsl:if>
</xsl:variable>
<xsl:call-template name="text_wrapper">
<xsl:with-param name="Text" select="$firstwrap"/>
</xsl:call-template>
Output:
JOHNNNNNNNNNNNN MARKKKKKKKKKKKK air JOHNNNNNNNNNNNN
abcdefh abc ab
bcde abc dekf jl
MARKKKKKKKKKKKK abcdefh abc ab bcde abc dekf jl
Expected out:
JOHNNNNNNNNNNNN MARKKKKKKKKKKKK air JOHNNNNNNNNNNNN abcdefh abc ab
bcde abc dekf jl MARKKKKKKKKKKKK abcdefh abc abbcde abc dekf jl
Please help me to sort out my problem or tell me Is it possible in XSLT?
I'm not sure what exactly your problem is (I cannot see any significant difference between output you got and output you expected). But I think it is possible make it simpler. I prepared some testing input xml (just very simple for demonstration).
<?xml version="1.0" encoding="UTF-8"?>
<Input>
<Line>Some long text is on the first line.</Line>
<Line>Some longer text is on the second line.</Line>
<Line>But the longest text occures on the third line.</Line>
</Input>
In following xslt I store the result of processing of each line (i.e. copy of its text and append additional text based on some conditions) into a variable. Then I wrap this variable at once using a user function (it could be done with named template as well).
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:my="my-ns">
<xsl:output method="text" />
<xsl:variable name="newLineCharacter" select="'
'" />
<xsl:variable name="maxLineWidth" select="50" />
<xsl:template match="/">
<xsl:apply-templates select="Input/Line" />
</xsl:template>
<xsl:template match="Line">
<!-- Process the line and store the result into variable-->
<xsl:variable name="processedText">
<xsl:value-of select="." />
<xsl:text> </xsl:text>
<xsl:if test="position() >= 1">
<xsl:text>First condition is true. </xsl:text>
</xsl:if>
<xsl:if test="position() >= 2">
<xsl:text>Second condition is true. </xsl:text>
</xsl:if>
<xsl:if test="position() >= 3">
<xsl:text>Third condition is true. </xsl:text>
</xsl:if>
<!-- et cetera, et cetera ...-->
</xsl:variable>
<!-- Wrap the text stored in a variable -->
<xsl:value-of select="my:wrapText($processedText, $maxLineWidth)" />
</xsl:template>
<xsl:function name="my:wrapText">
<xsl:param name="textToBeWrapped" />
<xsl:param name="maximumWidth" />
<xsl:value-of select="substring($textToBeWrapped,1,$maximumWidth)" />
<xsl:value-of select="$newLineCharacter" />
<xsl:if test="string-length($textToBeWrapped) > $maximumWidth">
<!-- Recursive call of my:wrapText to wrap the rest of the text -->
<xsl:value-of select="my:wrapText(substring($textToBeWrapped,$maximumWidth+1), $maximumWidth)" />
</xsl:if>
</xsl:function>
</xsl:stylesheet>
And the output is
Some long text is on the first line. First conditi
on is true.
Some longer text is on the second line. First cond
ition is true. Second condition is true.
But the longest text occures on the third line. Fi
rst condition is true. Second condition is true. T
hird condition is true.
I hope it will meet your needs.

Unsure whether I need a group or a sort or something else

Hi I am an occasional user of XSLT so am probably missing something obvious, but hopefully someone can point it out!
The original XML has the structure;
<test>
<input>a</input>
<input>b</input>
<input>c</input>
<input>d</input>
<input>e</input>
</test>
The XSL file contains the following processing commands;
<xsl:template name="convertInputToNumeric">
<xsl:param name="inputs" />
<xsl:for-each select="input">
<NumericCode>
<xsl:call-template name="toNumericCode">
<xsl:with-param name="type">Input</xsl:with-param>
<xsl:with-param name="" select="." />
</xsl:call-template>
</NumericCode>
</xsl:for-each>
</xsl:template>
the call template 'toNumericCode' takes the current input and looks up in another xml file a numeric representation for the input eg the input 'a' returns the value '001'
<Conversion type="Input">
<Convert>
<FROM>a</FROM>
<TO>001</TO>
</Convert>
<Convert>
<FROM>b</FROM>
<TO>002</TO>
</Convert>
<Convert>
<FROM>c</FROM>
<TO>001</TO>
</Convert>
<Convert>
<FROM>d</FROM>
<TO>001</TO>
</Convert>
<Convert>
<FROM>e</FROM>
<TO>002</TO>
</Convert>
</Conversion>
so running the XSL I currently get
<test>
<NumericCode>001</NumericCode>
<NumericCode>002</NumericCode>
<NumericCode>001</NumericCode>
<NumericCode>001</NumericCode>
<NumericCode>002</NumericCode>
</test>
but actually what I want is that I only get the distinct nodes eg
<test>
<NumericCode>001</NumericCode>
<NumericCode>002</NumericCode>
</test>
I don't know how best to do this as I would want to group based on the numeric code value that is returned from the template 'toNumericCode' rather than the initial input value?
You may use distinct-values(). Have look, I have change your shared template with this one:
<xsl:template name="convertInputToNumeric">
<xsl:param name="inputs" />
<xsl:parm name="abc"><xsl:for-each select="input">
<NumericCode>
<xsl:call-template name="toNumericCode">
<xsl:with-param name="type">Input</xsl:with-param>
<xsl:with-param name="" select="." />
</xsl:call-template>
</NumericCode>
</xsl:for-each>
</xsl:parm>
<xsl:for-each select="distinct-values($abc/NumericCode)">
<NumericCode><xsl:value-of select="."/></NumericCode>
</xsl:for-each>
</xsl:template>
output:
<NumericCode>001</NumericCode><NumericCode>002</NumericCode>

Use a dynamic match in XSLT

I have an external document with a list of multiple Xpath like this:
<EncrypRqField>
<EncrypFieldRqXPath01>xpath1</EncrypFieldRqXPath01>
<EncrypFieldRqXPath02>xpath2</EncrypFieldRqXPath02>
</EncrypRqField>
I use this document to obtain the Xpath of the nodes I want to be modified.
The input XML is:
<Employees>
<Employee>
<id>1</id>
<firstname>xyz</firstname>
<lastname>abc</lastname>
<age>32</age>
<department>xyz</department>
</Employee>
</Employees>
I want to obtain something like this:
<Employees>
<Employee>
<id>XXX</id>
<firstname>xyz</firstname>
<lastname>abc</lastname>
<age>XXX</age>
<department>xyz</department>
</Employee>
</Employees>
The XXX values are the result of a data encryption, I want to dynamically obtain the Xpath from the document and change the value of its node.
Thanks.
I'm not sure if something like this is possible in XSL 2.0. May be in 3.0 there should be some function evaluate() but I don't know any details.
But I tried some workaround and it seems to be functional. Of course it is not perfect and has many limitations in this form (e.g. you need to specify absolute path, you cannot use more complex XPath like //, [], etc.) so consider it just as an idea. But it could be the way in some easier cases.
It is based on comparing of two string instead of evaluation string as XPath.
Simplified xml with xpaths to encrypt (I ommit the number for simplicity).
<?xml version="1.0" encoding="UTF-8"?>
<EncrypRqField>
<EncrypFieldRqXPath>/Employees/Employee/id</EncrypFieldRqXPath>
<EncrypFieldRqXPath>/Employees/Employee/age</EncrypFieldRqXPath>
</EncrypRqField>
And my transformation
<xsl:template match="element()">
<xsl:variable name="pathToElement">
<xsl:call-template name="getPath">
<xsl:with-param name="element" select="." />
</xsl:call-template>
</xsl:variable>
<xsl:choose>
<xsl:when test="$xpaths/EncrypFieldRqXPath[text() = $pathToElement]">
<!-- If exists element with exacty same value as constructed "XPath", ten "encrypt" the content of element -->
<xsl:copy>
<xsl:text>XXX</xsl:text>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:copy>
<xsl:apply-templates />
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<!-- This template will "construct" the XPath for element under investigation. -->
<!-- There might be an easier way (e.g. some build-in function), but it is actually out of my skill. -->
<xsl:template name="getPath">
<xsl:param name="element" />
<xsl:choose>
<xsl:when test="$element/parent::node()">
<xsl:call-template name="getPath">
<xsl:with-param name="element" select="$element/parent::node()" />
</xsl:call-template>
<xsl:text>/</xsl:text>
<xsl:value-of select="$element/name()" />
</xsl:when>
<xsl:otherwise />
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

Resources