Unable to parse XML with multiple namespaces using tcl and tdom - xml-parsing

I am trying to parse a XML using tcl and tdom package. I am having trouble doing this as the node I want to parse is a child to a node with multiple namespaces. How would I be able to parse the realmCode or title element? Below is what I have tried:
package require tdom
set XML {<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="http://www.cerner.com/cda_stylesheet/" ?>
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:sdtc="urn:hl7-org:sdtc" xsi:schemaLocation="urn:hl7-org:v3 ../../../CDA%20R2/cda-schemas-and-samples/infrastructure/cda/CDA.xsd" classCode="DOCCLIN" moodCode="EVN">
<realmCode code="US" />
<title>Discharge Summary</title>
</ClinicalDocument>}
set nsmap {
a urn:hl7-org:v3
x http://www.w3.org/2001/XMLSchema-instance
s urn:hl7-org:sdtc
}
set doc [dom parse $XML]
set root [$doc documentElement]
set node [$root selectNodes -namespaces $nsmap "/a:ClinicalDocument/title"]
#set node [$root selectNodes "/ClinicalDocument/title"] ;# tried this as well - does not work
$doc delete

You need to specify the namespace for every level of the path, not just the root. Use
set title [$root selectNodes -namespaces $nsmap /a:ClinicalDocument/a:title]
set realm [$root selectNodes -namespaces $nsmap /a:ClinicalDocument/a:realmCode/#code]
etc.

This is more for the sake of completeness, and not necessarily recommended. You may instruct tDOM to simply ignore the namespaces. See -ignorexmlns.
Watch:
set doc [dom parse -ignorexmlns $XML]
set root [$doc documentElement]
$root selectNodes "/ClinicalDocument/title"
$root selectNodes "/ClinicalDocument/realmCode/#code"
The consequences are obvious: ambiguity.

Related

Saxon 9.8: Which patterns are supported in EXPath File Module function file:list?

Good afternoon,
I am working with Java Saxon 9.8.0.4. I would like to use EXPath File Module function "file:list" with its third "pattern" parameter. But I am in doubt, which style of pattern is supported.
I read both Saxon documentation and EXPath documentation. But I do not know, which patterns are supported in Saxon 9.8.0.4. It would be great to support regular expression, but I understand it is overkill for most users. I tried several blind tests, but just * and ? wildchars works for me as defined in EXPath documentation.
Yes, I can quite easily do regexp postprocessing in for-each, but to know more about list function could help.
Thank You in advance for Your help, Stepan
P.S: My use-case is to get all files without extension ("test" and not "test.txt") recursively from large and deep directory structure and process all of matching files with XSL-T 3.0. Most of such files have identical fileName and thus I can not do "copy to one folder" pre-processing for Saxon's -s:directory -o:directory one time invocation and invocation of Java (Saxon) for each file is of cource terrible time overhead. So I would like to read all matching files into sequence and process each item of such sequence using for-each (files are text ones and I read them using unparsed-text). And no, GAWK is not solution, as I have all transformation infrastructure from XML to SQL already in XSL-T, because 95 % of files are XMLs.
--ADDED code and explanation below:
Example of my test files.
XML file "a.xml":
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="a.xsl"?>
<root/>
XSL-T file "a.xsl":
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:saxon="http://saxon.sf.net/"
xmlns:expathFile="http://expath.org/ns/file"
exclude-result-prefixes="xs saxon"
version="3.0">
<xsl:output method="text" />
<xsl:template match="/root">
<xsl:variable name="list" select="expathFile:list('C:\temp\temp\test\', false(), '^.*$')"/>
<xsl:for-each select="$list">
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
My folder "C:\temp\temp\test\" contains 6 test files: "a.txt", "b.txt", "c.txt", "e", "f", "g".
But after testing of online Java RegExp tester on "http://www.regexplanet.com/advanced/java/index.html" I have found, that the problem is solely on my side, because Java regular expression behaves little different than PCRE (Perl), sed, gawk regular expressions. So it is my fault and I need to learn Java regular expression.
Saxon uses the same code for this pattern as for the filter in select="pattern" in collection URIs, which is described at http://www.saxonica.com/documentation/index.html#!sourcedocs/collections
Extracting the relevant details:
The pattern used in the select parameter can use glob-like syntax, for
example *.xml selects all files with extension "xml". More generally,
the pattern is converted to a regular expression by prepending "^",
appending "$", replacing "." by "\.", "*" by ".*", and "?" by ".?",
and it is then used to match the file names appearing in the directory
using the Java regular expression rules. So, for example, you can
write ?select=*.(xml|xhtml) to match files with either of these two
file extensions. Note however, that special characters used in the URL
(that is, characters such as backslash and curly braces that are not
allowed in the query part of a URI) must be escaped using the %HH
convention. For example, vertical bar needs to be written as %7C. This
escaping can be achieved using the encode-for-uri() function.
Note that Saxon's collection() function now also supports match=pattern in the URI, where the pattern is a standard XPath 3.1 regular expression.

How to read xml attribute value on apache ant?

I have an xml like below.
<Students college="SGS">
<Student id="001" name="ABC"/>
<Student id="002" name="XYZ"/>
<Students/>
<Students college="SPM">
<Student id="001" name="PQR"/>
<Student id="002" name="LMN"/>
<Students/>
and I want name of the student of the SGS college whose id is 001 using apache ant.
So how can I get this without using extra jar like xmltask.jar etc
The simplest solution is to use XPath to get this information. In Ant there is no built-in task to fetch XML data using XPath expressions. You would need to use tasks provided in external libraries:
https://code.google.com/p/ant-xpath-task/wiki/Introduction
http://ant.apache.org/external.html

How can we use variables in wxl file [duplicate]

I need to use variable in WIX localization file WIXUI_en-us.wxl.
I tried use it like this:
<String Id="Message_SomeVersionAlreadyInstalled" Overridable="yes">A another version of product $(var.InstallationVersionForGUI) is already installed</String>
But it doesn't work. And when I declared property and used it this way:
<String Id="Message_SomeVersionAlreadyInstalled" Overridable="yes">A another version of product [InstallationVersionForGUI] is already installed</String>
doesn't work either.
Where was I wrong?
Thanks for help and your time.
Localization strings are processed at link time, so you can't use $(var) preprocessor variables. Using a [property] reference is supported, as long as the place where the localization string is used supports run-time formatting (e.g., using the Formatted field type).
Your second method should work just fine. This is the same method used by the default .wxl files.
For example, in your .wxl file you would declare your string:
<String Id="Message_Foo">Foo blah blah [Property1]</String>
And in your .wxs file, you declare the property. If you wish, you can declare the property to match a WiX variable (which it sounds like you're trying to do)
<Property Id="Property1">$(var.Property1)</Property>
I was trying to get localization file to use variables. Came across this post:
There are different layers of variables in WiX (candle's preprocessor
variables, Light's WixVariables/localization variables/binder
variables, and MSI's properties). Each have different syntax and are
evaluated at different times:
Candle's preprocessor variables "$(var.VariableName)" are evaluated
when candle runs, and can be set from candle's commandline and from
"" statements. Buildtime environment
properties as well as custom variables can also be accessed similarly
(changing the "var." prefix with other values).
Light's variables accessible from the command-line are the
WixVariables, and accessing them is via the "!(wix.VariableName)"
syntax. To access your variable from your commandline, you would need
to change your String to: This build was prepared on
!(wix.BuildMachine)
If you instead need to have the BuildMachine value exist as an MSI
property at installation time (which is the "[VariableName]" syntax)
you would need to add the following to one of your wxs files in a
fragment that is already linked in:
Now, the environment variable COMPUTERNAME always has held the name of
my build machines in the past, and you can access that this way:
$(env.COMPUTERNAME). So, you can get rid of the commandline addition
to light.exe and change your wxs file like this:
<WixProperty Id="BuildMachine" Value="$(env.COMPUTERNAME)"/>
Preprocessor variables $(var.VariableName) are are processed at link time, so ideally you would use [PropertyName] which would be defined on the main Product element.
The issue sometimes is that property is not yet defined, for instance using the product name on the localization file seems not posible.
This solution was done aiming to only type the product name once given "Super product" as product name:
In case of running through visual studio extension:
Project properties -> Build -> Define variables -> "MyProductName=Super product" (No quotes)
In case of runing from cmd or some other place:
On Light.exe, add -d"MyProductName=Super product"
Into the localization .wxl file:
<String Id="Description" Overridable="yes">Description of !(wix.MyProductName)
to make it more interesting</String>
I have an aditional config file .wxi I include on other files to have some vars, for instance, here i had hardcoded the value but now it's harcoded on the variable definition and I use the given value:
<?xml version="1.0" encoding="utf-8"?>
<Include>
<!-- Define the product name preprocesor variable -->
<?define ProductName="!(wix.ProductNameDefVar)" ?>
<!-- From this point, can use the preprocesor var -->
<?define ProductName_x64="$(var.ProductName) (64bit)" ?>
<?define ProductName_x32="$(var.ProductName) (32bit)" ?>
<?define CompanyDirName = "My company name" ?>
</Include>
Finally, the place where the localization value where the localization text was not interpolating, is like this:
<?xml version="1.0" encoding="UTF-8"?>
<Wix xmlns="http://schemas.microsoft.com/wix/2006/wi">
<!-- Include the config file with the preprocesor var -->
<?include $(sys.CURRENTDIR)\Config.wxi?>
<!-- Main product definition -->
<Product Id="$(var.ProductCode)"
Name="$(var.ProductName)"
Language="!(loc.Language)"
Version="$(var.BuildVersion)"
Manufacturer="!(loc.Company)"
UpgradeCode="$(var.UpgradeCode)">
<!-- Package details -->
<!-- Here, Description was not interpolating -->
<Package InstallerVersion="200"
Compressed="yes"
InstallScope="perMachine"
Platform="$(var.Platform)"
Manufacturer="!(loc.Company)"
Description="!(loc.Description)"
Keywords="!(loc.Keywords)"
Comments="!(loc.Comments)"
Languages="!(loc.Language)"
/>
[...]

Validating XML with an in-memory DTD in C using libxml2

I need to validate XML using DTD stored in memory, i.e. something like the following:
static const char *dtd_str = "<!ELEMENT ...>";
xmlDtdPtr dtd;
dtd = xmlParseMemoryDtd(dtd_str);
XML_PARSE_DTDVALID parser option allows to validate DTD embedded into XML:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE some_tag[
<!ELEMENT some_tag ...>
...
]>
<some_tag>...</some_tag>
So a workaround is to modify in-memory XML. Things become more complicated with
a parser used in "push mode". In push mode we have to detect whether the XML
declaration (<?xml ...?>), or start of the root element, then put our inline
DTD between them.
Could you suggest better solution?
EDIT
A workaround is to validate parsed XML posteriori as Daniel(_DV) suggested below.
Example: main.c, response.xml.
But I was searching for way to "embed" a DTD and validate XML "on-the-fly" while libxml2 parses XML chunk-by-chunk.
The following aproach doesn't work for me:
xmlCtxtUseOptions(ctxt, XML_PARSE_NOENT | XML_PARSE_NOWARNING | XML_PARSE_DTDVALID);
ctxt->sax->internalSubset = ngx_http_file_chunks_sax_internal_subset;
ctxt->sax->externalSubset = NULL;
$ ./parsexml
validity error : Validation failed: no DTD found !
<response>
^
Document is not valid
xmlValidateDtd allows to do DTD validation a posteriori of an already parsed XML document
to make sure it validates against the DTD. This will not use the internal subset...
http://xmlsoft.org/html/libxml-valid.html#xmlValidateDtd
See xmllint.c code in libxml2 for a full example of how to use it,
Daniel

How to use multiple dtd files in a single xul?

In my firefox addon, I have a few !ENTITIYs that I want to put in a "global.dtd" file. Then, in all of my .xul files, I want to access both that global.dtd as well as that .xul file's particular .dtd file.
Thus, for code1.xul, I would load code1.dtd and global.dtd.
Then, for code2.xul, I would load code2.dtd and global.dtd.
Thus, I could assure the same strings to be used in the global.dtd.
Can I do this? How do I write the definition?
You can't put two doctypes.
<!DOCTYPE overlay SYSTEM "chrome://myaddon/locale/global.dtd">
<!DOCTYPE overlay SYSTEM "chrome://myaddon/locale/code1.dtd">
Use a parameter entity in any DTD that you want to use global.dtd in.
For example, you would add this to code1.dtd and code2.dtd:
<!ENTITY % global SYSTEM "global.dtd">
%global;
You'll have to adjust the SYSTEM identifier to point to the location of global.dtd.
You can also can import multiple DTDs in the same XUL file. It looks like this:
<!DOCTYPE some_name [
<!ENTITY % firstDTD SYSTEM "chrome://extension/locale/first.dtd">
%firstDTD;
<!ENTITY % secondDTD SYSTEM "chrome://extension/locale/pref/second.dtd">
%secondDTD;
]>

Resources