VBA force SAX to encode UTF-8 and indent - xml-parsing

I'm writing VBA to export xml.
I use SAXXMLReader because I need pretty indented output.
This is what I want the declaration to look like:
<?xml version="1.0" encoding="UTF-8"?>
This is what it turns out as:
<?xml version="1.0" encoding="UTF-16" standalone="no"?>
Why is SAX ignoring my encoding selection and how do I force it to use 8.
Sub XML_Format_Indent_Save(xmlDoc1 As MSXML2.DOMDocument60, strOutputFile As String)
Dim xmlWriter As MSXML2.MXXMLWriter60
Dim strWhole As String
Set xmlWriter = CreateObject("MSXML2.MXXMLWriter")
xmlWriter.omitXMLDeclaration = False
xmlWriter.Indent = True
xmlWriter.Encoding = "utf-8"
With CreateObject("MSXML2.SAXXMLReader")
Set .contentHandler = xmlWriter
.putProperty "http://xml.org/sax/properties/lexical-handler", xmlWriter
.Parse xmlDoc1
End With
strWhole = xmlWriter.output
ExportStringToTextFile strOutputFile, strWhole
End Sub

From what I read in another post, you need to set .byteOrderMark to either true or false, otherwise the .encoding is ignored.

Related

Using Python 3.6 to parse XML how can I determine if an XML tag contains no data

I am trying to learn Python by writing a script that will extract data from multiple records in an XML file. I have been able to find the answers to most of my questions by searching on the web, but I have not found a way to determine if an XML tag contains no data before the getElementsByTagName("tagname")[0].firstChild.data method is used and an AttributeError is thrown when no data is present. I realize that I could write my code with a try and handle the AttributeError but I would rather know that the tag is empty before I try to extract the data an not have to handle the exception.
Here is an example of an XML file that contains two records one with data in the tags and one with an empty tag.
<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<records>
<rec>
<name>ZYSRQPO</name>
<state>Washington</state>
<country>United States</country>
</rec>
<rec>
<name>ZYXWVUT</name>
<state></state>
<country>Mexico</country>
</rec>
</records>
Here is a sample of the code that I might use to extract the data:
from xml.dom import minidom
import sys
mydoc = minidom.parse('mydataFile.xml')
records = mydoc.getElementsByTagName("rec")
for rec in records:
try:
name = rec.getElementsByTagName("name")[0].firstChild.data
state = rec.getElementsByTagName("state")[0].firstChild.data
country = rec.getElementsByTagName("country")[0].firstChild.data
print('{}\t{}\t{}'.format(name, state, country))
except (AttributeError):
print('AttributeError encountered in record {}'.format(name), file=sys.stderr)
continue
When processing this file no information for the record named ZYXWVUT will be printed except that an exception was encountered. I would like to be able to have a null value for the state name used and the rest of the information printed about this record. Is there a method that can be used to do what I want, so that I could use an if statement to determine whether the tag contained no data before using getElementsByTagName and encountering an error when no data is found?
from xml.dom import minidom
import sys
mydoc = minidom.parse('mydataFile.xml')
records = mydoc.getElementsByTagName("rec")
for rec in records:
name = rec.getElementsByTagName("name")[0].firstChild.data
state = None if len(rec.getElementsByTagName("state")[0].childNodes) == 0 else rec.getElementsByTagName("state")[0].firstChild.data
country = rec.getElementsByTagName("country")[0].firstChild.data
print('{}\t{}\t{}'.format(name, state, country))
Or if there is any chance, that name and country is empty too:
from xml.dom import minidom
import sys
def get_node_data(node):
if len(node.childNodes) == 0:
result = None
else:
result = node.firstChild.data
return result
mydoc = minidom.parse('mydataFile.xml')
records = mydoc.getElementsByTagName("rec")
for rec in records:
name = get_node_data(rec.getElementsByTagName("name")[0])
state = get_node_data(rec.getElementsByTagName("state")[0])
country = get_node_data(rec.getElementsByTagName("country")[0])
print('{}\t{}\t{}'.format(name, state, country))
I tried reedcourty's second suggestion and found that it worked great. But I decided that I really did not want none to be returned if the element was empty. Here is what I came up with:
from xml.dom import minidom
import sys
def get_node_data(node):
if len(node.childNodes) == 0:
result = '*->No ' + node.nodeName + '<-*'
else:
result = node.firstChild.data
return result
mydoc = minidom.parse(dataFileSpec)
records = mydoc.getElementsByTagName("rec")
for rec in records:
name = get_node_data(rec.getElementsByTagName("name")[0])
state = get_node_data(rec.getElementsByTagName("state")[0])
country = get_node_data(rec.getElementsByTagName("country")[0])
print('{}\t{}\t{}'.format(name, state, country))
When this is run against this XML:
<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<records>
<rec>
<name>ZYSRQPO</name>
<country>United States</country>
<state>Washington</state>
</rec>
<rec>
<name></name>
<country>United States</country>
<state>Washington</state>
</rec>
<rec>
<name>ZYXWVUT</name>
<country>Mexico</country>
<state></state>
</rec>
<rec>
<name>ZYNMLKJ</name>
<country></country>
<state>Washington</state>
</rec>
</records>
It produces this output:
ZYSRQPO Washington United States
*->No name<-* Washington United States
ZYXWVUT *->No state<-* Mexico
ZYNMLKJ Washington *->No country<-*

How to use the dart:html library to write files?

Can someone please tell me how to create a file and make it available for download via browser button?
I have read about FileWriter but not found any proper example how to use it
and I found a post How to use the dart:html library to write html files?
but the answer just refers to html5lib which just answers how to parse a String to HTML but not how to save as a file.
help appreciated. Am I missing something or there is no example for that usecase??
Personally, I use a combination of Blob, Url.createObjectUrlFromBlob And AnchorElement (with download and href properties) to create a downloadable file.
Very simple example:
// Assuming your HTML has an empty anchor with ID 'myLink'
var link = querySelector('a#myLink') as AnchorElement;
var myData = [ "Line 1\n", "Line 2\n", "Line 3\n"];
// Plain text type, 'native' line endings
var blob = new Blob(myData, 'text/plain', 'native');
link.download = "file-name-to-save.txt";
link.href = Url.createObjectUrlFromBlob(blob).toString();
link.text = "Download Now!";
As alluded to by Günter Zöchbauer's comment, an alternative to using Blobs is to base64-encode the data and generate a data URL yourself (at least for file sizes that are well within the data URL limits of most browsers). For example:
import 'dart:html';
Uint8List generateFileContents() { ... }
void main() {
var bytes = generateFileContents();
(querySelector('#download-link') as AnchorElement)
..text = 'Download File'
..href = UriData.fromBytes(bytes).toString()
..download = 'filename';
}
Also note that if you do want to use a Blob and want to write binary data from a Uint8List, you are expected to use Blob([bytes], ...) and not Blob(bytes, ...). For example:
var bytes = generateFileContents();
var blob = Blob([bytes], 'application/octet-stream');
(querySelector('#download-link') as AnchorElement)
..text = 'Download File'
..href = Url.createObjectUrlFromBlob(blob)
..download = 'filename';

Parsing response.getEntity(String.class) string to xml with DocumentHelper.parseText() in dom4j

dom4j has no trouble doing
String text = "<person> <name>James</name> </person>";
Document document = DocumentHelper.parseText(text);
What I need is this
String text = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>"+
"<person> <name>James</name> </person>";
Document document = DocumentHelper.parseText(text);
But it throws an exception.
org.dom4j.DocumentException: Error on line 1 of document : parsing initialization error: org.gjt.xpp.XmlPullParserException: only whitespace content allowed outside root element at line 1 and column 1 seen
I find the problem. The before line below is the one that fails. The after line works.
BEFORE
Document output = DocumentHelper.parseText(response.getEntity(String.class));
AFTER
Document output = DocumentHelper.parseText(response.getEntity(String.class).trim());

reading xml with Linq

I cannot figure out how to get the all the ItemDetail nodes in the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<AssessmentMetadata xmlns="http://tempuri.org/AssessmentMetadata.xsd">
<ItemDetails>
<ItemName>I1200</ItemName>
<ISC_Inactive_Codes>NS,NSD,NO,NOD,ND,NT,SP,SS,SSD,SO,SOD,SD,ST,XX</ISC_Inactive_Codes>
<ISC_StateOptional_Codes>NQ,NP</ISC_StateOptional_Codes>
</ItemDetails>
<ItemDetails>
<ItemName>I1300</ItemName>
<ISC_Inactive_Codes>NS,NSD,NO,NOD,ND,NT,SP,SS,SSD,SO,SOD,SD,ST,XX</ISC_Inactive_Codes>
<ISC_StateOptional_Codes>NQ,NP</ISC_StateOptional_Codes>
</ItemDetails>
<ItemDetails>
<ItemName>I1400</ItemName>
<ISC_Active_Codes>NC</ISC_Active_Codes>
<ISC_Inactive_Codes>NS,NSD,NO,NOD,ND,NT,SP,SS,SSD,SO,SOD,SD,ST,XX</ISC_Inactive_Codes>
<ISC_StateOptional_Codes>NQ,NP</ISC_StateOptional_Codes>
</ItemDetails>
</AssessmentMetadata>
I have tried a number of things, I am thinking it might be a namespace issue, so this is my last try:
var xdoc = XDocument.Load(asmtMetadata.Filepath);
var assessmentMetadata = xdoc.XPathSelectElement("/AssessmentMetadata");
You need to get the default namespace and use it when querying:
var ns = xdoc.Root.GetDefaultNamespace();
var query = xdoc.Root.Elements(ns + "ItemDetails");
You'll need to prefix it for any element. For example, the following query retrieves all ItemName values:
var itemNames = xdoc.Root.Elements(ns + "ItemDetails")
.Elements(ns + "ItemName")
.Select(n => n.Value);

Nokogiri with error parsing xml

Why this code always returns zero ?
doc = Nokogiri::XML('<?xml version="1.0" encoding="UTF-8"?><root><l1><x:Menu xmlns:x="http://www.xworld.org/">OK</Menu></l1></root>')
ret = doc.xpath("//Menu")
ret.size() # return zero
I figured out that I have to declare the namespace.
doc.xpath("//x:Menu", "x" => "http://www.xworld.org/").text()
:)

Resources