xml parsing node value - xml-parsing

I have an xml file to upload in sql
the xml is like
<catalog>
<products>
<product>
<ID>0079</ID>
<NAME>Casa</NAME>
<feature name="material">cemento</feature>
</product>
</products>
</catalog>
I do:
$xml = simplexml_load_file('prova.xml');
$listProducts = $xml->products;
foreach ($listProducts->product as $product)
{
$name = $product->NAME;
$id= $product->ID;
....................
But the prodblem is when I must define the variable "FEATURE"
I want insert in my sql the value "CEMENTO"
How can I do?

See if it works
$feature = $product->feature;
$featureName = $product->feature->attributes()->name;
UPDATE
foreach ($listProducts->product as $product)
{
$name = $product->NAME;
$id= $product->ID;
foreach ($product->feature as $feature)
{
$featureName = $feature->attributes()->name;
}
...

Related

replace all double quotes with nothing in csv file in BIML script

I am importing flatfile connections using BIML.
" is used around text and ; is used as delimiter.
However, in some of the files I see this:
;"this is valid text""";
There are double double quotes with nothing between them. If I edit the file and search & replace all double double quotes with nothing, the import runs well. So, is it in BIML possible to do this action automagically? Search al instances of "" and replace these with ?
<#
string[] myFiles = Directory.GetFiles(path, extension);
string[] myColumns;
// Loop trough the files
int TableCount = 0;
foreach (string filePath in myFiles)
{
TableCount++;
fileName = Path.GetFileNameWithoutExtension(filePath);
#>
<Package Name="stg_<#=prefix#>_<#=TableCount.ToString()#>_<#=fileName#>" ConstraintMode="Linear" AutoCreateConfigurationsType="None" ProtectionLevel="<#=protectionlevel#>" PackagePassword="<#=packagepassword#>">
<Variables>
<Variable Name="CountStage" DataType="Int32" Namespace="User">0</Variable>
</Variables>
<Tasks>
<ExecuteSQL ConnectionName="STG_<#=application#>" Name="SQL-Truncate <#=fileName#>">
<DirectInput>TRUNCATE TABLE <#=dest_schema#>.<#=fileName#></DirectInput>
</ExecuteSQL>
<Dataflow Name="DFT-Transport CSV_<#=fileName#>">
<Transformations>
<FlatFileSource Name="SRC_FF-<#=fileName#> " ConnectionName="FF_CSV-<#=Path.GetFileNameWithoutExtension(filePath)#>">
</FlatFileSource>
<OleDbDestination ConnectionName="STG_<#=application#>" Name="OLE_DST-<#=fileName#>" >
<ExternalTableOutput Table="<#=dest_schema#>.<#=fileName#>"/>
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
<# } #>
Turns out I was looking completely at the wrong place for this.
Went to the part where the file is read and added .Replace("\"\"","")
myColumns = myFile.ReadLine().Replace("""","").Replace(separator,"").Split(delimiter);

PHP XML DOM Document move sub-sub-nodes within sub-node

I have an xml like this:
<?xml version="1.0" encoding="UTF-8"?>
<OrderListResponse>
<OrderListResponseContainer>
<DateFrom>2018-07-01T00:00:00+00:00</DateFrom>
<DateTo>2018-07-19T00:00:00+00:00</DateTo>
<Page>1</Page>
<TotalNumberOfPages>4</TotalNumberOfPages>
<Orders>
<Order>
<OrderID>158772</OrderID>
<Customer>
<Name><![CDATA[John Smith]]></Name>
<StreetAddress><![CDATA[33, Sunset Boulevrd]]></StreetAddress>
</Customer>
<Delivery>
<Name><![CDATA[John Smith]]></Name>
<StreetAddress><![CDATA[47, Rodeo Drive]]></StreetAddress>
</Delivery>
<Billing>
<Name><![CDATA[John Smith]]></Name>
<StreetAddress><![CDATA[33, Sunset Boulevrd]]></StreetAddress>
</Billing>
<Payment>
<Module>paypal</Module>
<TransactionID/>
</Payment>
<DatePurchased>2018-07-01 16:30:42</DatePurchased>
<DateLastModified>2018-07-02 21:08:28</DateLastModified>
<CheckoutMessage><![CDATA[]]></CheckoutMessage>
<Status>cancelled</Status>
<Currency>EUR</Currency>
<Products>
<Product>
<MxpID>44237</MxpID>
<SKU>IRF 8707TR</SKU>
<Quantity>3</Quantity>
<Price>2.46</Price>
</Product>
</Products>
<Total>
<SubTotal>7.38</SubTotal>
<Shipping>2.7</Shipping>
<Cod>0</Cod>
<Insurance>0</Insurance>
<Tax>1.62</Tax>
<Total>11.7</Total>
</Total>
</Order>
<Order>...</Order>
</Orders>
</OrderListResponseContainer>
</OrderListResponse>
and although surely there a better way to do it,
to parse all orders I build a routine like this:
$xmlDoc = new DOMDocument();
$xmlDoc->preserveWhiteSpace = false;
$xmlDoc->loadXML($response);
$xpath = new DOMXPath($xmlDoc);
$rootNode = $xpath->query('//OrderListResponseContainer/Orders')->item(0);
foreach($rootNode->childNodes as $node)
{
foreach($node->childNodes as $subnode)
{
Process User
foreach($subnode->childNodes as $subsubnode)
{
foreach($subsubnode->childNodes as $subsubsubnode)
{
Process Products and Sales
}
}
}
}
**** ADDED ****
I use the nested loops to create one xml for each product (each xml contains details about the buyer, the item and the
sale) and then this xml is passed to a Stored Procedure to generate
the user/item/sale records: For several reason I cannot bulky import
Users first, then Items and then Sales but while building the sale xml
I need some details from the Total Node and one way to get them is to
move Total Node on top of the XML, but clearly within the Order Node
**** ADDED ****
I need to access some Total subnodes before processing Products
The only solution I found is to move Total node at the beginning, but although many attempts, I've not been able to succeed:
The idea was to clone the totalNode and to appendbefore the OrderID Node
The problem is that I need to work on subdocuments and select the node to clone from a node itself, while all example I found do clone the full DocumentElement
perhaps an easier solution can be achieved using XSLT?
Can suggest a solution?
I don't completely understand what you are trying to say about the cloning part. Perhaps you can edit your question and clarify what you mean.
However, about accessing the Total nodes... you could simply use XPath for this as well.
$xmlDoc = new DOMDocument();
$xmlDoc->preserveWhiteSpace = false;
$xmlDoc->loadXML($response);
$xpath = new DOMXPath($xmlDoc);
// first, let's fetch all <Order> elements
$orders = $xpath->query('//OrderListResponseContainer/Orders/Order');
// loop through all <Order> elements
foreach( $orders as $order ) {
/*
There's all sorts of ways you could convert <Total> to something useful
*/
// Example 1.
// fetch <Total> that is a direct child (./) of our context node (second argument) $order
$total = $xpath->query( './Total', $order )->item( 0 );
// then do something like
$subTotal = $total->getElementsByTagName( 'SubTotal' )->item( 0 );
$shipping = $total->getElementsByTagName( 'Shipping' )->item( 0 );
// ... etc. for each child node of <Total>
// or perhaps simply convert it to a SimpleXMLElement
$total = simplexml_import_dom( $total );
var_dump( $total );
// and then access the values like this:
$total->SubTotal;
$total->Shipping;
// ... etc.
// Example 2.1
// fetch all children of <Total> into an array
$total = [];
foreach( $xpath->query( './Total/*', $order ) as $totalNode ) {
$total[ $totalNode->nodeName ] = $totalNode->textContent;
}
var_dump( $total );
// Example 2.2
// fetch all children of <Total> into a stdClass object
$total = new \stdClass;
foreach( $xpath->query( './Total/*', $order ) as $totalNode ) {
$total->{ $totalNode->nodeName } = $totalNode->textContent;
}
var_dump( $total );
/*
Now, after this you can create and process the Customer and Products data
in a similar fashion as I've shown how to process the Total data above
*/
}

Using Python 3.6 to parse XML how can I determine if an XML tag contains no data

I am trying to learn Python by writing a script that will extract data from multiple records in an XML file. I have been able to find the answers to most of my questions by searching on the web, but I have not found a way to determine if an XML tag contains no data before the getElementsByTagName("tagname")[0].firstChild.data method is used and an AttributeError is thrown when no data is present. I realize that I could write my code with a try and handle the AttributeError but I would rather know that the tag is empty before I try to extract the data an not have to handle the exception.
Here is an example of an XML file that contains two records one with data in the tags and one with an empty tag.
<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<records>
<rec>
<name>ZYSRQPO</name>
<state>Washington</state>
<country>United States</country>
</rec>
<rec>
<name>ZYXWVUT</name>
<state></state>
<country>Mexico</country>
</rec>
</records>
Here is a sample of the code that I might use to extract the data:
from xml.dom import minidom
import sys
mydoc = minidom.parse('mydataFile.xml')
records = mydoc.getElementsByTagName("rec")
for rec in records:
try:
name = rec.getElementsByTagName("name")[0].firstChild.data
state = rec.getElementsByTagName("state")[0].firstChild.data
country = rec.getElementsByTagName("country")[0].firstChild.data
print('{}\t{}\t{}'.format(name, state, country))
except (AttributeError):
print('AttributeError encountered in record {}'.format(name), file=sys.stderr)
continue
When processing this file no information for the record named ZYXWVUT will be printed except that an exception was encountered. I would like to be able to have a null value for the state name used and the rest of the information printed about this record. Is there a method that can be used to do what I want, so that I could use an if statement to determine whether the tag contained no data before using getElementsByTagName and encountering an error when no data is found?
from xml.dom import minidom
import sys
mydoc = minidom.parse('mydataFile.xml')
records = mydoc.getElementsByTagName("rec")
for rec in records:
name = rec.getElementsByTagName("name")[0].firstChild.data
state = None if len(rec.getElementsByTagName("state")[0].childNodes) == 0 else rec.getElementsByTagName("state")[0].firstChild.data
country = rec.getElementsByTagName("country")[0].firstChild.data
print('{}\t{}\t{}'.format(name, state, country))
Or if there is any chance, that name and country is empty too:
from xml.dom import minidom
import sys
def get_node_data(node):
if len(node.childNodes) == 0:
result = None
else:
result = node.firstChild.data
return result
mydoc = minidom.parse('mydataFile.xml')
records = mydoc.getElementsByTagName("rec")
for rec in records:
name = get_node_data(rec.getElementsByTagName("name")[0])
state = get_node_data(rec.getElementsByTagName("state")[0])
country = get_node_data(rec.getElementsByTagName("country")[0])
print('{}\t{}\t{}'.format(name, state, country))
I tried reedcourty's second suggestion and found that it worked great. But I decided that I really did not want none to be returned if the element was empty. Here is what I came up with:
from xml.dom import minidom
import sys
def get_node_data(node):
if len(node.childNodes) == 0:
result = '*->No ' + node.nodeName + '<-*'
else:
result = node.firstChild.data
return result
mydoc = minidom.parse(dataFileSpec)
records = mydoc.getElementsByTagName("rec")
for rec in records:
name = get_node_data(rec.getElementsByTagName("name")[0])
state = get_node_data(rec.getElementsByTagName("state")[0])
country = get_node_data(rec.getElementsByTagName("country")[0])
print('{}\t{}\t{}'.format(name, state, country))
When this is run against this XML:
<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<records>
<rec>
<name>ZYSRQPO</name>
<country>United States</country>
<state>Washington</state>
</rec>
<rec>
<name></name>
<country>United States</country>
<state>Washington</state>
</rec>
<rec>
<name>ZYXWVUT</name>
<country>Mexico</country>
<state></state>
</rec>
<rec>
<name>ZYNMLKJ</name>
<country></country>
<state>Washington</state>
</rec>
</records>
It produces this output:
ZYSRQPO Washington United States
*->No name<-* Washington United States
ZYXWVUT *->No state<-* Mexico
ZYNMLKJ Washington *->No country<-*

Nokogiri get xpath from Nokogiri::XML::Element

How to get xpath for rc an element returned by search
f=File.open('/media/cc.xml')
doc = Nokogiri::XML f
rc = doc.search('realmCode')
[#<Nokogiri::XML::Element:0x15a4d714 name="realmCode" namespace=#<Nokogiri::XML::Namespace:0x15a4dafc href="urn:hl7-org:v3"> attributes=[#<Nokogiri::XML::Attr:0x15a4d5c0 name="code" value="US">]>]
Is there a way to extract xpath from rc ?
updated with xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/ccda.xsl"?>
<ClinicalDocument xmlns:sdtc="urn:hl7-org:sdtc" xmlns:vocdo="urn:hl7-org:v3/voc" xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:mif="urn:hl7-org:v3/mif">
<realmCode code="US"/>
<typeId root="2.16.840.1.113883.1.3" extension="POCD_HD000040"/>
<templateId root="2.16.840.1.113883.10.20.22.1.1"/>
<templateId root="2.16.840.1.113883.10.20.22.1.2"/>
<id root="2.16.840.1.113883.19.5.99999.1" extension="Test CCDA"/>
<code codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC" code="34133-9" displayName="Summarization of Episode Note"/>
<title>Continuity of Care Document (C-CDA)</title>
<effectiveTime value="20130809043133+0000"/>
<confidentialityCode codeSystem="2.16.840.1.113883.5.25" codeSystemName="HL7 Confidentiality" code="N" displayName="Normal"/>
<languageCode code="en-US"/>
<recordTarget>
<patientRole>
<id root="2.16.840.1.113883.4.6" extension="1"/>
<id root="2.16.840.1.113883.4.1" extension="123-101-5230"/>
<addr use="HP">
<streetAddressLine>1357 Amber Drive</streetAddressLine>
<city nullFlavor="UNK"/>
<state nullFlavor="UNK"/>
<postalCode>97006</postalCode>
<country nullFlavor="UNK"/>
</addr>
<telecom value="3545345" use="HP"/>
<patient>
<name use="L">
<given qualifier="BR">test</given>
<family qualifier="CL">overall</family>
<prefix qualifier="IN">Mr</prefix>
</name>
<administrativeGenderCode codeSystem="2.16.840.1.113883.5.1" codeSystemName="HL7 AdministrativeGender" code="M" displayName="Male"/>
<birthTime value="19770429"/>
<raceCode codeSystem="2.16.840.1.113883.6.238" codeSystemName="Race and Ethnicity - CDC" code="2028-9" displayName="Asian"/>
<ethnicGroupCode codeSystem="2.16.840.1.113883.6.238" codeSystemName="Race and Ethnicity - CDC" code="2135-2" displayName="Hispanic or Latino"/>
<languageCommunication>
<languageCode code="eng"/>
<preferenceInd value="true"/>
</languageCommunication>
</patient>
</patientRole>
</recordTarget>
</ClinicalDocument>
rc is not an element-it's an array of matching elements:
results = rc.map do |node|
Nokogiri::CSS.xpath_for node.css_path
end
p results
Or, if you know there is only one matching element:
xpath = Nokogiri::CSS.xpath_for rc[0].css_path
Note that xpath_for returns an array, so you will need to extract the first element of the array:
xpath.first

Parse Xml tags with attributes

I have this xml :
<document-display>
<name>
<entry lang="nl">nl Text</entry>
<entry lang="fr">fr Text</entry>
<entry lang="en">en Text</entry>
</name>
</document-display>
I would like to get the text according to the langage.
I'm using XmlSlurper.
With my current code :
def parsedD = new XmlSlurper().parse(xml)
parsedD."document-display".name.entry.each {it.#lang == 'fr'}
I have as bad result which is the concatenation of the 3 text content :
nl Textfr Texten Text
Thanks for helping.
Try
parsedD.name.entry.find { it.#lang == 'fr' }?.text()

Resources