I've recently started using Nokogiri as a solution to parsing data into a RAILS 3 application. The problem I'm having is that I don't fully understand how to do it as the XML I am parsing appears to be 'non-standard'. Take a look at the snippet below:
<?xml version="1.0" encoding="utf-8"?>
<dataset xmlns="http://.com/schemas/xmldata/1/" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<!--
<dataset
xmlns="http://.com/schemas/xmldata/1/"
xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
xs:schemaLocation="http://.com/schemas/xmldata/1/ xmldata.xsd"
>
-->
<metadata>
<item name="Problem ID" type="xs:string" length="32"/>
<item name="Account Title" type="xs:string" length="162"/>
<item name="Account Name" type="xs:string" length="162"/>
<item name="Reassignment" type="xs:int" precision="1"/>
<item name="Initial Severity" type="xs:int" precision="1"/>
<item name="Resolution Desc" type="xs:string" length="510"/>
<item name="Resolver Name" type="xs:string" length="82"/>
<item name="Problem Code" type="xs:string" length="32"/>
<item name="Status" type="xs:string" length="32"/>
</metadata>
<data>
<row>
<value>AP-06684768 </value>
<value>ESA</value>
<value>1</value>
<value>8</value>
<value>8</value>
<value xs:nil="true" />
<value xs:nil="true" />
<value>ADDITION TO EXISTING FIREWALL</value>
<value></value>
<value>ESA BRIDGE </value>
<value>CLOSED </value>
<value>CLOSED </value>
</row>
<row>
<value>AP-06720564 </value>
<value>ESA</value>
<value>2011-01-19T12:02:47</value>
<value>2011-01-19T12:02:49</value>
<value>0</value>
<value>776</value>
<value>SCP UESCADADEV -> UESCADAPW/BW</value>
<value>NETAU_NETMGTS </value>
<value>N/A</value>
<value>ESA BRIDGE </value>
<value>CLOSED </value>
<value>CLOSED </value>
</row>
</data>
</dataset>
Instead of having named nodes and attributes it seems to be a 'metadata' section and then rows, much like a table really. How would I parse all this data?
require 'rubygems'
require 'nokogiri'
require 'pp'
doc = Nokogiri::XML(DATA)
column_names = doc.css('dataset > metadata > item').map {|a| a['name']}
result = doc.css('dataset > data > row').map do |row|
values = row.css('value').map { |value| value[:nil] == 'true' ? nil : value.content }
Hash[column_names.zip(values)]
end
pp result
results in
[{"Problem Code"=>"ADDITION TO EXISTING FIREWALL",
"Resolution Desc"=>nil,
"Reassignment"=>"8",
"Resolver Name"=>nil,
"Status"=>"",
"Problem ID"=>"AP-06684768 ",
"Account Name"=>"1",
"Initial Severity"=>"8",
"Account Title"=>"ESA"},
{"Problem Code"=>"NETAU_NETMGTS ",
"Resolution Desc"=>"776",
"Reassignment"=>"2011-01-19T12:02:49",
"Resolver Name"=>"SCP UESCADADEV -> UESCADAPW/BW",
"Status"=>"N/A",
"Problem ID"=>"AP-06720564 ",
"Account Name"=>"2011-01-19T12:02:47",
"Initial Severity"=>"0",
"Account Title"=>"ESA"}]
Here's working code that I hacked out and tested:
require 'rubygems'
require 'nokogiri'
class Item
attr_accessor :name
def initialize(name)
#name = name
end
end
file = File.open("data.xml")
document = Nokogiri::XML(file)
file.close
metadata = document.root.children[3]
items = metadata.children.reject{|child| child.attribute('name').nil?}.map do |child|
Item.new(child.attribute('name').value)
end
puts "#{items.size} items"
puts items.inspect
Results:
[~/stackoverflow/graphML] ruby parse.rb
9 items
[#<Item:0x007fc01c0fbd90 #id="Problem ID">, #<Item:0x007fc01c0fbca0 #id="Account Title">, #<Item:0x007fc01c0fbc28 #id="Account Name">, #<Item:0x007fc01c0fbbb0 #id="Reassignment">, #<Item:0x007fc01c0fbb38 #id="Initial Severity">, #<Item:0x007fc01c0fbac0 #id="Resolution Desc">, #<Item:0x007fc01c0fba48 #id="Resolver Name">, #<Item:0x007fc01c0fb9d0 #id="Problem Code">, #<Item:0x007fc01c0fb868 #id="Status">]
Here's the full project on GitHub: https://github.com/endymion/GraphML-parsing-exercise/tree/metadata-key-names
(It's a branch of a GraphML parsing exercise that I hacked out earlier tonight for somebody else on Stack Overflow.)
Related
I want to sort this XML such that same type of demographics show first like all staty_type="REACH" appear on top, then all clicks and so on.
Here is an example object:
<?xml version="1.0"?>
<properties date="2020-06-23">
<property>
<order start="2020-06-23" end="2020-06-23">52658</order>
<demographics demographic="Age" stat_type="REACH">
<value category="18-24">36</value>
<value category="25-34">149</value>
</demographics>
<demographics demographic="Age" stat_type="CLICK">
<value category="18-24">6</value>
<value category="25-34">37</value>
</demographics>
<demographics demographic="Gender" stat_type="REACH">
<value category="female">402</value>
<value category="male">188</value>
</demographics>
<demographics demographic="Gender" stat_type="CLICK">
<value category="female">107</value>
<value category="male">44</value>
</demographics>
</property>
</properties>
I'm able to iterate XML. However, unable to perform sorting.
#doc = Nokogiri::XML(File.open("public/test.xml"))
builder = #doc.xpath("//property")
builder.search('./demographics').sort_by{|t| puts t['stat_type']}.each do |table|
puts table.to_s
end
I need the final XML in this form.
<?xml version="1.0"?>
<properties date="2020-06-23">
<property>
<order start="2020-06-23" end="2020-06-23">PBNI152658</order>
<demographics demographic="Age" stat_type="REACH">
<value category="18-24">36</value>
<value category="25-34">149</value>
</demographics>
<demographics demographic="Gender" stat_type="REACH">
<value category="female">402</value>
<value category="male">188</value>
</demographics>
<demographics demographic="Age" stat_type="CLICK">
<value category="18-24">6</value>
<value category="25-34">37</value>
</demographics>
<demographics demographic="Gender" stat_type="CLICK">
<value category="female">107</value>
<value category="male">44</value>
</demographics>
</property>
</properties>
When you do things like builder.search('./demographics') you just create a new nodeset with some nodes filtered from the initial XML document. Even if you sort this new nodeset you don't affect the initial document itself.
To sort the nodes of the initial document you have to rebuild the children of the node in question (<property> in your case). And here comes a tiny additional challenge - there are more nodes parsed by Nokogiri to take into account, not only the ones to sort:
pry(main)> #doc.at_xpath("//property").children.map(&:node_name)
=> ["text", "order", "text", "demographics", "text", "demographics", "text", "demographics", "text", "demographics", "text"]
So, what we have to do is to sort demographics nodes only and keep everything else untouched. One of the ways to do this is:
property_node = #doc.at_xpath("//property")
nodes_to_sort = property_node.children.dup
# My sorting logic is dumb here, apply your own as necessary
sorted_demographics = nodes_to_sort.select { |n| n.node_name == "demographics" }.sort_by { |n| n.attr("stat_type") }.reverse
# Create an empty nodeset. There should be a more idiomatic and readable way but this trick works too
new_nodeset = nodes_to_sort - nodes_to_sort
nodes_to_sort.each do |n|
case n.node_name
when "demographics"
new_nodeset << sorted_demographics.shift
else
new_nodeset << n
end
end
property_node.children = new_nodeset
And voila! - we are sorted now:
pry(main)> puts #doc
<?xml version="1.0"?>
<properties date="2020-06-23">
<property>
<order start="2020-06-23" end="2020-06-23">52658</order>
<demographics demographic="Gender" stat_type="REACH">
<value category="female">402</value>
<value category="male">188</value>
</demographics>
<demographics demographic="Age" stat_type="REACH">
<value category="18-24">36</value>
<value category="25-34">149</value>
</demographics>
<demographics demographic="Gender" stat_type="CLICK">
<value category="female">107</value>
<value category="male">44</value>
</demographics>
<demographics demographic="Age" stat_type="CLICK">
<value category="18-24">6</value>
<value category="25-34">37</value>
</demographics>
</property>
</properties>
NB. Take the solution above with a grain of salt - I don't know nokogiri's XML building capabilities well, so chances are there are some ways to achieve the same result with less code/in a more idiomatic way.
I have a XML code which I want to convert into Hash
<meta_description><language id="1"></language><language id="2"></language></meta_description>
<meta_keywords><language id="1"></language><language id="2"></language></meta_keywords>
<meta_title><language id="1"></language><language id="2" ></language></meta_title>
<link_rewrite><language id="1" >konsult-500-krtim</language><language id="2" >konsult-500-krtim</language></link_rewrite>
<name><language id="1" >Konsult 500 kr/tim</language><language id="2" >Konsult 500 kr/tim</language></name>
<description><language id="1" ></language><language id="2" ></language></description>
<description_short><language id="1" ></language><language id="2" ></language></description_short>
<available_now><language id="1" ></language><language id="2" ></language></available_now>
<available_later><language id="1" ></language><language id="2" ></language></available_later>
<associations>
<categories nodeType="category" api="categories">
<category>
<id>2</id>
</category>
</categories>
<images nodeType="image" api="images"/>
<combinations nodeType="combination" api="combinations"/>
<product_option_values nodeType="product_option_value" api="product_option_values"/>
<product_features nodeType="product_feature" api="product_features"/>
<tags nodeType="tag" api="tags"/>
<stock_availables nodeType="stock_available" api="stock_availables">
<stock_available>
<id>111</id>
<id_product_attribute>0</id_product_attribute>
</stock_available>
</stock_availables>
<accessories nodeType="product" api="products"/>
<product_bundle nodeType="product" api="products"/>
</associations>
I want to convert this xml into Hash .
I try to find functions which convert this xml to h=Hash.new
How I do this?
There is ActiveSupport's Hash#from_xml method that you can use:
xml = File.open("data.xml").read # if your xml is in the 'data.xml' file
Hash.from_xml(xml)
If you are using Rails you can use the answer provided above, otherwise you can require the ActiveSuppport gem:
require 'active_support/core_ext/hash'
xml = '<foo>bar</foo>'
hash = Hash.from_xml(xml)
=>{"foo"=>"bar"}
Note this will only work with valid xml. See comments on op. Also note that using element attributes like id="1" won't convert back the same way for example:
xml = %q(
<root>
<foo id="1"></foo>
<bar id="2"></bar>
</root>).strip
hash = Hash.from(xml)
=>{"root"=>{"foo"=>{"id"=>"1"}, "bar"=>{"id"=>"2"}}}
puts hash.to_xml
# will output
<?xml version="1.0" encoding="UTF-8"?>
<hash>
<root>
<foo>
<id>1</id>
</foo>
<bar>
<id>2</id>
</bar>
</root>
</hash>
Use nokogiri to parse XML response to ruby hash. It's pretty fast.
require 'active_support/core_ext/hash' #from_xml
require 'nokogiri'
doc = Nokogiri::XML(response_body)
Hash.from_xml(doc.to_s)
I set up a simple XML feed for a vendor we're using (who refuses to read JSON).
<recipes type="array">
<recipe>
<id type="integer">1</id>
<name>
Hamburgers
</name>
<producturl>
http://test.com
</producturl>
...
</recipe>
...
<recipe>
However, the vendor requests that instead of having an id node, id is an attribute in the parent node. e.g.
<recipes type="array">
<recipe id="1">
<name>
Hamburgers
</name>
<producturl>
http://test.com
</producturl>
...
</recipe>
...
<recipe>
I'm building this with (basically)
xml_feed = []
recipes.each do |recipe|
xml_feed <<{id: recipe.id, name: recipe.name, ...}
end
...
render xml: xml_feed.to_xml(root: 'recipes')
But I'm unsure of how to include the id (or any field) as an attribute in the parent node like that. I googled around and couldn't find anything, nor were the http://api.rubyonrails.org/classes/ActiveRecord/Serialization.html docs very helpful
Thanks!
I would suggest you use the nokogiri gem. It provides all you can possible need for handling XML.
builder = Nokogiri::XML::Builder.new do |xml|
xml.root {
xml.objects {
xml.object.classy.thing!
}
}
end
puts builder.to_xml
<?xml version="1.0"?>
<root>
<objects>
<object class="classy" id="thing"/>
</objects>
</root>
The suggestion to use Nokogiri is fine. Just the sintax should be a little bit different to achive what you have requested:
builder = Nokogiri::XML::Builder.new do |xml|
xml.root {
xml.object('type' => 'Client') {
xml.name 'John'
}
}
end
puts builder.to_xml
<?xml version="1.0"?>
<root>
<object type="Client">
<name>John</name>
</object>
</root>
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<MasterDetailsResponse xmlns="http://192.168.100.173/ArvindMill/">
<MasterDetailsResult>
<GetAllMasterDetail>
<TABLENAME>item_master</TABLENAME>
<ITEMID>1</ITEMID>
<ITEMTYPE>CTS</ITEMTYPE>
<GROUP />
<VARIETY />
<FORM />
<STATUS />
<ITEM />
<GRADE />
<TYPE />
</GetAllMasterDetail>
<GetAllMasterDetail>
<TABLENAME>item_master</TABLENAME>
<ITEMID>2</ITEMID>
<ITEMTYPE>AGS</ITEMTYPE>
<GROUP /><VARIETY />
<FORM />
<STATUS />
<ITEM />
<GRADE />
<TYPE />
</GetAllMasterDetail>
<GetAllMasterDetail>
<TABLENAME>tablet_taluka_master</TABLENAME>
<VILLAGE>Anturli</VILLAGE>
<TALUKA>Anturli</TALUKA>
<TABLETUSERCODE />
<TABLETUSERNAME /><TABLETCODE />
<TABLETTALUKAID />
</GetAllMasterDetail>
<GetAllMasterDetail>
<TABLENAME>tablet_taluka_master</TABLENAME>
<VILLAGE>Bortha</VILLAGE>
<TALUKA>Sadgavan</TALUKA>
<TABLETUSERCODE /><TABLETUSERNAME />
<TABLETCODE />
<TABLETTALUKAID /></GetAllMasterDetail>
<GetAllMasterDetail>
<TABLENAME>tablet_taluka_master</TABLENAME>
<VILLAGE>Kukarmunda</VILLAGE>
<TALUKA>Kukarmunda</TALUKA>
<TABLETUSERCODE />
<TABLETUSERNAME /><TABLETCODE /><TABLETTALUKAID />
</GetAllMasterDetail>
The above code is the response returned from soapobject in android. How to retrieve the data from the above XML file? I want to display it in list view and the response returned from the soapobject contains more than one table so how to retrieve it?
If your are using Java use SAX Parser
How do you define a list of complex type items in WSDL?
I have a rather simple WSDL with 2 complex types
<xsd:complexType name="itemProperty">
<xsd:all>
<xsd:element name="name" type="xsd:string" />
<xsd:element name="value" type="xsd:string" />
<xsd:element name="type" type="xsd:string" />
</xsd:all>
</xsd:complexType>
Then i'm trying to make a list of this complexType like this:
<xsd:complexType name="itemPropertyList">
<xsd:complexContent>
<xsd:restriction base="SOAP-ENC:Array">
<xsd:sequence>
<xsd:element name="item" type="tns:itemProperty"
maxOccurs="unbounded" minOccurs="0" />
</xsd:sequence>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
I intend to use this list
<message name="getListRequest"></message>
<message name="getListResponse">
<part name="return" type="tns:itemPropertyList" />
</message>
<operation name="getList">
<documentation>Returns an array.</documentation>
<input message="tns:getListRequest" />
<output message="tns:getListResponse" />
</operation>
Instead of a list of elements of type itemProperty, I get this reply, no matter what variations i've tryed (including replacing the base item with the explicit string elements)
<SOAP-ENV:Envelope SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<SOAP-ENV:Body>
<ns1:getListResponse>
<return SOAP-ENC:arrayType="ns2:Map[1]" xsi:type="SOAP-ENC:Array">
<item xsi:type="ns2:Map">
<item>
<key xsi:type="xsd:string">name</key>
<value xsi:type="xsd:string">name_4c3b38b0b77ae</value>
</item>
<item>
<key xsi:type="xsd:string">value</key>
<value xsi:type="xsd:string">name_4c3b38b0b77ee</value>
</item>
<item>
<key xsi:type="xsd:string">type</key>
<value xsi:type="xsd:string">name_4c3b38b0b782b</value>
</item>
</item>
</return>
</ns1:getListResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
Any ideas? What is this ns2:Map thing? It's been haunting me for over a week!
Solved.
I used the AXIS model for delivering lists. This involved extending the namespaces attributes to include some extra encodings. I don't know which did the trick, I just added as many as possible while resolving conflicts with the help of eclipse's WSDL editor.
<definitions xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:tns="urn:mynamespace"
xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"
xmlns="http://schemas.xmlsoap.org/wsdl/"
targetNamespace="urn:mynamespace"
xmlns:ns1="http://org.apache.axis2/xsd"
xmlns:wsaw="http://www.w3.org/2006/05/addressing/wsdl"
xmlns:http="http://schemas.xmlsoap.org/wsdl/http/"
xmlns:ax21="http://example.org/xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mime="http://schemas.xmlsoap.org/wsdl/mime/"
xmlns:soap12="http://schemas.xmlsoap.org/wsdl/soap12/"
xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/">
Also I added 2 extra attributes to declare qualified-form attributes and elements within the schema
<xsd:schema targetNamespace="urn:mynamespace" attributeFormDefault="qualified" elementFormDefault="qualified">
...
</xsd:schema>
Instead of relying on the ComplexType declaration to make a "nillable" unbounded sequence of a complex type within my schema, I switched to declare an element like this:
<xsd:element name="getListResponse">
<xsd:complexType>
<xsd:sequence>
<xsd:element maxOccurs="unbounded" minOccurs="0" name="return" nillable="true" type="tns:itemProperty" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Then, when defining the message part for the operation I used
<message name="getListResponse">
<part name="parameters" element="tns:getListResponse" />
</message>
instead of
<message name="getListResponse">
<part name="return" type="tns:itemPropertyList" />
</message>
This resulted in a correct enveloper returned:
<SOAP-ENV:Envelope SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="urn:mynamespace" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/">
<SOAP-ENV:Body>
<ns1:getListResponse>
<parameters xsi:type="ns1:getListResponse">
<return xsi:type="ns1:itemProperty">
<name xsi:type="xsd:string">name4c4417b644a8e</name>
<value xsi:type="xsd:string">value4c4417b644aaa</value>
<type xsi:type="xsd:string">type4c4417b644ae8</type>
</return>
<return xsi:type="ns1:itemProperty">
<name xsi:type="xsd:string">name4c4417b644b26</name>
<value xsi:type="xsd:string">value4c4417b644b64</value>
<type xsi:type="xsd:string">type4c4417b644ba1</type>
</return>
<return xsi:type="ns1:itemProperty">
<name xsi:type="xsd:string">name4c4417b644bdf</name>
<value xsi:type="xsd:string">value4c4417b644c1c</value>
<type xsi:type="xsd:string">type4c4417b644c59</type>
</return>
</parameters>
</ns1:getListResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>