I am working on a project to parse a xml file into a certain table structure with nokogiri. At the moment I got this in my controller:
def new
doc = Nokogiri::HTML(open('sample3.xml'))
#home = doc.xpath('//match').map do |i|
{'title' => i.at('home')['name'], 'away' => i.at('away')['name']}
end
end
And this is the format of the XML file:
<league country="worldcup" cup="True" id="2889" name="World: World Cup" sub_id="63638038137">
<matches date="12.06.2014">
<match alternate_id="3844428" alternate_id_2="4013768" date="12.06.2014" id="3551903" status="20:00" time="20:00">
<home alternate_id="536380381512" id="2338917" name="Brazil"/>
<away alternate_id="536380381513" id="2340076" name="Croatia"/>
<odds>
<type id="766" name="1x2">
<bookmaker id="947" name="12Bet">
<odd name="1" value="1.27"/>
<odd name="2" value="9.56"/>
<odd name="X" value="5.32"/>
</bookmaker>
<type id="767" name="Home/Away">
<bookmaker id="821" name="188Bet">
<odd name="1" value="1.04"/>
<odd name="2" value="8.50"/>
</bookmaker>
</type>
</odds>
</match>
</matches>
</league>
My codes above are able to select the home team and away team. But how can write the code that select the odd value of type name="1x2"?
Thanks.
Regards,
Yam
try this. it may useful for you
f = File.open("sample3.xml")
=> #<File:sample3.xml>
>> doc = Nokogiri::XML(f)
root = doc.root
>> # again here you'll see the complete XML document output to the console.
>> root["id"]
=> "2889"
Documentation
Related
I have an array of database objects, #configs, that I want to convert to the XML format but the output is not the expected. Every entry gets enclosed in a <map> tag instead of a <entry> tag, I only wanted <tag> to be the XML root. How do I build the XML with the <tag> root and put all the entries in a <entry> tag?
Thank you in advance for your help and time!
Here is my code:
entries = Array.new
entry = Hash.new
conf = Hash.new
#configs.each do |config|
entry.store('string', config.key)
conf.store('value', config.value)
conf.store('comment', config.comment)
entry.store('com.mirth.connect.util.ConfigurationProperty', conf)
entries << entry
end
pp entries.to_xml(:root => 'map', :indent => 0, :skip_types => true)
And the result is:
<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<map>
<map>
<string>PNB_ALERTLOG_RECEIVER_CHANNEL</string>
<com.mirth.connect.util.ConfigurationProperty>
<value>PNB_ALERTLOG_RECEIVER</value>
<comment>Canal que irá receber tudo o que for logged com Warning e Error</comment>
</com.mirth.connect.util.ConfigurationProperty>
</map>
<map>
<string>PNB_CFG_FILE_ACCESS_CONTROL</string>
<com.mirth.connect.util.ConfigurationProperty>
<value>resources/configPnbDev/pnbAccessControl.json</value>
<comment>Este ficheiro permite configurar Autenticação e Controlo de Acessos.</comment>
</com.mirth.connect.util.ConfigurationProperty>
</map>
<map>
<string>PNB_CFG_FILE_CONNECTION_POOLS</string>
<com.mirth.connect.util.ConfigurationProperty>
<value>resources/configPnbDev/pnbConnectionPools.json</value>
<comment>Configuração de Oracle Universal Connection Pools usadas pelo PNB (PEM, RCU2)</comment>
</com.mirth.connect.util.ConfigurationProperty>
</map>
<map>
<string>PNB_CFG_FILE_CSP_MC_EXCLUSIONS</string>
<com.mirth.connect.util.ConfigurationProperty>
<value>resources/configPnbDev/medCronExclusions/mcExclCurrentRevision.json</value>
<comment>N/A</comment>
</com.mirth.connect.util.ConfigurationProperty>
</map>
<map>
<string>PNB_CFG_FILE_FACILITIES_ALIAS</string>
<com.mirth.connect.util.ConfigurationProperty>
<value>resources/configPnbDev/snsFacilitiesAlias.json</value>
<comment>Mapa de alias do codigo das instituicoes do SNS.</comment>
</com.mirth.connect.util.ConfigurationProperty>
</map>
</map>
What I wanted:
<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<map>
<entry>
<string>PNB_ALERTLOG_RECEIVER_CHANNEL</string>
<com.mirth.connect.util.ConfigurationProperty>
<value>PNB_ALERTLOG_RECEIVER</value>
<comment>Canal que irá receber tudo o que for logged com Warning e Error</comment>
</com.mirth.connect.util.ConfigurationProperty>
</entry>
<entry>
<string>PNB_CFG_FILE_ACCESS_CONTROL</string>
<com.mirth.connect.util.ConfigurationProperty>
<value>resources/configPnbDev/pnbAccessControl.json</value>
<comment>Este ficheiro permite configurar Autenticação e Controlo de Acessos.</comment>
</com.mirth.connect.util.ConfigurationProperty>
</entry>
<entry>
<string>PNB_CFG_FILE_CONNECTION_POOLS</string>
<com.mirth.connect.util.ConfigurationProperty>
<value>resources/configPnbDev/pnbConnectionPools.json</value>
<comment>Configuração de Oracle Universal Connection Pools usadas pelo PNB (PEM, RCU2)</comment>
</com.mirth.connect.util.ConfigurationProperty>
</entry>
<entry>
<string>PNB_CFG_FILE_CSP_MC_EXCLUSIONS</string>
<com.mirth.connect.util.ConfigurationProperty>
<value>resources/configPnbDev/medCronExclusions/mcExclCurrentRevision.json</value>
<comment>N/A</comment>
</com.mirth.connect.util.ConfigurationProperty>
</entry>
<entry>
<string>PNB_CFG_FILE_FACILITIES_ALIAS</string>
<com.mirth.connect.util.ConfigurationProperty>
<value>resources/configPnbDev/snsFacilitiesAlias.json</value>
<comment>entrya de alias do codigo das instituicoes do SNS.</comment>
</com.mirth.connect.util.ConfigurationProperty>
</entry>
</map>
try this:
pp entries.to_xml(:root => 'map', :children => 'entry', :indent => 0, :skip_types => true)
source: http://apidock.com/rails/Array/to_xml
Suppose entry is the following hash:
entry = {
a: “hello”,
b: “goodbye”,
}
If you write:
entries = []
entries << entry
p entries
then the output is:
[{:a => “hello”, {:b => “goodbye”}]
So if you then write:
p entries.to_xml
how do you suppose the word “entry” will ever appear in the output? That's sort of like expecting the output of:
x = 10
y = 20
puts x+y
to include the letters "x" and "y" somewhere.
According to the to_xml() docs for arrays:
Returns a string ... by invoking to_xml on each element.
The options hash is passed downwards.
http://apidock.com/rails/Array/to_xml
The fact that the options hash is passed downwards means that when you specify {root: map} for the to_xml() call on the array, then <map> will become the root of the xml, and when to_xml() is called on each array element the method will be called with the option {root: “map”}, which will cause each array element to be wrapped in a <map> tag. For instance:
puts [{a: 10, b: 20}, {a: 100, b: 200}].to_xml({root: "map"})
--output:--
<?xml version="1.0" encoding="UTF-8"?>
<map type="array">
<map>
<a type="integer">10</a>
<b type="integer">20</b>
</map>
<map>
<a type="integer">100</a>
<b type="integer">200</b>
</map>
</map>
The nested <map> tags are a side effect of a feature built into the to_xml() method: if you specify a plural name for the :root option when calling to_xml() on an array, e.g. “maps”, then when rails turns around and calls to_xml() on each element of the array, rails will specify the singular “map” for the :root option. That makes some sense because if you call to_xml() on an array and you specify the :root option to be “maps” then naturally each array element would probably be a "map". Of course, that isn’t what you want.
Luckily, as mr_sudaca pointed out, there is this:
By default name of the node for the children of root is
root.singularize. You can change it with the :children option.
http://apidock.com/rails/Array/to_xml
As a result, this code:
require 'ostruct'
configs = [
OpenStruct.new(
key: "PNB_ALERTLOG_RECEIVER_CHANNEL",
value: "PNB_ALERTLOG_RECEIVER",
comment: "Canal que...",
),
OpenStruct.new(
key: "PNB_CFG_FILE_ACCESS_CONTROL",
value: "resources/configPnbDev/pnbAccessControl.json",
comment: "Este ficheiro...",
)
]
entries = []
configs.each do |config|
entry = {}
conf = {}
entry.store('string', config.key)
conf.store('value', config.value)
conf.store('comment', config.comment)
entry.store('com.mirth.connect.util.ConfigurationProperty', conf)
entries << entry
end
p entries
puts entries.to_xml(:root => 'map', children: "entry", :skip_types => true)
produces the output:
<?xml version="1.0" encoding="UTF-8"?>
<map>
<entry>
<string>PNB_ALERTLOG_RECEIVER_CHANNEL</string>
<com.mirth.connect.util.ConfigurationProperty>
<value>PNB_ALERTLOG_RECEIVER</value>
<comment>Canal que...</comment>
</com.mirth.connect.util.ConfigurationProperty>
</entry>
<entry>
<string>PNB_CFG_FILE_ACCESS_CONTROL</string>
<com.mirth.connect.util.ConfigurationProperty>
<value>resources/configPnbDev/pnbAccessControl.json</value>
<comment>Este ficheiro...</comment>
</com.mirth.connect.util.ConfigurationProperty>
</entry>
</map>
It looks to me like you also have some problems with your entry and conf hashes as every element in the entries array will refer to the same entry and conf hash, and because your loop keeps changing those hashes, each entry in the array will refer to a hash that contains the last key/values set in the loop.
I'm trying to parse a simple XML data with nokogiri.
this is my XML:
POST /.... HTTP/1.1
Host: ....
Content-Type: text/xml; charset=utf-8
Content-Length: length
SOAPAction: "http://...."
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="...." xmlns:xsd="...." xmlns:soap="....">
<soap:Body>
<WS_QueryOnSec xmlns="......">
<type>string</type>
<ID>string</ID>
</WS_QueryOnSec>
</soap:Body>
</soap:Envelope>
and this is my simle request:
require "nokogiri"
#doc = Nokogiri::XML(request.body.read)
#something = #doc.at('type').inner_html
But Nokogiri can not find the Type or ID node.
When I change the data into this every thing works fine:
<soap:Body>
<type>string</type>
<ID>string</ID>
</soap:Body>
It seems the problem is the raw text above the data and the nods with xmlns or the other attributes!
What do you recommend to resolve this ?
The first "XML" isn't XML. It's text that contains XML. Remove the header information down to the blank line and try it again.
I think it'd help you to read the XML spec or to read some tutorials about creating XML which will help you understand how it's defined. XML is a tight specification and doesn't allow any deviation. The syntax is pretty flexible, but you have to play by its rules.
Consider these examples:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
foo
<root>
<node />
</root>
EOT
doc.errors # => [#<Nokogiri::XML::SyntaxError: Start tag expected, '<' not found>]
Removing the text, which is outside the root tag results in a proper parse:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<root>
<node />
</root>
EOT
doc.errors # => []
<root> isn't neccesarily the name of the "root" node, it's just the outermost tag:
doc = Nokogiri::XML(<<EOT)
<foo>
<node />
</foo>
EOT
doc.errors # => []
and still results in a valid DOM/internal representation of the document:
puts doc.to_html
# >> <foo>
# >> <node></node>
# >> </foo>
Your XML sample is using namespaces, which complicate matters somewhat. The Nokogiri documentation talks about how to deal with them, so you'll want to understand that part of parsing XML because you'll encounter it again. Here's the easy way of working with them:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="utf-8"?>
<Envelope xmlns:xsi="...." xmlns:xsd="...." xmlns:soap="....">
<Body>
<WS_QueryOnSec xmlns="......">
<type>string</type>
<ID>string</ID>
</WS_QueryOnSec>
</Body>
</Envelope>
EOT
namespaces = doc.collect_namespaces
doc.at('type', namespaces).text # => "string"
I have a XML code which I want to convert into Hash
<meta_description><language id="1"></language><language id="2"></language></meta_description>
<meta_keywords><language id="1"></language><language id="2"></language></meta_keywords>
<meta_title><language id="1"></language><language id="2" ></language></meta_title>
<link_rewrite><language id="1" >konsult-500-krtim</language><language id="2" >konsult-500-krtim</language></link_rewrite>
<name><language id="1" >Konsult 500 kr/tim</language><language id="2" >Konsult 500 kr/tim</language></name>
<description><language id="1" ></language><language id="2" ></language></description>
<description_short><language id="1" ></language><language id="2" ></language></description_short>
<available_now><language id="1" ></language><language id="2" ></language></available_now>
<available_later><language id="1" ></language><language id="2" ></language></available_later>
<associations>
<categories nodeType="category" api="categories">
<category>
<id>2</id>
</category>
</categories>
<images nodeType="image" api="images"/>
<combinations nodeType="combination" api="combinations"/>
<product_option_values nodeType="product_option_value" api="product_option_values"/>
<product_features nodeType="product_feature" api="product_features"/>
<tags nodeType="tag" api="tags"/>
<stock_availables nodeType="stock_available" api="stock_availables">
<stock_available>
<id>111</id>
<id_product_attribute>0</id_product_attribute>
</stock_available>
</stock_availables>
<accessories nodeType="product" api="products"/>
<product_bundle nodeType="product" api="products"/>
</associations>
I want to convert this xml into Hash .
I try to find functions which convert this xml to h=Hash.new
How I do this?
There is ActiveSupport's Hash#from_xml method that you can use:
xml = File.open("data.xml").read # if your xml is in the 'data.xml' file
Hash.from_xml(xml)
If you are using Rails you can use the answer provided above, otherwise you can require the ActiveSuppport gem:
require 'active_support/core_ext/hash'
xml = '<foo>bar</foo>'
hash = Hash.from_xml(xml)
=>{"foo"=>"bar"}
Note this will only work with valid xml. See comments on op. Also note that using element attributes like id="1" won't convert back the same way for example:
xml = %q(
<root>
<foo id="1"></foo>
<bar id="2"></bar>
</root>).strip
hash = Hash.from(xml)
=>{"root"=>{"foo"=>{"id"=>"1"}, "bar"=>{"id"=>"2"}}}
puts hash.to_xml
# will output
<?xml version="1.0" encoding="UTF-8"?>
<hash>
<root>
<foo>
<id>1</id>
</foo>
<bar>
<id>2</id>
</bar>
</root>
</hash>
Use nokogiri to parse XML response to ruby hash. It's pretty fast.
require 'active_support/core_ext/hash' #from_xml
require 'nokogiri'
doc = Nokogiri::XML(response_body)
Hash.from_xml(doc.to_s)
I set up a simple XML feed for a vendor we're using (who refuses to read JSON).
<recipes type="array">
<recipe>
<id type="integer">1</id>
<name>
Hamburgers
</name>
<producturl>
http://test.com
</producturl>
...
</recipe>
...
<recipe>
However, the vendor requests that instead of having an id node, id is an attribute in the parent node. e.g.
<recipes type="array">
<recipe id="1">
<name>
Hamburgers
</name>
<producturl>
http://test.com
</producturl>
...
</recipe>
...
<recipe>
I'm building this with (basically)
xml_feed = []
recipes.each do |recipe|
xml_feed <<{id: recipe.id, name: recipe.name, ...}
end
...
render xml: xml_feed.to_xml(root: 'recipes')
But I'm unsure of how to include the id (or any field) as an attribute in the parent node like that. I googled around and couldn't find anything, nor were the http://api.rubyonrails.org/classes/ActiveRecord/Serialization.html docs very helpful
Thanks!
I would suggest you use the nokogiri gem. It provides all you can possible need for handling XML.
builder = Nokogiri::XML::Builder.new do |xml|
xml.root {
xml.objects {
xml.object.classy.thing!
}
}
end
puts builder.to_xml
<?xml version="1.0"?>
<root>
<objects>
<object class="classy" id="thing"/>
</objects>
</root>
The suggestion to use Nokogiri is fine. Just the sintax should be a little bit different to achive what you have requested:
builder = Nokogiri::XML::Builder.new do |xml|
xml.root {
xml.object('type' => 'Client') {
xml.name 'John'
}
}
end
puts builder.to_xml
<?xml version="1.0"?>
<root>
<object type="Client">
<name>John</name>
</object>
</root>
I've recently started using Nokogiri as a solution to parsing data into a RAILS 3 application. The problem I'm having is that I don't fully understand how to do it as the XML I am parsing appears to be 'non-standard'. Take a look at the snippet below:
<?xml version="1.0" encoding="utf-8"?>
<dataset xmlns="http://.com/schemas/xmldata/1/" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<!--
<dataset
xmlns="http://.com/schemas/xmldata/1/"
xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
xs:schemaLocation="http://.com/schemas/xmldata/1/ xmldata.xsd"
>
-->
<metadata>
<item name="Problem ID" type="xs:string" length="32"/>
<item name="Account Title" type="xs:string" length="162"/>
<item name="Account Name" type="xs:string" length="162"/>
<item name="Reassignment" type="xs:int" precision="1"/>
<item name="Initial Severity" type="xs:int" precision="1"/>
<item name="Resolution Desc" type="xs:string" length="510"/>
<item name="Resolver Name" type="xs:string" length="82"/>
<item name="Problem Code" type="xs:string" length="32"/>
<item name="Status" type="xs:string" length="32"/>
</metadata>
<data>
<row>
<value>AP-06684768 </value>
<value>ESA</value>
<value>1</value>
<value>8</value>
<value>8</value>
<value xs:nil="true" />
<value xs:nil="true" />
<value>ADDITION TO EXISTING FIREWALL</value>
<value></value>
<value>ESA BRIDGE </value>
<value>CLOSED </value>
<value>CLOSED </value>
</row>
<row>
<value>AP-06720564 </value>
<value>ESA</value>
<value>2011-01-19T12:02:47</value>
<value>2011-01-19T12:02:49</value>
<value>0</value>
<value>776</value>
<value>SCP UESCADADEV -> UESCADAPW/BW</value>
<value>NETAU_NETMGTS </value>
<value>N/A</value>
<value>ESA BRIDGE </value>
<value>CLOSED </value>
<value>CLOSED </value>
</row>
</data>
</dataset>
Instead of having named nodes and attributes it seems to be a 'metadata' section and then rows, much like a table really. How would I parse all this data?
require 'rubygems'
require 'nokogiri'
require 'pp'
doc = Nokogiri::XML(DATA)
column_names = doc.css('dataset > metadata > item').map {|a| a['name']}
result = doc.css('dataset > data > row').map do |row|
values = row.css('value').map { |value| value[:nil] == 'true' ? nil : value.content }
Hash[column_names.zip(values)]
end
pp result
results in
[{"Problem Code"=>"ADDITION TO EXISTING FIREWALL",
"Resolution Desc"=>nil,
"Reassignment"=>"8",
"Resolver Name"=>nil,
"Status"=>"",
"Problem ID"=>"AP-06684768 ",
"Account Name"=>"1",
"Initial Severity"=>"8",
"Account Title"=>"ESA"},
{"Problem Code"=>"NETAU_NETMGTS ",
"Resolution Desc"=>"776",
"Reassignment"=>"2011-01-19T12:02:49",
"Resolver Name"=>"SCP UESCADADEV -> UESCADAPW/BW",
"Status"=>"N/A",
"Problem ID"=>"AP-06720564 ",
"Account Name"=>"2011-01-19T12:02:47",
"Initial Severity"=>"0",
"Account Title"=>"ESA"}]
Here's working code that I hacked out and tested:
require 'rubygems'
require 'nokogiri'
class Item
attr_accessor :name
def initialize(name)
#name = name
end
end
file = File.open("data.xml")
document = Nokogiri::XML(file)
file.close
metadata = document.root.children[3]
items = metadata.children.reject{|child| child.attribute('name').nil?}.map do |child|
Item.new(child.attribute('name').value)
end
puts "#{items.size} items"
puts items.inspect
Results:
[~/stackoverflow/graphML] ruby parse.rb
9 items
[#<Item:0x007fc01c0fbd90 #id="Problem ID">, #<Item:0x007fc01c0fbca0 #id="Account Title">, #<Item:0x007fc01c0fbc28 #id="Account Name">, #<Item:0x007fc01c0fbbb0 #id="Reassignment">, #<Item:0x007fc01c0fbb38 #id="Initial Severity">, #<Item:0x007fc01c0fbac0 #id="Resolution Desc">, #<Item:0x007fc01c0fba48 #id="Resolver Name">, #<Item:0x007fc01c0fb9d0 #id="Problem Code">, #<Item:0x007fc01c0fb868 #id="Status">]
Here's the full project on GitHub: https://github.com/endymion/GraphML-parsing-exercise/tree/metadata-key-names
(It's a branch of a GraphML parsing exercise that I hacked out earlier tonight for somebody else on Stack Overflow.)