I'm selecting a nodeset according to some condition. The resulting nodeset is correct. But if I do an xpath on it I get all the nodes from the document. I must be missing something here. Explanation and solution would be appreciated.
require 'nokogiri'
doc = Nokogiri::XML(DATA)
selection = doc.xpath("//listing[code[contains(text(), '34')]]")
p selection.length ## 2
p selection.xpath("//id").inner_text ##34567 (ids of all nodes), I'm trying to get 35 instead
__END__
<?xml version="1.0" encoding="UTF-8"?>
<listings>
<listing>
<id>3</id>
<code>3,4,55,34</code>
</listing>
<listing>
<id>4</id>
<code>3,4,55,33</code>
</listing>
<listing>
<id>5</id>
<code>3,4,55,34</code>
</listing>
<listing>
<id>6</id>
<code>3,4,55</code>
</listing>
<listing>
<id>7</id>
<code>3,14</code>
</listing>
</listings>
Related
Controller response includes "spec?" field:
r = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<hash type=\"array\">\n <item><spec? type=\"boolean\">false</spec?>\n </item>\n <hash>\n"
When trying to create xml from it with Nokogiri.xml(r) receive literally:
<?xml version="1.0" encoding="UTF-8"?>
<hash type="array">
<item><spec type=" type="boolean">false/spec">
</spec>item>
<hash>
</hash></item></hash>
which is something strange;
My question is:
is it possible to create xml from string using Nokogiri, parsing or removing ? and other non-xml-standart chars, at stage of Nokogiri.XML()?
Desirible result:
Nokogiri.xml(r) do |config|
config.maybe_some_configs?
end #=>
<?xml version="1.0" encoding="UTF-8"?>
<hash type="array">
<item><spec type="boolean">false</spec></item>
</hash>
The proper way to parse a string into an XML DOM is Nokogiri::XML or Nokogiri.XML or Nokogiri::XML.parse, but not using xml.
Also, XML tags can't contain ?. See the spec for more information. You'll have to dig through the "Names and Tokens" section and decode hexadecimal character descriptions to figure out the ranges of characters allowed, but a hint is that ? is character code 0x3f.
Which leads to the fact that the XML in r is invalid:
<?xml version="1.0" encoding="UTF-8"?>
<hash type="array">
<item><spec? type="boolean">false</spec?>
</item>
<hash>
Which, when parsed results in:
irb(main):012:0> doc = Nokogiri::XML(r)
#<Nokogiri::XML::Document:0x80c8014c name="document" children=[#<Nokogiri::XML::Element:0x80c7399c name="hash" attributes=[#<Nokogiri::XML::Attr:0x80c733e8 name="type" value="array">] children=[#<Nokogiri::XML::Text:0x80c6e26c "\n ">, #<Nokogiri::XML::Element:0x80c6df60 name="item" children=[#<Nokogiri::XML::Element:0x80c6d970 name="spec">, #<Nokogiri::XML::Text:0x80c6d09c "? type=\"boolean\">false">]>, #<Nokogiri::XML::Text:0x80c6ca34 "?>\n ">]>]>
irb(main):013:0> doc.errors
[
[0] #<Nokogiri::XML::SyntaxError: error parsing attribute name>,
[1] #<Nokogiri::XML::SyntaxError: attributes construct error>,
[2] #<Nokogiri::XML::SyntaxError: Couldn't find end of Start Tag spec line 3>,
[3] #<Nokogiri::XML::SyntaxError: expected '>'>,
[4] #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: item line 3 and spec>,
[5] #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: hash line 2 and item>,
[6] #<Nokogiri::XML::SyntaxError: Extra content at the end of the document>
]
As a result, Nokogiri is having to do some fixup in the DOM to try to make sense of it. The resulting XML looks like:
irb(main):014:0> puts doc.to_xml
<?xml version="1.0" encoding="UTF-8"?>
<hash type="array">
<item><spec/>? type="boolean">false</item>?>
</hash>
The way to fix it is to give Nokogiri valid XML. Either fix the source of the XML, if you control it, or fix the problems in the string before passing it to Nokogiri.
By its definition, XML is a strict format, and Nokogiri honors that and, trying to be friendly, makes it possible for you to check errors to see if its empty?. If it's not, odds are good you shouldn't continue using the source until you've determined the problems and fixed whatever causes the parsing problems. Sometimes the problem is fairly benign, and you can ignore it, but in either case you should at least be aware of it.
Pre-massaging the data before Nokogiri sees it isn't hard:
doc = Nokogiri::XML(r.gsub('spec?', 'spec'))
irb(main):024:0> puts doc.to_xml
<?xml version="1.0" encoding="UTF-8"?>
<hash type="array">
<item><spec type="boolean">false</spec>
</item>
<hash>
</hash></hash>
nil
irb(main):025:0> doc.errors
[
[0] #<Nokogiri::XML::SyntaxError: Premature end of data in tag hash line 5>,
[1] #<Nokogiri::XML::SyntaxError: Premature end of data in tag hash line 2>
]
That's a start, but not an attempt to fix it for you completely. I'm teaching you to fish, not handing out fish.
I have the below xml's in my code
XML Parsing Error: not well-formed
Location: http://localhost:3000/api/client?client=test1
Line Number 1, Column 1111:
<?xml version="1.0" encoding="UTF-8"?>
<application>
<name><![CDATA[TESTapp2]]></name>
<application-identifier>wac-8c28afa4-0f6e-11e1-8885-7071bc62c7bc</application-identifier>
<clients>
<pricepoint id="1" name=<![CDATA[TEST-price]]> currency="dollar" locale="la" country="india" price="50" text="this is a TEST" receipt="oi120934" operator-reference="1213w" operator-id="1"></pricepoint></pricepoints><product-image></product-image>
</clients>
</application>
<name><![CDATA[TESTapp2]]></name> this is working
<name=\"[CDATA[TESTapp2]]\"> this is not working,throws encoding error
AFAIK, Using CDATA as an attribute value is forbidden. CDATA can only be used for text nodes.
Using OmniXML and Delphi, I would like to locate an element and change another element in the node. For example, in the xml listing below, I would like to locate /first-name = 'Joe1' and then locate and change the /price from 1200 to 10.
I've tried using XPathSelect but I can not seem to specify the /first-name.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="myfile.xsl" ?>
<bookstore specialty="novel">
<book style="autobiography">
<author>
<first-name>Joe1</first-name>
<last-name>Bob</last-name>
<award>Trenton Literary Review Honorable Mention</award>
</author>
<price>1200</price>
</book>
<book style="textbook">
<author>
<first-name>Mary</first-name>
<last-name>Bob</last-name>
<publication>Selected Short Stories of
<first-name>Mary</first-name>
<last-name>Bob</last-name>
</publication>
</author>
<editor>
<first-name>Britney</first-name>
<last-name>Bob</last-name>
</editor>
<price>55</price>
</book>
</bookstore>
Use //book[author/first-name = "Joe1" ] as your XPathSelect query to get the node, and then access the subnode Price from that node to change it.
Alright, so the ultimate goal here is to parse the data inside of an xml response. The response comes in the format of a ruby string. The problem is that I'm getting an error when creating the xml file from that string (I know for a fact that response.body.to_s is a valid string of xml:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<CardTxn>
<authcode>123</authcode>
<card_scheme>Mastercard</card_scheme>
<country>United Kingdom</country>
</CardTxn>
<datacash_reference>XXXX</datacash_reference>
<merchantreference>XX0001</merchantreference>
<mode>TEST</mode>
<reason>ACCEPTED</reason>
<status>1</status>
<time>1286477267</time>
</Response>
Inside the ruby method I try to generate an xml file:
doc = Nokogiri::XML(response.body.to_s)
the output of doc.to_s after the above code executes is:
<?xml version="1.0"?>
Any ideas why the file is not getting generated correctly?
This works for me on 1.9.2. Notice it's Nokogiri::XML.parse().
require 'nokogiri'
asdf = %q{<?xml version="1.0" encoding="UTF-8"?>
<Response>
<CardTxn>
<authcode>123</authcode>
<card_scheme>Mastercard</card_scheme>
<country>United Kingdom</country>
</CardTxn>
<datacash_reference>XXXX</datacash_reference>
<merchantreference>XX0001</merchantreference>
<mode>TEST</mode>
<reason>ACCEPTED</reason>
<status>1</status>
<time>1286477267</time>
</Response>
}
doc = Nokogiri::XML.parse(asdf)
print doc.to_s
This parses the XML into a Nokogiri XML document, but doesn't create a file. doc.to_s only shows you what it would be like if you printed it.
To create a file replace "print doc.to_s" with
File.open('xml.out', 'w') do |fo|
fo.print doc.to_s
end
I have used XmlSlurper successfully before, but am struggling to parse the following XML - it is a response from a call to the Pingdom API. I have tried following the namespace declaration examples, but I just get an error message on the ns1 and ns2 values. Can anybody help point me in the right direction? The xml looks like this:-
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:ns1="urn:methods"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ns2="urn:PingdomAPI"
xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/"
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<SOAP-ENV:Body>
<ns1:Auth_loginResponse>
<return xsi:type="ns2:Auth_LoginResponse">
<status xsi:type="xsd:int">0</status>
<sessionId xsi:type="xsd:string">mysessionId</sessionId>
</return>
</ns1:Auth_loginResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
after using the XmlSlurper it just concatenates the 0 and mysessionId to one string
0mysessionId
Try this, giving your xml is stored in the xml variable :
def records = new XmlSlurper().parseText(xml).declareNamespace(
'SOAP-ENV':'http://schemas.xmlsoap.org/soap/envelope/',
ns1:'urn:methods',
xsd:'http://www.w3.org/2001/XMLSchema',
xsi:'http://www.w3.org/2001/XMLSchema-instance',
ns2:'urn:PingdomAPI'
)
println records.'SOAP-ENV:Body'.'ns1:Auth_loginResponse'.return.status
println records.'SOAP-ENV:Body'.'ns1:Auth_loginResponse'.return.sessionId