How to find text in XML and display its sibling with Nokogiri? - ruby-on-rails

I have the following XML consumed from a REST API:
<dataitems>
<dataitem colour="null">
<value>
<label>Intel</label>
<count>43</count>
</value>
<value>
<label>AMD</label>
<count>39</count>
</value>
<value>
<label>ARM</label>
<count>28</count>
</value>
</dataitem>
</dataitems>
I would like to search for the text in the <label> tag and display the matching value for <count> in a table.
In the controller I have: #post_count = Nokogiri::XML(Post.item_data.to_xml).
In the view I'm not sure whether I need to use #post_count.xpath or #post_count.search.
Can someone please point me in the right direction on the right method and syntax for it?
Thanks in advance.

Although I'm not sure what information you are looking for, I have a few suggestions.
1) If you know the value in the element before you do your searching:
doc = Nokogiri.XML(open(source_xml))
# Assuming there is only one of each label
node = doc.xpath('//label[text()="Intel"]').first
count = node.next_element.text
# or if there are many of each label
nodes = doc.xpath('//label[text()="Intel"]')
nodes.each {|node|
count = node.next_element.text
# do something with count here
}
2) Assuming that you don't know the names within the tag in advance
doc = Nokogiri.XML(open(source_xml))
labels = {}
doc.xpath('//label').each {|node|
labels[node.text] = node.next_element.text
}
# labels => {"Intel"=>"43", "AMD"=>"39", "ARM"=>"28"}
I personally like the second solution better because it gives you a clean hash, but I prefer to work with hashes and arrays as quickly as possible.

Related

Xquery to map the value of a specific name attribute

I am trying to create an xquery in jdeveloper . I am stuck at a small portion of it. It would be great if I get some suggestions.
Below is the part I am stuck at
The request is:
`<variables>
<variable name="StartTime" value="01:00:00"/>
<variable name= "EndTime" value="05:00:00"/>
The response I want to map is a single element with two values looks like below:
<ns2:time ns2:startTime="01:00:00" ns2:endTime="05:00:00"/>
Below is the xquery I tried. But I get only the start time at both places. I want some way by which I can correctly assign the values looking at the name value in the request.
if (fn:data($Prefereces/ns1:variables/ns1:variable/#name="StartTime")or fn:data($Prefereces/ns1:variables/ns1:variable/#name="EndTime")) then
( <ns2:time ns2:startTime="{fn:data($Prefereces/ns1:variables/ns1:variable/#value)}" ns2:endTime="{fn:data($Prefereces/ns1:variables/ns1:variable/#value)}">
</ns2:time>)
else
()
Thanks in advance.
You can use this :
<ns2:time ns2:startTime="{fn:data($Prefereces/variables/variable[#name = 'StartTime']/#value)}" ns2:endTime="{fn:data($Prefereces/variables/variable[#name = 'EndTime']/#value)}"/>
Note that the ns2 prefix has to be defined beforehand.
XQuery's predicates are specified using brackets ([condition]), which were missing from your tries.

Rails nokogiri parse XML file

I'm a little bit confused: could not find in web good examples of parsing xml with nokogiri...
example of my data:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<rows SessionGUID="6448680D1">
<row>
<AnalogueCode>0451103079</AnalogueCode>
<AnalogueCodeAsIs>0451103079</AnalogueCodeAsIs>
<AnalogueManufacturerName>BOSCH</AnalogueManufacturerName>
<AnalogueWeight>0.000</AnalogueWeight>
<CodeAsIs>OC90</CodeAsIs>
<DeliveryVariantPriceAKiloForClientDescription />
<DeliveryVariantPriceAKiloForClientPrice>0.00</DeliveryVariantPriceAKiloForClientPrice>
<DeliveryVariantPriceNote />
<PriceListItemDescription />
<PriceListItemNote />
<IsAvailability>1</IsAvailability>
<IsCross>1</IsCross>
<LotBase>1</LotBase>
<LotType>1</LotType>
<ManufacturerName>KNECHT/MAHLE</ManufacturerName>
<OfferName>MSC-STC-58</OfferName>
<PeriodMin>2</PeriodMin>
<PeriodMax>4</PeriodMax>
<PriceListDiscountCode>31087</PriceListDiscountCode>
<ProductName>Фильтр масляный</ProductName>
<Quantity>41</Quantity>
<SupplierID>30</SupplierID>
<GroupTitle>Замена</GroupTitle>
<Price>203.35</Price>
</row>
<row>
<AnalogueCode>0451103079</AnalogueCode>
<AnalogueCodeAsIs>0451103079</AnalogueCodeAsIs>
<AnalogueManufacturerName>BOSCH</AnalogueManufacturerName>
<AnalogueWeight>0.000</AnalogueWeight>
<CodeAsIs>OC90</CodeAsIs>
<DeliveryVariantPriceAKiloForClientDescription />
<DeliveryVariantPriceAKiloForClientPrice>0.00</DeliveryVariantPriceAKiloForClientPrice>
<DeliveryVariantPriceNote />
<PriceListItemDescription />
<PriceListItemNote>[0451103079] Bosch,MTGC#0451103079</PriceListItemNote>
<IsAvailability>1</IsAvailability>
<IsCross>1</IsCross>
<LotBase>1</LotBase>
<LotType>0</LotType>
<ManufacturerName>KNECHT/MAHLE</ManufacturerName>
<OfferName>MSC-STC-1303</OfferName>
<PeriodMin>3</PeriodMin>
<PeriodMax>5</PeriodMax>
<PriceListDiscountCode>102134</PriceListDiscountCode>
<ProductName>Фильтр масляный</ProductName>
<Quantity>5</Quantity>
<SupplierID>666</SupplierID>
<GroupTitle>Замена</GroupTitle>
<Price>172.99</Price>
</row>
</rows>
</root>
and ruby code:
...
xml_doc = Nokogiri::XML(response.body)
parts = xml_doc.xpath('/root/rows/row')
with the help of xpath i could do this? also how to get this parts object (row)?
You're on the right track. parts = xml_doc.xpath('/root/rows/row') gives you back a NodeSet i.e. a list of the <row> elements.
You can loop through these using each or use row indexes like parts[0], parts[1] to access specific rows. You can then get the values of child nodes using xpath on the individual rows.
e.g. you could build a list of the AnalogueCode for each part with:
codes = []
parts.each do |row|
codes << row.xpath('AnalogueCode').text
end
Looking at the full example of the XML you're processing there are 2 issues preventing your XPath from matching:
the <root> tag isn't actually the root element of the XML so /root/.. doesn't match
The XML is using namespaces so you need to include these in your XPaths
so there are a couple of possible solutions:
use CSS selectors rather than XPaths (i.e. use search) as suggested by the Tin Man
after xml_doc = Nokogiri::XML(response.body) do xml_doc.remove_namespaces! and then use parts = xml_doc.xpath('//root/rows/row') where the double slash is XPath syntax to locate the root node anywhere in the document
specify the namespaces:
e.g.
xml_doc = Nokogiri::XML(response.body)
ns = xml_doc.collect_namespaces
parts = xml_doc.xpath('//xmlns:rows/xmlns:row', ns)
codes = []
parts.each do |row|
codes << xpath('xmlns:AnalogueCode', ns).text
end
I would go with 1. or 2. :-)
First, Nokogiri supports XPath AND CSS. I recommend using CSS because it's more easily read:
doc.search('row')
will return a NodeSet of every <row> in the document.
The equivalent XPath is:
doc.search('//row')
...how to get this parts object (row)?
I'm not sure what that means, but if you want to access individual elements inside a <row>, it's easily done several ways.
If you only want one node inside each of the row nodes:
doc.search('row Price').map(&:to_xml)
# => ["<Price>203.35</Price>", "<Price>172.99</Price>"]
doc.search('//row/Price').map(&:to_xml)
# => ["<Price>203.35</Price>", "<Price>172.99</Price>"]
If you only want the first such occurrence, use at, which is the equivalent of search(...).first:
doc.at('row Price').to_xml
# => "<Price>203.35</Price>"
Typically we want to iterate over a number of blocks and return an array of hashes of the data found:
row_hash = doc.search('row').map{ |row|
{
AnalogueCode: row.at('AnalogueCode').text,
Price: row.at('Price').text,
}
}
row_hash
# => [{:AnalogueCode=>"0451103079", :Price=>"203.35"},
# {:AnalogueCode=>"0451103079", :Price=>"172.99"}]
These are ALL covered in Nokogiri's tutorials and are answered many times here on Stack Overflow, so take the time to read and search.

Find child of child which attribute code is equal to the parameter passed on the url - XSL

On this dynamic website,
The url looks something like this : departments/CHEM.html
CHEM is a parameter.
<xsl:param name="dep" select="'CHEM'" />
a piece of the xml is below
<course acad_year="2012" cat_num="5085" offered="Y">
<term term_pattern_code="1" fall_term="Y" spring_term="N">fall term</term>
<department code="CHEM">
<dept_long_name>Department of Chemistry and Chemical Biology</dept_long_name>
<dept_short_name>Chemistry and Chemical Biology</dept_short_name>
</department>
</course> ....
I am trying to get the dept_short_name to use on my H1 tag, but I have not been successful.So far I tried
<h2><xsl:value-of select="course/department/[code={#$dep}]"/></h2>
Any suggestions??? Thanks!
Just use:
<xsl:value-of select="course/department[#code eq $dep]/dept_short_name"/>
Remember:
In XPath 2.0 (XSLT 2.0) use the eq operator for value comparissons -- it is more efficient than the general comparisson operator = which really, only, needs to be used when at least one of its operands is a sequence.
I would try this:
<xsl:value-of select="course/department[#code=$dep]/dept_short_name/text()"/>
That says: find the department element (inside a course element) whose code attribute is the value of parameter "dep", then find the dept_short_name child element, then get the text inside that element.
You have to use the # to say that "code" is an attribute, but "dep" should not have it. I think the {} notation is for use inside attributes of the non-XSLT elements of your stylesheet, so I wouldn't use it inside a value-of expression.

Nokogiri pull parser (Nokogiri::XML::Reader) issue with self closing tag

I have a huge XML(>400MB) containing products. Using a DOM parser is therefore excluded, so i tried to parse and process it using a pull parser. Below is a snippet from the each_product(&block) method where i iterate over the product list.
Basically, using a stack, i transform each <product> ... </product> node into a hash and process it.
while (reader.read)
case reader.node_type
#start element
when Nokogiri::XML::Node::ELEMENT_NODE
elem_name = reader.name.to_s
stack.push([elem_name, {}])
#text element
when Nokogiri::XML::Node::TEXT_NODE, Nokogiri::XML::Node::CDATA_SECTION_NODE
stack.last[1] = reader.value
#end element
when Nokogiri::XML::Node::ELEMENT_DECL
return if stack.empty?
elem = stack.pop
parent = stack.last
if parent.nil?
yield(elem[1])
elem = nil
next
end
key = elem[0]
parent_childs = parent[1]
# ...
parent_childs[key] = elem[1]
end
The issue is on self-closing tags (EG <country/>), as i can not make the difference between a 'normal' and a 'self-closing' tag. They both are of type Nokogiri::XML::Node::ELEMENT_NODE and i am not able to find any other discriminator in the documentation.
Any ideas on how to solve this issue?
There is a feature request on project page regarding this issue (with the corresponding failing test).
Until it will be fixed and pushed into the current version, we'll stick with good'ol
input_text.gsub! /<([^<>]+)\/>/, '<\1></\1>'
Hey Vlad, well I am not a Nokogiri expert but I've done a test and saw that the self_closing?() method works fine on determining the self closing tags. Give it a try.
P.S. : I know this is an old post :P / the documentation is HERE

Hpricot Element intersection

I want to remove all images from a HTML page (actually tinymce user input) which do not meet certain criteria (class = "int" or class = "ext") and I'm struggeling with the correct approach. That's what I'm doing so far:
hbody = Hpricot(input)
#internal_images = hbody.search("//img[#class='int']")
#external_images = hbody.search("//img[#class='ext']")
But I don't know how to find images where the class has the wrong value (not "int" or "ext").
I also have to loop over the elements to check other attributes which are not standard html (I use them for setting internal values like the DB id, which I set in the attribute dbsrc). Can I access these attributes too and is there a way to remove certain elements (which are in the hpricot search result) when they don't meet my criteria?
Thanks for your help!
>> doc = Hpricot.parse('<html><img src="foo" class="int" /><img src="bar" bar="42" /><img src="foobar" class="int"></html>')
=> #<Hpricot::Doc {elem <html> {emptyelem <img class="int" src="foo">} {emptyelem <img src="bar" bar="42">} {emptyelem <img class="int" src="foobar">} </html>}>
>> doc.search("img")[1][:bar]
=> "42"
>> doc.search("img") - doc.search("img.int")
=> [{emptyelem img src"bar" bar"42"}]
Once you have results from search you can use normal array operations. nonstandard attributes are accessible through [].
Check out the not CSS selector.
(hbody."img:not(.int)")
(hbody."img:not(.ext)")
Unfortunately, it doesn't seem you can concat not expressions. You might want to fetch all img nodes and remove those where the .css selector doesn't include neither .int nor .ext.
Additionally, you could use the difference operator to calculate which elements are not part of both collections.
Use the .remove method to remove nodes or elements: Hpricot Altering documentation.

Resources