How can I search by content of a node using nokogiri::XML ?
Lets say I have the following xml
<parts>
<part>
<name>foo</name>
<madein>
<city>ABC</city>
<country>XYZ</country>
</madein>
</part>
<part>
<name>foo</name>
<madein>
<city>PQR</city>
<country>XYZ</country>
</madein>
</part>
<part>
<name>foo</name>
<madein>
<city>ABC</city>
<country>XYZ</country>
</madein>
</part>
</parts>
And I want to get all parts for which /madein/city is ABC. What is the best way to get the parts nodes ?
I am using nokogiri gem.
Thanks
Xpath is query language for XML, and is very flexible.
To get you started:
doc = Nokogiri::XML::Document.parse( xml_string )
parts_from_abc = doc.xpath( '/parts/part[madein/city="ABC"]' )
As bdares suggested in the comments, if you want to do more, take a look at the tutorials.
Related
I am trying to create an xquery in jdeveloper . I am stuck at a small portion of it. It would be great if I get some suggestions.
Below is the part I am stuck at
The request is:
`<variables>
<variable name="StartTime" value="01:00:00"/>
<variable name= "EndTime" value="05:00:00"/>
The response I want to map is a single element with two values looks like below:
<ns2:time ns2:startTime="01:00:00" ns2:endTime="05:00:00"/>
Below is the xquery I tried. But I get only the start time at both places. I want some way by which I can correctly assign the values looking at the name value in the request.
if (fn:data($Prefereces/ns1:variables/ns1:variable/#name="StartTime")or fn:data($Prefereces/ns1:variables/ns1:variable/#name="EndTime")) then
( <ns2:time ns2:startTime="{fn:data($Prefereces/ns1:variables/ns1:variable/#value)}" ns2:endTime="{fn:data($Prefereces/ns1:variables/ns1:variable/#value)}">
</ns2:time>)
else
()
Thanks in advance.
You can use this :
<ns2:time ns2:startTime="{fn:data($Prefereces/variables/variable[#name = 'StartTime']/#value)}" ns2:endTime="{fn:data($Prefereces/variables/variable[#name = 'EndTime']/#value)}"/>
Note that the ns2 prefix has to be defined beforehand.
XQuery's predicates are specified using brackets ([condition]), which were missing from your tries.
I'm a little bit confused: could not find in web good examples of parsing xml with nokogiri...
example of my data:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<rows SessionGUID="6448680D1">
<row>
<AnalogueCode>0451103079</AnalogueCode>
<AnalogueCodeAsIs>0451103079</AnalogueCodeAsIs>
<AnalogueManufacturerName>BOSCH</AnalogueManufacturerName>
<AnalogueWeight>0.000</AnalogueWeight>
<CodeAsIs>OC90</CodeAsIs>
<DeliveryVariantPriceAKiloForClientDescription />
<DeliveryVariantPriceAKiloForClientPrice>0.00</DeliveryVariantPriceAKiloForClientPrice>
<DeliveryVariantPriceNote />
<PriceListItemDescription />
<PriceListItemNote />
<IsAvailability>1</IsAvailability>
<IsCross>1</IsCross>
<LotBase>1</LotBase>
<LotType>1</LotType>
<ManufacturerName>KNECHT/MAHLE</ManufacturerName>
<OfferName>MSC-STC-58</OfferName>
<PeriodMin>2</PeriodMin>
<PeriodMax>4</PeriodMax>
<PriceListDiscountCode>31087</PriceListDiscountCode>
<ProductName>Фильтр масляный</ProductName>
<Quantity>41</Quantity>
<SupplierID>30</SupplierID>
<GroupTitle>Замена</GroupTitle>
<Price>203.35</Price>
</row>
<row>
<AnalogueCode>0451103079</AnalogueCode>
<AnalogueCodeAsIs>0451103079</AnalogueCodeAsIs>
<AnalogueManufacturerName>BOSCH</AnalogueManufacturerName>
<AnalogueWeight>0.000</AnalogueWeight>
<CodeAsIs>OC90</CodeAsIs>
<DeliveryVariantPriceAKiloForClientDescription />
<DeliveryVariantPriceAKiloForClientPrice>0.00</DeliveryVariantPriceAKiloForClientPrice>
<DeliveryVariantPriceNote />
<PriceListItemDescription />
<PriceListItemNote>[0451103079] Bosch,MTGC#0451103079</PriceListItemNote>
<IsAvailability>1</IsAvailability>
<IsCross>1</IsCross>
<LotBase>1</LotBase>
<LotType>0</LotType>
<ManufacturerName>KNECHT/MAHLE</ManufacturerName>
<OfferName>MSC-STC-1303</OfferName>
<PeriodMin>3</PeriodMin>
<PeriodMax>5</PeriodMax>
<PriceListDiscountCode>102134</PriceListDiscountCode>
<ProductName>Фильтр масляный</ProductName>
<Quantity>5</Quantity>
<SupplierID>666</SupplierID>
<GroupTitle>Замена</GroupTitle>
<Price>172.99</Price>
</row>
</rows>
</root>
and ruby code:
...
xml_doc = Nokogiri::XML(response.body)
parts = xml_doc.xpath('/root/rows/row')
with the help of xpath i could do this? also how to get this parts object (row)?
You're on the right track. parts = xml_doc.xpath('/root/rows/row') gives you back a NodeSet i.e. a list of the <row> elements.
You can loop through these using each or use row indexes like parts[0], parts[1] to access specific rows. You can then get the values of child nodes using xpath on the individual rows.
e.g. you could build a list of the AnalogueCode for each part with:
codes = []
parts.each do |row|
codes << row.xpath('AnalogueCode').text
end
Looking at the full example of the XML you're processing there are 2 issues preventing your XPath from matching:
the <root> tag isn't actually the root element of the XML so /root/.. doesn't match
The XML is using namespaces so you need to include these in your XPaths
so there are a couple of possible solutions:
use CSS selectors rather than XPaths (i.e. use search) as suggested by the Tin Man
after xml_doc = Nokogiri::XML(response.body) do xml_doc.remove_namespaces! and then use parts = xml_doc.xpath('//root/rows/row') where the double slash is XPath syntax to locate the root node anywhere in the document
specify the namespaces:
e.g.
xml_doc = Nokogiri::XML(response.body)
ns = xml_doc.collect_namespaces
parts = xml_doc.xpath('//xmlns:rows/xmlns:row', ns)
codes = []
parts.each do |row|
codes << xpath('xmlns:AnalogueCode', ns).text
end
I would go with 1. or 2. :-)
First, Nokogiri supports XPath AND CSS. I recommend using CSS because it's more easily read:
doc.search('row')
will return a NodeSet of every <row> in the document.
The equivalent XPath is:
doc.search('//row')
...how to get this parts object (row)?
I'm not sure what that means, but if you want to access individual elements inside a <row>, it's easily done several ways.
If you only want one node inside each of the row nodes:
doc.search('row Price').map(&:to_xml)
# => ["<Price>203.35</Price>", "<Price>172.99</Price>"]
doc.search('//row/Price').map(&:to_xml)
# => ["<Price>203.35</Price>", "<Price>172.99</Price>"]
If you only want the first such occurrence, use at, which is the equivalent of search(...).first:
doc.at('row Price').to_xml
# => "<Price>203.35</Price>"
Typically we want to iterate over a number of blocks and return an array of hashes of the data found:
row_hash = doc.search('row').map{ |row|
{
AnalogueCode: row.at('AnalogueCode').text,
Price: row.at('Price').text,
}
}
row_hash
# => [{:AnalogueCode=>"0451103079", :Price=>"203.35"},
# {:AnalogueCode=>"0451103079", :Price=>"172.99"}]
These are ALL covered in Nokogiri's tutorials and are answered many times here on Stack Overflow, so take the time to read and search.
I have been trying to parse an XML file and all is going well except for one thing.
this is what my XML looks like:
<portfolio>
<item>
<image url="http://www.google.com" />
<title>my first title here.</title>
<desc>my first description here...</desc>
<date>15/07/2010</date>
<skills>skills 1, skills 2, skills 3</skills>
</item>
</portfolio>
I have been parsing: title, desc, date, and skills perfectly. The only issue I am having is parsing the image url. I am using this simple parser: https://github.com/robertmryan/Simple-XML-Parser
Anyway this is how I am setting up the element names to parse:
parser.elementNames = #[#"image", #"title", #"desc", #"date", #"skills"];
Anyway what do I feed into the element name for the image url based upon the XML snippet I gave above?
Thanks!
Edit:
I logged the dictionary it returns after trying the following 3 bits of code:
parser.attributeNames = #[#"image url"];
parser.attributeNames = #[#"image"];
parser.attributeNames = #[#"url"];
Each one of those (after being parsed), returns a dictionary which I logged as this:
dict keys: (
title,
skills,
desc,
date
)
So something is not working right.
The image element has a url attribute so you need to specify that you want the attribute to be parsed out too. Do this by setting the value of the attributeNames property on your parser.
This parser is really basic though so it has some limitations. Most important for you is that attributeNames is only used on the 'main' element (specified with rowElementName) so to do what you want to do you will need to edit the parser class to change that.
I have access to the HL7 Clinical Document Architecture, Release 2.0, which states that it is used essentially to link entries with each other in a CDA document. Specifically, it links between what is called the "source" and the "target" entries. I also read about the different types of relationships (CAUS, COMP, GEVL, MFST, REFR, RSON, SAS, SPRT, SUBJ, XCRPT) and somewhat understand those.
My main question: what are the "source" and "target" elements? Are they the element containing the entryRelationship, and the element contained by entryRelationship?
For example:
<entry typeCode="DRIV">
<act classCode="ACT" moodCode="EVN">
...
<entryRelationship typeCode="SUBJ">
<observation classCode="OBS" moodCode="EVN">
...
<entryRelationship typeCode="REFR">
<observation classCode="OBS" moodCode="EVN">
...
</observation>
</entryRelationship>
</observation>
</entryRelationship>
</act>
</entry>
In the above snippet, according to my understanding, there is a SUBJ relationship between the act the the first observation, and there is a REFR relationship between the two observations. Is this correct?
The source of a entryRelationship is the entry that contains in its body the element in your example the source entry is
<act classCode="ACT" moodCode="EVN">
and the target is
<observation classCode="OBS" moodCode="EVN">
Its is posible to indicate a inverse relationship using the "InversionInd" attribute of the entryRelationship elemet. If this attribute is set to true source and target are inverted.
I have a file with the following structure
<admin>
<sampleName>Willow oak leaf</sampleName>
<sampleDescription comment="Total genes">
<cvParam cvLabel="Bob" accession="123" name="Oak" />
</sampleDescription>
</admin>
I'm trying to get out the text "Total genes" after the sampleDescription comment, and I have used the following code:
sampleDescription = doc.xpath( "/admin/Description/#comment" )
sampleDescription = doc.xpath( "/admin/Description" ).text
But neither work. What am I missing?
might be a typo... have you tried doc.xpath("/admin/sampleDescription/#comment").text?
It's not working because there's no Description element. As mentioned by Iwe, you need to do something like sampleDescription = doc.xpath("/admin/sampleDescription/#comment").to_s
Also, if it were me, I would just do sampleDescription = doc.xpath("//sampleDescription/#comment").to_s. It's a simpler xpath, but it might be slower.
And as a note, something that trips up a lot of people are namespaces. If your xml document uses namespaces, do sampleDescription = doc.xpath("/xmlns:admin/sampleDescription/#comment").to_s. If your doc uses namespaces and you don't specify it with xmlns:, then Nokogiri won't return anything.
Try this:
doc.xpath("//admin/sampleDescription/#comment").to_s
doc.xpath returns a NodeSet which acts a bit like an array. So you need to grab the first element
doc.xpath("//admin/sampleDescription").first['comment']
You could also use at_xpath which is equivalent to xpath(foo).first
doc.at_xpath("//admin/sampleDescription")['comment']
Another thing to note is that attributes on nodes are accessed like hash elements--with [<key>]