PHP Error: DOMDocument::loadXML() [domdocument.loadxml]: Start tag expected, '<' not found in Entity - domdocument

Please check this bellow program.
::Program::
<?php
$xml='
<books>
<book>
<name>Java complete reference</name>
<cost>256</cost>
</book>
<book>
<name>Head First PHP and Mysql</name>
<cost>389</cost>
</book>
</books>';
$dom=new DOMDocument();
$dom->loadXML($xml);
foreach ($dom->getElementsByTagName('book') as $book)
{
foreach($book->getElementsByTagName('name') as $name)
{
$names[]=$name->nodeValue;
}
foreach($book->getElementsByTagName('cost') as $cost)
{
$costs[]=$cost->nodeValue;
}
}
print_r($names);
?>
It is shows error:
DOMDocument::loadXML() [domdocument.loadxml]: Start tag expected, '<' not found in Entity
Is this correct way to do this?
If it is correct, Is there any way to get the proper result without changing this < to < and > to >?

You should not be using character entities for < and > on things that are actually XML tags in the string that represents your XML. It should be this:
$xml='
<books>
<book>
...
Do that and the warning goes away.
You only need to use character entities for < and > when they are part of the actual data rather than delimiting an XML tag.

DOMDocument expects the string to be VALID xml.
Your string isn't a valid XML string. You should just use < in stead of <
Why would you have the htmlentities in that string?

Aren't you supposed to start with something like this
<?xml version="1.0" encoding="UTF-8" ?>
to create valid XML? That might be the missing start tag your error is talking about.

Thank you all. Just now i have tried with the method "html_entity_decode()". It is worked for me for this example.
::Code::
<?php
$xml='
<books>
<book>
<name>Java complete reference</name>
<cost>256</cost>
</book>
<book>
<name>Head First PHP and Mysql</name>
<cost>389</cost>
</book>
</books>';
$xml=html_entity_decode($xml);
$dom=new DOMDocument();
$dom->loadXML($xml);
foreach ($dom->getElementsByTagName('book') as $book)
{
foreach($book->getElementsByTagName('name') as $name)
{
$names[]=$name->nodeValue;
}
foreach($book->getElementsByTagName('cost') as $cost)
{
$costs[]=$cost->nodeValue;
}
}
print_r($names);
?>

Related

Xml parsing in rails

I have this XML data:
<?xml version="1.0" encoding="UTF-8"?>
<responseParam>
<RESULT>-1</RESULT>
<ERROR_CODE>509</ERROR_CODE>
</responseParam>
How can I fetch the value of error code only?
I have tried this :
result = Net::HTTP.get(URI.parse(otpUrl))
data = Hash.from_xml(result)
puts "#{data['ERROR_CODE']}"
puts data[:ERROR_CODE]
printing only "data" gives me the whole hash. I am not able to get only the value of ERROR_CODE.
Any help ?
you can use Nokigiri here.
suppose this is your error.xml
<?xml version="1.0" encoding="UTF-8"?>
<responseParam>
<RESULT>-1</RESULT>
<ERROR_CODE>509</ERROR_CODE>
</responseParam>
you can do something like:-
#doc = Nokogiri::XML(File.open("error.xml"))
#doc.xpath("//ERROR_CODE")
will give you something like:-
# => ["<ERROR_CODE>509</ERROR_CODE>]"
The Node methods xpath and css actually return a NodeSet, which acts very much like an array, and contains matching nodes from the document.

Rails nokogiri parse XML file

I'm a little bit confused: could not find in web good examples of parsing xml with nokogiri...
example of my data:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<rows SessionGUID="6448680D1">
<row>
<AnalogueCode>0451103079</AnalogueCode>
<AnalogueCodeAsIs>0451103079</AnalogueCodeAsIs>
<AnalogueManufacturerName>BOSCH</AnalogueManufacturerName>
<AnalogueWeight>0.000</AnalogueWeight>
<CodeAsIs>OC90</CodeAsIs>
<DeliveryVariantPriceAKiloForClientDescription />
<DeliveryVariantPriceAKiloForClientPrice>0.00</DeliveryVariantPriceAKiloForClientPrice>
<DeliveryVariantPriceNote />
<PriceListItemDescription />
<PriceListItemNote />
<IsAvailability>1</IsAvailability>
<IsCross>1</IsCross>
<LotBase>1</LotBase>
<LotType>1</LotType>
<ManufacturerName>KNECHT/MAHLE</ManufacturerName>
<OfferName>MSC-STC-58</OfferName>
<PeriodMin>2</PeriodMin>
<PeriodMax>4</PeriodMax>
<PriceListDiscountCode>31087</PriceListDiscountCode>
<ProductName>Фильтр масляный</ProductName>
<Quantity>41</Quantity>
<SupplierID>30</SupplierID>
<GroupTitle>Замена</GroupTitle>
<Price>203.35</Price>
</row>
<row>
<AnalogueCode>0451103079</AnalogueCode>
<AnalogueCodeAsIs>0451103079</AnalogueCodeAsIs>
<AnalogueManufacturerName>BOSCH</AnalogueManufacturerName>
<AnalogueWeight>0.000</AnalogueWeight>
<CodeAsIs>OC90</CodeAsIs>
<DeliveryVariantPriceAKiloForClientDescription />
<DeliveryVariantPriceAKiloForClientPrice>0.00</DeliveryVariantPriceAKiloForClientPrice>
<DeliveryVariantPriceNote />
<PriceListItemDescription />
<PriceListItemNote>[0451103079] Bosch,MTGC#0451103079</PriceListItemNote>
<IsAvailability>1</IsAvailability>
<IsCross>1</IsCross>
<LotBase>1</LotBase>
<LotType>0</LotType>
<ManufacturerName>KNECHT/MAHLE</ManufacturerName>
<OfferName>MSC-STC-1303</OfferName>
<PeriodMin>3</PeriodMin>
<PeriodMax>5</PeriodMax>
<PriceListDiscountCode>102134</PriceListDiscountCode>
<ProductName>Фильтр масляный</ProductName>
<Quantity>5</Quantity>
<SupplierID>666</SupplierID>
<GroupTitle>Замена</GroupTitle>
<Price>172.99</Price>
</row>
</rows>
</root>
and ruby code:
...
xml_doc = Nokogiri::XML(response.body)
parts = xml_doc.xpath('/root/rows/row')
with the help of xpath i could do this? also how to get this parts object (row)?
You're on the right track. parts = xml_doc.xpath('/root/rows/row') gives you back a NodeSet i.e. a list of the <row> elements.
You can loop through these using each or use row indexes like parts[0], parts[1] to access specific rows. You can then get the values of child nodes using xpath on the individual rows.
e.g. you could build a list of the AnalogueCode for each part with:
codes = []
parts.each do |row|
codes << row.xpath('AnalogueCode').text
end
Looking at the full example of the XML you're processing there are 2 issues preventing your XPath from matching:
the <root> tag isn't actually the root element of the XML so /root/.. doesn't match
The XML is using namespaces so you need to include these in your XPaths
so there are a couple of possible solutions:
use CSS selectors rather than XPaths (i.e. use search) as suggested by the Tin Man
after xml_doc = Nokogiri::XML(response.body) do xml_doc.remove_namespaces! and then use parts = xml_doc.xpath('//root/rows/row') where the double slash is XPath syntax to locate the root node anywhere in the document
specify the namespaces:
e.g.
xml_doc = Nokogiri::XML(response.body)
ns = xml_doc.collect_namespaces
parts = xml_doc.xpath('//xmlns:rows/xmlns:row', ns)
codes = []
parts.each do |row|
codes << xpath('xmlns:AnalogueCode', ns).text
end
I would go with 1. or 2. :-)
First, Nokogiri supports XPath AND CSS. I recommend using CSS because it's more easily read:
doc.search('row')
will return a NodeSet of every <row> in the document.
The equivalent XPath is:
doc.search('//row')
...how to get this parts object (row)?
I'm not sure what that means, but if you want to access individual elements inside a <row>, it's easily done several ways.
If you only want one node inside each of the row nodes:
doc.search('row Price').map(&:to_xml)
# => ["<Price>203.35</Price>", "<Price>172.99</Price>"]
doc.search('//row/Price').map(&:to_xml)
# => ["<Price>203.35</Price>", "<Price>172.99</Price>"]
If you only want the first such occurrence, use at, which is the equivalent of search(...).first:
doc.at('row Price').to_xml
# => "<Price>203.35</Price>"
Typically we want to iterate over a number of blocks and return an array of hashes of the data found:
row_hash = doc.search('row').map{ |row|
{
AnalogueCode: row.at('AnalogueCode').text,
Price: row.at('Price').text,
}
}
row_hash
# => [{:AnalogueCode=>"0451103079", :Price=>"203.35"},
# {:AnalogueCode=>"0451103079", :Price=>"172.99"}]
These are ALL covered in Nokogiri's tutorials and are answered many times here on Stack Overflow, so take the time to read and search.

Read XML file with Nokogiri

I currently have an XML file that is reading correctly except for one part. It is an item list and sometimes one item has multiple barcodes. In my code it only pulls out the first. How can I iterate over multiple barcodes. Please see code below:
def self.pos_import(xml)
Plu.transaction do
Plu.delete_all
xml.xpath('//Item').each do |xml|
plu_import = Plu.new
plu_import.update_pointer = xml.at('Update_Type').content
plu_import.plu = xml.at('item_no').content
plu_import.dept = xml.at('department').content
plu_import.item_description = xml.at('item_description').content
plu_import.price = xml.at('item_price').content
plu_import.barcodes = xml.at('UPC_Code').content
plu_import.sync_date = Time.now
plu_import.save!
end
end
My test XML file looks like this:
<?xml version="1.0" encoding="UTF-16" standalone="no"?>
<items>
<Item>
<Update_Type>2</Update_Type>
<item_no>0000005110</item_no>
<department>2</department>
<item_description>DISC-ALCOHOL PAD STERIL 200CT</item_description>
<item_price>7.99</item_price>
<taxable>No</taxable>
<Barcode>
<UPC_Code>0000005110</UPC_Code>
<UPC_Code>1234567890</UPC_Code>
</Barcode>
</Item>
</Items>
Any ideas how to pull both UPC_Code fields out and write them to my database?
.at will always return a single element. To get an array of elements use xpath like you do to get the list of Item elements.
plu_import.barcodes = xml.xpath('//UPC_Code').map(&:content)
Thanks for all the great tips. It definitely led me in the right direction. The way that I got it to work was just adding a period before the double //.
plu_import.barcodes = xml.xpath('.//UPC_Code').map(&:content)

Rss Feed Parse in ruby Using Nokogiri when feed have `CDATA` `tag` [duplicate]

I have seen several things on this, but nothing has seemed to work so far. I am parsing an xml via a url using nokogiri on rails 3 ruby 1.9.2.
A snippet of the xml looks like this:
<NewsLineText>
<![CDATA[
Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.
]]>
</NewsLineText>
I am trying to parse this out to get the text associated with the NewsLineText
r = node.at_xpath('.//newslinetext') if node.at_xpath('.//newslinetext')
s = node.at_xpath('.//newslinetext').text if node.at_xpath('.//newslinetext')
t = node.at_xpath('.//newslinetext').content if node.at_xpath('.//newslinetext')
puts r
puts s ? if s.blank? 'NOTHING' : s
puts t ? if t.blank? 'NOTHING' : t
What I get in return is
<newslinetext></newslinetext>
NOTHING
NOTHING
So I know my tags are named/spelled correctly to get at the newslinetext data, but the cdata text never shows up.
What do I need to do with nokogiri to get this text?
You're trying to parse XML using Nokogiri's HMTL parser. If node as from the XML parser then r would be nil since XML is case sensitive; your r is not nil so you're using the HTML parser which is case insensitive.
Use Nokogiri's XML parser and you will get things like this:
>> r = doc.at_xpath('.//NewsLineText')
=> #<Nokogiri::XML::Element:0x8066ad34 name="NewsLineText" children=[#<Nokogiri::XML::Text:0x8066aac8 "\n ">, #<Nokogiri::XML::CDATA:0x8066a9c4 "\n Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.\n ">, #<Nokogiri::XML::Text:0x8066a8d4 "\n">]>
>> r.text
=> "\n \n Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.\n \n"
and you'll be able to get at the CDATA through r.text or r.children.
Ah I see. What #mu said is correct. But to get at the cdata directly, maybe:
xml =<<EOF
<NewsLineText>
<![CDATA[
Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.
]]>
</NewsLineText>
EOF
node = Nokogiri::XML xml
cdata = node.search('NewsLineText').children.find{|e| e.cdata?}

xml markupbuilder in grails changing single quotes in atribute value to &apos;

I am using groovys xml markupbuilder to generate my xml. I have attribute of a tag which has single quote (') as part of its value, and when I set it in the code and do a printout, I see the generated xml has the single quote changed to &apos;
Is this automatically converted to single quote when I render this xml string in gsp?
if not how do I retain the single quote in the attribute value?
I tried to escape the single quote using \ but it stil shows &apos in output log
here is the markupbuilder code I have
xml.map(id:"worldmap",name:"worldmap"){
res_row.each{
area(shape:"circle",alt:it.key,title:it.key,onclick:"loadActivity(\'"+it.key+"\')")
}
}
the final attribute should be onclick="loadActivity('New York')"
Thanks
you can configure the markup-builder to use double quotes:
xml.setDoubleQuotes(true)
complete example:
import groovy.xml.MarkupBuilder
def xml = new MarkupBuilder()
xml.setDoubleQuotes(true)
def res_row = [a:1, b:2]
def text= xml.map(id:"worldmap",name:"worldmap"){
res_row.each{
area(shape:"circle",alt:it.key,title:it.key,onclick:"loadActivity('${it.key}')")
}
}
println text
prints:
<map id="worldmap" name="worldmap">
<area shape="circle" alt="a" title="a" onclick="loadActivity('a')" />
<area shape="circle" alt="b" title="b" onclick="loadActivity('b')" />

Resources