The document "doc.xml" does not have a valid root (REXML::ParseException) - ruby-on-rails

I'm trying to convert an XML document into a Ruby hash for the first time, and having no success. I have my XML document, doc.xml, in a folder along with my script hashrunner.rb.
In hashrunner.rb:
require 'active_support/core_ext/hash'
hash = Hash.from_xml("doc.xml")
puts hash
The first line of the XML document is <?xml version="1.0" encoding="US-ASCII"?>, if that is helpful.
In my console, when I run ruby hashrunner.rb, I get the error message:
/Users/me/.rvm/gems/ruby-1.9.3-p374/gems/activesupport-4.0.0/lib/active_support/xml_mini/rexml.rb:34:in `parse':The document "doc.xml" does not have a valid root (REXML::ParseException)
As someone relatively new to Ruby, I don't understand what this means, and some internet searching didn't turn up an explanation, either. To start, I'm not even sure if I'm calling the XML file correctly in the from_xml method, so please let me know if that's the case. I'd be open to using different gems or a different approach if that would help.

I'm pretty sure Hash::from_xml has to take an XML string, not a filename string. Try:
hash = Hash.from_xml(File.read("doc.xml"))

Related

Take a string representation of JSON and render it as JSON in the broswer

I'm playing around with the Twitter api, and have gotten back super long JSON response. I saved the response as a string in a seperate file, and I want to have Chrome display that string as JSON, so I can collapse/ expand the nested parts in JSON view.
I feel like there should be an easier way to do this rather than temporarily changing my api controller in Rails...any suggestions? This is for a Rails 4 app using Backbone.js in the front end.
Ah, stupid mistake on my part -- I was using one of the referred to chrome extensions, JSONView, and asked this question after being surprised that it wasn't working.
The reason it wasn't working was because contents of the file were not actually in JSON format, they were in a ruby hash.
I was able to fix it by replacing this:
File.open('exampleResponse', 'w') do |file|
file.write(Twitter::SearchResults.new(request).attrs)
end
with this:
File.open('exampleResponse', 'w') do |file|
file.write(Twitter::SearchResults.new(request).attrs.to_json)
end
There are many chrome extensions to view formatted json. I use JsonView and it works fine, but I imagine there are dozens if not hundreds to choose from.

How to parse Nokogiri/libXML XML errors to human-friendly errors?

We are using Nokogiri to validate XML files using a XSD. The problem is that the error messages that Nokogiri generates are not very friendly and very hard to translate:
"Element '{http://www.portalfiscal.inf.br/nfe}infNFe': The attribute 'Id' is required but missing."
Does anyone know of a parser or any other way to capture the info needed from the error to generate a more human friendly error?
Until then, we will be doing a custom parser for them... ouch!
I created a gem for this that is now open source: https://rubygems.org/gems/xml_errors_parser
It seems to work pretty well so far, but number of errors parsed is very few for now. It is however very easy to add new errors, so we will be adding them as needed.
Code reviews and pull requests are always great :)

Ruby on rails string parsing

I have a string that is a bunch of XML tags.
Basically there is the contents to one tag I want and ignore everything else:
The input would look like:
<Some><XML><stuff>
<title type='text'>key</title>
<Some><other><XML><stuff>
The output would look like:
key
I'm not sure if XML is appropriate since there doesn't seem very much structure to this particular XML.
Can regex do this in RoR or is it more of just a pattern matching thing (true or false) in ruby on rails?
Thanks so much!
Cheers,
Zigu
No. If your source could not be strictly valid XML, I strongly suggest you to use Nokogiri.
Handle the source as an HTML document and extract the info you need in this way:
doc = Nokogiri::HTML("Your string with <key>some value</key>"))
doc.search('key').each do |value|
puts value.content # do whatever you want
end
Here's why you don't parse xml with regexen: RegEx match open tags except XHTML self-contained tags

In Ruby on Rails, how can I convert html to word?

how can I convert html to word
thanks.
I have created a Ruby html to word gem that should help you do just that. You can check it out at https://github.com/nickfrandsen/htmltoword - You simply pass it a html string and it will create a corresponding word docx file.
def show
respond_to do |format|
format.docx do
file = Htmltoword::Document.create params[:docx_html_source], "file_name.docx"
send_file file.path, :disposition => "attachment"
end
end
end
Hope you find it helpful.
I am not aware of any solution which does this, i.e. convert HTML to Word format. If you literally mean that, you will have to parse the HTML document first using something like Nokogiri. If you mean you want to output data persisted in your model objects, there is obviously no need to parse HTML! As far as outputting to Word, I'm afraid it looks as if you will have to directly interface with a running instance of Microsoft Word via OLE!
A quick google search for win32ole ruby word will get you started:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/241606
Good luck!
I agree with CodeJoust that it is better to generate a PDF. However, if you really need to generate a Word document then you can do the following:
If your server is a Windows machine, you can install Office in it and use ruby's OLE binding to generate the Word document into the public folder and then deliver the file in the response.
To use ruby's OLE binding, see the "Programming Ruby" ebook that comes with the one-click ruby installer for Windows. You may have to use custom logic to convert from HTML to Word unless you can find a function in the OLE api of Word to do that.
http://prawn.majesticseacreature.com/
You could allow the user to download a PDF or a .html file, but there aren't any helpful ruby libraries to do that. You're better off generating a 'printable and downloadable' version, without much styling, and/or a pdf version using a library like prawn.
You could always generate a simple .rtf file, I think word'll be pretty happy reading that...

How to use ruby to get string between HTML <cite> tags?

Greetings everyone:
I would love to get some information from a huge collection of Google Search Result pages.
The only thing I need is the URLs inside a bunch of <cite></cite> HTML tags.
I cannot get a solution in any other proper way to handle this problem so now I am moving to ruby.
This is so far what I have written:
require 'net/http'
require 'uri'
url=URI.parse('http://www.google.com.au')
res= Net::HTTP.start(url.host, url.port){|http|
http.get('/#hl=en&q=helloworld')}
puts res.body
Unfortunately I cannot use the recommended hpricot ruby gem (because it misses a make command or something?)
So I would like to stick with this approach.
Now that I can get the response body as a string, the only thing I need is to retrieve whatever is inside the ciite(remove an i to see the true name :)) HTML tags.
How should I do that? using regular expression? Can anyone give me an example?
Here's one way to do it using Nokogiri:
Nokogiri::HTML(res.body).css("cite").map {|cite| cite.content}
I think this will solve it:
res.scan(/<cite>([^<>]*)<\/cite>/imu).flatten
# This one to ignore empty tags:
res.scan(/<cite>([^<>]*)<\/cite>/imu).flatten.select{|x| !x.empty?}
If you're having problems with hpricot, you could also try nokogiri which is very similar, and allows you to do the same things.
Split the string on the tag you want. Assuming only one instance of tag (or specify only one split) you'll have two pieces I'll call head and tail. Take tail and split it on the closing tag (once), so you'll now have two pieces in your new array. The new head is what was between your tags, and the new tail is the remainder of the string, which you may process again if the tag could appear more than once.
An example that may not be exactly correct but you get the idea:
head1, tail1 = str.split('<tag>', 1) # finds the opening tag
head2, tail2 = tail1.split('</tag>', 1) # finds the closing tag

Resources