REXML::Document.new take a simple string as good doc? - ruby-on-rails

I would like to check if the xml is valid. So, here is my code
require 'rexml/document'
begin
def valid_xml?(xml)
REXML::Document.new(xml)
rescue REXML::ParseException
return nil
end
bad_xml_2=%{aasdasdasd}
if(valid_xml?(bad_xml_2) == nil)
puts("bad xml")
raise "bad xml"
end
puts("good_xml")
rescue Exception => e
puts("exception" + e.message)
end
and it returns good_xml as result. Did I do something wrong? It will return bad_xml if the string is
bad_xml = %{
<tasks>
<pending>
<entry>Grocery Shopping</entry>
<done>
<entry>Dry Cleaning</entry>
</tasks>}

Personally, I'd recommend using Nokogiri, as it's the defacto standard for XML/HTML parsing in Ruby. Using it to parse a malformed document:
require 'nokogiri'
doc = Nokogiri::XML('<xml><foo><bar></xml>')
doc.errors # => [#<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: bar line 1 and xml>, #<Nokogiri::XML::SyntaxError: Premature end of data in tag foo line 1>, #<Nokogiri::XML::SyntaxError: Premature end of data in tag xml line 1>]
If I parse a document that is well-formed:
doc = Nokogiri::XML('<xml><foo/><bar/></xml>')
doc.errors # => []

REXML treats a simple string as a valid XML with no root node:
xml = REXML::Document.new('aasdasdasd')
# => <UNDEFINED> ... </>
It does not however treat illegal XML (with mismatching tags, for example) as a valid XML, and throws an exception.
REXML::Document.new(bad_xml)
# REXML::ParseException: #<REXML::ParseException: Missing end tag for 'done' (got "tasks")
It is missing an end-tag to <done> - so it is not valid.

Related

Ruby - checking if file is a CSV

I have just wrote a code where I get a csv file passed in argument and treat it line by line ; so far, everything is okay. Now, I would like to secure my code by making sure that what we receive in argument is a .csv file.
I saw in the Ruby doc that it exist a == "--file" option but using it generate an error : the way I understood it, it seems this option only work for the txt files.
Is there a method specific that allowed to check if my file is a csv ? Here some of my code :
if ARGV.empty?
puts "j'ai rien reçu"
# option to check, don't work
elsif ARGV[0].shift == "--file"
# my code so far, whithout checking
else CSV.foreach(ARGV.shift) do |row|
etc, etc...
I think it is unpossible to make a real safe test without additional information.
Just some notes what you can do:
You get a filename in a variable filename.
First, check if it is a file:
File.exist?
Then you could check, if the encoding is correct:
raise "Wrong encoding" unless content.valid_encoding?
Has your csv always the same number of columns? And do you have only one liner?
This can be a possibility to make the next check:
content.each_line{|line|
return false if line.count(sep) < columns - 1
}
This check can be modified for your case, e.g. if you have always an exact number of rows.
In total you can define something like:
require 'csv'
#columns defines the expected numer of columns per line
def csv?(filename, sep: ';', columns: 3)
return false unless File.exist?(filename) #"No file"
content = File.read(filename, :encoding => 'utf-8')
return false unless content.valid_encoding? #"Wrong encoding"
content.each_line{|line|
return false if line.count(sep) < columns - 1
}
CSV.parse(content, :col_sep => sep)
end
if csv = csv?('test.csv')
csv.each do |row|
p row
end
end
You can use ruby-filemagic gem
gem install ruby-filemagic
Usage:
$ irb
irb(main):001:0> require 'filemagic'
=> true
irb(main):002:0> fm = FileMagic.new
=> #<FileMagic:0x7fd4afb0>
irb(main):003:0> fm.file('foo.zip')
=> "Zip archive data, at least v2.0 to extract"
irb(main):004:0>
https://github.com/ricardochimal/ruby-filemagic
Use File.extname() to check the origin file
File.extname("test.rb") #=> ".rb"

Open filename with embedded spaces with Roo

Ruby 2.0.0, Rails 4.0.3, Windows 8.1 Update, Roo 1.13.2
I am trying to open an Excel spreadsheet with embedded spaces using Roo. So far, I am unable to do that. I don't really know if this problem is restricted to Roo. If I rename it to eliminate the spaces, I have no problem with it. I tried encoding it, but then it simply said the file doesn't exist. Can I open the file while it contains spaces?
Code sample:
exceptions = [URI::InvalidURIError, IOError]
puts "f is #{f}"
puts "f exist? #{File.exist?(f)}"
begin
xls = Roo::Spreadsheet.open(f)
rescue *exceptions => e
puts e.message
end
encoded_f = URI.encode(f).to_s
puts "encoded_f is #{encoded_f}"
puts "encoded_f exist? #{File.exist?(encoded_f)}"
begin
xls = Roo::Spreadsheet.open(encoded_f)
rescue *exceptions => e
puts e.message
end
gsub_f = f.gsub(" ", "") # Rename file without spaces
File.rename(f, gsub_f)
puts "gsub_f is #{gsub_f}"
puts "gsub_f exist? #{File.exist?(gsub_f)}"
begin
xls = Roo::Spreadsheet.open(gsub_f)
rescue *exceptions => e
puts e.message
end
Output sample:
f is Whitt Report 2014-07-28-0803.xls
f exist? true
bad URI(is not URI?): Whitt Report 2014-07-28-0803.xls
encoded_f is Whitt%20Report%202014-07-28-0803.xls
encoded_f exist? false
file Whitt%20Report%202014-07-28-0803.xls does not exist
gsub_f is WhittReport2014-07-28-0803.xls
gsub_f exist? true
No message is given in the end because the file opens successfully.
This is caused by the way in which the URI module is called in the Roo::Spreadsheet#open method.
I posted a fix to this problem which has now been merged. If you update your Roo gem you should no longer have this issue.

ruby net_dav sample puts

I'm trying to put a file on a site with WEB_DAV. (a ruby gem)
When I follow the example, I get a nil exception
#### GEMS
require 'rubygems'
begin
gem "net_dav"
rescue LoadError
system("gem install net_dav")
Gem.clear_paths
end
require 'net/dav'
uri = URI('https://staging.web.mysite');
user = "dave"
pasw = "correcthorsebatterystaple"
dav = Net::DAV.new(uri, :curl => false)
dav.verify_server = false
dav.credentials(user, pasw)
cargo = ("testing.txt")
File.open(cargo, "rb") { |stream|
dav.put(urI.path +'/'+ cargo, stream, File.size(cargo))
}
when I run this I get
`digest_auth': can't convert nil into String (TypeError)
this relates to line 197 in my nav.rb file.
request_digest << ':' << params['nonce']
So what I'm wondering is what step did I not add?
Is there a reasonable example of the correct use of this gem? Something that does something that works would be sweet :)
SIDE QUESTION: Is this the correct gem to use to do web_DAV? It seems an old unmaintained gem, perhaps there's something used by more to accomplish the task?
Try referencing the hash with a symbol rather than a string, i.e.
request_digest << ':' << params[:nonce]
In a simple test
baz = "baz"
params = {:foo => "bar"}
baz << ':' << params['foo']
results in the same error as you're getting.

Parse a malformed CSV line

I am parsing the following CSV lines. I need to rescue malformed lines that look like "Malformed" below. What is a regular expression I can use to do this? What considerations do I need to make?
body = %(
"Sensitive",2416,159,"Test "Malformed" Failure",2789,111,7-24-11,1800,0600,"R2","12323","",""
"Sensitive",2742,107,"Test",2791,112,7-24-11,1800,0600,"R1","","",""
"Sensitive",2700,135,"Test",2792,113,7-24-11,1800,0600,"R1","12110","","")
rows = []
body.each_line do |line|
begin
rows << FasterCSV.parse_line(line)
rescue FasterCSV::MalformedCSVError => e
rows << line if rescue_from_malformed_line(line)
rescue => e
Rails.logger.error(e.to_s)
Rails.logger.info(line)
end
end
I am not sure how malformed your data is malformed, but here is one approach for that line.
> puts line
"Sensitive",2416,159,"Test "Malformed" Failure",2789,111,7-24-11,1800,0600,"R2","12323","",""
>
> puts line.scan /[\d.-]+|(?:"[^"]*"[^",]*)+/
"Sensitive"
2416
159
"Test "Malformed" Failure"
2789
111
7-24-11
1800
0600
"R2"
"12323"
""
""
Note: Tested on ruby 1.9.2p290
You could use a regex to replace the nested double quotes with single quotes before it's passed to the parser.
Something like
.gsub(/(?<!^|,)"(?!,|$)/,"'")

Illegal quoting on line using FasterCSV in ruby 1.8.7

I am facing "Illegal quoting" error when parse the content from SQL dump and the dump file is in the format of TXT with tab (\t) separator.
require 'rubygems'
require 'faster_csv'
begin
FasterCSV.foreach(excel_file, :quote_char => '"',:col_sep =>'\t', :row_sep =>:auto, :headers => :first_row) do |row|
col= row.to_s.split(/\t/)
if col[3]!="" or !col[3].empty?
color_value=col[3].to_s.capitalize
#Inser Color
color=Color.find_or_create_by_name(:name=>color_value)
elsif col[3].empty?
color_id= nil
end
end
rescue Exception => e
puts e
end
The program executed and run successfully but there is an invalid data present like
below (#font-face ...) mean execution terminated with error of "Illegal quoting on line 3.
ID Name code comments
1 white 234 good
2 Black 222
3 red 343 #font-face { font-family: "Verdana"; .....}
Can any one suggest me how to skip when invalid data occurs in column ?
Thanks in advance.
I'm not sure if this will solve the error you are seeing, but you need to use double quotes around escaped characters, e.g.:
:col_sep => "\t"
FasterCSV isn't very kind to badly formatted data.
I don't know that there is a solution for this.
However - if your example file doesn't actually contain any quoting using "
then perhaps just use a different quot_char (eg ')
You can use the ASCII code for the NULL character -- \0x00 -- as such:
FasterCSV.foreach(excel_file, :quote_char => '\0x00',:col_sep =>'\t', :row_sep =>:auto, :headers => :first_row) do |row|
...
end
You can find a chart of some ASCII chars here: http://www.bluesock.org/~willg/dev/ascii.html

Resources