Illegal quoting on line using FasterCSV in ruby 1.8.7 - ruby-on-rails

I am facing "Illegal quoting" error when parse the content from SQL dump and the dump file is in the format of TXT with tab (\t) separator.
require 'rubygems'
require 'faster_csv'
begin
FasterCSV.foreach(excel_file, :quote_char => '"',:col_sep =>'\t', :row_sep =>:auto, :headers => :first_row) do |row|
col= row.to_s.split(/\t/)
if col[3]!="" or !col[3].empty?
color_value=col[3].to_s.capitalize
#Inser Color
color=Color.find_or_create_by_name(:name=>color_value)
elsif col[3].empty?
color_id= nil
end
end
rescue Exception => e
puts e
end
The program executed and run successfully but there is an invalid data present like
below (#font-face ...) mean execution terminated with error of "Illegal quoting on line 3.
ID Name code comments
1 white 234 good
2 Black 222
3 red 343 #font-face { font-family: "Verdana"; .....}
Can any one suggest me how to skip when invalid data occurs in column ?
Thanks in advance.

I'm not sure if this will solve the error you are seeing, but you need to use double quotes around escaped characters, e.g.:
:col_sep => "\t"

FasterCSV isn't very kind to badly formatted data.
I don't know that there is a solution for this.
However - if your example file doesn't actually contain any quoting using "
then perhaps just use a different quot_char (eg ')

You can use the ASCII code for the NULL character -- \0x00 -- as such:
FasterCSV.foreach(excel_file, :quote_char => '\0x00',:col_sep =>'\t', :row_sep =>:auto, :headers => :first_row) do |row|
...
end
You can find a chart of some ASCII chars here: http://www.bluesock.org/~willg/dev/ascii.html

Related

Ruby - checking if file is a CSV

I have just wrote a code where I get a csv file passed in argument and treat it line by line ; so far, everything is okay. Now, I would like to secure my code by making sure that what we receive in argument is a .csv file.
I saw in the Ruby doc that it exist a == "--file" option but using it generate an error : the way I understood it, it seems this option only work for the txt files.
Is there a method specific that allowed to check if my file is a csv ? Here some of my code :
if ARGV.empty?
puts "j'ai rien reçu"
# option to check, don't work
elsif ARGV[0].shift == "--file"
# my code so far, whithout checking
else CSV.foreach(ARGV.shift) do |row|
etc, etc...
I think it is unpossible to make a real safe test without additional information.
Just some notes what you can do:
You get a filename in a variable filename.
First, check if it is a file:
File.exist?
Then you could check, if the encoding is correct:
raise "Wrong encoding" unless content.valid_encoding?
Has your csv always the same number of columns? And do you have only one liner?
This can be a possibility to make the next check:
content.each_line{|line|
return false if line.count(sep) < columns - 1
}
This check can be modified for your case, e.g. if you have always an exact number of rows.
In total you can define something like:
require 'csv'
#columns defines the expected numer of columns per line
def csv?(filename, sep: ';', columns: 3)
return false unless File.exist?(filename) #"No file"
content = File.read(filename, :encoding => 'utf-8')
return false unless content.valid_encoding? #"Wrong encoding"
content.each_line{|line|
return false if line.count(sep) < columns - 1
}
CSV.parse(content, :col_sep => sep)
end
if csv = csv?('test.csv')
csv.each do |row|
p row
end
end
You can use ruby-filemagic gem
gem install ruby-filemagic
Usage:
$ irb
irb(main):001:0> require 'filemagic'
=> true
irb(main):002:0> fm = FileMagic.new
=> #<FileMagic:0x7fd4afb0>
irb(main):003:0> fm.file('foo.zip')
=> "Zip archive data, at least v2.0 to extract"
irb(main):004:0>
https://github.com/ricardochimal/ruby-filemagic
Use File.extname() to check the origin file
File.extname("test.rb") #=> ".rb"

Ruby CSV File Parsing, Headers won't format?

My rb file reads:
require "csv"
puts "Program1 initialized."
contents = CSV.open "data.csv", headers: true
contents.each do |row|
name = row[4]
puts name
end
...but when i run it in ruby it wont load the program. it gives me the error message about the headers:
syntax error, unexpected ':', expecting $end
contents = CSV.open "data.csv", headers: true
so I'm trying to figure out, why won't ruby let me parse this file? I've tried using other csv files I have and it won't load, and gives me an error message. I'm trying just to get the beginning of the program going! I feel like it has to do with the headers. I've updated as much as I can, mind you I'm using ruby 1.8.7. I read somewhere else that I could try to run the program in irb but it didn't seem like it needed it. so yeah... thank you in advance!!!!
Since you are using this with Ruby 1.8.7, :headers => true won't work in this way.
The simplest way to ignore the headers and get your data is to shift the first row in the data, which would be the headers:
require 'csv'
contents = CSV.open("data.csv", 'r')
contents.shift
contents.each do |row|
name = row[4]
puts name
end
If you do want to use the syntax with headers in ruby 1.8, you would need to use FasterCSV, something similar to this:
require 'fastercsv'
FasterCSV.foreach("data.csv", :headers => true) do |fcsv_obj|
puts fcsv_obj['name']
end
(Refer this question for further read: Parse CSV file with header fields as attributes for each row)

Rails: image_tag helper / umlaut in file name throws error in production

I am uploading an image with a file name containing an umlaut via dragonfly in a Rails 3 app on Heroku. Then I'm trying to display the image using
image_tag #model.image.url, …
In development everything works just fine, but in production I'm getting:
incompatible character encodings: UTF-8 and ASCII-8BIT
.bundle/gems/ruby/1.9.1/gems/actionpack-3.0.7/lib/action_view/helpers/tag_helper.rb:129:in `*'
After reading a bit I've added
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
in environment.rb but the problem remains.
What is the proper way to go about this? Do I have to fix the file name when uploading? I was under the impression this should work just fine in Rails 3?
Well, you could try something like url.force_encoding('utf8')
You could also simply sanitize the url in the model before saving it to the database - that's what I did. And, yes, I sometimes stumble over this in the weirdest places, too.
This is what my model looked like:
# encoding: UTF-8
class Page < ActiveRecord::Base
before_save :sanitize_title
private
def sanitize_title
self.title = self.title.force_encoding('UTF-8').downcase.gsub(/[ \-äöüß]/, ' ' => '_', '-' => '_', 'ä' => 'ae', 'ö' => 'oe', 'ü' => 'ue', 'ß' => 'ss').gsub(/[^a-z_]/,'')
end
end
This will replace the German umlaute with their ASCII counterparts, convert spaces to underscores and drop everything else.
The first line # encoding: UTF-8 is important or ruby will complain of non-ASCII characters in the model.rb file...
In addition to #Rhywden's answer, here my solution specific for Dragonfly:
image_accessor :image do :after_assign
after_assign{|i| i.name = sanitize_filename(image.name) }
end
def sanitize_filename(filename)
filename.strip.tap do |name|
name.sub! /\A.*(\\|\/)/, ''
name.gsub! /[^\w\.\-]/, '_'
end
end
Details here http://markevans.github.com/dragonfly/file.Models.html and here http://guides.rubyonrails.org/security.html#file-uploads .

Parse a malformed CSV line

I am parsing the following CSV lines. I need to rescue malformed lines that look like "Malformed" below. What is a regular expression I can use to do this? What considerations do I need to make?
body = %(
"Sensitive",2416,159,"Test "Malformed" Failure",2789,111,7-24-11,1800,0600,"R2","12323","",""
"Sensitive",2742,107,"Test",2791,112,7-24-11,1800,0600,"R1","","",""
"Sensitive",2700,135,"Test",2792,113,7-24-11,1800,0600,"R1","12110","","")
rows = []
body.each_line do |line|
begin
rows << FasterCSV.parse_line(line)
rescue FasterCSV::MalformedCSVError => e
rows << line if rescue_from_malformed_line(line)
rescue => e
Rails.logger.error(e.to_s)
Rails.logger.info(line)
end
end
I am not sure how malformed your data is malformed, but here is one approach for that line.
> puts line
"Sensitive",2416,159,"Test "Malformed" Failure",2789,111,7-24-11,1800,0600,"R2","12323","",""
>
> puts line.scan /[\d.-]+|(?:"[^"]*"[^",]*)+/
"Sensitive"
2416
159
"Test "Malformed" Failure"
2789
111
7-24-11
1800
0600
"R2"
"12323"
""
""
Note: Tested on ruby 1.9.2p290
You could use a regex to replace the nested double quotes with single quotes before it's passed to the parser.
Something like
.gsub(/(?<!^|,)"(?!,|$)/,"'")

How to change the encoding during CSV parsing in Rails

I would like to know how can I change the encoding of my CSV file when I import it and parse it. I have this code:
csv = CSV.parse(output, :headers => true, :col_sep => ";")
csv.each do |row|
row = row.to_hash.with_indifferent_access
insert_data_method(row)
end
When I read my file, I get this error:
Encoding::CompatibilityError in FileImportingController#load_file
incompatible character encodings: ASCII-8BIT and UTF-8
I read about row.force_encoding('utf-8') but it does not work:
NoMethodError in FileImportingController#load_file
undefined method `force_encoding' for #<ActiveSupport::HashWithIndifferentAccess:0x2905ad0>
Thanks.
I had to read CSV files encoded in ISO-8859-1.
Doing the documented
CSV.foreach(filename, encoding:'iso-8859-1:utf-8', col_sep: ';', headers: true) do |row|
threw the exception
ArgumentError: invalid byte sequence in UTF-8
from csv.rb:2027:in '=~'
from csv.rb:2027:in 'init_separators'
from csv.rb:1570:in 'initialize'
from csv.rb:1335:in 'new'
from csv.rb:1335:in 'open'
from csv.rb:1201:in 'foreach'
so I ended up reading the file and converting it to UTF-8 while reading, then parsing the string:
CSV.parse(File.open(filename, 'r:iso-8859-1:utf-8'){|f| f.read}, col_sep: ';', headers: true, header_converters: :symbol) do |row|
pp row
end
force_encoding is meant to be run on a string, but it looks like you're calling it on a hash. You could say:
output.force_encoding('utf-8')
csv = CSV.parse(output, :headers => true, :col_sep => ";")
...
Hey I wrote a little blog post about what I did, but it's slightly more verbose than what's already been posted. For whatever reason, I couldn't get those solutions to work and this did.
This gist is that I simply replace (or in my case, remove) the invalid/undefined characters in my file then rewrite it. I used this method to convert the files:
def convert_to_utf8_encoding(original_file)
original_string = original_file.read
final_string = original_string.encode(invalid: :replace, undef: :replace, replace: '') #If you'd rather invalid characters be replaced with something else, do so here.
final_file = Tempfile.new('import') #No need to save a real File
final_file.write(final_string)
final_file.close #Don't forget me
final_file
end
Hope this helps.
Edit: No destination encoding is specified here because encode assumes that you're encoding to your default encoding which for most Rails applications is UTF-8 (I believe)

Resources