Ruby CSV File Parsing, Headers won't format? - ruby-on-rails

My rb file reads:
require "csv"
puts "Program1 initialized."
contents = CSV.open "data.csv", headers: true
contents.each do |row|
name = row[4]
puts name
end
...but when i run it in ruby it wont load the program. it gives me the error message about the headers:
syntax error, unexpected ':', expecting $end
contents = CSV.open "data.csv", headers: true
so I'm trying to figure out, why won't ruby let me parse this file? I've tried using other csv files I have and it won't load, and gives me an error message. I'm trying just to get the beginning of the program going! I feel like it has to do with the headers. I've updated as much as I can, mind you I'm using ruby 1.8.7. I read somewhere else that I could try to run the program in irb but it didn't seem like it needed it. so yeah... thank you in advance!!!!

Since you are using this with Ruby 1.8.7, :headers => true won't work in this way.
The simplest way to ignore the headers and get your data is to shift the first row in the data, which would be the headers:
require 'csv'
contents = CSV.open("data.csv", 'r')
contents.shift
contents.each do |row|
name = row[4]
puts name
end
If you do want to use the syntax with headers in ruby 1.8, you would need to use FasterCSV, something similar to this:
require 'fastercsv'
FasterCSV.foreach("data.csv", :headers => true) do |fcsv_obj|
puts fcsv_obj['name']
end
(Refer this question for further read: Parse CSV file with header fields as attributes for each row)

Related

Invalid Byte Sequence in UTF-8 from Excel file

(Ruby 2.5) I have a method that reads and parses a csv file that's being uploaded via Alchemy CMS
def process_csv(csv_file, current_user_id, original_filename)
lock_importer
errors = []
index = 0
string_converter = lambda { |field| field.strip }
total = CSV.foreach(csv_file, headers: true).count
csv_string = csv_file.read.encode!("UTF-8", "iso-8859-1", invalid: :replace)
CSV.parse(csv_string, headers: true, header_converters: :symbol, skip_blanks: true, converters: [string_converter] ) do |row|
# do other stuff
end
but when I try to upload a csv file that has a column (name) with a string that contains special characters then I receive the Invalid Byte Sequence in UTF-8 error. I'm trying to test the value N'öt Réal Stô'rë.
I've tried a few solutions that I found on the web but no luck - any suggestions?
It's unclear what your csv_fileis. I guess it is a File-object.
Sometimes I got csv from Excel as a UTF-16. So let's try an example:
I have a csv-file stored in UTF-16BE with the following content:
line;comment;UmlautÄ
1;Das ist UTF-16 BE;Ä
2;öüäÖÄÜ;Ä
If I execute the following code:
require 'csv'
def process_csv(csv_file)
csv_string = csv_file.read#.encode!("UTF-8", "iso-8859-1", invalid: :replace)
CSV.parse(csv_string, headers: true, skip_blanks: true, col_sep: ';') do |row|
p row # do other stuff
end
end
process_csv(File.open('example_utf16BE.txt'))
then I get also a Invalid byte sequence in UTF-8-error.
If I use
process_csv(File.open('example_utf16BE.txt', 'rb', encoding: 'BOM|utf-16BE'))
then everything works.
So I guess, you get a File-object in a wron encoding and the code csv_file.read.encode!("UTF-8", "iso-8859-1", invalid: :replace) is a code part to repair this problem.
What you can do:
Add to you code:
p csv_file
p csv_file.external_encoding
You should get
#<File:example_utf16BE.txt>
#<Encoding:UTF-16BE>
Now check, if the file (in my example: example_utf16BE.txt has really the encoding of the 2nd line.
If not, try to adapt the File-object creation.
If this is not possible, then you can try to use csv_file.set_encoding 'utf-8' to change the encoding before you read the content.

Ruby - checking if file is a CSV

I have just wrote a code where I get a csv file passed in argument and treat it line by line ; so far, everything is okay. Now, I would like to secure my code by making sure that what we receive in argument is a .csv file.
I saw in the Ruby doc that it exist a == "--file" option but using it generate an error : the way I understood it, it seems this option only work for the txt files.
Is there a method specific that allowed to check if my file is a csv ? Here some of my code :
if ARGV.empty?
puts "j'ai rien reçu"
# option to check, don't work
elsif ARGV[0].shift == "--file"
# my code so far, whithout checking
else CSV.foreach(ARGV.shift) do |row|
etc, etc...
I think it is unpossible to make a real safe test without additional information.
Just some notes what you can do:
You get a filename in a variable filename.
First, check if it is a file:
File.exist?
Then you could check, if the encoding is correct:
raise "Wrong encoding" unless content.valid_encoding?
Has your csv always the same number of columns? And do you have only one liner?
This can be a possibility to make the next check:
content.each_line{|line|
return false if line.count(sep) < columns - 1
}
This check can be modified for your case, e.g. if you have always an exact number of rows.
In total you can define something like:
require 'csv'
#columns defines the expected numer of columns per line
def csv?(filename, sep: ';', columns: 3)
return false unless File.exist?(filename) #"No file"
content = File.read(filename, :encoding => 'utf-8')
return false unless content.valid_encoding? #"Wrong encoding"
content.each_line{|line|
return false if line.count(sep) < columns - 1
}
CSV.parse(content, :col_sep => sep)
end
if csv = csv?('test.csv')
csv.each do |row|
p row
end
end
You can use ruby-filemagic gem
gem install ruby-filemagic
Usage:
$ irb
irb(main):001:0> require 'filemagic'
=> true
irb(main):002:0> fm = FileMagic.new
=> #<FileMagic:0x7fd4afb0>
irb(main):003:0> fm.file('foo.zip')
=> "Zip archive data, at least v2.0 to extract"
irb(main):004:0>
https://github.com/ricardochimal/ruby-filemagic
Use File.extname() to check the origin file
File.extname("test.rb") #=> ".rb"

ruby net-sftp read file line by line

I am using ruby 2.0.0 and rails 4.0.0. I have something similar to this:
require 'net/sftp'
sftp = Net::SFTP.start('ftp.app.com','username', :password => 'password')
sftp.file.open("/path/to/remote/file.csv", "r") do |f|
puts f.gets
end
This opens the file on the FTP site, but it only puts the first line of the csv file. I need to read this file row by row, preferably ignoring the header.
How can I read the file row by row, without downloading the file locally?
I solved this by doing this:
data = sftp.download!("/path/to/remote/file.csv").split(/\r\n/)
data.each do |line|
puts line
end
The proper answer for this would actually be to use the file.eof? value.
The code would look like:
require 'net/sftp'
sftp = Net::SFTP.start('ftp.app.com','username', :password => 'password')
sftp.file.open("/path/to/remote/file.csv", "r") do |f|
while !f.eof?
puts f.gets
end
end
Documentation can be found here
In my case something like this worked:
data = sftp.download!("/path/to/remote/file.csv").split(/\n/).map{ |e| e.split(/,/).map{ |x| x.gsub(/"/, "")} }
data.each do |line|
puts line
end
Will also split each row of the .csv into different array columns and remove any excess of "". Note this is for mac where line breaks are \n.

How to change the encoding during CSV parsing in Rails

I would like to know how can I change the encoding of my CSV file when I import it and parse it. I have this code:
csv = CSV.parse(output, :headers => true, :col_sep => ";")
csv.each do |row|
row = row.to_hash.with_indifferent_access
insert_data_method(row)
end
When I read my file, I get this error:
Encoding::CompatibilityError in FileImportingController#load_file
incompatible character encodings: ASCII-8BIT and UTF-8
I read about row.force_encoding('utf-8') but it does not work:
NoMethodError in FileImportingController#load_file
undefined method `force_encoding' for #<ActiveSupport::HashWithIndifferentAccess:0x2905ad0>
Thanks.
I had to read CSV files encoded in ISO-8859-1.
Doing the documented
CSV.foreach(filename, encoding:'iso-8859-1:utf-8', col_sep: ';', headers: true) do |row|
threw the exception
ArgumentError: invalid byte sequence in UTF-8
from csv.rb:2027:in '=~'
from csv.rb:2027:in 'init_separators'
from csv.rb:1570:in 'initialize'
from csv.rb:1335:in 'new'
from csv.rb:1335:in 'open'
from csv.rb:1201:in 'foreach'
so I ended up reading the file and converting it to UTF-8 while reading, then parsing the string:
CSV.parse(File.open(filename, 'r:iso-8859-1:utf-8'){|f| f.read}, col_sep: ';', headers: true, header_converters: :symbol) do |row|
pp row
end
force_encoding is meant to be run on a string, but it looks like you're calling it on a hash. You could say:
output.force_encoding('utf-8')
csv = CSV.parse(output, :headers => true, :col_sep => ";")
...
Hey I wrote a little blog post about what I did, but it's slightly more verbose than what's already been posted. For whatever reason, I couldn't get those solutions to work and this did.
This gist is that I simply replace (or in my case, remove) the invalid/undefined characters in my file then rewrite it. I used this method to convert the files:
def convert_to_utf8_encoding(original_file)
original_string = original_file.read
final_string = original_string.encode(invalid: :replace, undef: :replace, replace: '') #If you'd rather invalid characters be replaced with something else, do so here.
final_file = Tempfile.new('import') #No need to save a real File
final_file.write(final_string)
final_file.close #Don't forget me
final_file
end
Hope this helps.
Edit: No destination encoding is specified here because encode assumes that you're encoding to your default encoding which for most Rails applications is UTF-8 (I believe)

Illegal quoting on line using FasterCSV in ruby 1.8.7

I am facing "Illegal quoting" error when parse the content from SQL dump and the dump file is in the format of TXT with tab (\t) separator.
require 'rubygems'
require 'faster_csv'
begin
FasterCSV.foreach(excel_file, :quote_char => '"',:col_sep =>'\t', :row_sep =>:auto, :headers => :first_row) do |row|
col= row.to_s.split(/\t/)
if col[3]!="" or !col[3].empty?
color_value=col[3].to_s.capitalize
#Inser Color
color=Color.find_or_create_by_name(:name=>color_value)
elsif col[3].empty?
color_id= nil
end
end
rescue Exception => e
puts e
end
The program executed and run successfully but there is an invalid data present like
below (#font-face ...) mean execution terminated with error of "Illegal quoting on line 3.
ID Name code comments
1 white 234 good
2 Black 222
3 red 343 #font-face { font-family: "Verdana"; .....}
Can any one suggest me how to skip when invalid data occurs in column ?
Thanks in advance.
I'm not sure if this will solve the error you are seeing, but you need to use double quotes around escaped characters, e.g.:
:col_sep => "\t"
FasterCSV isn't very kind to badly formatted data.
I don't know that there is a solution for this.
However - if your example file doesn't actually contain any quoting using "
then perhaps just use a different quot_char (eg ')
You can use the ASCII code for the NULL character -- \0x00 -- as such:
FasterCSV.foreach(excel_file, :quote_char => '\0x00',:col_sep =>'\t', :row_sep =>:auto, :headers => :first_row) do |row|
...
end
You can find a chart of some ASCII chars here: http://www.bluesock.org/~willg/dev/ascii.html

Resources