How to continue processing a block in Ruby after an exception? - ruby-on-rails

I am trying to process some very large tab-separated files. The process is:
begin
Dir["#{#data_path}*.tsv"].each do |file|
begin
CSV.foreach(file, :col_sep => "\t") do |row|
# assign columns to model and save
end
#log.info("Loaded all files into MySQL database illu.datafeeds")
rescue Exception => e
#log.warn("Unable to process the data feed: #{file} because #{e.message}")
next
end
end
However, when I execute this I get the following error:
Unable to process the file: /Users/XXXXX_2013-06-12.tsv because Illegal quoting in line 153.
The files are too big for me to go in and fix the error rows. I would like the process to continue the loop and process the file even if there are error rows.
Any suggestions?
Thanks.

just ... rescue nil the row causing the error
you can even log it with logger
before the loop:
error_log ||= Logger.new("#{Rails.root}/log/my.log")
inside the loop instead of just rescue nil use
rescue error_log.info(row.to_s)
in case you get the error before file begins to parse (before .foreach procedure) you can open it as raw file and read it as CSV later - inside the loop (like mentioned here)
..or just rescue full file parsing procedure
CSV.foreach(file, :col_sep => "\t") do |row|
...
end rescue error_log.info(row.to_s)

Related

Rails console vs a rake task: returning File.size is not consistent

I'm having a strange issue where when I check the File.size of a particular file in Rails console, it returns the correct size. However when I run the same code in a rake task, it returns 0. Here is the code in question (I've tidied it up a bit to help with readability):
def sum_close
daily_closed_tickets = Fst.sum_retrieve_closed_tickets
daily_closed_tickets.each do |ticket|
CSV.open("FILE_NAME_HERE", "w+", {force_quotes: false}) do |csv|
if (FileCopyReceipt.exists?(path: "#{ticket.attributes['TroubleTicketNumber']}_sum.txt"))
csv << ["GENERATE CSV WITH ATTRIBUTES HERE"]
files = Dir.glob("/var/www/html/harmonize/public/close/CLOSED_#{ticket.attributes['TroubleTicketNumber']}_sum.txt")
files.each do |f|
Rails.logger.info "File size (should return non-0): #{File.size(f)}" #returns 0, but not in Rails Console
Rails.logger.info "File size true or false, should be true: #{File.size(f) != 0}" #returns false, should return true
Rails.logger.info "Rails Environment: #{Rails.env}" #returns production
if(!FileCopyReceipt.exists?(path: f) && (File.size(f) != 0))
Rails.logger.info("SUM CLOSE, GOOD => FileUtils.cp_r occurred and FileCopyReceipt object created")
else
Rails.logger.info("SUM CLOSE, WARNING: => no data transfer occurred")
end
end
else
Rails.logger.info("SUM CLOSE => DID NOT make it into initial if ClosedDate.present? if block")
end
end
end
close_tickets.rake
task :close_tickets => :environment do
tickets = FstController.new
tickets.sum_close
tickets.dais_close
end
It is beyond me why this File.size comes back as 0 when this is run as a rake task. I thought it may be a environment issue, but that does not seem to be the case.
Any insight on the matter is appreciated.
The CSV.open block and everything being wrapped in there was causing issues. So I just made CSV generation it's own snippet instead of wrapping everything in there.
daily_closed_tickets.each do |ticket|
CSV.open("generate csv here.txt") do |csv|
#enter ticket.attributes here for the csv
end
#continue on with the rest of the code and File.size() works properly
end

Ruby task uses 97 %CPU

My ruby application uses about 97 %CPU which eventually gets killed. The program is reading files from the folder and if a file name exists in the database, it skips it and checks another file. While executing this procedure, application usually gets killed.
COMMAND %CPU
ruby 96.5
Even if I insert almost all files and try to lunch an application again (because it was killed), system tends to kill it even sooner. How can I decrease the %CPU?
task :process_data, [:data_directory] => :environment do |_task, args|
# add data to a database
saver = CsvToSqlSaver.new
saver.fill_files_names
Dir.foreach(args.data_directory) do |filename|
# if not present in records already we read it
Base.logger.info "> Found #{filename}."
next if saver.files_names.to_s.include?(filename) ||
!filename.include?('csv')
Base.logger.info "> Reading #{filename}."
begin
saver.generate_db_rows_from_csv_file(args.data_directory, filename)
# handle Malformed .csv exception
rescue CSV::ArgumentError, CSV::MalformedCSVError => e
Base.logger.info e.message
next
end # we continue csv file loop?
unless integrator.insert_data_to_database
Base.logger.info '> No new data saved.'
end
end
end
This is fill_files_names:
def fill_files_names
#files_names = []
files_names = MyFilesTable.select(:filename).distinct
files_names.each do |row|
#files_names.push(row[:filename])
end
end
This is Base:
class Base
class << self
attr_accessor :logger
end
#logger ||= Logger.new(STDERR)
end
This is generate_db_rows_from_csv_file
def generate_db_rows_from_csv_file(directory, filename)
#incoming_data = []
CSV.foreach("#{directory}/#{filename}",
headers: true, quote_char: "\x00") do |csv_record|
# if invalid record, go further
next if record_invalid?(csv_record)
generate_row_in_the_database(csv_record, filename)
end
end

Ruby - checking if file is a CSV

I have just wrote a code where I get a csv file passed in argument and treat it line by line ; so far, everything is okay. Now, I would like to secure my code by making sure that what we receive in argument is a .csv file.
I saw in the Ruby doc that it exist a == "--file" option but using it generate an error : the way I understood it, it seems this option only work for the txt files.
Is there a method specific that allowed to check if my file is a csv ? Here some of my code :
if ARGV.empty?
puts "j'ai rien reçu"
# option to check, don't work
elsif ARGV[0].shift == "--file"
# my code so far, whithout checking
else CSV.foreach(ARGV.shift) do |row|
etc, etc...
I think it is unpossible to make a real safe test without additional information.
Just some notes what you can do:
You get a filename in a variable filename.
First, check if it is a file:
File.exist?
Then you could check, if the encoding is correct:
raise "Wrong encoding" unless content.valid_encoding?
Has your csv always the same number of columns? And do you have only one liner?
This can be a possibility to make the next check:
content.each_line{|line|
return false if line.count(sep) < columns - 1
}
This check can be modified for your case, e.g. if you have always an exact number of rows.
In total you can define something like:
require 'csv'
#columns defines the expected numer of columns per line
def csv?(filename, sep: ';', columns: 3)
return false unless File.exist?(filename) #"No file"
content = File.read(filename, :encoding => 'utf-8')
return false unless content.valid_encoding? #"Wrong encoding"
content.each_line{|line|
return false if line.count(sep) < columns - 1
}
CSV.parse(content, :col_sep => sep)
end
if csv = csv?('test.csv')
csv.each do |row|
p row
end
end
You can use ruby-filemagic gem
gem install ruby-filemagic
Usage:
$ irb
irb(main):001:0> require 'filemagic'
=> true
irb(main):002:0> fm = FileMagic.new
=> #<FileMagic:0x7fd4afb0>
irb(main):003:0> fm.file('foo.zip')
=> "Zip archive data, at least v2.0 to extract"
irb(main):004:0>
https://github.com/ricardochimal/ruby-filemagic
Use File.extname() to check the origin file
File.extname("test.rb") #=> ".rb"

Open filename with embedded spaces with Roo

Ruby 2.0.0, Rails 4.0.3, Windows 8.1 Update, Roo 1.13.2
I am trying to open an Excel spreadsheet with embedded spaces using Roo. So far, I am unable to do that. I don't really know if this problem is restricted to Roo. If I rename it to eliminate the spaces, I have no problem with it. I tried encoding it, but then it simply said the file doesn't exist. Can I open the file while it contains spaces?
Code sample:
exceptions = [URI::InvalidURIError, IOError]
puts "f is #{f}"
puts "f exist? #{File.exist?(f)}"
begin
xls = Roo::Spreadsheet.open(f)
rescue *exceptions => e
puts e.message
end
encoded_f = URI.encode(f).to_s
puts "encoded_f is #{encoded_f}"
puts "encoded_f exist? #{File.exist?(encoded_f)}"
begin
xls = Roo::Spreadsheet.open(encoded_f)
rescue *exceptions => e
puts e.message
end
gsub_f = f.gsub(" ", "") # Rename file without spaces
File.rename(f, gsub_f)
puts "gsub_f is #{gsub_f}"
puts "gsub_f exist? #{File.exist?(gsub_f)}"
begin
xls = Roo::Spreadsheet.open(gsub_f)
rescue *exceptions => e
puts e.message
end
Output sample:
f is Whitt Report 2014-07-28-0803.xls
f exist? true
bad URI(is not URI?): Whitt Report 2014-07-28-0803.xls
encoded_f is Whitt%20Report%202014-07-28-0803.xls
encoded_f exist? false
file Whitt%20Report%202014-07-28-0803.xls does not exist
gsub_f is WhittReport2014-07-28-0803.xls
gsub_f exist? true
No message is given in the end because the file opens successfully.
This is caused by the way in which the URI module is called in the Roo::Spreadsheet#open method.
I posted a fix to this problem which has now been merged. If you update your Roo gem you should no longer have this issue.

Ruby CSV File Parsing, Headers won't format?

My rb file reads:
require "csv"
puts "Program1 initialized."
contents = CSV.open "data.csv", headers: true
contents.each do |row|
name = row[4]
puts name
end
...but when i run it in ruby it wont load the program. it gives me the error message about the headers:
syntax error, unexpected ':', expecting $end
contents = CSV.open "data.csv", headers: true
so I'm trying to figure out, why won't ruby let me parse this file? I've tried using other csv files I have and it won't load, and gives me an error message. I'm trying just to get the beginning of the program going! I feel like it has to do with the headers. I've updated as much as I can, mind you I'm using ruby 1.8.7. I read somewhere else that I could try to run the program in irb but it didn't seem like it needed it. so yeah... thank you in advance!!!!
Since you are using this with Ruby 1.8.7, :headers => true won't work in this way.
The simplest way to ignore the headers and get your data is to shift the first row in the data, which would be the headers:
require 'csv'
contents = CSV.open("data.csv", 'r')
contents.shift
contents.each do |row|
name = row[4]
puts name
end
If you do want to use the syntax with headers in ruby 1.8, you would need to use FasterCSV, something similar to this:
require 'fastercsv'
FasterCSV.foreach("data.csv", :headers => true) do |fcsv_obj|
puts fcsv_obj['name']
end
(Refer this question for further read: Parse CSV file with header fields as attributes for each row)

Resources