Rails importing CSV fails due to mal-formation - ruby-on-rails

I get a CSV:MalFormedCSVError when I try to import a file using the following code:
def import_csv(filename, model)
CSV.foreach(filename, :headers => true) do |row|
item = {}
row.to_hash.each_pair do |k,v|
item.merge!({k.downcase => v})
end
model.create!(item)
end
end
The csv files are HUGE, so is there a way I can just log the bad formatted lines and CONTINUE EXECUTION with the remainder of the csv file?

You could try handling the file reading yourself and let CSV work on one line at a time. Something like this:
File.foreach(filename) do |line|
begin
CSV.parse(line) do |row|
# Do something with row...
end
rescue CSV::MalformedCSVError => e
# complain about line
end
end
You'd have to do something with the header line yourself of course. Also, this won't work if you have embedded newlines in your CSV.

One problem with using File to manually go through each line in the file is that CSV files can contain fields with \n (newline character) in them. File will take that to indicate a newline and you will end up trying to parse a partial row.
Here is an another approach that might work for you:
#csv = CSV.new('path/to/file.csv')
loop do
begin
row = #csv.shift
break unless row
# do stuff
rescue CSV::MalformedCSVError => error
# handle the error
next
end
end
The main downside that I see with this approach is that you don't have access to the CSV row string when handling the error, just the CSV::MalformedCSVError itself.

Related

How to check header exist before import data in Ruby CSV?

I want to write header only 1 time in first row when import data to csv in ruby, but the header is written many time on output file.
job_datas.each do |job_data|
#company_job = job data coverted etc....
save_job_to_csv(#company_job)
end
def save_job_to_csv(job_data)
filepath = "tmp/jobs/jobs.csv"
CSV.open(filepath, "a", :headers => true) do |csv|
if csv.blank?
csv << CompanyJob.attribute_names
end
csv << job_data.attributes.values
end
end
Any one can give me solution? Thank you so much!
You are calling save_job_to_csv the method for each job_data and pushing header every time csv << CompanyJob.attribute_names
filepath = "tmp/jobs/jobs.csv"
CSV.open(filepath, "a", :headers => true) do |csv|
# push header once
csv << CompanyJob.attribute_names
# push every job record
job_datas.each do |job_data|
#company_job = job data coverted etc....
csv << #company_job.attributes.values
end
end
The above script can be created wrapped a method but if you like to write a separate method that just saves the CSV, then you need to refactor the script when you first prepare an array of values holding header and pass it to a method that just saves to CSV.
You could do something similar to this:
def save_job_to_csv(job_data)
filepath = "tmp/jobs/jobs.csv"
unless File.file?(filepath)
File.open(filepath, 'w') do |file|
file.puts(job_data.attribute_names.join(','))
end
end
CSV.open(filepath, "a", :headers => true) do |csv|
csv << job_data.attributes.values
end
end
It just checks beforehand if the file exists and if not it adds the header. If you want tabs as column separators, you just have to change the value for the join function and add the col_sep parameter to CSV.open():
file.puts(job_data.attribute_names.join("\t"))
CSV.open(filepath, "a", :headers => true, col_sep: "\t") do |csv|

Ruby Rails Remove Comments from CSV

I'm new to rails and am trying to process a CSV file, some files will have comments at the start of the CSV file, comments are marked with #. If there a way I can delete these rows? I don't have to just ignore them as I want to save the file without comments.
sample file:
#-----------------------
# report --------------
#-----------------------
Date, transctions
20100923, 34
20200110, 56
Thanks.
The CSV library has a skip_lines options:
When setting an object responding to match, every line matching it is considered a comment and ignored during parsing. When set to a String, it is first converted to a Regexp. When set to nil no line is considered a comment. If the passed object does not respond to match, ArgumentError is thrown.
This should work for you:
CSV.foreach(file, skip_lines: /^#/, headers: true) do |row|
# ...
end
/^#/ matches lines starting with #.
Adding something to #Stefan answer (all credit goes to him for the skip_lines tip), assuming your csv file is input.csv :
require "csv"
CSV.open("output.csv", "wb") do |output_csv|
CSV.foreach("input.csv", skip_lines: /^#/, headers: true) do |row|
# ...
output_csv << row
end
end
This way you will end with a file output.csv without those comments.
EDIT:
If you want also the header, you can do:
CSV.open("output.csv", "wb") do |output_csv|
CSV.foreach("input.csv", skip_lines: /^#/, headers: true).with_index(0) do |row, i|
output_csv << row.headers if i == 0
puts row
output_csv << row
end
end
...It's not as clean as I want but fits your needs ;)

how to skip/ignore malformed CSV when using CSV.foreach?

I tried to read large csv file
but the csv are on bad condition
so some of it's line throwing CSV::MalformedCSVError
I just want to ignore the error line and move onto next line
I tried to add begin rescue but seems my code is not working, it stopped at the error
my current code
require 'csv'
begin
CSV.foreach(filename, :headers => true) do |row|
Moulding.create!(row.to_hash)
end
rescue
next
end
I don't think you can do it with the foreach method because the exception does not seem to be raised within the block but rather within the foreach method itself, but something like this should work. In this case the exception is raised on the call to shift, which you can then rescue out of.
require 'csv'
csv_file = CSV.open("test.csv", :headers => true)
loop do
begin
row = csv_file.shift
break unless row
p row
rescue CSV::MalformedCSVError
puts "skipping bad row"
end
end
BTW your code above does not run because when you moved begin rescue to surround the foreach method , next is no longer valid in that context. Commenting out the next statement the code runs but when the exception is raised in the foreach method the method just ends and program moves on to the rescue block and does not read any more lines from the file.

Move line from one text file to another

I have a list of names (names.txt) separated by line. After I loop through each line, I'd like to move it to another file (processed.txt).
My current implementation to loop through each line:
open("names.txt") do |csv|
csv.each_line do |line|
url = line.split("\n")
puts url
# Remove line from this file amd move it to processed.txt
end
end
def readput
#names = File.readlines("names.txt")
File.open("processed.txt", "w+") do |f|
f.puts(#names)
end
end
You can do it like this:
File.open('processed.txt', 'a') do |file|
open("names.txt") do |csv|
csv.each_line do |line|
url = line.chomp
# Do something interesting with url...
file.puts url
end
end
end
This will result in processed.txt containing all of the urls that were processed with this code.
Note: Removing the line from names.txt is not practical using this method. See How do I remove lines of data in the middle of a text file with Ruby for more information. If this is a real goal of this solution, it will be a much larger implementation with some design considerations that need to be defined.

Rails - New line character at the end of row in CSV import causing errors

I'm running a rake task to import some file attributes and I'm receiving an error that would lead me to believe that the string created for each line contains some sort of new-line character (e.g. /n).
EDIT - New-line character has been confirmed to be the issue.
Here is a sample of what my CSV file might look like:
1|type1,type2|category1
2|type2|category1,category2,category3
3|type2,type4|category3,category8
And here is my code to deal with it:
namespace :data do
desc "import"
task :import => :environment do
file = File.open(Rails.root.join('lib/assets/data.csv'), 'r')
file.each do |line|
attrs = line.split("|")
foo = Model.find(attrs[0])
attrs[1].split(",").each do |type|
foo.add_type!(ModelType.find_by_name(type))
end
attrs[2].split(",").each do |category|
foo.categorize!(ModelCategory.find_by_name(category))
end
end
end
end
ModelType and ModelCategory are both seperate models with a :through relationship to Model that is built with the function Model.add_type! and Model.categorize!.
When I run rake data:import, everything works fine up until the final category is reached at the end of the first line. It doesn't matter which category it is, nor how many categories are present in attrs[2] - it only fails on the last one. This is the error I receive:
Called id for nil, which would mistakenly be 4 -- if you really wanted the id of nil, use object_id
Any thoughts on how to fix this or avoid this error?
You can use chomp:
attrs = line.chomp.split("|")
attrs = line.split("|")
if attrs.length > 0
foo = Model.find(attrs[0])
...
end
You probably have an empty line at the end of your CSV
UPDATE
file = File.open(Rails.root.join('lib/assets/data.csv'), 'r')
file.split("\r\n").each do |line|
or
file = File.open(Rails.root.join('lib/assets/data.csv'), 'r')
file.split("\r").each do |line|
or
file = File.open(Rails.root.join('lib/assets/data.csv'), 'r')
file.split("\n").each do |line|
depending on how the CSV was originally generated!
Use String.encode(universal_newline: true) instead gsub.
It converting CRLF and CR to LF # Always break lines with \n

Resources