I'm new to rails and am trying to process a CSV file, some files will have comments at the start of the CSV file, comments are marked with #. If there a way I can delete these rows? I don't have to just ignore them as I want to save the file without comments.
sample file:
#-----------------------
# report --------------
#-----------------------
Date, transctions
20100923, 34
20200110, 56
Thanks.
The CSV library has a skip_lines options:
When setting an object responding to match, every line matching it is considered a comment and ignored during parsing. When set to a String, it is first converted to a Regexp. When set to nil no line is considered a comment. If the passed object does not respond to match, ArgumentError is thrown.
This should work for you:
CSV.foreach(file, skip_lines: /^#/, headers: true) do |row|
# ...
end
/^#/ matches lines starting with #.
Adding something to #Stefan answer (all credit goes to him for the skip_lines tip), assuming your csv file is input.csv :
require "csv"
CSV.open("output.csv", "wb") do |output_csv|
CSV.foreach("input.csv", skip_lines: /^#/, headers: true) do |row|
# ...
output_csv << row
end
end
This way you will end with a file output.csv without those comments.
EDIT:
If you want also the header, you can do:
CSV.open("output.csv", "wb") do |output_csv|
CSV.foreach("input.csv", skip_lines: /^#/, headers: true).with_index(0) do |row, i|
output_csv << row.headers if i == 0
puts row
output_csv << row
end
end
...It's not as clean as I want but fits your needs ;)
Related
I want to write header only 1 time in first row when import data to csv in ruby, but the header is written many time on output file.
job_datas.each do |job_data|
#company_job = job data coverted etc....
save_job_to_csv(#company_job)
end
def save_job_to_csv(job_data)
filepath = "tmp/jobs/jobs.csv"
CSV.open(filepath, "a", :headers => true) do |csv|
if csv.blank?
csv << CompanyJob.attribute_names
end
csv << job_data.attributes.values
end
end
Any one can give me solution? Thank you so much!
You are calling save_job_to_csv the method for each job_data and pushing header every time csv << CompanyJob.attribute_names
filepath = "tmp/jobs/jobs.csv"
CSV.open(filepath, "a", :headers => true) do |csv|
# push header once
csv << CompanyJob.attribute_names
# push every job record
job_datas.each do |job_data|
#company_job = job data coverted etc....
csv << #company_job.attributes.values
end
end
The above script can be created wrapped a method but if you like to write a separate method that just saves the CSV, then you need to refactor the script when you first prepare an array of values holding header and pass it to a method that just saves to CSV.
You could do something similar to this:
def save_job_to_csv(job_data)
filepath = "tmp/jobs/jobs.csv"
unless File.file?(filepath)
File.open(filepath, 'w') do |file|
file.puts(job_data.attribute_names.join(','))
end
end
CSV.open(filepath, "a", :headers => true) do |csv|
csv << job_data.attributes.values
end
end
It just checks beforehand if the file exists and if not it adds the header. If you want tabs as column separators, you just have to change the value for the join function and add the col_sep parameter to CSV.open():
file.puts(job_data.attribute_names.join("\t"))
CSV.open(filepath, "a", :headers => true, col_sep: "\t") do |csv|
I have a list of names (names.txt) separated by line. After I loop through each line, I'd like to move it to another file (processed.txt).
My current implementation to loop through each line:
open("names.txt") do |csv|
csv.each_line do |line|
url = line.split("\n")
puts url
# Remove line from this file amd move it to processed.txt
end
end
def readput
#names = File.readlines("names.txt")
File.open("processed.txt", "w+") do |f|
f.puts(#names)
end
end
You can do it like this:
File.open('processed.txt', 'a') do |file|
open("names.txt") do |csv|
csv.each_line do |line|
url = line.chomp
# Do something interesting with url...
file.puts url
end
end
end
This will result in processed.txt containing all of the urls that were processed with this code.
Note: Removing the line from names.txt is not practical using this method. See How do I remove lines of data in the middle of a text file with Ruby for more information. If this is a real goal of this solution, it will be a much larger implementation with some design considerations that need to be defined.
I followed railscasts #396 Importing CSV and implemented CSV upload in my rails project.
This is my view file:
<%= form_tag import_customers_path, multipart: true do %>
<%= file_field_tag :file %>
<%= submit_tag "Import" %>
<% end %>
This is my controller action:
def import
current_user.customers.import(params[:file])
redirect_to customers_path, notice: "Users imported."
end
And these are my model methods:
def self.to_csv(options = {})
CSV.generate(options) do |csv|
csv << column_names
all.each do |customer|
csv << customer.attributes.values_at(*column_names)
end
end
end
def self.import(file)
CSV.foreach(file.path, headers: true) do |row|
Customer.create! row.to_hash
end
end
Here I don't want user to include header in CSV. When I replace headers: true with headers: false, I get error:
NoMethodError in CustomersController#import
undefined method `to_hash' for ["abc#wer.com"]:Array
Can anybody tell how to upload CSV files without need of header line?
As far as upload and handling of the CSV file goes, you're very, very close. You just have an issue with reading the rows of data to populate the database with, via the Customer.create! call
It looks like you've been testing with a CSV file that only has a single line of data. With the headers: true, that single line was converted to headers and subsequently ignored in the CSV.foreach iterator. So, in effect, you had no data in the file, and no iterations occurred. If you had two rows of data in the input file, you'd have encountered the error, anyway.
Now, when you use headers: false, that line of data is treated as data. And that's where the issue lies: handling the data isn't done correctly.
Since there's no schema in your question, I'll assume a little bit of leeway on fields; you should be able to extrapolate pretty easily to make it work in your situation. This code shows how it works:
CSV.parse(csv_data, headers: false) do |row|
hash = {
first_name: row[0],
last_name: row[1],
age: row[2],
phone: row[3],
address: row[4]
}
Customer.create!(hash)
end
If you wanted a CSV version with headers, this would work well in this case, and has the benefit of not allowing arbitrary access to columns that shouldn't be assigned from an outside source:
CSV.parse(csv_data, headers: true, header_converters: :symbol) do |row|
hash = {
first_name: row[:first_name],
surname: row[:last_name],
birth_year: Date.today - row[:age],
phone: row[:phone],
street_address: row[:address]
}
Customer.create!(hash)
end
Note that the Customer#to_csv in your model is not quite correct, either. First, it creates the CSV file with a header, so you wouldn't be able to export and then import again with this implementation. Next, the header fields variable column_names is not actually defined in this code. Finally, the code doesn't control the order of columns written to the CSV, which means that the headers and values could possibly go out of sync. A correct (non-header) version of this is very simple:
csv_data = CSV.generate do |csv|
csv.each do |customer|
csv << [customer.first_name, customer.last_name, customer.age, customer.phone, customer.address]
end
end
The header-based version is this:
csv_data = CSV.generate do |csv|
csv << ["First Name","Last Name","Age","Phone","Address"]
csv.each do |customer|
csv << [customer.first_name, customer.last_name, customer.age, customer.phone, customer.address]
end
end
Personally, I'd use the header-based version, because it's far more robust, and it's easy to understand which columns are which. If you've ever received a headerless CSV file and had to figure out how to make sense of it without any keys, you'd know why the header is important.
You could just load the CSV file into an array of arrays and remove the first row:
data = CSV.read("path/to/file.csv")
data = data[1..-1]
However this will store the data as an array of values only.
When you use headers: true it uses a hash where the keys are the column header names.
I get a CSV:MalFormedCSVError when I try to import a file using the following code:
def import_csv(filename, model)
CSV.foreach(filename, :headers => true) do |row|
item = {}
row.to_hash.each_pair do |k,v|
item.merge!({k.downcase => v})
end
model.create!(item)
end
end
The csv files are HUGE, so is there a way I can just log the bad formatted lines and CONTINUE EXECUTION with the remainder of the csv file?
You could try handling the file reading yourself and let CSV work on one line at a time. Something like this:
File.foreach(filename) do |line|
begin
CSV.parse(line) do |row|
# Do something with row...
end
rescue CSV::MalformedCSVError => e
# complain about line
end
end
You'd have to do something with the header line yourself of course. Also, this won't work if you have embedded newlines in your CSV.
One problem with using File to manually go through each line in the file is that CSV files can contain fields with \n (newline character) in them. File will take that to indicate a newline and you will end up trying to parse a partial row.
Here is an another approach that might work for you:
#csv = CSV.new('path/to/file.csv')
loop do
begin
row = #csv.shift
break unless row
# do stuff
rescue CSV::MalformedCSVError => error
# handle the error
next
end
end
The main downside that I see with this approach is that you don't have access to the CSV row string when handling the error, just the CSV::MalformedCSVError itself.
I am using the code from this tutorial to parse a CSV file and add the contents to a database table. How would I ignore the first line of the CSV file? The controller code is below:
def csv_import
#parsed_file=CSV::Reader.parse(params[:dump][:file])
n = 0
#parsed_file.each do |row|
s = Student.new
s.name = row[0]
s.cid = row[1]
s.year_id = find_year_id_from_year_title(row[2])
if s.save
n = n+1
GC.start if n%50==0
end
flash.now[:message] = "CSV Import Successful, #{n} new students added to the database."
end
redirect_to(students_url)
end
This question kept popping up when i was searching for how to skip the first line with the CSV / FasterCSV libraries, so here's the solution that if you end up here.
the solution is...
CSV.foreach("path/to/file.csv",{:headers=>:first_row}) do |row|
HTH.
#parsed_file.each_with_index do |row, i|
next if i == 0
....
If you identify your first line as headers then you get back a Row object instead of a simple Array.
When you grab cell values, it seems like you need to use .fetch("Row Title") on the Row object.
This is what I came up with. I'm skipping nil with my if conditional.
CSV.foreach("GitHubUsersToAdd.csv",{:headers=>:first_row}) do |row|
username = row.fetch("GitHub Username")
if username
puts username.inspect
end
end
Using this simple code, you can read a CSV file and ignore the first line which is the header or field names:
CSV.foreach(File.join(File.dirname(__FILE__), filepath), headers: true) do |row|
puts row.inspect
end
You can do what ever you want with row. Don't forget headers: true
require 'csv'
csv_content =<<EOF
lesson_id,user_id
5,3
69,95
EOF
parse_1 = CSV.parse csv_content
parse_1.size # => 3 # it treats all lines as equal data
parse_2 = CSV.parse csv_content, headers:true
parse_2.size # => 2 # it ignores the first line as it's header
parse_1
# => [["lesson_id", "user_id"], ["5", "3"], ["69", "95"]]
parse_2
# => #<CSV::Table mode:col_or_row row_count:3>
here where it's the fun part
parse_1.each do |line|
puts line.inspect # the object is array
end
# ["lesson_id", "user_id"]
# ["5", " 3"]
# ["69", " 95"]
parse_2.each do |line|
puts line.inspect # the object is `CSV::Row` objects
end
# #<CSV::Row "lesson_id":"5" "user_id":" 3">
# #<CSV::Row "lesson_id":"69" "user_id":" 95">
So therefore I can do
parse_2.each do |line|
puts "I'm processing Lesson #{line['lesson_id']} the User #{line['user_id']}"
end
# I'm processing Lesson 5 the User 3
# I'm processing Lesson 69 the User 95
data_rows_only = csv.drop(1)
will do it
csv.drop(1).each do |row|
# ...
end
will loop it