I am receiving a csv file that has some blank headers but data exists in these columns. I want to remove the blank header and it's associated column in rails.
Sample csv
#report
,Code,Price,Orders,,Mark,
1,X91,4.55,4,xxx,F,23
What I'd like returned:
Code,Price,Orders,Mark
A91,4.55,4,F
This is what I have so far as there is also comments on the csv which i am ignoring.
CSV.open("output.csv", "w") do |output_csv|
CSV.foreach("out.csv", encoding: "bom|utf-8", skip_lines: /^#/, headers: true).with_index(0) do |row, i|
output_csv << row.headers if i == 0
output_csv << row
end
end
You can use CSV::Row's delete_if method https://ruby-doc.org/stdlib-2.4.1/libdoc/csv/rdoc/CSV/Row.html#method-i-delete_if, something like:
CSV.open("output.csv", "w") do |output_csv|
CSV.foreach("out.csv", encoding: "bom|utf-8", skip_lines: /^#/, headers: true) do |row|
clean_row = row.delete_if { |header, _field| header.blank? }
output_csv << clean_row.headers if row.header_row?
output_csv << clean_row
end
end
Although I largely agree with the answer of arieljuod, there are a few things that might go wrong. row.header_row? will always return false, since the return_headers: true option isn't set, thus leaving out the header. delete_if is a mutating method, so there is no need to save the result in a variable. This only returns itself so you can chain it with other methods.
The following would be enough:
read_options = {
encoding: "bom|utf-8",
skip_lines: /^#/,
headers: true,
return_headers: true,
}
CSV.open("output.csv", "w") do |output_csv|
CSV.foreach("out.csv", read_options) do |row|
row.delete_if { |header, _field| header.blank? }
output_csv << row
end
end
Note that blank? is a Ruby on Rails method, but since you've tagged the question with ruby-on-rails this should be fine.
From the CSV::new documentation (also describing CSV::foreach) options:
:return_headers
When false, header rows are silently swallowed. If set to true,
header rows are returned in a CSV::Row object with identical
headers and fields (save that the fields do not go through the
converters).
Related
Please forgive me if I'm unclear, but this is pretty difficult to describe in words. I'm using Ruby for a Rails application to take in values from a CSV file row by row, using Ruby's tohash.select function to generate a hash table of all of the key-value pairs for each row, and then using the create function to generate a table of the rows.
The code works for creating a database table from a CSV, but many records within the CSV have null values for some of the fields/columns. I'd like to have those null values converted to a string like "null" when inserting each row in the CSV to the hash table.
I've tried using a regex to replace the null values with a string, but it hasn't worked. I very well may just be doing it wrong.
require 'csv'
fields = %w{lVoterUniqueID sAffNumber szStateVoterID sVoterTitle szNameLast szNameFirst szNameMiddle sNameSuffix sGender szSitusAddress szSitusCity sSitusState sSitusZip sHouseNum sUnitAbbr sUnitNum szStreetName sStreetSuffix sPreDir sPostDir szMailAddress1 szMailAddress2 szMailAddress3 szMailAddress4 szMailZip szPhone szEmailAddress dtBirthDate sBirthPlace dtRegDate dtOrigRegDate dtLastUpdate_dt sStatusCode szStatusReasonDesc sUserCode1 sUserCode2 iDuplicateIDFlag szLanguageName szPartyName szAVStatusAbbr szAVStatusDesc szPrecinctName sPrecinctID sPrecinctPortion sDistrictID_0 iSubDistrict_0 szDistrictName_0 sDistrictID_1 iSubDistrict_1 szDistrictName_1 sDistrictID_2 iSubDistrict_2 szDistrictName_2 sDistrictID_3 iSubDistrict_3 szDistrictName_3 sDistrictID_4 iSubDistrict_4 szDistrictName_4 sDistrictID_5 iSubDistrict_5 szDistrictName_5}
if Rails.env.production?
CSV.foreach(Dir.pwd + "/db/prod.csv", encoding: 'iso-8859-1:utf-8', headers: true) do |row|
voter_row = row.to_hash.select { |k, v| fields.include?(k)}
Voter.create!(voter_row.to_hash.symbolize_keys)
end
elsif Rails.env.development?
CSV.foreach(Dir.pwd + "/db/Cntywd_020819.csv", headers: true) do |row|
voter_row = row.to_hash.select { |k, v| fields.include?(k)}
Voter.create!(voter_row.to_hash.symbolize_keys)
end
else
CSV.foreach(Dir.pwd + "/db/Cntywd_020819.csv", headers: true) do |row|
voter_row = row.to_hash.select { |k, v| fields.include?(k)}
Voter.create!(voter_row.to_hash.symbolize_keys)
end
end
Wherever I am using row.tohash.select, I'd like to replace null values with an empty string, that way every key in the hash table has a corresponding string ("null" if there is no value).
There is Hash#transform_values method that does the job in a clean and idiomatic way. I'd also suggest using Hash#slice instead of #select:
...
CSV.foreach(Dir.pwd + "/db/prod.csv", encoding: 'iso-8859-1:utf-8', headers: true) do |row|
attrs = row.to_hash.slice(*fields).transform_values { |v| v || "null" }
Voter.create!(attrs)
end
...
But to be honest, in practice, I'd propose another solution - using default values for database columns if possible instead of "normalizing" the data on the app level.
You have to iterate over the values and set them where appropriate.
if Rails.env.production?
CSV.foreach(Dir.pwd + "/db/prod.csv", encoding: 'utf-8', headers: true) do |row|
voter_row = row.to_hash.select { |k, v| fields.include?(k)}
voter_row.each do |key, value|
if value.nil?
voter_row[key] = "null"
end
end
Voter.create!(voter_row.to_hash.symbolize_keys)
end
else
CSV.foreach(Dir.pwd + "/db/Cntywd_020819.csv", headers: true) do |row|
voter_row = row.to_hash.select { |k, v| fields.include?(k)}
voter_row.each do |key, value|
if value.nil?
voter_row[key] = "null"
end
end
Voter.create!(voter_row.to_hash.symbolize_keys)
end
I also think your elseif/else is redundant, unless I'm missing something.
This sounds like a job for Hash#transform_values:
h = voter_row.transform_values { |v| v.nil?? 'null' : v }
Couple other things:
You might want to use Hash#slice instead of #select:
voter_row = row.to_h.slice(*fields)
create is happy with string keys so you don't need to call #symbolize_keys.
You can simplify your CSV.foreach blocks to just this:
Voter.create!(row.to_h.slice(*fields))
You could go further and write:
opts = { headers: true }
if Rails.env.production?
csv_file = 'db/prod.csv'
opts[:encoding] 'iso-8859-1:utf-8'
elsif Rails.env.development?
csv_file = 'db/Cntywd_020819.csv'
else
csv_file = 'db/Cntywd_020819.csv'
end
CSV.foreach(Rails.root.join(csv_file), opts) do |row|
Voter.create!(row.to_h.slice(*fields))
end
I want to add new column and update existing values in CSV response. How can I do simpler and better way of doing the below transformations?
Input
id,name,country
1,John,US
2,Jack,UK
3,Sam,UK
I am using following method to parse the csv string and add new column
# Parse original CSV
rows = CSV.parse(csv_string, headers: true).collect do |row|
hash = row.to_hash
# Merge additional data as a hash.
hash.merge('email' => 'sample#gmail.com')
end
# Extract column names from first row of data
column_names = rows.first.keys
# Generate CSV after transformation of csv
csv_response = CSV.generate do |csv|
csv << column_names
rows.each do |row|
# Extract values for row of data
csv << row.values_at(*column_names)
end
end
I am using following method to parse the csv and update existing values
name_hash = {"John" => "Johnny", "Jack" => "Jackie"}
Parse original CSV
rows = CSV.parse(csv_string, headers: true).collect do |row|
hash = row.to_hash
hash['name'] = name_hash[hash['name']] if name_hash[hash['name']] != nil
hash
end
# Extract column names from first row of data
column_names = rows.first.keys
# Generate CSV after transformation of csv
csv_response = CSV.generate do |csv|
csv << column_names
rows.each do |row|
# Extract values for row of data
csv << row.values_at(*column_names)
end
end
One possible option given the following reference data to be used for modifying the table:
name_hash = {"John" => "Johnny", "Jack" => "Jackie"}
sample_email = {'email' => 'sample#gmail.com'}
Just store in rows the table converted to hash:
rows = CSV.parse(csv_string, headers: true).map(&:to_h)
#=> [{"id"=>"1", "name"=>"John", "country"=>"US"}, {"id"=>"2", "name"=>"Jack", "country"=>"UK"}, {"id"=>"3", "name"=>"Sam", "country"=>"UK"}]
Then modify the hash based on reference data (I used Object#then for Ruby 2.6.1 alias of Object#yield_self for Ruby 2.5):
rows.each { |h| h.merge!(sample_email).then {|h| h['name'] = name_hash[h['name']] if name_hash[h['name']] } }
#=> [{"id"=>"1", "name"=>"Johnny", "country"=>"US", "email"=>"sample#gmail.com"}, {"id"=>"2", "name"=>"Jackie", "country"=>"UK", "email"=>"sample#gmail.com"}, {"id"=>"3", "name"=>"Sam", "country"=>"UK", "email"=>"sample#gmail.com"}]
Finally restore the table:
csv_response = CSV.generate(headers: rows.first.keys) { |csv| rows.map(&:values).each { |v| csv << v } }
So you now have:
puts csv_response
# id,name,country,email
# 1,Johnny,US,sample#gmail.com
# 2,Jackie,UK,sample#gmail.com
# 3,Sam,UK,sample#gmail.com
I'm trying to figure out why I keep getting the following error:
From the following code:
def information_transfer()
file_contents = CSV.read("test.csv", col_sep: ",", encoding: "ISO8859-1")
file_contents2 = CSV.read("applicantinfo.csv", col_sep: ",", encoding:"ISO8859-1")
arraysize = file_contents.length
arraysize1 = file_contents2.length
for i in 1..arraysize
for x in 1..arraysize1
if file_contents[i][0] == file_contents2[x][0]
CSV.open("language_output.csv", "wb") do |csv|
csv << [file_contents[i][0], file_contents[i][1], file_contents[i][2],file_contents[i][3], file_contents[i][4],
file_contents[i][5], file_contents[i][6], file_contents[i][7], file_contents[i][8],file_contents[i][9],
file_contents[i][10], file_contents[i][11], file_contents[i][12], file_contents[i][13], file_contents[i][14],
file_contents[i][15], file_contents[i][16], file_contents[i][17], file_contents[i][18], file_contents2[i][24],file_contents2[i][25],
file_contents2[i][26],file_contents2[i][27], file_contents2[i][28], file_contents2[i][29], file_contents2[i][30], file_contents2[i][31], file_contents2[i][32], file_contents2[i][33]]
end
end
end
end
end
I'm basically trying to take two individual .csv files and merge certain columns together. I have two arrays (file_contents and file_contents2) that are reading the individual csv files and storing the contents in arrays. For some reason i'm getting a syntax error for my if statement. I was hoping someone could help me figure out why the if statement that I wrote isn't valid. I figured it would be. Any help is appreciated. Thanks!
Seems like one of file_contents or file_contents2 is empty.
You can skip the loop if you don't want to raise the error on that specific line.
next if file_contents[i].blank? || file_contents2[i].blank?
if file_contents[i][0] == file_contents2[x][0]
One of your arrays file_contents or file_contents2 might be empty. Output both, as well as printing file_contents[i][0] and file_contents2[x][0] before your if statement.
You can make a simple change that should work:
for i in 0..arraysize
for x in 0..arraysize1
And add an error check:
if !file_contents[i].blank? and !file_contents2[x].blank? and file_contents[i][0] == file_contents2[x][0]
for i in 1..arraysize
for x in 1..arraysize1
Array indexes run from 0 to length − 1 in Ruby; loop in 0...arraysize instead.
If file_contents2[i] can or should be written as file_contents2[x], you can just loop over the arrays’ contents directly:
for a in file_contents
for b in file_contents2
and use slices to get consecutive array elements into another array:
def information_transfer()
file_contents = CSV.read("test.csv", col_sep: ",", encoding: "ISO8859-1")
file_contents2 = CSV.read("applicantinfo.csv", col_sep: ",", encoding: "ISO8859-1")
for a in file_contents
for b in file_contents2
if a[0] == b[0]
CSV.open("language_output.csv", "wb") do |csv|
csv << a[0..18] + b[24..33]
end
end
end
end
end
and if you’re trying to join the two files one-to-one, you can do that more efficiently by putting the key into a hash. You also probably didn’t mean to reopen the output file every time.
def information_transfer()
file_contents = CSV.read("test.csv", col_sep: ",", encoding: "ISO8859-1")
file_contents2 = CSV.read("applicantinfo.csv", col_sep: ",", encoding: "ISO8859-1")
h = Hash[file_contents.collect { |row| [row[0], row] }]
CSV.open("language_output.csv", "wb") do |csv|
for b in file_contents2
a = h[b[0]]
csv << a[0..18] + b[24..33]
end
end
end
I have a CSV file with two columns:
PPS_Id Amount
123 100
1234 150
I read data from this file and insert in a array using the code below:
CSV.foreach("filename.CSV", headers: true) do |row|
file_details << row.inspect # hash
end
I am then trying to push the data in the file_details into a hash with PPS_Id as key and Amount as Value, I am using the code below:
file_details_hash = Hash.new
file_details.each { |x|
file_details_hash[x['PPS_Id']] = x['Amount']
}
But when I print the result I get nothing just {"PPS_Id"=>"Amount"}
Can you please help
Your code, modified to work
You need to specify the column separator for your csv, and remove inspect.
require 'csv'
file_details = []
CSV.foreach("filename.CSV", headers: true, col_sep: "\s" ) do |row|
file_details << row
end
file_details_hash = Hash.new
file_details.each { |x|
file_details_hash[x['PPS_Id']] = x['Amount']
}
p file_details_hash
#=> {"123"=>"100", "1234"=>"150"}
It now returns what you expected to get.
Shorter solution
Read the csv, drop the first line (header) and convert to a Hash :
p CSV.read("filename.CSV", col_sep: "\s").drop(1).to_h
#=> {"123"=>"100", "1234"=>"150"}
First of all, you are collecting strings into an array (see String#inspect):
file_details << row.inspect
After that you call (sic!) String#[] on that strings:
x['PPS_Id'] #⇒ "PPS_Id", because string contains this substring
That said, your code has nothing but errors. You might achieve what you want with:
csv = CSV.parse(File.read("filename.CSV"), col_sep: "\s")
csv[1..-1].to_h
#⇒ {
# "123" => "100",
# "1234" => "150"
# }
Using inspect will save your CSV rows as strings, so obviously you won't be able get what you need. Instead try this:
file_details = CSV.read("filename.csv")
Read CSV directly will create an 2D array that you can then iterate over, which will look like this: [["PPS_Id", "Amount"], ["123", "100"], ["1234", "150"]]
From there you can slightly modify your approach:
file_details.each do |key, value|
file_details_hash[key] = value
end
To receive a hash like this: {"PPS_Id"=>"Amount", "123"=>"100", "1234"=>"150"}
I'm trying to upload rows from a CSV file into my database, but the spaces in the headers keep messing me up. So for example, the header will be "Order Item Id" and I want the hash key to be "order_item_id". Here's what my code looks like now:
CSV.foreach(file.path, headers:true, :header_converters => lambda { |h| h.try(:downcase) }, col_sep: ';') do |row|
product_hash = row.to_hash
product = OrderCsv.where(id: product_hash["id"])
if product.count ==1
product.first.update_attributes(product_hash)
else
user.order_csvs.create!(product_hash)
end
end
I've tried editing the product_hash with product_hash.keys.each { |k| k = "..." }
but it doesn't do anything. I've also tried creating a header converter like the one that does the downcasing, but I wasn't able to make that work either. Sorry if this is a newb question, but I've been looking everywhere for an answer and none of them have been working for me. Thanks a lot!
You can concatenate the replacement after the downcase, in the :header_converters, like this:
lambda { |h| h.try(:downcase).try(:gsub,' ', '_') }
Try this:
product_hash = { "Order Item Id" => 2 }
product_hash = product_hash.each_with_object({}) do |(k, v), h|
h[k.parameterize.underscore] = v
end
puts product_hash # {"order_item_id"=>2}
In case anyone else stumble upon this question you can make use of :symbol header_converter
https://docs.ruby-lang.org/en/2.1.0/CSV.html#HeaderConverters
The header String is downcased, spaces are replaced with underscores, non-word characters are dropped, and finally to_sym() is called.
Example:
CSV.foreach(csv_path, headers: true, header_converters: :symbol) do |row|
# do stuff
end
If you wish to convert a hash with keys containing spaces to a hash with keys containing underscores, you can do the following
hash_with_spaces = {"order item id" => '1', "some other id" => '2'}
new_hash = hash_with_spaces.inject({}) do |h, (k, v)|
h[k.gsub(' ', '_')] = v ; h
end
new_hash
#=> {"order_item_id"=>"1", "some_other_id"=>"2"}