Rails Import CSV Error: invalid byte sequence in UTF-8 - ruby-on-rails

I'm getting the error invalid byte sequence in UTF-8 when trying to import a CSV file in my Rails application. Everything was working fine until I added a gsub method to compare one of the CSV columns to a field in my database.
When I import a CSV file, I want to check whether the address for each row is included in an array of different addresses for a specific client. I have a client model with an alt_addresses property which contains a few different possible formats for the client's address.
I then have a citation model (if you're familiar with local SEO you'll know this term). The citation model doesn't have an address field, but it has a nap_correct? field (NAP stands for "Name", "Address", "Phone Number"). If the name, address, and phone number for a CSV row is equivalent to what I have in the database for that client, the nap_correct? field for that citation gets set to "correct".
Here's what the import method looks like in my citation model:
def self.import(file, client_id)
#client = Client.find(client_id)
CSV.foreach(file.path, headers: true) do |row|
#row = row.to_hash
#citation = Citation.new
if #row["Address"]
if #client.alt_addresses.include?(#row["Address"].to_s.downcase.gsub(/\W+/, '')) && self.phone == #row["Phone Number"].gsub(/[^0-9]/, '')
#citation.nap_correct = true
end
end
#citation.name = #row["Domain"]
#citation.listing_url = #row["Citation Link"]
#citation.save
end
end
And then here's what the alt_addresses property looks like in my client model:
def alt_addresses
address = self.address.downcase.gsub(/\W+/, '')
address_with_zip = (self.address + self.zip_code).downcase.gsub(/\W+/, '')
return [address, address_with_zip]
end
I'm using gsub to reformat the address column in the CSV as well as the field in my client database table so I can compare the two values. This is where the problem comes in. As soon as I added the gsub method I started getting the invalid byte-sequence error.
I'm using Ruby 2.1.3. I've noticed a lot of the similar errors I find searching Stack Overflow are related to an older version of Ruby.

Specify the encoding with encoding option:
CSV.foreach(file.path, headers: true, encoding: 'iso-8859-1:utf-8') do |row|
# your code here
end

One way I've figured out to get around this is to "Save As" on open office or libre office and then click "Edit Filter Settings", then make sure the character set is UTF-8 and save. Bottom line, use some external tool to convert the characters to utf-8 compatible characters before loading it into ruby. This issue can be a true f-ing labyrinth within ruby alone
A unix tool called iconv can apparently do this sort of thing. https://superuser.com/questions/588048/is-there-any-tools-which-can-convert-any-strings-to-utf-8-encoded-values-in-linu

Related

Using Roo with Ruby(Rails) to parse Excel

I'm trying to allow users to upload a CSV/Excel document, and parse it using Roo (The most suggested one I've seen), but I'm having a bit of issues figuring it out.
Current Script
require 'roo'
xlsx = Roo::Excelx.new("./TestAppXL.xlsx")
xlsx.each_row_streaming do |row|
puts row.inspect # Array of Excelx::Cell objects
end
This was the only one I was able to get work - It returns what looks to be JSONB.
What I'm trying to do is a few part process:
A) User Uploads a list of 'cards' to my website.(Trying to allow as many options as possible. CSV, Excel, etc)
B) It instantly returns a list of the headers and asks 'Which header is name, quantity, etc etc?'
C) I parse the data for specifics headers and do 'X'.
B is what I primarily need help with. I'm struggling to figure out Roo exactly. I won't have control over the headers so I can't use numerical column numbers.
(Adding in Rails tag since this will be in a controller in the end, maybe an easier way to do it.)
Updated Script
I've actually made a lot of progress. Still trying to get closer to my original request.
require 'roo'
require 'roo-xls'
xlsx = Roo::Spreadsheet.open('Demo.xls')
headers = xlsx.first_row
puts xlsx.row(headers)
puts "Which number header is the Card Name?"
CardName = gets
puts xlsx.column(CardName.to_i)
# => Returns basic info about the spreadsheet file
Need a lot more logic on the gets but currently if I put in '3' it will return all content of Column 'CardName'. Working on iterating over the rows now.
Psuedo working script
require 'roo'
require 'roo-xls'
xlsx = Roo::Spreadsheet.open('Demo.xls')
headers = xlsx.first_row
puts xlsx.row(headers)
puts "Which number header is the Card Name?"
CardName = gets.to_i
specHeader = xlsx.cell(headers,CardName)
xlsx.column(CardName).drop(0).each_with_index do |item, index|
if index == 0
else
puts item
end
end
This is actually performing as expected, and I can start feeding the file into a Rake job now. Still working on some of the iteration but this is very close.
I made you a generic way to extract data out of a Roo spreadsheet based on a few header names which would be the convention to use by your uploaders.
require 'roo'
require 'roo-xls'
xlsx = Roo::Spreadsheet.open('Demo.xls')
first_row = xlsx.first_row
headers = ['CardName', 'Item']
headers.each{|h|Kernel.const_set(h, xlsx.row(first_row).index{|e| e =~ /#{h}/i})}
begin
xlsx.drop(first_row).each do |row|
p [row[CardName], row[Item]]
end
rescue
# the required headers are not all present
end
I suppose the only line that needs explaining is headers.each{|h|Kernel.const_set(h, xlsx.row(first_row).index{|e| e =~ /#{h}/i})}
for each headername assign to it with const_set the index of it in xlsx.row(first_row) (our headerrow) where the regular expression /h/i returns an index, the #{} around h is to expand the h into its value, 'CardName' in the first case, the i at the end of /h/i means the case is to be ignored, so the constant CardName is assigned the index of the string CardName in the headerrow.
Instead of the rather clumsy begin rescue structure you could check if all required constants are present with const_get and act upon that instead of catching the error.
EDIT
instead of the p [row[CardName], row[Item]] you could check and do anything, only keep in mind that if this is going to be part of a Rails or other website the interaction with the user is going to be tickier than your puts and get example. Eg something like
headers = ['CardName', 'Item', 'Condition', 'Collection']
...
xlsx.drop(first_row).each do |row|
if row[CardName].nil? || row[Item].nil?
# let the user know or skip
else
condition, collection = row[Condition], row[Collection]
# and do something with it
end
end

How to map data imported via xlsx with no headers in rails 4

I want to add an import function for the users of my rails app, however the files that they will import won't have a header and the interesting data will start at row 8. In the rows I only need 2 fields
Here is an example of a line in the xlsx file :
751,"01/17/2015","11:17:32","60","TDFSRDSK","2","10","-1","0","3","","26","3","","","1","0"
I'll only need the date and the number in 4th field (60) and add them to an SQL table
I have a problem with the mapping and how to do it. I've tried to do it based on the railscast tutorial and roo doc but I can't manage to make it work.
def self.import(file)
xlsx = Roo::Excelx.new(file)
xlsx.each_row do |row|
date = row[2]
value = row[4]
user_id = current_user.id
product.create(:date => date, :valeur => value, :user_id => user_id)
end
end
And the error I get :
no implicit conversion of ActionDispatch::Http::UploadedFile into String
I'm really new to rails/ruby so I'm not even sure the mapping code is supposed to be like that.
It seems like you need to read the contents of the uploaded file into a String object first:
xlsx = Roo::Excelx.new(file.read)
You can refer to the relevant Rails guide for details on how this works.

Using gsub to clean string and then truncate

I need to remove some characters from a string (computer mac address + junk...) and truncate to leave the first 18 characters.
Currently, I have the following in my model:
def popular_locations
popular_locations = Radacct.group(calledstationid).order('calledstationid desc').count
end
That outputs a list and their count but the format needs adjustment for a search I'll do.
I tried adding this:
clean_mac_address = :calledstationid.gsub(/[^0-9a-z]/i, '')
But I get an error undefined method `gsub' for :calledstationid:Symbol
-- EDIT --
Initially the calledstationid is stored in db (radacct model) with the following format:
00-18-0A-21-44-7B:Home Office
This is basically a mac address plus an SSID name.
I need to strip out the dashes and SSID because we have another model (locations) which has a list of mac addresses in this format:
00:18:0A:21:44:7B
locations and radacct are unrelated (radacct is a model where all sessions are dumped into). Eventually what I need to do is do a count of all sessions and group by calledstationid (as seen above). Then we'll query the locations table and work out the location name. I should be left with something like this:
location_name session_count
School 2
Home 12
Office 89
I'm not sure how you put the errant : in there when the model definition doesn't have it:
clean_mac_address = calledstationid.gsub(/[^0-9a-z]/i, '')
What you probably mean to do is clean up that variable before passing it in:
def popular_locations
# Clean up calledstationid
calledstationid.gsub!(/[^0-9a-z]/i, '')
# Find it and return
Radacct.group(calledstationid).order('calledstationid desc').count
end
I think that popular_location is object of ActiveSupport::OrderedHash class with calledstationid as keys.
So if I understood you right, try something like
result = Radacct.group(calledstationid).order('calledstationid desc').count
result.each do |key, value|
puts key.gsub(/[^0-9a-z]/i, '') # formatted key
puts value # count
end
I think there is also sql-way to did it. You should select substring from calledstationid and then group by it.
Check this article http://www.ke-cai.net/2009/06/mysql-group-by-substring.html

What is the best way to replace smart quotes, smart apostrophe and ellipsis in Rails 3?

My application allows the user to enter text. When they copy and paste from MS Word, it pastes smart quotes, smart apostrophes and ellipsis. These characters get saved into the database and cause problems. What is the best way to replace these non-UTF-8 characters with normal quotes("), apostrophe(') and periods(...)?
Also, how do you test this functionality? I added a test with these special characters and # encoding: ISO-8859-1 at the top of the file. The special characters caused the tests stop running: /home/george/.rvm/gems/ruby-1.9.2-p180/gems/redgreen-1.2.2/lib/redgreen.rb:62:in 'sub': invalid byte sequence in UTF-8 (ArgumentError)...Apparently redgreen gem is incompatible with these characters...?
Thanks.
you can add a before_save method that will convert your text to UTF-8 corresponding characters. if you have just 1 field that might contain non-UTF8 chars then its simple, if you have many fields then it would be better if you dynamically iterate over changed text/string fields and fix UTF-8 problem. Either way you need to use String#encode. Here is an example
before_save :fix_utf8_encoding
def fix_utf8_encoding
columns = self.class.columns.select{|col| [:text,:string].include?(col.type)}.map{|col| col.name.to_sym}
columns.each do |col|
self[col] = self.self[col].encode('UTF-8', :invalid => :replace, :undef => :replace) if self[col].kind_of?(String) #Double checking just in case.
end
end
private :fix_utf8_encoding
And for bonus points you can also check if the field was changed using rails handy changed? helpers before fixing it.

rails read the file and store the database

i saved .rb file in app/component/text.rb. i want to read and store the database.
in that file has 5 rows and my table also have 5 rows.i want to save line by line.you can understand my question.i use mysql database.
please help me....
thanks,
kingston.s
I missed the rails- and ruby-version, the operating system you use and a clear problem description...anyway
Assuming that you meant your database would have 5 fields and you want to read and write a record to a file:
Assuming the file will contain a record for the model "Apple "with the fields id, title and description.
Read new Apple from file:
indexes = [:id, :title, :desc].reverse # be careful the data will be saved reverse because pop takes the last element first
record_params = {}
handle = File.open("Apple.record","r")
handle.lines.each do |line|
record_params[indexes.pop]=line.chomp #use chomp to get rid of the trailing \n
end
handle.close
Apple.create!(record_params)
Write Apple with id 7 to a File
indexes = [:id, :title, :desc]
record = Apple.find(7)
to_save = indexes.map{|i| record.send i}
handle = File.open("Apple.record","w")
handle.write(to_save.join("\n"))
handle.close
Beware to escape the linebreaks in your data if there are any possible..
Anyway it would be much easier to use YAML:
#write the first apple to the file
handle = File.open("Apple.record", "w")
handle = Apple.first.to_yaml
handle.close
#read the file to a new record
record_params = YAML.load File.read("Apple.record")
Apple.create!(record_params)
All this worked in Rails 3, but remember, that you are saving the id as well, therefore saving and loading will cause the id to change, because a record with the given id already existed.

Resources