I am trying to parse a CSV file generated from an Excel spreadsheet.
Here is my code
require 'csv'
file = File.open("input_file")
csv = CSV.parse(file)
But I get this error
ArgumentError: invalid byte sequence in UTF-8
I think the error is because Excel encodes the file into ISO 8859-1 (Latin-1) and not in UTF-8
Can someone help me with a workaround for this issue, please
Thanks in advance.
You need to tell Ruby that the file is in ISO-8859-1. Change your file open line to this:
file=File.open("input_file", "r:ISO-8859-1")
The second argument tells Ruby to open read only with the encoding ISO-8859-1.
Specify the encoding with encoding option:
CSV.foreach(file.path, headers: true, encoding:'iso-8859-1:utf-8') do |row|
...
end
You can supply source encoding straight in the file mode parameter:
CSV.foreach( "file.csv", "r:windows-1250" ) do |row|
<your code>
end
If you have only one (or few) file, so when its not needed to automatically declare encoding on whatever file you get from input, and you have the contents of this file visible in plaintext (txt, csv etc) separated with i.e. semicolon, you can create new file with .csv extension manually, and paste the contents of your file there, then parse the contents like usual.
Keep in mind, that this is a workaround, but in need of parsing in linux only one big excel file, converted to some flavour of csv, it spares time on experimenting with all those fancy encodings
Save the file in utf-8, unless for some reason you need to save it differently in which case you may specify the encoded set while reading the file
add second argument "r:ISO-8859-1" as File.open("input_file","r:ISO-8859-1" )
I had this same problem and was just using google spreadsheets and then downloading as a CSV. That was the easiest solution.
Then I came across this gem
https://github.com/singlebrook/utf8-cleaner
Now I don't need to worry about this issue at all. Hope this helps!
Related
I am working in rails and I downloaded a word document from OneDrive through graph API and it returns a binary string which is a collection of files. I need to convert this string into .docx file and if I save it in a simple way or I write as a binary file after decoding it using base64, it doesn't save in the right format, it looks some awkward content in the file.
Any help in this regard will be appreciated.
Thanks
Can you not just save the binary string to a file?
data = <binary string>
File.open('document.docx', 'wb') do |f|
f.write(data)
end
A docx file is actually a gzipped collection of files, with the file extension .docx substituted for .gz. There should be no conversion necessary, and there should be no encoding necessary in order to download it across the 'net.
You should be able to change the file extension to .gz and then unzip it using gunzip, with the result being a collection of xml files (text) and directories. If you can't do this, then you haven't correctly decoded it, so you should figure out what encoding you have requested, and reverse that, or better, don't request encoding at all.
I am trying to figure out how to take a string that is the file contents of a mp4 file and write a properly formatted mp4 file. Currently I am just throwing the string into a file and slapping a .mp4 extension on it, but this resulting file cannot be played by any video players (I am assuming because of all the missing meta data).
def write_mp4(mp4_string)
file = File.new('movie.mp4', 'w')
file.puts(mp4_string)
file.close
end
For context, I am doing this in a Ruby on Rails application, not sure if that changes anything. Please help thanks.
Use "wb" mode, which will suppress EOL conversions and set the encoding properly
Use write, not puts, as the latter inserts an extra EOL
You could use a shortcut: File.write('movie.mp4', mp4_string, mode: 'wb') - or even File.binwrite('movie.mp4', mp4_string).
Of course, make sure the string actually contains a correct file before - for example, if mp4_string.encoding doesn't return #<Encoding:ASCII-8BIT>, you probably done goofed somewhere before the writing step, too :)
I generate a CSV text file in Rails like this:
CSV.generate(col_sep: ';') do |csv|
sheet.add_row ['1st line']
sheet.add_row ['2nd line']
end
When I open the text file the two lines are there as expected. Unfortunately this file now should be used by a program that reads the file and I get an error message, that the second line is missing. I have a sample file that looks exactly like the file I generated which works fine but my file can't be read properly. It also has the same encoding. Any suggestions where to look? Anything concerning line breaks?
I'm not sure this is a question that can be answered as asked. You said that a 3rd party program is having trouble reading a text file generated by Ruby, but provided no information on that error and how you think Ruby is related to this error.
Could you please update your original post with the plaintext version of your CSV file and what program you're trying to open it in?
how can I show the chinese,thanks.
This is my csv file.
How can solve it,thank you.
LOAD CSV requires that the CSV file use UTF-8 character encoding. Your file may be using the wrong encoding.
I have a .xlsx file converted to .csv.I need to write a script to modify this file(change/rename columns etc.) How can I open this .csv file and save it from within the script?
Thanks!
Open the csv file just like you would open any other file in ruby using the standard File api
csv_file = File.open('data.csv', 'r')
Parse it manually or use a library like FasterCSV. Make your modifications, writeback to the file and close. There is nothing inherently special about a csv file, work with it like you would with any file in ruby.
You should proably work with a CSV library (or in the ruby world a gem). So install the gem,
and your code will look something like this:
FasterCSV.foreach("path/to/file.csv") do |row|
# use row here...
end
http://fastercsv.rubyforge.org/
As far as I know, you cannot make inline modifications to the CSV file. You would have to output via another file.