How to convert ascii-8bit file data to readable string - ruby-on-rails

I'm trying to parse the contents of a CSV file (saved on Windows Excel then uploaded to dropbox) from my dropbox via the Dropbox Core api.
I create a rake task (part of a Rails app) with the following code and it creates a magnum-opus.csv file on my local hard drive that has the original text in the Excel file. The encoding of contents is ASCII-8BIT by calling contents.encoding
contents, metadata = client.get_file_and_metadata('/magnum-opus.csv')
open('magnum-opus.csv', 'w') {|f| f.puts contents }
Instead of creating a local file, I'd like to convert the binary data in "contents" to readable text on the fly and parse through it. I don't want to save it anywhere and then have to open it.
How do I go about doing that?
If do
p contents
I end up getting some type of unreadable data format ... \x00e\x00d\x00u
1) How do I convert this into a string I can parse through with Ruby?
2) The other thing I'm wondering - if do:
puts contents
The original text in the CSV file that is human readable is outputed to STDOUT. What is puts doing?
I tried:
calling CSV.parse on contents.encode( "UTF-8", "binary", :invalid => :replace, :undef => :replace, :replace => '') but end up getting an error such as
CSV::MalformedCSVError: Unquoted fields do not allow \r or \n

Related

Ruby: Is there a way to specify your encoding in File.write?

TL;DR
How would I specify the mode of encoding on File.write, or how would one save image binary to a file in a similar fashion?
More Details
I'm trying to download an image from a Trello card and then upload that image to S3 so it has an accessible URL. I have been able to download the image from Trello as binary (I believe it is some form of binary), but I have been having issues saving this as a .jpeg using File.write. Every time I attempt that, it gives me this error in my Rails console:
Encoding::UndefinedConversionError: "\xFF" from ASCII-8BIT to UTF-8
from /app/app/services/customer_order_status_notifier/card.rb:181:in `write'
And here is the code that triggers that:
def trello_pics
#trello_pics ||=
card.attachments.last(config_pics_number)&.map(&:url).map do |url|
binary = Faraday.get(url, nil, {'Authorization' => "OAuth oauth_consumer_key=\"#{ENV['TRELLO_PUBLIC_KEY']}\", oauth_token=\"#{ENV['TRELLO_TOKEN']}\""}).body
File.write(FILE_LOCATION, binary) # doesn't work
run_me
end
end
So I figure this must be an issue with the way that File.write converts the input into a file. Is there a way to specify encoding?
AFIK you can't do it at the time of performing the write, but you can do it at the time of creating the File object; here an example of UTF8 encoding:
File.open(FILE_LOCATION, "w:UTF-8") do
|f|
f.write(....)
end
Another possibility would be to use the external_encoding option:
File.open(FILE_LOCATION, "w", external_encoding: Encoding::UTF_8)
Of course this assumes that the data which is written, is a String. If you have (packed) binary data, you would use "wb" for openeing the file, and syswrite instead of write to write the data to the file.
UPDATE As engineersmnky points out in a comment, the arguments for the encoding can also be passed as parameter to the write method itself, for instance
IO::write(FILE_LOCATION, data_to_write, external_encoding: Encoding::UTF_8)

How to convert Base64 string to pdf file using prawn gem

I want to generate pdf file from DB record. Encode it to Base64 string and store it to DB. Which works fine. Now I want reverse action, How can I decode Base64 string and generate pdf file again?
here is what I tried so far.
def data_pdf_base64
begin
# Create Prawn Object
my_pdf = Prawn::Document.new
# write text to pdf
my_pdf.text("Hello Gagan, How are you?")
# Save at tmp folder as pdf file
my_pdf.render_file("#{Rails.root}/tmp/pdf/gagan.pdf")
# Read pdf file and encode to Base64
encoded_string = Base64.encode64(File.open("#{Rails.root}/tmp/pdf/gagan.pdf"){|i| i.read})
# Delete generated pdf file from tmp folder
File.delete("#{Rails.root}/tmp/pdf/gagan.pdf") if File.exist?("#{Rails.root}/tmp/pdf/gagan.pdf")
# Now converting Base64 to pdf again
pdf = Prawn::Document.new
# I have used ttf font because it was giving me below error
# Your document includes text that's not compatible with the Windows-1252 character set. If you need full UTF-8 support, use TTF fonts instead of PDF's built-in fonts.
pdf.font Rails.root.join("app/assets/fonts/fontawesome-webfont.ttf")
pdf.text Base64.decode64 encoded_string
pdf.render_file("#{Rails.root}/tmp/pdf/gagan2.pdf")
rescue => e
return render :text => "Error: #{e}"
end
end
Now I am getting below error:
Encoding ASCII-8BIT can not be transparently converted to UTF-8.
Please ensure the encoding of the string you are attempting to use is
set correctly
I have tried How to convert base64 string to PNG using Prawn without saving on server in Rails but it gives me error:
"\xFF" from ASCII-8BIT to UTF-8
Can anyone point me what I am missing?
The answer is to decode the Base64 encoded string and either send it directly or save it directly to disk (naming it as a PDF file, but without using prawn).
The decoded string is a binary representation of the PDF file data, so there's no need to use Prawn or to re-calculate the content of the PDF data.
i.e.
raw_pdf_str = Base64.decode64 encoded_string
render :text, raw_pdf_str # <= this isn't the correct rendering pattern, but it's good enough as an example.
EDIT
To clarify some of the information given in the comments:
It's possible to send the string as an attachment without saving it to disk, either using render text: raw_pdf_str or the #send_data method (these are 4.x API versions, I don't remember the 5.x API style).
It's possible to encode the string (from the Prawn object) without saving the rendered PDF data to a file (save it to a String object instead). i.e.:
encoded_string = Base64.encode64(my_pdf.render)
The String data could be used directly as an email attachment, similarly to the pattern provided here only using the String directly instead of reading any data from a file. i.e.:
# inside a method in the Mailer class
attachments['my_pdf.pdf'] = { :mime_type => 'application/pdf',
:content => raw_pdf_str }

Writing valid Excel file

I got an Endpoint that receives emails with .xlsx files as attachments. I want to save these file in my app, so I can later access the data.
After receiving the mail and its attachment - which has a mime_type of application/vnd.openxmlformats-officedocument.spreadsheetml.sheet- I call
path = "data/emails/#{attachment.filename}"
File.write(path, attachment.body.decoded)
but I get this error:
Encoding::UndefinedConversionError: "\x85" from ASCII-8BIT to UTF-8
When I use add .force_encoding('utf-8') to the decoded body, it does succeed, but the file it writes becomes invalid. I cannot open it normally, nor access its data.
How do I write a normal Excel file?
Does this work?
File.open( path, "w+b", 0644 ) { |f| f.write attachment.body.decoded }
Taken from here:
https://cbpowell.wordpress.com/2011/01/17/saving-attachments-with-ruby-1-9-2-rails-3-and-the-mail-gem/

MalformedCSVError with rails CSV (FasterCSV)

I'm having serious issues trying to parse some CSV in rails right now.
Basically my app gets a user to upload a CSV file. The app then converts the file to ensure it is in UTF-8 format, then attempts to parse it and process it. Whenever the app attempts to parse it however, I get the MalformedCSVError stating "Illegal quoting on line 1"
Now what I don't get, is if I copy the original file into a new document and save it, then I can parse it on a rails console without a problem.
If I attempt to parse the original file, it complains about an invalid character for UTF-8 encoding (the file isn't in UTF-8 hence the app converts it)
If I attempt to parse the file which the app has converted to UTF-8 and changed the line endings to LF, it fails to parse.
If I do a file diff between the version the app has produced, and the copy/paste version that I have made (which works) there are 0 differences so I really can't figure out why one is parsable, and one is not.
Any suggestions? My app is processing the file as follows :
def create
#survey = Survey.new(params[:survey])
# Now we need to try and convert this to UTF-8 if it isn't already
encoded = File.read(#survey.survey_data.current_path)
encoding = CharlockHolmes::EncodingDetector.detect(encoded)
# We've got a guess at the encoding,
# so we can try and convert it but it
# may still fail so we need to handle
# that
begin
re_encoded = CharlockHolmes::Converter.convert(encoded, encoding[:encoding], 'UTF-8')
re_encoded = re_encoded.gsub(/\r\n?/, "\n")
# Now replace the uploaded file
File.open(#survey.survey_data.current_path, 'w') { |f|
f.write(re_encoded)
}
rescue ArgumentError
puts "UH OH!!!!!"
end
puts "#{#survey.survey_data.current_path}"
#parsed = CSV.read(#survey.survey_data.current_path)
end
The file uploading gem is CarrierWave if that makes any difference.
Please can someone help me as this is driving me insane!
Edit
The error says it's on line 1. Line 1 (assuming it doesn't index from 0) is
"Survey","RD","GarrysMDs","NigelsMDs","PaulsMDs","StephensMDs","BrinleyJ","CarolineP","DaveL","GrantR","GregS","Kent","NeilC","NicolaP","AndyC","DarrenS","DeanB","KarenF","PaulR","RichardF","SteveG","BrianG","GordonA","NickD","NickR","NickT","RayL","SimonH","EdmondH","JasonF","MikeS","SamanthaN","TimB","TravisF","AlanS","Q1","Q2","Q3","Q4","Q5","Q6","Q7","Q8PM","Q8N","Q9","Q10","Q11","Q12","Q13","Q14","Q15","Q16PM","Q16N","Q17PM","Q17N","Q18PM","Q18N","Q19","Q20","Q21","Q22","comment","Q23.1","Q23.2","Q23.3","TQ23.1","TQ23.2","VPM","VN","VQ1","VQ2","VQ3","VQ4","VQ5","VQ6","VQ7","VQ8N","VQ8PM","VQ9","VQ10","VQ11","VQ12","VQ13","VQ14","VQ15","VQ16","VQ16N","VQ16PM","VQ17","VQ17N","VQ17PM","VQ18","VQ18N","VQ18PM","VQ19","VQ20","VQ21","VQ22","VQ23.1","VQ23.2","VQ23.3","VRD","XQ16","XQ17","XQ18"
Well that was irritating!
Turns out the file had a BOM which was causing the CSV parser to break. Loading the file with
CSV.open("path/to/file.csv", "rb:bom|encoding")
allowed it to parse it perfectly! So annoyed how long it took to track down but it's now working and with no need to convert to UTF-8 now either!

Rails 3, check CSV file encoding before import

In my app (Rails 3.0.5, Ruby 1.8.7), I created an import tool to import CSV data from file.
Problem: I asked my users to export the CSV file from Excel in UTF-8 encoding but they don't do it most of time.
How can I just verify if the file is UTF-8 before importing ? Else the import will run but give strange results. I use FasterCSV to import.
Exemple of bad CSV file:
;VallÈe du RhÙne;CÙte Rotie;
Thanks.
You can use Charlock Holmes, a character encoding detecting library for Ruby.
https://github.com/brianmario/charlock_holmes
To use it, you just read the file, and use the detect method.
contents = File.read('test.xml')
detection = CharlockHolmes::EncodingDetector.detect(contents)
# => {:encoding => 'UTF-8', :confidence => 100, :type => :text}
You can also convert the encoding to UTF-8 if it is not in the correct format:
utf8_encoded_content = CharlockHolmes::Converter.convert contents, detection[:encoding], 'UTF-8'
This saves users from having to do it themselves before uploading it again.
For 1.9 it's obvious, you just tell it to expect utf8 and it will raise an error if it isn't:
begin
lines = CSV.read('bad.csv', :encoding => 'utf-8')
rescue ArgumentError
puts "My users don't listen to me!"
end

Resources