Ruby: Is there a way to specify your encoding in File.write? - ruby-on-rails

TL;DR
How would I specify the mode of encoding on File.write, or how would one save image binary to a file in a similar fashion?
More Details
I'm trying to download an image from a Trello card and then upload that image to S3 so it has an accessible URL. I have been able to download the image from Trello as binary (I believe it is some form of binary), but I have been having issues saving this as a .jpeg using File.write. Every time I attempt that, it gives me this error in my Rails console:
Encoding::UndefinedConversionError: "\xFF" from ASCII-8BIT to UTF-8
from /app/app/services/customer_order_status_notifier/card.rb:181:in `write'
And here is the code that triggers that:
def trello_pics
#trello_pics ||=
card.attachments.last(config_pics_number)&.map(&:url).map do |url|
binary = Faraday.get(url, nil, {'Authorization' => "OAuth oauth_consumer_key=\"#{ENV['TRELLO_PUBLIC_KEY']}\", oauth_token=\"#{ENV['TRELLO_TOKEN']}\""}).body
File.write(FILE_LOCATION, binary) # doesn't work
run_me
end
end
So I figure this must be an issue with the way that File.write converts the input into a file. Is there a way to specify encoding?

AFIK you can't do it at the time of performing the write, but you can do it at the time of creating the File object; here an example of UTF8 encoding:
File.open(FILE_LOCATION, "w:UTF-8") do
|f|
f.write(....)
end
Another possibility would be to use the external_encoding option:
File.open(FILE_LOCATION, "w", external_encoding: Encoding::UTF_8)
Of course this assumes that the data which is written, is a String. If you have (packed) binary data, you would use "wb" for openeing the file, and syswrite instead of write to write the data to the file.
UPDATE As engineersmnky points out in a comment, the arguments for the encoding can also be passed as parameter to the write method itself, for instance
IO::write(FILE_LOCATION, data_to_write, external_encoding: Encoding::UTF_8)

Related

How to write a Tempfile as binary

When trying to write a string / unzipped file to a Tempfile by doing:
temp_file = Tempfile.new([name, extension])
temp_file.write(unzipped_io.read)
Which throws the following error when I do this with an image:
Encoding::UndefinedConversionError - "\xFF" from ASCII-8BIT to UTF-8
When researching it I found out that this is caused because Ruby tries to write files with an encoding by default (UTF-8). But the file should be written as binary, so it ignores any file specific behavior.
Writing regular File you would be able to do this as following:
File.open('/tmp/test.jpg', 'rb') do |file|
file.write(unzipped_io.read)
end
How to do this in Tempfile
Tempfile.new passes options to File.open which accepts the options from IO.new, in particular:
:binmode
If the value is truth value, same as “b” in argument mode.
So to open a tempfile in binary mode, you'd use:
temp_file = Tempfile.new([name, extension], binmode: true)
temp_file.binmode? #=> true
temp_file.external_encoding #=> #<Encoding:ASCII-8BIT>
In addition, you might want to use Tempfile.create which takes a block and automatically closes and removes the file afterwards:
Tempfile.create([name, extension], binmode: true) do |temp_file|
temp_file.write(unzipped_io.read)
# ...
end
I have encountered the solution in an old Ruby forum post, so I thought I would share it here, making it easier for people to find:
https://www.ruby-forum.com/t/ruby-binary-temp-file/116791
Apparently Tempfile has an undocumented method binmode, which changes the writing mode to binary and thus ignoring any encoding issues:
temp_file = Tempfile.new([name, extension])
temp_file.binmode
temp_file.write(unzipped_io.read)
Thanks unknown person who mentioned it on ruby-forums.com in 2007!
Another alternative is IO.binwrite(path, file_content)

How to convert Base64 string to pdf file using prawn gem

I want to generate pdf file from DB record. Encode it to Base64 string and store it to DB. Which works fine. Now I want reverse action, How can I decode Base64 string and generate pdf file again?
here is what I tried so far.
def data_pdf_base64
begin
# Create Prawn Object
my_pdf = Prawn::Document.new
# write text to pdf
my_pdf.text("Hello Gagan, How are you?")
# Save at tmp folder as pdf file
my_pdf.render_file("#{Rails.root}/tmp/pdf/gagan.pdf")
# Read pdf file and encode to Base64
encoded_string = Base64.encode64(File.open("#{Rails.root}/tmp/pdf/gagan.pdf"){|i| i.read})
# Delete generated pdf file from tmp folder
File.delete("#{Rails.root}/tmp/pdf/gagan.pdf") if File.exist?("#{Rails.root}/tmp/pdf/gagan.pdf")
# Now converting Base64 to pdf again
pdf = Prawn::Document.new
# I have used ttf font because it was giving me below error
# Your document includes text that's not compatible with the Windows-1252 character set. If you need full UTF-8 support, use TTF fonts instead of PDF's built-in fonts.
pdf.font Rails.root.join("app/assets/fonts/fontawesome-webfont.ttf")
pdf.text Base64.decode64 encoded_string
pdf.render_file("#{Rails.root}/tmp/pdf/gagan2.pdf")
rescue => e
return render :text => "Error: #{e}"
end
end
Now I am getting below error:
Encoding ASCII-8BIT can not be transparently converted to UTF-8.
Please ensure the encoding of the string you are attempting to use is
set correctly
I have tried How to convert base64 string to PNG using Prawn without saving on server in Rails but it gives me error:
"\xFF" from ASCII-8BIT to UTF-8
Can anyone point me what I am missing?
The answer is to decode the Base64 encoded string and either send it directly or save it directly to disk (naming it as a PDF file, but without using prawn).
The decoded string is a binary representation of the PDF file data, so there's no need to use Prawn or to re-calculate the content of the PDF data.
i.e.
raw_pdf_str = Base64.decode64 encoded_string
render :text, raw_pdf_str # <= this isn't the correct rendering pattern, but it's good enough as an example.
EDIT
To clarify some of the information given in the comments:
It's possible to send the string as an attachment without saving it to disk, either using render text: raw_pdf_str or the #send_data method (these are 4.x API versions, I don't remember the 5.x API style).
It's possible to encode the string (from the Prawn object) without saving the rendered PDF data to a file (save it to a String object instead). i.e.:
encoded_string = Base64.encode64(my_pdf.render)
The String data could be used directly as an email attachment, similarly to the pattern provided here only using the String directly instead of reading any data from a file. i.e.:
# inside a method in the Mailer class
attachments['my_pdf.pdf'] = { :mime_type => 'application/pdf',
:content => raw_pdf_str }

Extract text from document in memory using docsplit

With the docsplit gem I can extract the text from a PDF or any other file type. For example, with the line:
Docsplit.extract_pages('doc.pdf')
I can have the text content of a PDF file.
I'm currently using Rails, and the PDF is sent through a request and lives in memory. Looking in the API and in the source code I couldn't find a way to extract the text from memory, only from a file.
Is there a way to get the text of this PDF avoiding the creation of a temporary file?
I'm using attachment_fu if it matters.
Use a temporary directory:
require 'docsplit'
def pdf_to_text(pdf_filename)
Docsplit.extract_text([pdf_filename], ocr: false, output: Dir.tmpdir)
txt_file = File.basename(pdf_filename, File.extname(pdf_filename)) + '.txt'
txt_filename = Dir.tmpdir + '/' + txt_file
extracted_text = File.read(txt_filename)
File.delete(txt_filename)
extracted_text
end
pdf_to_text('doc.pdf')
If you have the content in a string, use StringIO to create a File-like object that IO can read. In StringIO, it doesn't matter if the content is true text, or binary, it's all the same.
Look at either of:
new(string=""[, mode])
Creates new StringIO instance from with string and mode.
open(string=""[, mode]) {|strio| ...}
Equivalent to ::new except that when it is called with a block, it yields with the new instance and closes it, and returns the result which returned from the block.

Can't parse files uploaded in UTF-16 on Heroku

I'm writing a Rails app that allows the user to upload TSV (Tab-separated values) files to be parsed on the server. Those files are encoded in UTF-16. All goes fine in local, but when I try to open the file with such encoding on Heroku, I'm getting a warning that says warning: Unsupported encoding utf-16 ignored. When later I try to read such file, it obviously fails stating invalid byte sequence in UTF-8. See an excerpt of the code below:
File.open(params[:batch_import][:file].path, 'r:utf-16') do |f|
#recipients = Recipient.from_tsv(f.read)
end
Is there any walkaround that I can do?
UTF-16 files must be opened with the binary mode. Try this:
File.open(params[:batch_import][:file].path, 'rb:utf-16') do |f|
#recipients = Recipient.from_tsv(f.read)
end

MalformedCSVError with rails CSV (FasterCSV)

I'm having serious issues trying to parse some CSV in rails right now.
Basically my app gets a user to upload a CSV file. The app then converts the file to ensure it is in UTF-8 format, then attempts to parse it and process it. Whenever the app attempts to parse it however, I get the MalformedCSVError stating "Illegal quoting on line 1"
Now what I don't get, is if I copy the original file into a new document and save it, then I can parse it on a rails console without a problem.
If I attempt to parse the original file, it complains about an invalid character for UTF-8 encoding (the file isn't in UTF-8 hence the app converts it)
If I attempt to parse the file which the app has converted to UTF-8 and changed the line endings to LF, it fails to parse.
If I do a file diff between the version the app has produced, and the copy/paste version that I have made (which works) there are 0 differences so I really can't figure out why one is parsable, and one is not.
Any suggestions? My app is processing the file as follows :
def create
#survey = Survey.new(params[:survey])
# Now we need to try and convert this to UTF-8 if it isn't already
encoded = File.read(#survey.survey_data.current_path)
encoding = CharlockHolmes::EncodingDetector.detect(encoded)
# We've got a guess at the encoding,
# so we can try and convert it but it
# may still fail so we need to handle
# that
begin
re_encoded = CharlockHolmes::Converter.convert(encoded, encoding[:encoding], 'UTF-8')
re_encoded = re_encoded.gsub(/\r\n?/, "\n")
# Now replace the uploaded file
File.open(#survey.survey_data.current_path, 'w') { |f|
f.write(re_encoded)
}
rescue ArgumentError
puts "UH OH!!!!!"
end
puts "#{#survey.survey_data.current_path}"
#parsed = CSV.read(#survey.survey_data.current_path)
end
The file uploading gem is CarrierWave if that makes any difference.
Please can someone help me as this is driving me insane!
Edit
The error says it's on line 1. Line 1 (assuming it doesn't index from 0) is
"Survey","RD","GarrysMDs","NigelsMDs","PaulsMDs","StephensMDs","BrinleyJ","CarolineP","DaveL","GrantR","GregS","Kent","NeilC","NicolaP","AndyC","DarrenS","DeanB","KarenF","PaulR","RichardF","SteveG","BrianG","GordonA","NickD","NickR","NickT","RayL","SimonH","EdmondH","JasonF","MikeS","SamanthaN","TimB","TravisF","AlanS","Q1","Q2","Q3","Q4","Q5","Q6","Q7","Q8PM","Q8N","Q9","Q10","Q11","Q12","Q13","Q14","Q15","Q16PM","Q16N","Q17PM","Q17N","Q18PM","Q18N","Q19","Q20","Q21","Q22","comment","Q23.1","Q23.2","Q23.3","TQ23.1","TQ23.2","VPM","VN","VQ1","VQ2","VQ3","VQ4","VQ5","VQ6","VQ7","VQ8N","VQ8PM","VQ9","VQ10","VQ11","VQ12","VQ13","VQ14","VQ15","VQ16","VQ16N","VQ16PM","VQ17","VQ17N","VQ17PM","VQ18","VQ18N","VQ18PM","VQ19","VQ20","VQ21","VQ22","VQ23.1","VQ23.2","VQ23.3","VRD","XQ16","XQ17","XQ18"
Well that was irritating!
Turns out the file had a BOM which was causing the CSV parser to break. Loading the file with
CSV.open("path/to/file.csv", "rb:bom|encoding")
allowed it to parse it perfectly! So annoyed how long it took to track down but it's now working and with no need to convert to UTF-8 now either!

Resources