How to convert Base64 string to pdf file using prawn gem - ruby-on-rails

I want to generate pdf file from DB record. Encode it to Base64 string and store it to DB. Which works fine. Now I want reverse action, How can I decode Base64 string and generate pdf file again?
here is what I tried so far.
def data_pdf_base64
begin
# Create Prawn Object
my_pdf = Prawn::Document.new
# write text to pdf
my_pdf.text("Hello Gagan, How are you?")
# Save at tmp folder as pdf file
my_pdf.render_file("#{Rails.root}/tmp/pdf/gagan.pdf")
# Read pdf file and encode to Base64
encoded_string = Base64.encode64(File.open("#{Rails.root}/tmp/pdf/gagan.pdf"){|i| i.read})
# Delete generated pdf file from tmp folder
File.delete("#{Rails.root}/tmp/pdf/gagan.pdf") if File.exist?("#{Rails.root}/tmp/pdf/gagan.pdf")
# Now converting Base64 to pdf again
pdf = Prawn::Document.new
# I have used ttf font because it was giving me below error
# Your document includes text that's not compatible with the Windows-1252 character set. If you need full UTF-8 support, use TTF fonts instead of PDF's built-in fonts.
pdf.font Rails.root.join("app/assets/fonts/fontawesome-webfont.ttf")
pdf.text Base64.decode64 encoded_string
pdf.render_file("#{Rails.root}/tmp/pdf/gagan2.pdf")
rescue => e
return render :text => "Error: #{e}"
end
end
Now I am getting below error:
Encoding ASCII-8BIT can not be transparently converted to UTF-8.
Please ensure the encoding of the string you are attempting to use is
set correctly
I have tried How to convert base64 string to PNG using Prawn without saving on server in Rails but it gives me error:
"\xFF" from ASCII-8BIT to UTF-8
Can anyone point me what I am missing?

The answer is to decode the Base64 encoded string and either send it directly or save it directly to disk (naming it as a PDF file, but without using prawn).
The decoded string is a binary representation of the PDF file data, so there's no need to use Prawn or to re-calculate the content of the PDF data.
i.e.
raw_pdf_str = Base64.decode64 encoded_string
render :text, raw_pdf_str # <= this isn't the correct rendering pattern, but it's good enough as an example.
EDIT
To clarify some of the information given in the comments:
It's possible to send the string as an attachment without saving it to disk, either using render text: raw_pdf_str or the #send_data method (these are 4.x API versions, I don't remember the 5.x API style).
It's possible to encode the string (from the Prawn object) without saving the rendered PDF data to a file (save it to a String object instead). i.e.:
encoded_string = Base64.encode64(my_pdf.render)
The String data could be used directly as an email attachment, similarly to the pattern provided here only using the String directly instead of reading any data from a file. i.e.:
# inside a method in the Mailer class
attachments['my_pdf.pdf'] = { :mime_type => 'application/pdf',
:content => raw_pdf_str }

Related

Ruby: Is there a way to specify your encoding in File.write?

TL;DR
How would I specify the mode of encoding on File.write, or how would one save image binary to a file in a similar fashion?
More Details
I'm trying to download an image from a Trello card and then upload that image to S3 so it has an accessible URL. I have been able to download the image from Trello as binary (I believe it is some form of binary), but I have been having issues saving this as a .jpeg using File.write. Every time I attempt that, it gives me this error in my Rails console:
Encoding::UndefinedConversionError: "\xFF" from ASCII-8BIT to UTF-8
from /app/app/services/customer_order_status_notifier/card.rb:181:in `write'
And here is the code that triggers that:
def trello_pics
#trello_pics ||=
card.attachments.last(config_pics_number)&.map(&:url).map do |url|
binary = Faraday.get(url, nil, {'Authorization' => "OAuth oauth_consumer_key=\"#{ENV['TRELLO_PUBLIC_KEY']}\", oauth_token=\"#{ENV['TRELLO_TOKEN']}\""}).body
File.write(FILE_LOCATION, binary) # doesn't work
run_me
end
end
So I figure this must be an issue with the way that File.write converts the input into a file. Is there a way to specify encoding?
AFIK you can't do it at the time of performing the write, but you can do it at the time of creating the File object; here an example of UTF8 encoding:
File.open(FILE_LOCATION, "w:UTF-8") do
|f|
f.write(....)
end
Another possibility would be to use the external_encoding option:
File.open(FILE_LOCATION, "w", external_encoding: Encoding::UTF_8)
Of course this assumes that the data which is written, is a String. If you have (packed) binary data, you would use "wb" for openeing the file, and syswrite instead of write to write the data to the file.
UPDATE As engineersmnky points out in a comment, the arguments for the encoding can also be passed as parameter to the write method itself, for instance
IO::write(FILE_LOCATION, data_to_write, external_encoding: Encoding::UTF_8)

Convert pdf file to base64 string

I have working Paperclip gem in my app for documents (pdf, doc). I need to pass the document to some other third party application via post request.
I tried to convert the paperclip attachment via Base64 but it throws error:
no implicit conversion of Tempfile into String
Here is how I did it:
# get url from the paperclip file
url = document.doc.url # https://s3-ap-southeast-1.amazonaws.com/xx-eng/documents/xx/000/000/xx/original/doc.pdf
file_data = open(url)
# Encode the bytes to base64 - this line throw error
base_64_file = Base64.encode64(file_data)
Do you have any suggestion how to avoid the Tempfile error?
You need to read file first.
base_64_file = Base64.encode64(file_data.read)
Here is working example:
$ bundle exec rails c
=> file = open("tmp/file.pdf")
#> #<File:tmp/receipts.pdf>
=> base_64 = Base64.encode64(file)
#> TypeError: no implicit conversion of File into String
=> base_64 = Base64.encode64(file.read)
#> "JVBERi0xLjQKMSAwIG9iago8PAovVGl0b/BBQEPgQ ......J0ZgozMDM0OQolJUVPRgo=\n"
The answer from #3елёный didn't work to me - maybe because it's the S3 file.
However I managed to find a way with Paperclip method:
file_data = Paperclip.io_adapters.for(url).read
base_64_file = Base64.encode64(file_data)

combine_pdf not combining the pdfs

I think I am missing something simple. Using combine_pdf: I am attempting to combine two pdf files into one pdf, and then send that resulting pdf with send_data in my rails app.
Here is my code in a controller:
pdf = CombinePDF.new
# returns an array, each element is a string of an absolute path
# to the file I want to upload
absolute_upload_paths = #obj.attachments.collect {|obj| obj.my_attachment.path}
absolute_upload_paths.each {|upload_path| pdf << CombinePDF.load(upload_path)}
send_data pdf, filename: “my_combined_pdf”, type: "application/pdf"
What results is that a damaged pdf file gets sent which cannot be opened:
Adobe Acrobat Reader could not open 'VR_Voc_Eval-51.pdf' because it is either not a supported file type or because the file has been damaged (for example, it was sent as an email attachment and wasn't correctly decoded).
What am I missing? How can I use this gem to combine two existing pdf files into one pdf and then send it to the user?
It looks like the README for that library calls .to_pdf when sending the data. Hopefully calling #to_pdf on the pdf object like in the example will fix your issue.
send_data pdf.to_pdf, filename: “my_combined_pdf”, type: "application/pdf"
https://github.com/boazsegev/combine_pdf#rendering-pdf-data

How to convert ascii-8bit file data to readable string

I'm trying to parse the contents of a CSV file (saved on Windows Excel then uploaded to dropbox) from my dropbox via the Dropbox Core api.
I create a rake task (part of a Rails app) with the following code and it creates a magnum-opus.csv file on my local hard drive that has the original text in the Excel file. The encoding of contents is ASCII-8BIT by calling contents.encoding
contents, metadata = client.get_file_and_metadata('/magnum-opus.csv')
open('magnum-opus.csv', 'w') {|f| f.puts contents }
Instead of creating a local file, I'd like to convert the binary data in "contents" to readable text on the fly and parse through it. I don't want to save it anywhere and then have to open it.
How do I go about doing that?
If do
p contents
I end up getting some type of unreadable data format ... \x00e\x00d\x00u
1) How do I convert this into a string I can parse through with Ruby?
2) The other thing I'm wondering - if do:
puts contents
The original text in the CSV file that is human readable is outputed to STDOUT. What is puts doing?
I tried:
calling CSV.parse on contents.encode( "UTF-8", "binary", :invalid => :replace, :undef => :replace, :replace => '') but end up getting an error such as
CSV::MalformedCSVError: Unquoted fields do not allow \r or \n

Extract text from document in memory using docsplit

With the docsplit gem I can extract the text from a PDF or any other file type. For example, with the line:
Docsplit.extract_pages('doc.pdf')
I can have the text content of a PDF file.
I'm currently using Rails, and the PDF is sent through a request and lives in memory. Looking in the API and in the source code I couldn't find a way to extract the text from memory, only from a file.
Is there a way to get the text of this PDF avoiding the creation of a temporary file?
I'm using attachment_fu if it matters.
Use a temporary directory:
require 'docsplit'
def pdf_to_text(pdf_filename)
Docsplit.extract_text([pdf_filename], ocr: false, output: Dir.tmpdir)
txt_file = File.basename(pdf_filename, File.extname(pdf_filename)) + '.txt'
txt_filename = Dir.tmpdir + '/' + txt_file
extracted_text = File.read(txt_filename)
File.delete(txt_filename)
extracted_text
end
pdf_to_text('doc.pdf')
If you have the content in a string, use StringIO to create a File-like object that IO can read. In StringIO, it doesn't matter if the content is true text, or binary, it's all the same.
Look at either of:
new(string=""[, mode])
Creates new StringIO instance from with string and mode.
open(string=""[, mode]) {|strio| ...}
Equivalent to ::new except that when it is called with a block, it yields with the new instance and closes it, and returns the result which returned from the block.

Resources