Validate PDF is stampable - Rails, Prawn, CombinePDF - ruby-on-rails

I'm working at a company where we upload a good amount of PDFs, which we later stamp using Prawn. Occasionally these PDFs upload and save fine, but when we try to stamp them later they don't work and our managers have to re-convert the file, and re-input a bunch of data.
As such we're looking for ways to validate the PDFs before they're attached to ensure they're going to be stampable later, or convert them to a PDF format we know is going to work with Prawn.
I have two questions
is there anything wrong with our stamping code? (posted below)
is there any way to do that sort of validation? including
converting to a Prawn doc before uploading
converting to a Prawn doc and attempting some trivial operation before uploading
any other solutions
begin
paid_stamp_pdf_file = Tempfile.new('paid')
Prawn::Document.generate(paid_stamp_pdf_file.path) do |pdf|
if self.is_paid_by_trust? && self.submitted_to_trust_date.present?
text = "Submitted to Trust - " + self.submitted_to_trust_date.strftime('%m/%d/%Y') + "\nPAID #{Date.parse(paid_on_date).strftime('%m/%d/%Y')}" + " - $#{'%.2f' % amount}" + payment_method_text
else
text = "PAID #{Date.parse(paid_on_date).strftime('%m/%d/%Y')}" + " - $#{'%.2f' % amount}" + payment_method_text
end
pdf.transparent(0.6) do
pdf.fill_color "ff0000"
pdf.text text, :size => 30, style: :bold, align: :center, valign: :center
end
end
# Stamp "PAID" to every page of the file
paid_stamp = CombinePDF.load(paid_stamp_pdf_file.path).pages[0]
URI.open(self.account_statement_file.blob.url) do |tmp_pdf_file|
pdf = CombinePDF.load tmp_pdf_file.path
pdf.pages.each {|page| page << paid_stamp}
ActiveRecord::Base.transaction do
if pdf.save tmp_pdf_file.path
file_name = self.account_statement_file.filename
self.account_statement_file.purge
self.account_statement_file.attach(io: File.open(tmp_pdf_file.path), filename: file_name, content_type: 'application/pdf')
self.update(is_paid: true, paid_date: paid_on_date, marked_paid_by_user_id: user.id)
return true
else
return false
end
end
end
rescue Exception => e
Rails.logger.error("Failed to mark statement ID #{self.id}: #{e.message}")
return false
end
Any help is greatly appreciated!
ruby 2.7.2
rails 6.1.1
prawn 2.4.0
combine_pdf 1.0.21
Edit:
Was able to replicated error, trying to load from file url
Occurs at line
Same error occurs when trying to parse downloaded file

For anyone else who sees this it was related to CombinePDF only parsing until it reaches what the metadata says the length, but some files lie about that so it causes them to fail and produce a RangeError: index out of range. Adding this work around, then using the relaxed option it adds fixed the issues for me, hopefully it merges into the gem itself soon.
https://github.com/boazsegev/combine_pdf/issues/191

Related

Reading text from a PDF works in Rails console but not in Rails application

I have a simple one-page searchable PDF that is uploaded to a Rails 6 application model (Car) using Active Storage. I can extract the text from the PDF using the 'tempfile' and 'pdf-reader' gems in the Rails console:
> #car.creport.attached?
=> true
> f = Tempfile.new(['file', '.pdf'])
> f.binmode
> f.write(#car.creport.blob.download)
> r = PDF::Reader.new(f.path.to_s)
> r.pages[1].text
=> "Welcome to the ABC Car Report for January 16, 20...
But, if I try the same thing in the create method of my cars_controller.rb, it doesn't work:
# cars_controller.rb
...
def create
#car = Car.new(car_params)
#car.filetext = ""
f = Tempfile.new(['file', '.pdf'])
f.binmode
f.write(#car.creport.blob.download)
r = PDF::Reader.new(f.path.to_s)
#car.filetext = r.pages[1].text
...
end
When I run the Rails application I can create a new Car and select a PDF file to attach. But when I click 'Submit' I get a FileNotFoundError in cars_controller.rb at the f.write() line.
My gut instinct is that the controller is trying to read the blob in order to write it to the temp file too soon (i.e., before the blob has even been written). I tried inserting a sleep(2) to give it time, but I get the same FileNotFoundError.
Any ideas?
Thank you!
I don't get why you're jumping through so many hoops. And using .download without a block loads the entire file into memory (yikes). If #car.creport is an ActiveStorage attachment you can just use the open method instead:
#car.creport.blob.open do |file|
file.binmode
r = PDF::Reader.new(file) # just pass the IO object
#car.filetext = r.pages[1].text
end if #car.creport
This steams the file to disk instead (as a tempfile).
If you're just taking file input via a plain old file input you will get a ActionDispatch::Http::UploadedFile in the parameters that also is extemely easy to open:
params[:file].open do |file|
file.binmode
r = PDF::Reader.new(file) # just pass the IO object
#car.filetext = r.pages[1].text
end if params[:file].respond_to?(:open)
The difference looks like it's with your #car variable.
In the console you have a blob attached (#car.creport.attached? => true). In your controller, you're initializing a new instance of the Car class, so unless you have some initialization going on that attaches something in the background, that will be nil.
Why that would return a 'file not found' error I'm not sure, but from what I can see that's the only difference between code samples. You're trying to write #car.creport.blob.download, which is present on #car in console, but nil in your controller.

Rails: Generate QR Code in Prawn PDF

I'm trying to generate a pdf document with different QRcodes and am using the following gems
gem 'prawn'
gem 'prawn-qrcode'
Issue is, I can't seem to print out the QRcode with the following method:
require 'prawn/qrcode'
class QrcodePdf < Prawn::Document
def initialize (deal)
super()
#deal = deal
title
#deal.venues.each do |venue|
text "QR Code for: #{venue.name}"
qrcode = RQRCode::QRCode.new(#deal.id.to_s + "_" + venue.id.to_s + "_" + #deal.created_at.to_s)
render_qr_code(qrcode)
end
end
def title
text "Title of deal: #{#deal.title}", size: 16, style: :bold
end
end
Any help will be appreciated! Thank you!
Edit: Additional Info
Sorry, i forgot to state that the pdf is actually being compiled. But the QRCode section is blank.
So it just print the text of QR Code for ... in the loop.
Also, I've printed out the string in #deal.id.to_s ... and it does contain the data I want, so I'm not sure what went wrong.
I've also refereed to https://github.com/jabbrwcky/prawn-qrcode in the usage section.
It's probably bug with prawn-qrcode, but render_qr_code is not working. At least for Ruby 2.2.3 (I didn't tested if for older versions).
Still you can use print_qr_code method and it's functional:
pdf = Prawn::Document.new(:page_size => "A4") do
print_qr_code("some-text", extent: 96, stroke: false)
end

How to handle a file_as_string (generated by Prawn) so that it is accepted by Carrierwave?

I'm using Prawn to generate a PDF from the controller of a Rails app,
...
respond_to do |format|
format.pdf do
pdf = GenerateReportPdf.new(#object, view_context)
send_data pdf.render, filename: "Report", type: "application/pdf", disposition: "inline"
end
end
This works fine, but I now want to move GenerateReportPdf into a background task, and pass the resulting object to Carrierwave to upload directly to S3.
The worker looks like this
def perform
pdf = GenerateReportPdf.new(#object)
fileString = ???????
document = Document.new(
object_id: #object.id,
file: fileString )
# file is field used by Carrierwave
end
How do I handle the object returned by Prawn (?????) to ensure it is a format that can be read by Carrierwave.
fileString = pdf.render_file 'filename' writes the object to the root directory of the app. As I'm on Heroku this is not possible.
file = pdf.render returns ArgumentError: string contains null byte
fileString = StringIO.new( pdf.render_file 'filename' ) returns TypeError: no implicit conversion of nil into String
fileString = StringIO.new( pdf.render ) returns ActiveRecord::RecordInvalid: Validation failed: File You are not allowed to upload nil files, allowed types: jpg, jpeg, gif, png, pdf, doc, docx, xls, xlsx
fileString = File.open( pdf.render ) returns ArgumentError: string contains null byte
....and so on.
What am I missing? StringIO.new( pdf.render ) seems like it should work, but I'm unclear why its generating this error.
It turns out StringIO.new( pdf.render ) should indeed work.
The problem I was having was that the filename was being set incorrectly and, despite following the advise below on Carrierwave's wiki, a bug elsewhere in the code meant that the filename was returning as an empty string. I'd overlooked this an assumed that something else was needed
https://github.com/carrierwaveuploader/carrierwave/wiki/How-to:-Upload-from-a-string-in-Rails-3
my code ended up looking like this
def perform
s = StringIO.new(pdf.render)
def s.original_filename; "my file name"; end
document = Document.new(
object_id: #object.id
)
document.file = s
document.save!
end
You want to create a tempfile (which is fine on Heroku as long as you don't expect it to persist across requests).
def perform
# Create instance of your Carrierwave Uploader
uploader = MyUploader.new
# Generate your PDF
pdf = GenerateReportPdf.new(#object)
# Create a tempfile
tmpfile = Tempfile.new("my_filename")
# set to binary mode to avoid UTF-8 conversion errors
tmpfile.binmode
# Use render to write the file contents
tmpfile.write pdf.render
# Upload the tempfile with your Carrierwave uploader
uploader.store! tmpfile
# Close the tempfile and delete it
tmpfile.close
tmpfile.unlink
end
Here's a way you can use StringIO like Andy Harvey mentioned, but without adding a method to the StringIO intstance's eigenclass.
class VirtualFile < StringIO
attr_accessor :original_filename
def initialize(string, original_filename)
#original_filename = original_filename
super(string)
end
end
def perform
pdf_string = GenerateReportPdf.new(#object)
file = VirtualFile.new(pdf_string, 'filename.pdf')
document = Document.new(object_id: #object.id, file: file)
end
This one took me couple of days, the key is to call render_file controlling the filepath so you can keep track of the file, something like this:
in one of my Models e.g.: Policy i have a list of documents and this is just the method for updating the model connected with the carrierwave e.g.:PolicyDocument < ApplicationRecord mount_uploader :pdf_file, PdfDocumentUploader
def upload_pdf_document_file_to_s3_bucket(document_type, filepath)
policy_document = self.policy_documents.where(policy_document_type: document_type)
.where(status: 'processing')
.where(pdf_file: nil).last
policy_document.pdf_file = File.open(file_path, "r")
policy_document.status = 's3_uploaded'
policy_document.save(validate:false)
policy_document
rescue => e
policy_document.status = 's3_uploaded_failed'
policy_document.save(validate:false)
Rails.logger.error "Error uploading policy documents: #{e.inspect}"
end
end
in one of my Prawn PDF File Generators e.g.: PolicyPdfDocumentX in here please note how im rendering the file and returning the filepath so i can grab from the worker object itself
def generate_prawn_pdf_document
Prawn::Document.new do |pdf|
pdf.draw_text "Hello World PDF File", size: 8, at: [370, 462]
pdf.start_new_page
pdf.image Rails.root.join('app', 'assets', 'images', 'hello-world.png'), width: 550
end
end
def generate_tmp_file(filename)
file_path = File.join(Rails.root, "tmp/pdfs", filename)
self.generate_prawn_pdf_document.render_file(file_path)
return filepath
end
in the "global" Worker for creating files and uploading them in the s3 bucket e.g.: PolicyDocumentGeneratorWorker
def perform(filename, document_type, policy)
#here we create the instance of the prawn pdf generator class
pdf_generator_class = document_type.constantize.new
#here we are creating the file, but also `returning the filepath`
file_path = pdf_generator_class.generate_tmp_file(filename)
#here we are simply updating the model with the new file created
policy.upload_pdf_document_file_to_s3_bucket(document_type, file_path)
end
finally how to test, run rails c and:
the_policy = Policies.where....
PolicyDocumentGeneratorWorker.new.perform('report_x.pdf', 'PolicyPdfDocumentX',the_policy)
NOTE: im using meta-programming in case we have multiple and different file generators, constantize.new is just creating new prawn pdf doc generator instance so is similar to PolicyPdfDocument.new that way we can only have one pdf doc generator worker class that can handle all of your prawn pdf documents so for instance if you need a new document you can simply PolicyDocumentGeneratorWorker.new.perform('report_y.pdf', 'PolicyPdfDocumentY',the_policy)
:D
hope this helps someone to save some time

prawn pdf group, transaction and rollback method problems

I'm trying to create a pdf report using prawn in a rails application. There are lots of sections that contain user generated content that I want to try and group together. Sometimes this will go over more that one page which results in a cannot group error. I then tried to use a transaction so that in the event of an error I can rollback and then output the content without using the group method.
The problem is the rollback stuffs up the pages. It removes the extra page from the pdf but still has the wrong page count and outputs over lapping content when I try to redo it. I reset the y position after the rollback, as per the prawn documentation but I still get the problems.
eg. The following test code writes 2 pages of numbers, does a rollback to the start and then tries to write the same numbers again. It results in a single page pdf with the second page of numbers overlapping the first and a page count of 2. The page counts at the bottom of the page also overlap one another even though I'm using the prawn number_pages method
class TestReport < Prawn::Document
def to_pdf
font('Helvetica')
bounding_box([bounds.left, bounds.top - 50], :width => bounds.width, :height => bounds.height - 100) do
text 'begin'
y_pos = y
transaction do
begin
group do
64.times do|i|
text i.to_s
end
end
rescue
rollback
end
end
self.y = y_pos
64.times do|i|
text i.to_s
end
text 'end'
text page_number.to_s
end
page_numbers(1)
#render
end
def page_numbers(start)
string = "page <page> of <total>"
options = { :at => [bounds.right - 150, 40],
:width => 150,
:align => :right,
:start_count_at => start,
:color => "000000" }
number_pages string, options
end
end
def test_report
pdf = TestReport.new()
pdf.to_pdf
send_data pdf.render, filename: "test.pdf",
type: "application/pdf",
disposition: "inline"
end
The problems seem to be with transaction rollbacks. The main thing I want is to be able to use the group method. Is there another way?
Is my code wrong? Am I missing something or do transaction not currently work.
I'm currently using the master prawn branch in a ruby on rails application ( gem 'prawn', :git =>
'git://github.com/prawnpdf/prawn.git', :branch => 'master').
This question is quite old now, but I'll post an answer since it is one of the first hits on Google when searching for the exception.
Transactions still doesnt work with page breaks (v 1.0.0.rc2), so I created a helper method that tries to use grouping first and then if the exception occurs it just retries without grouping, making the content span more than one page.
def group_if_possible(pdf, &block)
begin
pdf.group { block.call }
rescue Prawn::Errors::CannotGroup
block.call
end
end
Example: Using it when creating a table:
group_if_possible(pdf) do
pdf.table(rows)
end
EDIT:
Grouping were removed from Prawn 1.x but there is an unofficial grouping gem that works well for Prawn 2:
https://github.com/ddengler/prawn-grouping
Looks like Brad Ediger answered your question on Google Groups, but for the benefit of anyone else looking for help with this, here's his response:
Sadly, transactions do not yet work correctly when they start new
pages or change the pages collection. It's a known issue:
https://github.com/prawnpdf/prawn/issues/268
-be

Rails 3.0.7 ActionMailer attachment issue

I'm trying to attach a file to an outgoing email but the attachment size ends up being 1 byte. It doesn't matter what attachment I'm forwarding it always ends up in the email 1 byte in size (corrupt). Everything else looks ok to me.
The email information is pulled from an IMAP account and stored in the database for browsing purposes. Attachments are stored on the file system and it's file name stored as an associated record for the Email.
In the view there's an option to forward the email to another recipient. It worked in Rails 2.3.8 but for Rails 3 I've had to change the attachment part of the method so now it looks like...
def forward_email(email_id, from_address, to_address)
#email = Email.find(email_id)
#recipients = to_address
#from = from_address
#subject = #email.subject
#sent_on = Time.now
#body = #email.body + "\n\n"
#email.attachments.each do |file|
if File.exist?(file.full_path)
attachment :filename => file.file_name, :body => File.read(file.full_path)
else
#body += "ATTACHMENT NOT FOUND: #{file.file_name}\n\n"
end
end
end
I've also tried it with...
attachments[file.file_name] = File.read(file.full_path)
and adding :mime_type and :content_type to no avail.
Any help would be a appreciated.
Thanks!
This is what I tried and worked for me
attachments.each do |file|
attachment :content_type => MIME::Types.type_for(file.path).first.content_type, :body => File.read(file.path)
end
Is the file readable? Can you debug the issue by placing something like this?
logger.debug "File: #{file.full_path.inspect} : #{File.read(file.full_path).inspect[0..100]}"
Is there anything in your development.log?
Well, someone from the rails team answered my question. The problem lies with adding body content (#body) other than the attachment inside the method. If you're going to attach files you have to use a view template.

Resources