The way I'm working right now is I'm generating multiple pdf files and sending them all one at a time for the user to download, but the problem is sometimes they end up with too many files.
How can I merge all pdfs in a single file before sending them to the user?
I use combine_pdf gem.
To combine PDF files:
pdf = CombinePDF.new
pdf << CombinePDF.load("file1.pdf") # one way to combine, very fast.
pdf << CombinePDF.load("file2.pdf")
pdf.save "combined.pdf"
You can also parse PDF files from memory. Loading from the memory is
especially effective for importing PDF data recieved through the
internet or from a different authoring library such as Prawn:
pdf_data = prawn_pdf_document.render # Import PDF data from Prawn
pdf = CombinePDF.parse(pdf_data)
If you want to use some tool like PDFTk or CombinePDF, all you should need to do is prerender your individual PDFs by using something like:
pdf1 = render_to_string(pdf: 'pdf1', template: 'pdf1')
pdf2 = render_to_string(pdf: 'pdf2', template: 'pdf2')
or
pdf1 = WickedPdf.new.pdf_from_string(some_html_string)
pdf2 = WickedPdf.new.pdf_from_string(another_html_string)
If those tools won't take a PDF as a string, you may need to write them to tempfiles first.
If you don't want to introduce another dependency to merge things, wkhtmltopdf can take multiple pdf files (or urls), and render them all as one pdf with a command similar to this:
wkhtmltopdf tmp/tempfile1.html tmp/tempfile2.html tmp/output.pdf
Knowing this, you could prerender your templates, with layouts and everything, out to HTML strings, then pass them into wkhtmltopdf something like this:
# app/models/concerns/multipage_pdf_renderer.rb
require 'open3'
class MultipagePdfRenderer
def self.combine(documents)
outfile = WickedPdfTempfile.new('multipage_pdf_renderer.pdf')
tempfiles = documents.each_with_index.map do |doc, index|
file = WickedPdfTempfile.new("multipage_pdf_doc_#{index}.html")
file.binmode
file.write(doc)
file.rewind
file
end
filepaths = tempfiles.map{ |tf| tf.path.to_s }
binary = WickedPdf.new.send(:find_wkhtmltopdf_binary_path)
command = [binary, '-q']
filepaths.each { |fp| command << fp }
command << outfile.path.to_s
err = Open3.popen3(*command) do |stdin, stdout, stderr|
stderr.read
end
raise "Problem generating multipage pdf: #{err}" if err.present?
return outfile.read
ensure
tempfiles.each(&:close!)
end
end
And call in your controller something like this:
def fancy_report
respond_to do |format|
format.pdf do
doc1 = render_to_string(template: 'pages/_page1')
doc2 = render_to_string(template: 'pages/_page2')
pdf_file = MultipagePdfRenderer.combine([doc1, doc2])
send_data pdf_file, type: 'application/pdf', disposition: 'inline'
end
end
end
However, this only covers the simplest of cases, you'll have to do the work of rendering the headers and footers if you need them, parsing (or adding) any options you might need.
This solution originally came from https://github.com/mileszs/wicked_pdf/issues/339 so it may be helpful to look there for more details on this strategy.
Try PDFtk. In my opinon, it is the best library for editing PDF files, and there are some gems that wraps it for access from Ruby.
Related
Ok - I have the following in my test/test_helper.rb:
def read_pdf_from_response(response)
file = Tempfile.new
file.write response.body.force_encoding('UTF-8')
begin
reader = PDF::Reader.new(file)
reader.pages.map(&:text).join.squeeze("\n")
ensure
file.close
file.unlink
end
end
I use it like this in an integration test:
get project_path(project, format: 'pdf')
read_response_from_pdf(#response).tap do |pdf|
assert_match(/whatever/, pdf)
end
This works fine as long as I run a test singly or when running all tests with only one worker, e.g. PARALLEL_WORKERS=1. But tests that use this method will fail intermittently when I run my suite with more than 1 parallel worker. My laptop has 8 cores, so that's normally what it's running with.
Here's the error:
PDF::Reader::MalformedPDFError: PDF malformed, expected 5 but found 96 instead
or sometimes: PDF::Reader::MalformedPDFError: PDF file is empty
The PDF reader is https://github.com/yob/pdf-reader which hasn't given any problems.
The controller that sends the PDF returns like so:
send_file out_file,
filename: "#{#project.name}.pdf",
type: 'application/pdf',
disposition: (params[:download] ? 'attachment' : 'inline')
I can't see why this isn't working. No files should ever have the same name at the same time, since I'm using Tempfile, right? How can I make all this run with parallel tests?
While I cannot confirm why this is happening the issue may be that:
You are forcing the encoding to "UTF-8" but PDF documents are binary files so this conversion could be damaging the PDF.
Some of the responses you are receiving are truly empty or malformed.
Maybe try this instead:
def read_pdf_from_response(response)
doc = StringIO.new(response.body.to_s)
begin
PDF::Reader.new(doc)
.pages
.map(&:text)
.join
.squeeze("\n")
rescue PDF::Reader::MalformedPDFError => e
# handle issues with the pdf itself
end
end
This will avoid the file system altogether while still using a compatible IO object and will make sure that the response is read as binary to avoid any conversion conflicts.
I am trying to create a new file and then write some content to it just to create a basic backup of a template.
When I log out the values of filename and file_content they are correct, but when I send the data all I get is a file named after the method (download_include) and a fixnum inside the file, the last one made was 15.
# POST /download_include/:id
def download_include
#include = Include.find(params[:id])
version_to_download = #include.latest_version_record
filename = "#{version_to_download.name}"
file_content = "#{version_to_download.liquid_code.to_s}"
file = File.open(filename, "w") { |f| f.write (file_content) }
send_data file
end
I also tried send_file but that produces the error
no implicit conversion of Fixnum into String
I also tried to just write dummy values like below, and it still produced a file named after the method with a fixnum inside it.
file = File.open("DOES THIS CHANGE THE FILENAME?", "w") { |f| f.write ("FILE CONTENT?") }
I feel I am missing something obvious but I cannot figure it out after looking at many examples here and in blogs.
If you don't end along the filename as an option for send_data, it defaults to the method name.
Secondly, the download wants to read the data from a buffer. My guess is your syntax is just sending a file handle.
Try this...
send_data(file.read, filename: filename)
Or skip the intermediate file and try...
send_data(version_to_download.liquid_code.to_s, filename: filename)
I think I am missing something simple. Using combine_pdf: I am attempting to combine two pdf files into one pdf, and then send that resulting pdf with send_data in my rails app.
Here is my code in a controller:
pdf = CombinePDF.new
# returns an array, each element is a string of an absolute path
# to the file I want to upload
absolute_upload_paths = #obj.attachments.collect {|obj| obj.my_attachment.path}
absolute_upload_paths.each {|upload_path| pdf << CombinePDF.load(upload_path)}
send_data pdf, filename: “my_combined_pdf”, type: "application/pdf"
What results is that a damaged pdf file gets sent which cannot be opened:
Adobe Acrobat Reader could not open 'VR_Voc_Eval-51.pdf' because it is either not a supported file type or because the file has been damaged (for example, it was sent as an email attachment and wasn't correctly decoded).
What am I missing? How can I use this gem to combine two existing pdf files into one pdf and then send it to the user?
It looks like the README for that library calls .to_pdf when sending the data. Hopefully calling #to_pdf on the pdf object like in the example will fix your issue.
send_data pdf.to_pdf, filename: “my_combined_pdf”, type: "application/pdf"
https://github.com/boazsegev/combine_pdf#rendering-pdf-data
I need to serve some data from my database in a zip file, streaming it on the fly such that:
I do not write a temporary file to disk
I do not compose the whole file in RAM
I know that I can do streaming generation of zip files to the filesystemk using ZipOutputStream as here. I also know that I can do streaming output from a rails controller by setting response_body to a Proc as here. What I need (I think) is a way of plugging those two things together. Can I make rails serve a response from a ZipOutputStream? Can I get ZipOutputStream give me incremental chunks of data that I can feed into my response_body Proc? Or is there another way?
Short Version
https://github.com/fringd/zipline
Long Version
so jo5h's answer didn't work for me in rails 3.1.1
i found a youtube video that helped, though.
http://www.youtube.com/watch?v=K0XvnspdPsc
the crux of it is creating an object that responds to each... this is what i did:
class ZipGenerator
def initialize(model)
#model = model
end
def each( &block )
output = Object.new
output.define_singleton_method :tell, Proc.new { 0 }
output.define_singleton_method :pos=, Proc.new { |x| 0 }
output.define_singleton_method :<<, Proc.new { |x| block.call(x) }
output.define_singleton_method :close, Proc.new { nil }
Zip::IoZip.open(output) do |zip|
#model.attachments.all.each do |attachment|
zip.put_next_entry "#{attachment.name}.pdf"
file = attachment.file.file.send :file
file = File.open(file) if file.is_a? String
while buffer = file.read(2048)
zip << buffer
end
end
end
sleep 10
end
end
def getzip
self.response_body = ZipGenerator.new(#model)
#this is a hack to preven middleware from buffering
headers['Last-Modified'] = Time.now.to_s
end
EDIT:
the above solution didn't ACTUALLY work... the problem is that rubyzip needs to jump around the file to rewrite the headers for entries as it goes. particularly it needs to write the compressed size BEFORE it writes the data. this is just not possible in a truly streaming situation... so ultimately this task may be impossible. there is a chance that it might be possible to buffer a whole file at a time, but this seemed less worth it. ultimately i just wrote to a tmp file... on heroku i can write to Rails.root/tmp less instant feedback, and not ideal, but neccessary.
ANOTHER EDIT:
i got another idea recently... we COULD know the compressed size of the files if we do not compress them. the plan goes something like this:
subclass the ZipStreamOutput class as follows:
always use the "stored" compression method, in other words do not compress
ensure we never seek backwards to change file headers, get it all right up front
rewrite any code related to TOC that seeks
I haven't tried to implement this yet, but will report back if there's any success.
OK ONE LAST EDIT:
In the zip standard: http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers
they mention that there's a bit you can flip to put the size, compressed size and crc AFTER a file. so my new plan was to subclass zipoutput stream so that it
sets this flag
writes sizes and CRCs after the data
never rewinds output
furthermore i needed to get all the hacks in order to stream output in rails fixed up...
anyways it all worked!
here's a gem!
https://github.com/fringd/zipline
I had a similar issue. I didn't need to stream directly, but only had your first case of not wanting to write a temp file. You can easily modify ZipOutputStream to accept an IO object instead of just a filename.
module Zip
class IOOutputStream < ZipOutputStream
def initialize io
super '-'
#outputStream = io
end
def stream
#outputStream
end
end
end
From there, it should just be a matter of using the new Zip::IOOutputStream in your Proc. In your controller, you'd probably do something like:
self.response_body = proc do |response, output|
Zip::IOOutputStream.open(output) do |zip|
my_files.each do |file|
zip.put_next_entry file
zip << IO.read file
end
end
end
It is now possible to do this directly:
class SomeController < ApplicationController
def some_action
compressed_filestream = Zip::ZipOutputStream.write_buffer do |zos|
zos.put_next_entry "some/filename.ext"
zos.print data
end
compressed_filestream .rewind
respond_to do |format|
format.zip do
send_data compressed_filestream .read, filename: "some.zip"
end
end
# or some other return of send_data
end
end
This is the link you want:
http://info.michael-simons.eu/2008/01/21/using-rubyzip-to-create-zip-files-on-the-fly/
It builds and generates the zipfile using ZipOutputStream and then uses send_file to send it directly out from the controller.
Use chunked HTTP transfer encoding for output: HTTP header "Transfer-Encoding: chunked" and restructure the output according to the chunked encoding specification, so no need to know the resulting ZIP file size at the begginning of the transfer. Can be easily coded in Ruby with the help of Open3.popen3 and threads.
I have a Dragonfly processor which should take a given PDF and return a PNG of the first page of the document.
When I run this processor via the console, I get back the PNG as expected, however, when in the context of Rails, I'm getting it as a PDF.
My code is roughly similar to this:
def to_pdf_thumbnail(temp_object)
tempfile = new_tempfile('png')
args = "'#{temp_object.path}[0]' '#{tempfile.path}'"
full_command = "convert #{args}"
result = `#{full_command}`
tempfile
end
def new_tempfile(ext=nil)
tempfile = ext ? Tempfile.new(['dragonfly', ".#{ext}"]) : Tempfile.new('dragonfly')
tempfile.binmode
tempfile.close
tempfile
end
Now, tempfile is definitely creating a .png file, but the convert is generating a PDF (when run from within Rails 3).
Any ideas as to what the issue might be here? Is something getting confused about the content type?
I should add that both this and a standard conversion (asset.png.url) both yield a PDF with the PDF content as a small block in the middle of the (A4) image.
An approach I’m using for this is to generate the thumbnail PNG on the fly via the thumb method from Dragonfly’s ImageMagick plugin:
<%= image_tag rails_model.file.thumb('100x100#', format: 'png', frame: 0).url %>
So long as Ghostscript is installed, ImageMagick/Dragonfly will honour the format / frame (i.e. page of the PDF) settings. If file is an image rather than a PDF, it will be converted to a PNG, and the frame number ignored (unless it’s a GIF).
Try this
def to_pdf_thumbnail(temp_object)
ret = ''
tempfile = new_tempfile('png')
system("convert",tmp_object.path[0],tmpfile.path)
tempfile.open {|f| ret = f.read }
ret
end
The problem is you are likely handing convert ONE argument not two
Doesn't convert rely on the extension to determine the type? Are you sure the tempfiles have the proper extensions?