Convert PDF/DOC/DOCx to Images in Ruby on Rails - ruby-on-rails

Part 1: I have PDFs, Docs,Docx stored in my S3. When I download them I want them to first be converted to images (png or jpg) and then only download (as images or a thumbnail of images).
How to achieve this ?
Part 2: I have used mini-magick to convert pdf to image and its somewhat working like this:
require "mini_magick"
im=MiniMagick::Image.open("path/to_my_pdf.pdf")
im.format("png", 0)
im.write("some_thumbnail.png")
The problem here is a pdf can have multiple pages and I need each and every page to be converted into image format (may be an array of images) and I am not able to achieve it. I am only able to convert any one of the page of the pdf. Stuck here. Kindly help.
Answer any part of the question as you like. !!

You can do that using RMagick gem as following.
require 'RMagick'
pdf = Magick::ImageList.new("path/to_my_pdf.pdf")
pdf.each_with_index do |page, i|
page.write "#{i}_thumbnail.png"
end

Related

Display a converted PDF to a rails page

So I was checking on how to display PDF thumbnails in Rails since for some reason creating a thumbnail version of my file in my uploader doesn't work, and it lead me to this:
Convert a .doc or .pdf to an image and display a thumbnail in Ruby?
So I got up to this:
def show_thumbnail
require 'rmagick'
pdf = Magick::ImageList.new(self.pdf_file.file.path)
first_page = pdf.first
scaled_page = first_page.scale(300, 450)
end
But how do I display scaled_page to a webpage?
I added this function in the decorator so I can do something like this:
= image_tag(pdf.pdf_file.show_thumbnail)
But it results in this error:
Can't resolve image into URL: undefined method `to_model' for #<Magick::Image:0x0000000122c4b300>
To display the image, the browser only need a URL to the image. If you don't want to store the image on your HDD, you can encode the image into a data URL.
...
scaled_page = first_page.scale(300, 450)
# Set the image format to png so that we can quantize it to reduce its size
scaled_page.format('png')
# Quantize the image
scaled_page = scaled_page.quantize
# A blob is just a binary string
blob = scaled_page.to_blob
# Base64 encode the blob and remove all line feeds
base64 = Base64.encode64(blob).tr("\n", "")
data_url = "data:image/png;base64,#{base64}"
# You need to find a way to send the data URL to the browser,
# e.g. `<%= image_tag data_url %>`
But I highly recommend that you persist the thumbnails on your HDD or better on a CDN because such images are hard to generate but are frequently accessed by the browsers. If you decided to do so, you need a strategy to generate unique URLs to those thumbnails and a way to associate the URLs to your PDF files.

I want to convert pdf to image to pdf using rmagick in rails and then upload using activestorage

pdf = Magick::ImageList.new(self.checklist_question.resource_file.service_url) do
self.quality = 100
end
This converts the pdf to images but does not get the point on how to save and merge them in pdf again and then upload to active storage.
You could use a different gem like carrier_wave and write the data as explained here: https://github.com/carrierwaveuploader/carrierwave/wiki/How-to:-Upload-from-a-string-in-Rails-3

how to convert pdf file into xlsx file in ruby on rails

I have uploaded 1 PDF then convert it to xlsx file. I have tried different ways but not getting actual output.pdf2xls only displays single line format not whole file data. I want whole PDF file data to display on xlsx file.
i have one method convert PDF to xlsx but not display proper format.
def do_excel_to_pdf
#user=User.create!(pdf: params[:pdf])
#path_in = #user.pdf.path
temp1 = #user.pdf.path
#path_out = #user.pdf.path.slice(0..#user.pdf.path.rindex(/\//))
query = "libreoffice --headless --invisible --convert-to pdf " + #path_in + " --outdir " + #path_out
system(query)
file = #path_out+#user.pdf.original_filename.slice(0..#user.pdf.original_filename.rindex('.')-1)+".pdf"
send_file file, :type=>"application/msexcel", :x_sendfile=>true
end
if any one use please help me, any gem any script.
I would start with reading from the PDF, inserting the data in the XLSX is easy, if you have problems with that ask another question and specify which gem you use and what you tried for that part.
You use libreoffice to read the PDF but according to the FAQ your PDF needs to be hybrid, perhaps that is the problem.
As an alternative you could try to use some conversion tool for ebooks like the one in Calibre but I'm afraid you will lose too much formatting to recover the data you need.
All depends on how the data in your PDF is structured, if regular text without much formatting and positioning it can be as easy as using the gem pdf-reader
I used it in the past and my data had a lot of formatting - you would be surprised to know how complicated the PDF structure is - so I had to specify for each field at which location exactly which data had to be read, not for the faint of heart.
Here a simple example.
require 'pdf/reader' # gem install pdf-reader
reader = PDF::Reader.new("my.pdf")
reader.pages.each do |page|
# puts page.text
page.page_object.each do |e|
p e.first.contents
end
end
not able to find options to convert from PDF to xsls but API Options available for converting PDF to Image and PDF to powerpoint(Link Given Below)
Not sure u can change the requirement to show results in other formats!!
http://www.convertapi.com/

Inserting external PDF into Prawn generated document

How can I insert an existing PDF into a Prawn generated document? I am generating a pdf for a bill (as a view), and that bill can have many attachments (png, jpg, or pdf). How can I insert/embed/include those external pdf attachments in my generated document? I've read the manual, looked over the source code, and searched online, but no luck so far.
The closest hint I've found is to use ImageMagick or something similar to convert the pdf to another format, but since I don't need to resize/manipulate the document, that seems wasteful. The old way to do it seems to be through templates, but my understanding is that the code for templating is unstable.
Does anyone know how to include PDF pages in a Prawn generated PDF? If Prawn won't do this, do you know of any supplementary gems that will? If someone can point me towards something like prawn-templates but more reliable, that would be awesome.
Edit: I am using prawnto and prawn to render PDF views in Rails 4.2.0 with Ruby 2.2.0.
Strategies that I've found but that seem inapplicable/too messy:
Create a jpg preview of a PDF on upload, include that in the generated document (downsides: no text selection/searching, expensive). This is currently my favorite option, but I don't like it.
prawn-templates (downside: unstable, unmaintained codebase; this is a business-critical application)
Merge PDFs through a gem like 'combine-pdf'–I can't figure out how to make this work for rendering a view with the external PDFs inserted at specific places (the generated pdf is a collection of bills, and I need them to follow the bill they're attached to)
You're right about the lack of existing documentation for this - I found only this issue from 2010 which uses the outdated methods you describe. I also found this SO answer which does not work now since Prawn dropped support for templates.
However, the good news is that there is a way to do what you want with Ruby! What you will be doing is merging the PDFs together, not "inserting" PDFs into the original PDF.
I would recommend this library, combine_pdf, to do so. The documentation is good, so doing what you want would be as simple as:
my_prawn_pdf = CombinePDF.new
my_prawn_pdf << CombinePDF.new("my_bill_pdf.pdf")
my_prawn_pdf << CombinePDF.new("attachment.pdf")
my_prawn_pdf.save "combined.pdf"
Edit
In response to your questions:
I'm using Prawn to render a pdf view in Rails, which means that I don't think I get that kind of post-processing
You do! If you look at the documentation for combine_pdf, you'll see that loading from memory is the fastest way to use the gem - the documentation even explicitly says that Prawn can be used as input.
I'm not just tacking the PDFs to the end: a bill attachment must directly follow the generated page(s) for a bill
The combine_pdf gem isn't just for adding pages on the end. As the documentation shows, you can cycle through a PDF adding pages when you want to, for example:
my_pdf # previously defined
new_pdf = CombinePDF.new
my_pdf.pages.each.do |page|
i += 1
new_pdf << my_pdf if i == bill_number # or however you want to handle this logic
end
new_pdf.save "new_pdf.pdf"

ruby on rails: Convert an image file to byte array

I need to upload an image from a rails form using Ajax and convert it into a byte array to show a html preview of the image.
When I read the file, it returns me binary data which is not readable by img tag. I'm sure I'm doing something very silly and this might have an obvious solution. Here is the code snippet. Please help.
rails
tmp = File.open(params[:file_upload][:my_file].tempfile, 'rb').read
render :text => tmp
jquery
$("#item_detail_image").attr("src", "data:image/png;base64," + data.responseText);
I'm not using paperclip because I don't have a database connection to my rails application (I'm using web services) and I'm not sure how to use paperclip without ActiveRecord
You need to Base 64 encode the data.
See http://www.ruby-doc.org/stdlib-1.9.3/libdoc/base64/rdoc/Base64.html#method-i-encode64

Resources