Inserting external PDF into Prawn generated document - ruby-on-rails

How can I insert an existing PDF into a Prawn generated document? I am generating a pdf for a bill (as a view), and that bill can have many attachments (png, jpg, or pdf). How can I insert/embed/include those external pdf attachments in my generated document? I've read the manual, looked over the source code, and searched online, but no luck so far.
The closest hint I've found is to use ImageMagick or something similar to convert the pdf to another format, but since I don't need to resize/manipulate the document, that seems wasteful. The old way to do it seems to be through templates, but my understanding is that the code for templating is unstable.
Does anyone know how to include PDF pages in a Prawn generated PDF? If Prawn won't do this, do you know of any supplementary gems that will? If someone can point me towards something like prawn-templates but more reliable, that would be awesome.
Edit: I am using prawnto and prawn to render PDF views in Rails 4.2.0 with Ruby 2.2.0.
Strategies that I've found but that seem inapplicable/too messy:
Create a jpg preview of a PDF on upload, include that in the generated document (downsides: no text selection/searching, expensive). This is currently my favorite option, but I don't like it.
prawn-templates (downside: unstable, unmaintained codebase; this is a business-critical application)
Merge PDFs through a gem like 'combine-pdf'–I can't figure out how to make this work for rendering a view with the external PDFs inserted at specific places (the generated pdf is a collection of bills, and I need them to follow the bill they're attached to)

You're right about the lack of existing documentation for this - I found only this issue from 2010 which uses the outdated methods you describe. I also found this SO answer which does not work now since Prawn dropped support for templates.
However, the good news is that there is a way to do what you want with Ruby! What you will be doing is merging the PDFs together, not "inserting" PDFs into the original PDF.
I would recommend this library, combine_pdf, to do so. The documentation is good, so doing what you want would be as simple as:
my_prawn_pdf = CombinePDF.new
my_prawn_pdf << CombinePDF.new("my_bill_pdf.pdf")
my_prawn_pdf << CombinePDF.new("attachment.pdf")
my_prawn_pdf.save "combined.pdf"
Edit
In response to your questions:
I'm using Prawn to render a pdf view in Rails, which means that I don't think I get that kind of post-processing
You do! If you look at the documentation for combine_pdf, you'll see that loading from memory is the fastest way to use the gem - the documentation even explicitly says that Prawn can be used as input.
I'm not just tacking the PDFs to the end: a bill attachment must directly follow the generated page(s) for a bill
The combine_pdf gem isn't just for adding pages on the end. As the documentation shows, you can cycle through a PDF adding pages when you want to, for example:
my_pdf # previously defined
new_pdf = CombinePDF.new
my_pdf.pages.each.do |page|
i += 1
new_pdf << my_pdf if i == bill_number # or however you want to handle this logic
end
new_pdf.save "new_pdf.pdf"

Related

Insert existing PDF at position in PrawnPdf created PDF

I am creating a one page PDF with PrawnPDF in ruby (Rails). I want to insert and existing PDF in the top of the pdf at certain position in the PDF I am creating with PrawnPDF.
This is dooable with an Image but would be good if possible with PDF.
Anyone know if this is possible ?
I don't think you can do this with Prawn, but the combine_pdf gem allows you to merge pdf documents and if you really need to sandwich something in the middle, you could create partial files and merge them in the order you need.
From the doc of https://github.com/boazsegev/combine_pdf:
pdf = CombinePDF.new
pdf << CombinePDF.load("file1.pdf") # one way to combine, very fast.
pdf << CombinePDF.load("file2.pdf")
pdf.save "combined.pdf"

how to convert pdf file into xlsx file in ruby on rails

I have uploaded 1 PDF then convert it to xlsx file. I have tried different ways but not getting actual output.pdf2xls only displays single line format not whole file data. I want whole PDF file data to display on xlsx file.
i have one method convert PDF to xlsx but not display proper format.
def do_excel_to_pdf
#user=User.create!(pdf: params[:pdf])
#path_in = #user.pdf.path
temp1 = #user.pdf.path
#path_out = #user.pdf.path.slice(0..#user.pdf.path.rindex(/\//))
query = "libreoffice --headless --invisible --convert-to pdf " + #path_in + " --outdir " + #path_out
system(query)
file = #path_out+#user.pdf.original_filename.slice(0..#user.pdf.original_filename.rindex('.')-1)+".pdf"
send_file file, :type=>"application/msexcel", :x_sendfile=>true
end
if any one use please help me, any gem any script.
I would start with reading from the PDF, inserting the data in the XLSX is easy, if you have problems with that ask another question and specify which gem you use and what you tried for that part.
You use libreoffice to read the PDF but according to the FAQ your PDF needs to be hybrid, perhaps that is the problem.
As an alternative you could try to use some conversion tool for ebooks like the one in Calibre but I'm afraid you will lose too much formatting to recover the data you need.
All depends on how the data in your PDF is structured, if regular text without much formatting and positioning it can be as easy as using the gem pdf-reader
I used it in the past and my data had a lot of formatting - you would be surprised to know how complicated the PDF structure is - so I had to specify for each field at which location exactly which data had to be read, not for the faint of heart.
Here a simple example.
require 'pdf/reader' # gem install pdf-reader
reader = PDF::Reader.new("my.pdf")
reader.pages.each do |page|
# puts page.text
page.page_object.each do |e|
p e.first.contents
end
end
not able to find options to convert from PDF to xsls but API Options available for converting PDF to Image and PDF to powerpoint(Link Given Below)
Not sure u can change the requirement to show results in other formats!!
http://www.convertapi.com/

simple formatting/parsing in markdown for blockquotes

I'm using markdown in my site and I would like to do some simple parsing for news articles.
How can I parse markdown to pull all blockquotes and links, so I can highlight them separately from the rest of the document
For example I would like to parse the first blockquote ( >) in the document so I can push it to the top no matter where it occurs in the document. (Similar to what many news sites do, to highlight certain parts of an article.) but then de-blockquote it for the main body. So it occurs twice (once in the highlighted always at the top and then normally as it occurs in the document).
I will assume you're trying to do this at render-time, when the markdown is going to be converted to HTML. To point you in the right direction, one way you could go about this would be to
Convert the markdown to HTML
Pass the HTML to Nokogiri
Grab the first <blockquote>, copy it, and inject it into the top of the Nokogiri node tree
The result would be a duplicate of the first <blockquote>.
Redcarpet 2 is a great gem for converting Markdown to HTML. Nokogiri is your best bet for HTML parsing.
I can write sample code if necessary, but the documentation for both gems is thorough and this task is trivial enough to just piece together bits from examples within the docs. This at least answers your question of how to go about doing it.
Edit
Depending on the need, this could be done with a line of jQuery too.
$('article').prepend($($('article blockquote').get(0)).clone())
Given the <article> DOM element for an article on your page, grab the first <blockquote>, clone it, and prepend it to the top of the <article>.
I know wiki markup (i.e. wikicloth for ruby) has similar implementations as you're after for parsing links, categories, and references. Though I'm not sure about block quotes, but it may be better suited.
Something like:
data = "[[ this ]] is a [[ link ]] and another [http://www.google.com Google]. This is a <ref>reference</ref>, but this is a [[Category:Test]]. This is in another [[de:Sprache]]"
wiki = WikiCloth::Parser.new(:data => data)
wiki.to_html
puts "Internal Links: #{wiki.internal_links.size}"
puts "External Links: #{wiki.external_links.size}"
puts "References: #{wiki.references.size}"
puts "Categories: #{wiki.categories.size} [#{wiki.categories.join(",")}]"
puts "Languages: #{wiki.languages.size} [#{wiki.languages.keys.join(",")}]"
I haven't seen any such parsers available for markdown. Using redcarpet, converting to HTML, then using Nokogiri does seem a bit convoluted.

How to generate a PDF mail merge with Ruby on Rails?

I'm working on a very simple mail merge system with Ruby on Rails. I have 2 models (members and letters).
First I create a html-formatted letter with some special fields (like {NAME}, {ADDRESS} or {CITY}.
Then, I must generate a PDF with one letter per member. So, if I have 100 members in the database, the PDF must have 100 letters, each one customized (replacing the special fields with the data in the database).
I know how to create the letters in html, but I don't know how to generate the PDF. Can you help me with that? I know there are some gems to create PDFs, but I don't know how to add the letters on different pages.
Thanks for your help!
I have used Prawn to generate PDFs and it sounds like it would work for you. There’s a wiki article on using Prawn with Rails.
The go_to_page method will let you position elements on a particular page after you have used add_pages to create them.

In Ruby on Rails, how can I convert html to word?

how can I convert html to word
thanks.
I have created a Ruby html to word gem that should help you do just that. You can check it out at https://github.com/nickfrandsen/htmltoword - You simply pass it a html string and it will create a corresponding word docx file.
def show
respond_to do |format|
format.docx do
file = Htmltoword::Document.create params[:docx_html_source], "file_name.docx"
send_file file.path, :disposition => "attachment"
end
end
end
Hope you find it helpful.
I am not aware of any solution which does this, i.e. convert HTML to Word format. If you literally mean that, you will have to parse the HTML document first using something like Nokogiri. If you mean you want to output data persisted in your model objects, there is obviously no need to parse HTML! As far as outputting to Word, I'm afraid it looks as if you will have to directly interface with a running instance of Microsoft Word via OLE!
A quick google search for win32ole ruby word will get you started:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/241606
Good luck!
I agree with CodeJoust that it is better to generate a PDF. However, if you really need to generate a Word document then you can do the following:
If your server is a Windows machine, you can install Office in it and use ruby's OLE binding to generate the Word document into the public folder and then deliver the file in the response.
To use ruby's OLE binding, see the "Programming Ruby" ebook that comes with the one-click ruby installer for Windows. You may have to use custom logic to convert from HTML to Word unless you can find a function in the OLE api of Word to do that.
http://prawn.majesticseacreature.com/
You could allow the user to download a PDF or a .html file, but there aren't any helpful ruby libraries to do that. You're better off generating a 'printable and downloadable' version, without much styling, and/or a pdf version using a library like prawn.
You could always generate a simple .rtf file, I think word'll be pretty happy reading that...

Resources