Looking for a way to parse a PDF (with just text in it), into the plain text. I see that PDF parsing with Ruby has been asked before but the answers are several years old, and not suited to a rails app.
Is there a gem that can assist with this?
This is what the docsplit gem is all about. Usage example:
pdfs = Dir['storage/originals/*.pdf']
Docsplit.extract_text(pdfs, :ocr => false)
Whats great with this gem is that it can convert .doc or .odt etc... to get the text.
Plus it's backed by a very specialised company: http://www.documentcloud.org/
This seems to be quite famous. I haven't tried it but it seems relevant.
Related
I've been using the awesome prawn gem in my last few project, but this time I have to covert a odt file (in odf format) to PDF.
I know that there are many gems that can do this, for example docsplit and others, but since I am already using prawn to generate other PDFs in the same app I would really like to know if I could get a way without adding yet another pdf creator type gem...
Does anyone know any resources that could help? Or at least a (really) simple gem that coverts odt to PDF (without possibly having to install anything)?
Thanks in advance...
No, Prawn can't do that. Prawn is just a library for programmatically generating PDF documents.
Best way would be using LibreOffice in a head-less fashion to convert ODF files into PDF.
Update:
After looking more closely into the issue I think I am understanding the problem wrong. Since epub is essentially a zipped file I have to generate files at some point.
The actual question would be how to do this efficiently in production if the number of files and file size I need to generate become large?
The ebook content will be generated from entries in the database as html files. I am thinking about storing those files with Amazon S3 but I am not sure if that's the best option out there.
Original Question
I am trying to create a web-based epub generation application with Ruby On Rails.
Currently I am looking into the eeepub gem: https://github.com/jugyo/eeepub.
I am wondering if there is a way to feed the epub content from database without declaring files as shown in the example.
files [File.join(dir, 'foo.html'), File.join(dir, 'bar.html')]
There is an open issue regarding this:https://github.com/jugyo/eeepub/issues/17
from years ago....
I know the gem is very old and does not seem to be active at all. I have looked through the source code and still not seeing a solution. If anyone has any pointers on how to achieve this through eeepub or a better tool please help me out! Thanks in advance.
Hi #voidwalker You can check the best gems for e-publishing on Ruby-toolbox, here you can compare gems by their popularity and activity.
from this list I think the Git-scribe is the best gem as per your requirement. Please try it and let me know if it's helpful.
Thanks
I would like to generate PDF forms with radio buttons and submit buttons in it by using Ruby on Rails. Does anyone know if there is a Gem that can help with this task?
I've looked into
Prawn,
Wicked PDF, and
PDFKit
but they don't seem to have this feature. Currently I am just using Acrobat Pro to create my PDF and insert the form manually but would like to automate this with a Gem if possible.
Any help would be appreciated, thanks.
EDIT
I just found 2 gems that can help insert radio buttons, check boxes, etc. while generating a PDF in rails: prawn-blank and prawn-forms. It doesn't seem like they are being maintained anymore but they should still be useful. Hope this is useful for others attempting to automate generating interactive PDF files too.
There's also RTeX. That works well if you're willing to translate to LaTeX first. LaTeX is a very good way to store marked-up documents. It just depends on how static each document is.
You can use right-signature to complete your task
https://github.com/rightsignature/rightsignature-api
http://www.gsubbarao.com/2013/03/ruby-rightsignature-api-to-prefill.html
I need a pdf generator for Rails 3.1 application.Which one can you suggest.
I've tried Prawn but it seems me quite not easy for a beginner like me while PdfKit gives me several errors :|
I don't know if it is the best fit for you, but I've heard good things about Wicked PDF/wkhtmltopdf.
I've tried both Wicked Pdf and PdfKit but in Development mode they are extremely slows :( Any other suggestions?
Another (maybe dummy) question: using PdfKit for example, I've seen that it make and open pdf file into web browser,but is there a way to download file into my desktop?
Can anybody show me (maybe copy/paste a simple code example) how to create the .pdf file from an existing (.xls) file, using the Prawn gem? (Basically, I'd need the command that "opens" the existing file.)
(I'm asking because the Prawn documentation (http://prawn.majesticseacreature.com/docs/) seems to be gone since quite a while - it's not even usable via Google cache...)
Thanks a lot for any help with this!
Tom
I'd suggest that you break the problem down.
Can you read xls with Ruby? Possibly, but it's flaky at best. However, you can easily read csv, and xls exports nicely to that format.
Can you write a 'table' of values to a prawn pdf? Yes
So, (almost) all you need is a little program that can parse a csv file into a prawn-friendly table-structure and then hand it off to Prawn for generation.
Turns out the Prawn gem cannot handle existing files...
Prawn can be used to render content on top of a PDF. You're talking about .xls, a completely different format.