Parse doc and xls files in ruby - ruby-on-rails

In my rails application, I need to upload some doc/xls files and parse its structure and get information. How can I get data from *.doc or *.xls in maybe xml format or anything else that I can read and parse?

You can parse different types of spreadsheets using the Roo gem. It supports:
OpenOffice
Excel
Google spreadsheets
Excelx
LibreOffice
CSV
From my experience it has some issues with parsing .xls files, however parsing .xlsx files is good.
As for .doc files, you may try using msworddoc-extractor gem or try one of the solutions proposed here.
Update: working with *.docx files - docx and docx-html

Have you seen the Nokogiri gem? http://nokogiri.org/
Very useful for xml parsing

The spreadsheet gem is nice for excel and csv files.
https://github.com/zdavatz/spreadsheet

Related

Best way to deal in reading very old xls files

I'm coming across transferring and old app to a new one, some of the reports are generated by some very old desktop app running on old xls files, the new app is build with rails 3.x but the only problem I have is that it keeps on getting an OLE Signature error, I tried parsing the files manually via excel to xlsx and all the rails xls gems start reading them. What would be the best way to handle old xls files?
Are there actually gems that read very old xls files, I've already tested roo, spreadsheet, rubyXL (i can't get simple-spreadsheet to work due to version conflicts with roo-xls and spreadsheet requirements)
Gem that would allow me to simply re-parse the file as an xlsx file and let the latest gems read them from there on.
Take a look at this gem https://github.com/roo-rb/roo-xls.
Taken from README.
This library extends Roo to add support for handling class Excel files, including:
.xls files
.xml files in the SpreadsheetML format (circa 2003)

Convert Excel and Word files to PDF Using ruby

I want to convert Excel with multiple sheets and Word files to PDF format as a single file using Ruby.
Is there any Script/Gems/Plugins available to achieve this?
You can use libreconv
https://github.com/ricn/libreconv
You'll need to install libre office on your server, which is straightforward.
There is not a Gem doing all the job. but you can combine some:
For excel files - read data using roo GEM - http://roo.rubyforge.org/
For word files Opening .doc files in Ruby
Convert the data readed in previous steps into html.
Then convert it to pdf using: https://github.com/pdfkit/PDFKit

Single Ruby Gem that parses BOTH xlsx and xls Excel files?

I'm a bit frustrated with the gems out there. It seems like each one does one thing well but not others.
roo parses both xlsx and xls however it doesn't seem to read certain fields correctly and isn't working in each case I need it to.
spreadsheet gem doesn't parse xlsx
rubyXL doesn't parse xls files
Any other suggestions?
Thanks
I would just combine the rubyXL gem and the spreadsheet gem if you're happy with the individual results both provide.

Parsing data from excel sheet

Any gem or plugins in ruby on rails which will read the data from excel sheet.
Try Spreadsheet GEM. It provides facility to read xls file. You can also try Roo.

Prawn gem: How to create the .pdf from an *existing* file (.xls)

Can anybody show me (maybe copy/paste a simple code example) how to create the .pdf file from an existing (.xls) file, using the Prawn gem? (Basically, I'd need the command that "opens" the existing file.)
(I'm asking because the Prawn documentation (http://prawn.majesticseacreature.com/docs/) seems to be gone since quite a while - it's not even usable via Google cache...)
Thanks a lot for any help with this!
Tom
I'd suggest that you break the problem down.
Can you read xls with Ruby? Possibly, but it's flaky at best. However, you can easily read csv, and xls exports nicely to that format.
Can you write a 'table' of values to a prawn pdf? Yes
So, (almost) all you need is a little program that can parse a csv file into a prawn-friendly table-structure and then hand it off to Prawn for generation.
Turns out the Prawn gem cannot handle existing files...
Prawn can be used to render content on top of a PDF. You're talking about .xls, a completely different format.

Resources