Reading Excel formulae using Ruby - ruby-on-rails

I'm trying to use the Spreadsheet gem to parse XLS files that store information about school courses. These XLS files are automatically generated, so I cannot change the presentation of data.
Course schedules are saved as a list of characters, with dashes representing days in which the class does not meet. An example would be "3--33--", meaning the class meets during block 3 on days 1, 4, and 5 in the rotation. Excel parses some of these schedules as formulae, meaning that I need to read the formula itself from certain cells.
The problem is that when I try to read the data from a formula cell, using cell.data, the result is a string like \r\x00\x1F\x00\x00\x00\x00\x00\xD0\x84\xC0\x1EB\x00\x04. I'm assuming that this is Ruby's attempt to print the data as ASCII text. After some research, I have learned that Excel stores formulae in RPN format.
In short: I'm not sure how to go about reading a formula (the formula itself, not the formula's calculated value) from an Excel spreadsheet. I can't change the input Excel spreadsheet, and having a purely Ruby solution would be nice, since I'm planning on using this with Rails.

A different approach is:
convert it to csv using xls2csv: http://linux.die.net/man/1/xls2csv
read it using the ruby standard lib: http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html
I hope this can help you.

Related

QUERY function not including text cells along with number cells in result

I'm currently trying to copy a list using the QUERY function in google-sheets.
The problem im now facing is that words / letters are not included in the search.
Example picture
Im using the function: "=QUERY(E2:F5;)" but don't get the words included.
Is there any way to include these words by using the formula above as guide?
In google-sheets, use Format, Number, Plain Text on your source range of E2:F5 and your original formula will work.
=QUERY(E2:F5)
From Docs Editor Help - QUERY function
In case of mixed data types in a single column, the majority data type determines the data type of the column for query purposes. Minority data types are considered null values.

How to convert HTML formatting inside google sheets rows to their correct formatted equivalent

I have been looking for a solution to convert a database I have with HTML formatting in one of the columns to its "normal" text equivalency in google sheets. A lot of the solutions I've found dealt with writing programs to do this or using Excel, so they unfortunately didn't pertain well enough.
For example in one of my columns I have;
Fast (<i> This character deals damage before non-<b>Fast</b> characters in combat.</i>)
But I would like to be able to have a somewhat streamlined solution to convert the above to:
Fast (This character deals damage before non-Fast characters in combat.)
AFAIK, with Google Sheet API, you can only format text within a cell. As mentioned in documentation, conditional formatting lets you format cells so that their appearance changes dynamically according to the value they contain, or to values in other cells.
You may want to request new feature here.

How to add an identifier in a row if any cell for that row is changed in excel?

Is there a way in excel to mark the rows with an identifier if any cell in that row is changed without using vb scripts and using a formula ?
Such that while parsing the excel, i can get only those rows which are changed and then i can easily compare those changed rows with the values in the database.
The reason i need to do that is because:
It is a bulk import and each sheet can have 50,000 to million rows.
Data in each row needs to compare with 3-4 database tables
I cannot add vb scripts to those excel sheets because the excel sheets are exported through the same application.
Or is there any other way to efficiently do the bulk import? I'm using the Roo gem and already using the each_row_streaming method.
Afaik there exists no such functionality. Even in the case you would use VBA to mark changed rows, you would run into a validation issues. Let me explain a bit:
Let A be the one changing the data, if she is not such a nice lady she will make her necessary changes but fiddle around with the change indications to break you logic. Why? Because she can. Or because it gives her a business advantage, or ... Even if she is nice, how do you now - for certain - that no change went unnoticed?
I would say your safe option is to always do a full compare on each workbook/row against the database to be sure no change goes unnoticed.
It might be sensible to calculate a hash for each row and store it somewhere in the database. That way you would only need to recompute and compare the hashes. But this depends a lot on your data.

Insert formula to excel with Ruby on Rails format xls support

I am using the inbuilt "format.xls" support that comes with Ruby on Rails to export data to excel.
I have created my .xls.erb, and referenced values that I need. However, is it possible to insert a formula into a cell? eg. "=SUM(A1:A20)" I have not seen this used anywhere.
If possible, could an example be given?
The format used is
<Cell ss:Index="2" ss:Formula="=SUM(R[-3]C[0]:R[-1]C[0])">
<Data ss:Type="Number"></Data>
</Cell>
This will sum the values in the range 3 rows above to 1 row above, in the current column. The formula uses R1C1 format rather than A1-style formatting. This link is particularly useful for getting to know the elements and attributes that make up the schema - http://msdn.microsoft.com/en-us/library/aa140066(v=office.10).aspx

Tika and parsing data with row and col spans

I am searching for this for the past 2 days but its difficult to find. but the problem here is when you search for col spans in google with any word, different and variant documents will show that are irrelevant.
The question: is it possible to use tika apache parser, to retrieve or get parsed data from different type of documents with the col spans and row span as xhtml. if yes is there a tutorial or any document that can help me with that.
Unfortunately not, out o the box.
You would need to extend the base library used to parse spreadsheets to get this information into the Tika output.
An alternative would be to use EPPlus

Resources