is it possible to optimise spreadsheetgear performance? - spreadsheetgear

I am using the spreadsheetgear library in a web application that manages some large spreadsheets, populating cell values and extracting the results. It runs extremely fast for small spreadsheets but we are noticing problems with more sophisticated ones. One possible improvement that springs to mind is that as we are setting each parameter cell value I imagine that the other cell values are being recalculated immediately (possibly?), so if I set a lot of cell values prior to extracting the results then those calculations are being run redundantly (num of parameters - 1) times. Of course I don't really know how it works, maybe it just calculates the relevant values when inspected... so can someone please let me know if that is the case and if so is there anything that can be done to delay processing?

It looks like values are being calculated on pull from formula cell anyway.
Here is an explanation regarding some configuration settings
http://www.spreadsheetgear.com/support/help/spreadsheetgear.net.3.0/SpreadsheetGear~SpreadsheetGear.IWorkbookSet~CalculationOnDemand.html

Related

Abaqus: efficiently export large xy data set from Abaqus

I am trying to export XY data objects from sets of the size of 20-40k elements, but Abaqus is slowing down considerably, and even crashing. In fact, when I create the xy data, Abaqus gives me a warning saying that "the number of xyDataObjects is very large, and might cause performance issues". And so it does.
My usual procedure to is to create the xy data and then export in rpt format. Can someone suggest another method less prone to crashing? Would it be more efficient to divide the output element set into two or more subsets, and concatenate them after exporting?
The method recommended by #hgazibara in the comments is certainly sufficient, but it is laborious.
An easier method, I found, is a package called Abaqus2Matlab, which scrapes any variable you want from the odb. See here: http://www.abaqus2matlab.com/

What is the proper way of putting a tab-delimited database into a dataset?

The file has 26 columns and a very large number of rows. Would the proper way, using HDF5, be to read the file one line at a time, reading the contents into a 1x26 memory space, and then extending the dataset by 1x26 and copying the memory space contents to the dataset's newly added row?
I not sure how efficient this would be or even if this is the right way to do it, I am really new to this.
Thanks.
The answer is pretty dependent on your exact use case. It's certainly not wrong to do it the way you suggest, but it is possible there are more efficient/faster ways of doing it. In general, you're going to want to adapt the size of your chunks to how to read/write the data.
If you know roughly the number of rows ahead of time, it's likely to be much quicker to use relatively large chunks with compression. For example, if you know you're likely to have somewhere between 1000 and 2000 rows, then use chunks with 100 rows and enable compression. This will result in much fewer IO operations than a single row at a time.
On the other hand, if the dataset is likely to grow in time, one row at a time, then your way is probably better.
The other consideration is how you're going to read the data. If you're only ever going to read a single row at a time, then 1x26 chunks would be a good idea. If you're going to be reading the whole dataset at once and only a few times, however, use larger chunks.

Assigning labels to triples

I am currently trying to do stream reasoning using Jena, so I want to be able to reason over a certain set of triples that have occurred in a particular window of time, also taking into account some background static knowledge.
My problem is that I have an ontology that I read from several files, however I wish for the triples I obtain to have time stamps for when I receive them, which I thought I could just do by applying labels to the triples (I am just giving them all random time stamps for the moment as this is only a test).
While I didn't think that this would be problem, I am struggling at the initial step of just applying a label to an existing triple and selecting it. I cannot not seem to be able to access triples from the ontModel without having to transform it into a Graph, and while I could then create quads with the extra value being some literal for time, I can't find a way to then reason over this graph.
Any light that people can shed on this issue would help. I hope I am being clear.
I'm not sure exactly how you're putting labels on your triples, but you can get Statements from an OntModel, and Statement implements FrontsTriple through which you can access a corresponding Triple.

Store Redundant Info vs. Repeated Conversions

Is it preferable to store redundant information, (which can be otherwise generated from existing data,) or to instead convert the existing data each time you need access?
I've simplified my specific problem as best as I can below, hoping that the provided answers are useful as future-reference material.
Example:
Let's say we've developed a program that places data into Squares on a grid (like a super-descriptive game of Tic-Tac-Toe or something) and assigns various details, and a unique identification number to each:
Throughout our program, we often perform logic based on a square's X and/or Y coordinates (checking for 3 in a row) and other times we only need the ID (perhaps to access a string at "SquareName[ID]") - We aren't exactly certain which of these two is accessed more often, but it's a rather close competition.
Up until now we've simply stored the ID inside the square class, and converted it with some simple formulas whenever just the X or Y are needed. Say we want to get coordinates for one square in particular:
int CurrentX = (this.Square.ID - 1) % 3) + 1; // X coordinate, 1 through 3
int CurrentY = (this.Square.ID + 1) / 3; // Y, 1 through 3
Since the squares don't move around or change ID after setup, part of me believes it would be simpler just to store all 3 values inside the Square class, but my other part cringes at the redundancy since access to X and Y is already easy enough to calculate from the existing ID.
(Note, This program itself is not very memory or resource intensive, nor does the size of the grid get much larger, so it mostly comes down to which option is a better practice or rule of thumb.)
What would you do?
As a rule of thumb, for a system where the data is read/write, store your basic data without redundancy.
When performance or other considerations become a practical issue, then you should denormalize as necessary. (i.e. wait for it to be a problem, don't pre-optimize overly much).
Your goal should be the most maintainable code possible. That usually means writing the least code possible. Having extra code to maintain redundant copies of data points will make your code more brittle.
If those are values which can be determined at the moment of creation and then do not change anymore, I would go for variables populated in the constructor. It's not redundant info in so far as that it isn't stored anywhere else, but that's not my main point. When reading my code, I'd usually expect that whenever something is computed at the time of request, it might change per request. It is easy to find the point in the source where the field is populated and where it is changed, especially if it does never change, but you might end up slightly confused when looking at some calculation which will return always the same result, as it's variables can't change, and wonder whether you're just missing a case or this is really static.
Also, using a descriptive variable name, you can get rid of the comments. Not that I generally aim at not commenting, but source code which doesn't even need comments is a pretty save signal for easy to understand code, which might (/should) be your aim.

upper bound - display

This is an idea I got in to my mind,
All the display devices(screens which have pixels etc...) have an upper bound for the amount of various images they can generate.
as an example 1024*728 - 32 bit pixel display can only show (2^32)^(1024*768) etc... number of identical frames without duplicating any scene(view).
funny thing is, It's like we could pre generate all the films all the windows we have ever seen in our lives through screens etc...
the question here is can anybody use this abstract idea to create something useful? :D
You're talking of a number about
(2^32)^(1024*768) ~~ ((2^4)^8)^(10^6) ~~ 10^8^(10^6) ~ 10^8000000.
The number of atoms in universe is about
10^80 // http://en.wikipedia.org/wiki/Observable_universe#Matter_content
I think that there is no way we could pre-generate all the screens in our life.
Let me formulate another question. From a number this big, what can we do to reduce it? How to aggregate similar pictures in order to reduce the complexity?
Another nice question is: what kind of data structure we need to store all this information? Suppose we reduce the number of similar images to 10^10. What kind of structure can handle so many different kinds of pictures in an efficient way?
So given some extra information about the scenes you could generate you might be able to pull apart the scenes that no-one has ever seen.
So if you could take all the pictures out on the internet and the statistics about what was popular or viewed a lot then compute your all possible screens you could pull apart that was not viewed much.
With some basic rules about complexity of the image you might be able to come up with images that have not been seen before. Think 80% flesh tones might produce something coupled with a variance to show range might render people naked. :-)
Of course the computation of such an idea is vastly outside our potential. 2^32^(1024*768) is in the superexponential range which is outside the bounds of reality. I tried to compute it in ruby, and it just died. It would have been fun if it had actually worked. :-)

Resources