How can I convert a bullet transform to glm matrix without copying data? - glm-math

I have a bullet transform and i would like to make it accessible as glm::mat3 type.
However, I am wondering if there is a good to way to do that without copying (like make_mat3x3).

After I skimmed the GLM, I found that - without modifying source code - it is impossible.
The copying is required.
Both Bullet and GLM cache the matrix by value, not pointer or reference.
For Bullet, see an evidence : http://bulletphysics.org/Bullet/BulletFull/btMatrix3x3_8h_source.html
For GLM, see an example : https://glm.g-truc.net/0.9.2/api/a00132_source.html.
It might be faster if you use memcpy, but I am not sure if it is possible.
It depends on how the values are ordered.
(I have limited knowledge about GLM)
Even you manage to let two objects reside in the same address,
there will be a horrible issue that hard to be managed. (e.g. double delete)
However, before you try to avoid copying, did you profile it?
Copying is not expensive, really.
A few years ago, I wasted a few hours with a similar problem.
In my case, I want to copy Bullet's matrix to Opengl buffer.
Nonetheless, after I profiled it, I found that
in all of my game prototypes, this operation cost less than 1% of the whole logic.
Not worth the effort, really.
Premature optimization is the root of evil.

Related

ELKI: Normalization undo for result

I am using the ELKI MiniGUI to run LOF. I have found out how to normalize the data before running by -dbc.filter, but I would like to look at the original data records and not the normalized ones in the output.
It seems that there is some flag called -normUndo, which can be set if using the command-line, but I cannot figure out how to use it in the MiniGUI.
This functionality used to exist in ELKI, but has effectively been removed (for now).
only a few normalizations ever supported this, most would fail.
there is no longer a well defined "end" with the visualization. Some users will want to visualize the normalized data, others not.
it requires carrying over normalization information along, which makes data structures more complex (albeit the hierarchical approach we have now would allow this again)
due to numerical imprecision of floating point math, you would frequently not get out the exact same values as you put in
keeping the original data in memory may be too expensive for some use cases, so we would need to add another parameter "keep non-normalized data"; furthermore you would need to choose which (normalized or non-normalized) to use for analysis, and which for visualization. This would not be hard with a full-blown GUI, but you are looking at a command line interface. (This is easy to do with Java, too...)
We would of course appreciate patches that contribute such functionality to ELKI.
The easiest way is this: Add a (non-numerical) label column, and you can identify the original objects, in your original data, by this label.

Best practice for dealing with package allocation in Go

I'm writing a package which makes heavy use of buffers internally for temporary storage. I have a single global (but not exported) byte slice which I start with 1024 elements and grow by doubling as needed.
However, it's very possible that a user of my package would use it in such a way that caused a large buffer to be allocated, but then stop using the package, thus wasting a large amount of allocated heap space, and I would have no way of knowing whether to free the buffer (or, since this is Go, let it be GC'd).
I've thought of three possible solutions, none of which is ideal. My question is: are any of these solutions, or maybe ones I haven't thought of, standard practice in situations like this? Is there any standard practice? Any other ideas?
Screw it.
Oh well. It's too hard to deal with this, and leaving allocated memory lying around isn't so bad.
The problem with this approach is obvious: it doesn't solve the problem.
Exported "I'm done" or "Shrink internal memory usage" function.
Export a function which the user can call (and calling it intelligently is obviously up to them) which will free the internal storage used by the package.
The problem with this approach is twofold. First, it makes for a more complex, less clean interface to the user. Second, it may not be possible or practical for the user to know when calling such a function is wise, so it may be useless anyway.
Run a goroutine which frees the buffer after a certain period of the package going unused, or which shrinks the buffer (perhaps halving the length) whenever its size hasn't been increased in a while.
The problem with this approach is primarily that it puts unnecessary strain on the scheduler. Obviously a single goroutine isn't so bad, but if this were accepted practice, it wouldn't scale well if every package you imported were doing this under the hood. Also, if you have a time-sensitive application, you may not want code running when you're not aware of it (that is, you may assume that the package isn't doing any work when its functions are not being called - a reasonable assumption, I'd say).
So... any ideas?
NOTE: You can see the existing project here (the relevant code is only a few tens of lines).
A common approach to this is letting the client pass an existing []byte (or whatever) as an argument to some call/function/method. For example:
// The returned slice may be a sub-slice of dst if dst was large enough
// to hold the entire encoded block. Otherwise, a newly allocated slice
// will be returned. It is valid to pass a nil dst.
func Foo(dst []byte, whatever Bar) (ret []byte, err error)
(Example)
Another approach is to get a new []byte from a, for example cache and/or for example pool (if you prefer the later name for that concept) and rely on clients to return used buffers to such "recycle-bin".
BTW: You're doing it right by thinking about this. Where it's possible to reasonably reuse []byte buffers, there's a potential for lowering the GC load and thus making your program better performing. Sometimes the difference can be critical.
You could reslice your buffer at the end of every operation.
buffer = buffer[:0]
Then your function extendAndSliceBuffer would have the original backing array most likely available if it needs to grow. If not, you would suffer a new allocation, which you might get anyway when you do extendAndSliceBuffer.
Overall, I think a cleaner solution is to do like #jnml said and let the users pass their own buffer if they care about performance. If they don't care about performance, then you should not use a global var and simply allocate the buffer as you need and let it go when it gets out of scope.
I have a single global (but not exported) byte slice which I start
with 1024 elements and grow by doubling as needed.
And there's your problem. You shouldn't have a global like this in your package.
Generally the best approach is to have an exported struct with attached functions. The buffer should reside in this struct unexported. That way the user can instantiate it and let the garbage collector clean it up when they let go of it.
You also want to avoid requiring globals like this as it can hamper unit tests. A unit test should be able to instantiate the exported struct, as the user can, and do it each time for every test.
Also depending on what kind of buffer you need, bytes.Buffer may be useful as it already provides io.Reader and io.Writer functions. bytes.Buffer also automatically grows and shrinks its buffer. In buffer.go you'll see various calls to b.Truncate(0) that does the shrinking with the comment "reset to recover space".
It's generally really really bad form to write Go code that is not thread-safe. If two different goroutines call functions that modify the buffer at the same time, who knows what state the buffer will be in when they finish? Just let the user provide a scratch-space buffer if they decide that the allocation performance is a bottleneck.

How to bulk-load an r-tree in C#?

I am looking for C# code to construct an r-tree. I have code that builds an r-tree incrementally i.e. items are added one by one to the tree, but I guess a better r-tree could be built if all items are given all at once to the tree creation algorithm. Please let me know if anyone knows how to bulk-load an r-tree in this manner. I tried doing some search but couldn't find anything very useful.
The most common method for low-dimensional point data is sort-tile-recursive (STR). It does exactly that: sort the data, tile it into the optimal number of slices, then recurse if necessary.
The leaf level of a STR-loaded tree with point data will have no overlap, so it is really good. Higher levels may have overlap, as STR does not take the extend of objects into account.
A proven good bulk-loading is a key component to the Priority-R-Tree, too.
And even when not bulk-loading, the insertion strategy makes a big difference. R-Trees built with linear splits such as Guttmans or Ang-Tan will usually be worse than those built with the R*-Tree split heuristics. In particular Ang-Tan tends to produce "sliced" pages, that are very unbalanced in their spatial extend. It is a fast split strategy and probably the simplest, but the results aren't good.
A paper by Achakeev et al.,Sort-based Parallel Loading of R-trees might be of some help. And you could also find other methods in their references.

Why do we use data structures? (when no dynamic allocation is needed)

I'm pretty sure this is a silly newbie question but I didn't know it so I had to ask...
Why do we use data structures, like Linked List, Binary Search Tree, etc? (when no dynamic allocation is needed)
I mean: wouldn't it be faster if we kept a single variable for a single object? Wouldn't that speed up access time? Eg: BST possibly has to run through some pointers first before it gets to the actual data.
Except for when dynamic allocation is needed, is there a reason to use them?
Eg: using linked list/ BST / std::vector in a situation where a simple (non-dynamic) array could be used.
Each thing you are storing is being kept in it's own variable (or storage location). Data structures apply organization to your data. Imagine if you had 10,000 things you were trying to track. You could store them in 10,000 separate variables. If you did that, then you'd always be limited to 10,000 different things. If you wanted more, you'd have to modify your program and recompile it each time you wanted to increase the number. You might also have to modify the code to change the way in which the calculations are done if the order of the items changes because the new one is introduced in the middle.
Using data structures, from simple arrays to more complex trees, hash tables, or custom data structures, allows your code to both be more organized and extensible. Using an array, which can either be created to hold the required number of elements or extended to hold more after it's first created keeps you from having to rewrite your code each time the number of data items changes. Using an appropriate data structure allows you to design algorithms based on the relationships between the data elements rather than some fixed ordering, giving you more flexibility.
A simple analogy might help to understand. You could, for example, organize all of your important papers by putting each of them into separate filing cabinet. If you did that you'd have to memorize (i.e., hard-code) the cabinet in which each item can be found in order to use them effectively. Alternatively, you could store each in the same filing cabinet (like a generic array). This is better in that they're all in one place, but still not optimum, since you have to search through them all each time you want to find one. Better yet would be to organize them by subject, putting like subjects in the same file folder (separate arrays, different structures). That way you can look for the file folder for the correct subject, then find the item you're looking for in it. Depending on your needs you can use different filing methods (data structures/algorithms) to better organize your information for it's intended use.
I'll also note that there are times when it does make sense to use individual variables for each data item you are using. Frequently there is a mixture of individual variables and more complex structures, using the appropriate method depending on the use of the particular item. For example, you might store the sum of a collection of integers in a variable while the integers themselves are stored in an array. A program would need to be pretty simple though before the introduction of data structures wouldn't be appropriate.
Sorry, but you didn't just find a great new way of doing things ;) There are several huge problems with this approach.
How could this be done without requring programmers to massively (and nontrivially) rewrite tons of code as soon as the number of allowed items changes? Even when you have to fix your data structure sizes at compile time (e.g. arrays in C), you can use a constant. Then, changing a single constant and recompiling is sufficent for changes to that size (if the code was written with this in mind). With your approach, we'd have to type hundreds or even thousands of lines every time some size changes. Not to mention that all this code would be incredibly hard to read, write, maintain and verify. The old truism "more lines of code = more space for bugs" is taken up to eleven in such a setting.
Then there's the fact that the number is almost never set in stone. Even when it is a compile time constant, changes are still likely. Writing hundreds of lines of code for a minor (if it exists at all) performance gain is hardly ever worth it. This goes thrice if you'd have to do the same amount of work again every time you want to change something. Not to mention that it isn't possible at all once there is any remotely dynamic component in the size of the data structures. That is to say, it's very rarely possible.
Also consider the concept of implicit and succinct data structures. If you use a set of hard-coded variables instead of abstracting over the size, you still got a data structure. You merely made it implicit, unrolled the algorithms operating on it, and set its size in stone. Philosophically, you changed nothing.
But surely it has a performance benefit? Well, possible, although it will be tiny. But it isn't guaranteed to be there. You'd save some space on data, but code size would explode. And as everyone informed about inlining should know, small code sizes are very useful for performance to allow the code to be in the cache. Also, argument passing would result in excessive copying unless you'd figure out a trick to derive the location of most variables from a few pointers. Needless to say, this would be nonportable, very tricky to get right even on a single platform, and liable to being broken by any change to the code or the compiler invocation.
Finally, note that a weaker form is sometimes done. The Wikipedia page on implicit and succinct data structures has some examples. On a smaller scale, some data structures store much data in one place, such that it can be accessed with less pointer chasing and is more likely to be in the cache (e.g. cache-aware and cache-oblivious data structures). It's just not viable for 99% of all code and taking it to the extreme adds only a tiny, if any, benefit.
The main benefit to datastructures, in my opinion, is that you are relationally grouping them. For instance, instead of having 10 separate variables of class MyClass, you can have a datastructure that groups them all. This grouping allows for certain operations to be performed because they are structured together.
Not to mention, having datastructures can potentially enforce type security, which is powerful and necessary in many cases.
And last but not least, what would you rather do?
string string1 = "string1";
string string2 = "string2";
string string3 = "string3";
string string4 = "string4";
string string5 = "string5";
Console.WriteLine(string1);
Console.WriteLine(string2);
Console.WriteLine(string3);
Console.WriteLine(string4);
Console.WriteLine(string5);
Or...
List<string> myStringList = new List<string>() { "string1", "string2", "string3", "string4", "string5" };
foreach (string s in myStringList)
Console.WriteLine(s);

Subkernel memory control in Mathematica

I have a somewhat similar question as:
Mathematica running out of memory
I am interested in something like this:
ParallelTable[F[i], {i, 0, 14.9, 0.001}]
where F[i] is a complicated numerical integral (I haven't yet found an easy way to reproduce the problem without page filling definitions for an integral).
My problem is that the subkernels blow up in memory and I have to stop evaluation if I won't let the machine swapping.
But even if I have stopped evaluation the kernels won't give free their occupied memory.
ClearSystemCache[]
I even have tried
ParallelEvaluate[ClearSystemCache[]]
but
ParallelEvaluate[MemoryInUse[]]
stays at
{823185944, 833146832, 812429208, 840150336, 850057024, 834441704,
847068768, 850424224}
it seems that all memory controlling only works for the main kernel?
By now the only way is to shut down all the kernels and launch them again.
I really hope there are some solutions out there...
Thanks a lot.
Memory control works for the kernel where control expressions involving such functions as MemoryConstrained, MemoryInUse, Clear, Unset, Remove, $HistoryLength, ClearSystemCache etc. are evaluated. It seems that in your case the source of the memory leaks is not due to Mathematica's internal caching mechanism (thanks for the link, BTW!).
Have you tried to evaluate $HistoryLength=0; in all subkernels before using them for computations? If you have not yet, I strongly recommend to try.
Since you are working with numerical integration functions, I suggest also to try to optimize usage of them. For example, if you make numerical integration using NDSolve and need only a limited set of calculated points (or even the only one point) you should use the form NDSolve[eqns,y,{x,x_needed_min,x_needed_max}] (or even NDSolve[eqns,y,{x,x_max,x_max}]) instead of NDSolve[eqns,y,{x,x_min,x_max}] or NDSolve[eqns,y,{x,0,x_max}]. This can dramatically reduce memory usage in some cases! You can also use EventLocator for memory control.
I was(am?) having the exact same problem, almost word for word. I just had some good luck with adding the option to the problem integral:
Method-> {"GlobalAdaptive", "SymbolicProcessing"->False}
You can probably choose any other method if you'd like, but I had success with this within the last few minutes. Also, a lot of nasty inconsistencies I used to be getting are gone, and integration proceeds MUCH faster.

Resources