armadillo c++: Usage of cube data structure - armadillo

Is there a method to define a cube of vectors, like for e.g if I define cube(1,2,4), can I have every entry of the cube to be a vector of float entries (fcolvec) ?

Reading the Armadillo documentation always helps before posting questions.
To have vectors in a cube-like layout, use the field class:
field<vec> X(2,3,4);
Each of the elements in the field is then an instance of the vec class. You will still need to set the size of each vector and manipulate its contents. For example:
X(1,2,3).set_size(10);
X(1,2,3).fill(456);
If on the other hand you want to access the columns of a slice in a cube, use:
cube C(4,3,2, fill::zeros);
C.slice(1).col(2).fill(456);

Related

What is the meaning of Caffe - Blob Class - member variables?

In Caffe, as we can see in blob.hpp, there are 6 member variables in each blob object:
data_
diff_
shape_data_
shape_
count_
capacity_
data_ contains the normal data that we pass along
diff_ is gradient computed by the network
Since there is no comment in the source code and due to lack of the official documentation, I wanted to know, What is the exact meaning of the others?
thanks,
shape_data_ & shape_ represent the same thing. The only difference is that their types are different. shape_ is a vector of integers with the dimensions of the data, whereas shape_data_ is a shared pointer.
count_ is the total number of elements in data_. So it the product of all the dimensions in shape_.
capacity_ is the maximum size of data_ that can be accommodated in the Blob.
References:
http://blog.luoyetx.com/2015/10/reading-caffe-2/
http://imbinwang.github.io/blog/inside-caffe-code-blob

Why does ELKI need db.in file in addition to distance matrix? Also what should db.in file contain?

I tried to follow this tutorial on using ELKI with pre-computed distances for clustering.
http://elki.dbs.ifi.lmu.de/wiki/HowTo/PrecomputedDistances
I used the following set of command line options:
-dbc.filter FixedDBIDsFilter -dbc.startid 0 -algorithm clustering.OPTICS
-algorithm.distancefunction external.FileBasedDoubleDistanceFunction
-distance.matrix /path/to/matrix -optics.minpts 5 -resulthandler ResultWriter
ELkI fails with a configuration error saying db.in file is needed to make the computation.
The following configuration errors prevented execution:
No value given for parameter "dbc.in":
Expected: The name of the input file to be parsed.
No value given for parameter "parser.distancefunction":
Expected: Distance function used for parsing values.
My question is what is db.in file? Why should I provide it in addition to the distance matrix file since the pair-wise distance matrix file completely specifies all the information about the point cloud. (also I don't have access to any other information other than the pair-wise distance information).
What should I do about db.in? Should I override it, or specify some dummy information etc. Kindly help me understand.
thank you.
This is documented in the ELKI HowTos:
http://elki.dbs.ifi.lmu.de/wiki/HowTo/PrecomputedDistances
Using without primary data
-dbc DBIDRangeDatabaseConnection -idgen.count 100
However, there is a bug (patch is on the howto page, and will be in the next release) so you right now can't fully use this; as a workaround you can use a text file that enumerates the objects.
The reason for this is that ELKI is designed to work on multi-relational data. It's not just processing matrixes. But some algorithms may e.g. need a geographic representation of an object, some measurements for this object, and a label for evaluation. That is three relations.
What the DBIDRange data source essentially does is create a single "fake" relation that is just the DBIDs 0 to 99. On algorithms that don't need actual data, but only distances (e.g. LOF or DBSCAN or OPTICS), it is sufficient to have object IDs and a distance matrix.

Efficient matrix copying in OpenCV

I have no idea for how to implement matrix implementation efficiently in OpenCV.
I have binary Mat nz(150,600) with 0 and 1 elements.
I have Mat mk(150,600) with double values.
I like to implement as in Matlab as
sk = mk(nz);
That command copy mk to sk only for those element of mk element at the location where nz has 1. Then make sk into a row matrix.
How can I implement it in OpenCV efficiently for speed and memory?
You should take a look at Mat::copyTo and Mat::clone.
copyTo will make an copy with optional mask where its non-zero elements indicate which matrix elements need to be copied.
mk.copyTo(sk, nz);
And if you really want a row matrix then call sk.reshape() as member sansuiso already suggested. This method ...
creates alternative matrix header for the same data, with different
number of channels and/or different number of rows.
bkausbk gave the best answer. However, a second way around:
A=bitwise_and(nz,mk);
If you access A, you can copy the non-zero into a std::vector. If you want your output to be a cv::Mat instance then you have to allocate the memory first:
S=countNonZero(A); //size of the final output matrix
Now, fast element access is an actual topic of itself. Google it. Or have a look at opencv/modules/core/src/stat.cpp where countNonZero() is implemented to get some ideas.
There are two steps involved in your task.
First, you convert to double the input matrix:
cv::Mat binaryMat; // source matrix, filled somewhere
cv::Mat doubleMat; // target matrix (with doubles)
binaryMat.convertTo(doubleMat, CV64F); // Perform the conversion
Then, reshape the result as a row matrix:
doubleMat = cv::reshape(doubleMat, 1, 1);
// Alternatively:
cv::Mat doubleRow = cv::reshape(doubleMat, 1, 1);
The cv::reshape operation is efficient in the sense that the data is not copied, only the structure header changes.
This function returns a new reference to a matrix (by creating a new header), thus you should not forget to assign its result.

How to read Mahout clustering output

I have run the k-Means clustering algorithm on the synthetic control data from the Mahout tutorial, and was wondering if someone could explain how to interpret the output. I ran clusterdump and received output that looks something like this (truncated to save space):
CL-592{n=57 c=30.726, 29.813...] r=[3.528, 3.597...]}
Weight : [props - optional]: Point:
1.0 : [distance=27.453962995925863]: [24.672, 35.261, 30.486...]
1.0 : [distance=27.675053294846002]: [25.592, 29.951, 34.188...]
1.0 : [distance=28.97727289419493]: [30.696, 32.667, 34.223...]
1.0 : [distance=21.999685652862784]: [32.702, 35.219, 30.143...]
...
CL-598{n=50 c=[29.611, 29.769...] r=[3.166, 3.561...]}
Weight : [props - optional]: Point:
1.0 : [distance=27.266203490250472]: [27.679, 33.506, 23.594...]
1.0 : [distance=28.749781351838173]: [34.727, 28.325, 30.331...]
1.0 : [distance=32.635136046420186]: [27.758, 33.859, 29.879...]
1.0 : [distance=29.328974057024624]: [29.356, 26.793, 25.575...]
Could someone explain to me how to read this? From what I understand, CL-__ is a cluster ID, followed by n=number of points in the cluster, c=centroid as a vector, r=radius as a vector, and then each point in the cluster. Is this correct? Furthermore, how do I know which clustered point matches up with which input point? i.e. are the points described as a key-value pair where the key is some kind of ID for the point and the value is the vector? If not is there some way I can set it up so it is?
I believe your interpretation of the data is correct (I've only been working with Mahout for ~3 weeks, so someone more seasoned should probably weigh in on this).
As far as linking points back to the input that created them I've used NamedVector, where the name is the key for the vector. When you read one of the generated points files (clusteredPoints) you can convert each row (point vector) back into a NamedVector and retrieve the name using .getName().
Update in response to comment
When you initially read your data into Mahout, you convert it into a collection of vectors with which you then write to a file (points) for use in the clustering algorithms later. Mahout gives you several Vector types which you can use, but they also give you access to a Vector wrapper class called NamedVector which will allow you to identify each vector.
For example, you could create each NamedVector as follows:
NamedVector nVec = new NamedVector(
new SequentialAccessSparseVector(vectorDimensions),
vectorName
);
Then you write your collection of NamedVectors to file with something like:
SequenceFile.Writer writer = new SequenceFile.Writer(...);
VectorWritable writable = new VectorWritable();
// the next two lines will be in a loop, but I'm omitting it for clarity
writable.set(nVec);
writer.append(new Text(nVec.getName()), nVec);
You can now use this file as input to one of the clustering algorithms.
After having run one of the clustering algorithms with your points file, it will have generated yet another points file, but it will be in a directory named clusteredPoints.
You can then read in this points file and extract the name you associated to each vector. It'll look something like this:
IntWritable clusterId = new IntWritable();
WeightedPropertyVectorWritable vector = new WeightedPropertyVectorWritable();
while (reader.next(clusterId, vector))
{
NamedVector nVec = (NamedVector)vector.getVector();
// you now have access to the original name using nVec.getName()
}
Try to add the option -of CSV in clusterdump, you will have a more exploitable result for further treatment.
I have the same problem,(using mahout 0.6).I am also a beginner. I need to display the documents in the form of clusters to the users. So i will need document names rather that words corresponding to clusters. I have been clustering the text documents from shell script.

Mapping points from Euclician 2-space onto a Poincare disc

For some reason it seems that everyone writing webpages about Poincare discs is only concerned with how to represent lines and measure distances.
I'd like to morph a collection of 2D points (as defined by x,y coordinates in the Euclidian plane) onto a Poincare disc, but I have no idea what the algorithm is supposed to be like. At this point I don't even know if it's possible to create a mapping between Euclidian 2-space and a Poincare disc...
Any pointers?
Goodwill,
David
You describe your data as a collection of points. But from your comments, you want to make lines in the plane still map to lines in the disk. You seem to want to preserve the "structure" of the space somehow, which is probably why you use the term "morph". I think that you want a conformal map.
There is no conformal bijection between the disk and the plane. There is such a mapping between the half-plane and the disk, and it preserves "lines", but not the kind that you want, unfortunately.
You said "I don't even know if it's possible to create a mapping" ... there are a number of mappings for you to choose from (see the Unit Disk page for an example) but there are none with all the features you seem to want.
If I understand everything correctly, the answer you get on the other forum is for the Beltrami–Klein model. Once you have that, you can get to the coordinates in the Poicare' disk with
p = b / (1 + sqrt(1 - b * b))
Where p is the vector of coordinates in the Poincare' disk (i.e. what you need) and b is the one in the Beltrami–Klein model (i.e. what you get from the other answer).

Resources