Drake flattens by channel - drake

Why does Drake flatten along the channel dimension instead of along the batch dimension?
For a [n, c] vector, numpy/torch/tensorflow have a flatten operation and its inverse, a reshape operation
This flatten operation flattens along the batch dimension "n", whereas Drake flattens along "c"
Is there some built in function to get the flattening along the batch dimension?
For example, I want to access:
prog.initial_guess()
But I want it to be flattened along the batch dimension, according to the original matrix-shaped decision variables

Drake is written in C++ with a python binding. In C++ we use Eigen as our linear algebra library. Eigen::Matrix is default to column major, namely it stores data in the order of mat[0, 0], mat[1, 0], mat[2, 0], ..., mat[0, 1], mat[1, 1], .... This is different from how numpy/pytorch stores matrix, which is default to a row major.
Specifically for your question about getting the initial guess of a matrix variable, you can call the function
initial_guess = prog.GetInitialGuess(variable_matrix)
it will return a numpy matrix of float, where initial_guess[i, j] is the initial guess value of the variable variable_matrix[i, j], so you don't need to worry about matrix flattening/reshaping.

Related

How to create Polynomial Trajectories of Linear/Affine Systems in Python? i.e Time varying A and B Matrices

I am trying to create a Trajectory of a linear system for use in TrajectoryAffineSystem. But I am not able to understand the type of NumPy arrays needed for this.
For a simple Pendulum system, the state trajectory is a 2xN NumPy array where row 1 is theta, row 2 is theta_dot, and column k is the state at time t[k]. This NumPy array is clear/easy to create. Similarly, systems with a 1-D state vector are easy to create. However, this is not as clear when the state matrix itself is a 2D matrix.
For a linear pendulum system, the A matrix is a 2x2 matrix and A[k] is the (linearized) state at t[k]. Which type of NumPy array should then be used for representing this A[0-N]? Is it 2x2xN or Nx2x2 or another way of representing the time-varying linear dynamics that can be used to create an object of the Trajectory class?
Reproduction of the problem on google colab here
You cannot pass a numpy array directly into the TrajectoryAffineSystem constructor. Those constructors need a trajectory object, for instance a PiecewisePolynomial see here. There are a number of static methods on PiecewisePolynomial, like FirstOrderHold, that will provide the semantics you need to go from a list of A matrices into a trajectory.

How to understand SpatialDropout1D and when to use it?

Occasionally I see some models are using SpatialDropout1D instead of Dropout. For example, in the Part of speech tagging neural network, they use:
model = Sequential()
model.add(Embedding(s_vocabsize, EMBED_SIZE,
input_length=MAX_SEQLEN))
model.add(SpatialDropout1D(0.2)) ##This
model.add(GRU(HIDDEN_SIZE, dropout=0.2, recurrent_dropout=0.2))
model.add(RepeatVector(MAX_SEQLEN))
model.add(GRU(HIDDEN_SIZE, return_sequences=True))
model.add(TimeDistributed(Dense(t_vocabsize)))
model.add(Activation("softmax"))
According to Keras' documentation, it says:
This version performs the same function as Dropout, however it drops
entire 1D feature maps instead of individual elements.
However, I am unable to understand the meaning of entrie 1D feature. More specifically, I am unable to visualize SpatialDropout1D in the same model explained in quora.
Can someone explain this concept by using the same model as in quora?
Also, under what situation we will use SpatialDropout1D instead of Dropout?
To make it simple, I would first note that so-called feature maps (1D, 2D, etc.) is our regular channels. Let's look at examples:
Dropout(): Let's define 2D input: [[1, 1, 1], [2, 2, 2]]. Dropout will consider every element independently, and may result in something like [[1, 0, 1], [0, 2, 2]]
SpatialDropout1D(): In this case result will look like [[1, 0, 1], [2, 0, 2]]. Notice that 2nd element was zeroed along all channels.
The noise shape
In order to understand SpatialDropout1D, you should get used to the notion of the noise shape. In plain vanilla dropout, each element is kept or dropped independently. For example, if the tensor is [2, 2, 2], each of 8 elements can be zeroed out depending on random coin flip (with certain "heads" probability); in total, there will be 8 independent coin flips and any number of values may become zero, from 0 to 8.
Sometimes there is a need to do more than that. For example, one may need to drop the whole slice along 0 axis. The noise_shape in this case is [1, 2, 2] and the dropout involves only 4 independent random coin flips. The first component will either be kept together or be dropped together. The number of zeroed elements can be 0, 2, 4, 6 or 8. It cannot be 1 or 5.
Another way to view this is to imagine that input tensor is in fact [2, 2], but each value is double-precision (or multi-precision). Instead of dropping the bytes in the middle, the layer drops the full multi-byte value.
Why is it useful?
The example above is just for illustration and isn't common in real applications. More realistic example is this: shape(x) = [k, l, m, n] and noise_shape = [k, 1, 1, n]. In this case, each batch and channel component will be kept independently, but each row and column will be kept or not kept together. In other words, the whole [l, m] feature map will be either kept or dropped.
You may want to do this to account for adjacent pixels correlation, especially in the early convolutional layers. Effectively, you want to prevent co-adaptation of pixels with its neighbors across the feature maps, and make them learn as if no other feature maps exist. This is exactly what SpatialDropout2D is doing: it promotes independence between feature maps.
The SpatialDropout1D is very similar: given shape(x) = [k, l, m] it uses noise_shape = [k, 1, m] and drops entire 1-D feature maps.
Reference: Efficient Object Localization Using Convolutional Networks
by Jonathan Tompson at al.

Can caffe reshape layer do transpose

Caffe have reshape layer implemented, but say I want to first reshape a blob of (1, n, k, p) to (1, a, b, k, p), where n= a*b and then transpose it to shape (1, b, a, k, p), how to implement this operation, I know I can write a seperate python layer and do all this with numpy.reshape and numpy.transpose, but that would be not efficient, is it?
transpose and reshape are two fundamentally different operations:
While reshape only changes the shape of a blob, it does not affect its internal structure (and thus can be execute very efficiently). On the other hand, transpose re-arrange the blob's data.
Let's look at a simple example.
Suppose you have a 2x2 blob with values
[[0, 1], [2, 3]]
In memory the values are stored in a 1D contiguous way (row-major):
[0, 1, 2, 3]
If you reshape the blob to 4x1
[[0], [1], [2], [3]]
The underlying arrangement of the elements in memory is not changed.
However, if you transpose the blob to get
[[0, 2], [1, 3]]
The underlying arrangement is also changed to
[0, 2, 1, 3]
Therefore, you cannot use "Reshape" layer to transpose a blob.
Caffe SSD branch (by Weilu) has a "Permute" layer which is equivalent to transpose.
A note about performance:
While reshape only changes the blob's header (O(1) runtime and space), transpose needs to re-arrange elements in memory thus taking O(n) time and space.
To make things worse, if you use numpy.transpose to perform the task it means you transpose in CPU (host memory) thus adding two sync operations between CPU and GPU memory (sync GPU->CPU, transpose in CPU, sync CPU->GPU).
So, if you have no alternative but to transpose (aka "Permute") make sure you have a GPU implementation.

Multi Label classification with Sklearn

I have tried using the OneVsRest with Logistic Regression from Sklearn, but it gives empty labels for some samples (i.e. doesn't predict any out), even though I do not have any unlabelled training data.
Any idea what might be causing this or how to fix this?
clf = OneVsRestClassifier(LogisticRegression(multi_class='ovr',max_iter=1000,solver='lbfgs'))
clf.fit(X,Y)
self.classifier=clf
self.classifier.predict(test_data)
Whenever you are performing MultiLabel classification, according to the OneVsRestClassifier the targets need to be "a sequence of sequences of labels".
Moreover, depending on how you encode this labels you may get the following warning: "DeprecationWarning: Direct support for sequence of sequences multilabel representation will be unavailable from version 0.17. Use sklearn.preprocessing.MultiLabelBinarizer to convert to a label indicator representation."
So, neat way to encode your labels:
from sklearn import preprocessing
mlb = preprocessing.MultiLabelBinarizer()
Y = mlb.fit_transform([(1, 2), (1,2), (1,2),(4,)])
# this means sample one belongs to classes {1,2} and so on.
# Take into account the format if only one class is needed, (4,) not (4)
so Y turns out to be:
array([[1, 1, 0],
[1, 1, 0],
[1, 1, 0],
[0, 0, 1]])

scikit multilabel classification: ValueError: bad input shape

I beieve SGDClassifier() with loss='log' supports Multilabel classification and I do not have to use OneVsRestClassifier. Check this
Now, my dataset is quite big and I am using HashingVectorizer and passing result as input to SGDClassifier. My target has 42048 features.
When I run this, as follows:
clf.partial_fit(X_train_batch, y)
I get: ValueError: bad input shape (300000, 42048).
I have also used classes as the parameter as follows, but still same problem.
clf.partial_fit(X_train_batch, y, classes=np.arange(42048))
In the documentation of SGDClassifier, it says y : numpy array of shape [n_samples]
No, SGDClassifier does not do multilabel classification -- it does multiclass classification, which is a different problem, although both are solved using a one-vs-all problem reduction.
Then, neither SGD nor OneVsRestClassifier.fit will accept a sparse matrix for y. The former wants an array of labels, as you've already found out. The latter wants, for multilabel purposes, a list of lists of labels, e.g.
y = [[1], [2, 3], [1, 3]]
to denote that X[0] has label 1, X[1] has labels {2,3} and X[2] has labels {1,3}.

Resources