Sklearn throws ValueError when given a sparse matrix - machine-learning

My SVM classifier is throwing a Value Error when features are represented with a sparse matrix, but no error if features are represented with a dense one.
I have code that performs One Hot Encoding on my feature sets, and adds the encoded output to a new list of features. When the output of the One Hot Encoding is converted to a dense array using .toarray(), my SVM classifier runs fine.
However, using dense arrays are non ideal as I have thousands of data points and my computer to run out of memory very quickly. Therefore, sparse arrays are needed. If I simply remove the .toarray() from the code below, the output of enc.transform(features) will output a sparse matrix. However, if I run my SVM classifier, I now get the following error:
ValueError: setting an array element with a sequence.
It seems as though something is failing when my SVM tries to fit the data. Sklearn SVMs accept sparse vectors, so I don't understand what is going wrong.
# Perform One Hot Encoding
transformedFeatureList = []
for features in featureList:
features = np.asarray(features)
features = features.reshape(1, -1)
transformedFeatures = enc.transform(features).toarray() <---Without toarray() the Value Error happens
featureList = transformedFeatureList
# Seperate data into training and testing set
trainingSet = [[], []]
testSet = [[], []]
if len(featureList) == len(classList):
for index in range(len(featureList)):
if random.randint(1, 10) <= 7:
# Train model and attempt classification
from sklearn import svm
X = trainingSet[0]
y = trainingSet[1]
clf = svm.SVC(), y)
results = {}
for iclass in set(classList):
results[iclass] = [0, 0] # index 0 = correct, index 1 = incorrect
if len(testSet[0]) == len(testSet[1]):
for index in range(len(testSet[0])):
features = testSet[0][index]
iclass = testSet[1][index]
predictedClass = clf.predict([features])[0]
if predictedClass == iclass:
results[iclass][0] += 1
results[iclass][1] += 1

I found the source of the ValueError. Essentially, my "sparse matrix" was super non-legit. Apparently a dense matrix represented as:
dense = [[0,0], [1,1], [2,2]]
is a legitimate matrix representation, but representing a sparse matrix as:
sparse = [*sparse1, *sparse2, * sparse3]
where *sparse represents the output of a function that returns a sparse matrix
is not a legitimate matrix representation. It is simply a list of matrices.
The solution that I found is to use scipy.sparse.vstack to add sparse rows one by one to create the total sparse matrix that I was going for.


MLJ: selecting rows and columns for training in evaluate

I want to implement a kernel ridge regression that also works within MLJ. Moreover, I want to have the option to use either feature vectors or a predefined kernel matrix as in Python sklearn.
When I run this code
const MMI = MLJModelInterface
MMI.#mlj_model mutable struct KRRModel <: MLJModelInterface.Deterministic
mu::Float64 = 1::(_ > 0)
kernel::String = "linear"
K = MLJBase.matrix(K)
fitresult = inv(*I)*y
cache = nothing
report = nothing
return (fitresult,cache,report)
N = 10
K = randn(N,N)
K = K*K
a = randn(N)
y = K*a + 0.2*randn(N)
m = KRRModel()
kregressor = machine(m,K,y)
cv = CV(; nfolds=6, shuffle=nothing, rng=nothing)
evaluate!(kregressor, resampling=cv, measure=rms, verbosity=1)
the evaluate! function evaluates the machine on different subsets of rows of K. Due to the Representer Theorem, a kernel ridge regression has a number of nonzero coefficients equal to the number of samples. Hence, a reduced size matrix K[train_rows,train_rows] can be used instead of K[train_rows,:].
To denote I'm using a kernel matrix I'd set m.kernel = "" . How do I make evaluate! select the columns as well as the rows to form a smaller matrix when m.kernel = ""?
This is my first time using MLJ and I'd like to make as few modifications as possible.
Quoting the answer I got on the Julia Discourse from #ablaom
The intended use of evaluate! is to estimate the generalisation error
associated with some supervised learning model, by subsampling
observations, as in cross-validation, a common use-case. I’m afraid
there is no natural way for evaluate! do feature subsampling.
FYI: There is a version of kernel regression implementing the MLJ
model interface, namely kernel partial least squares regression from
the package GitHub - lalvim/PartialLeastSquaresRegressor.jl:
Implementation of a Partial Least Squares Regressor 2 .

Predicting sequence of grid coordinates with PyTorch

I have a similar open question here on Cross Validated (though not implementation focused, which I intend this question to be, so I think they are both valid).
I'm working on a project that uses sensors to monitor a persons GPS location. The coordinates will then be converted to a simple-grid representation. What I want to try and do is after recording a users routes, train a neural network to predict the next coordinates, i.e. take the example below where a user repeats only two routes over time, Home->A and Home->B.
I want to train an RNN/LSTM with sequences of varying lengths e.g. (14,3), (13,3), (12,3), (11,3), (10,3), (9,3), (8,3), (7,3), (6,3), (5,3), (4,3), (3,3), (2,3), (1,3) and then also predict with sequences of varying lengths e.g. for this example route if I called
route = [(14,3), (13,3), (12,3), (11,3), (10,3)] //pseudocode
pred = model.predict(route)
pred should give me (9,3) (or ideally even a longer prediction e.g. ((9,3), (8,3), (7,3), (6,3), (5,3), (4,3), (3,3), (2,3), (1,3))
How do I feed such training sequences to the init and forward operations identified below?
self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)
out, hidden = self.rnn(x, hidden)
Also, should the entire route be a tensor or each set of coordinates within the route a tensor?
I'm not very experienced with RNNs, but I'll give it a try.
A few things to pay attention to before we start:
1. Your data is not normalized.
2. The output prediction you want (even after normalization) is not bounded to [-1, 1] range and therefore you cannot have tanh or ReLU activations acting on the output predictions.
To address your problem, I propose a recurrent net that given a current state (2D coordinate) predicts the next state (2D coordinates). Note that since this is a recurrent net, there is also a hidden state associated with each location. At first, the hidden state is zero, but as the net sees more steps, it updates its hidden state.
I propose a simple net to address your problem. It has a single RNN layer with 8 hidden states, and a fully connected layer on to to output the prediction.
class MyRnn(nn.Module):
def __init__(self, in_d=2, out_d=2, hidden_d=8, num_hidden=1):
super(MyRnn, self).__init__()
self.rnn = nn.RNN(input_size=in_d, hidden_size=hidden_d, num_layers=num_hidden)
self.fc = nn.Linear(hidden_d, out_d)
def forward(self, x, h0):
r, h = self.rnn(x, h0)
y = self.fc(r) # no activation on the output
return y, h
You can use your two sequences as training data, each sequence is a tensor of shape Tx1x2 where T is the sequence length, and each entry is two dimensional (x-y).
To predict (during training):
rnn = MyRnn()
pred, out_h = rnn(seq[:-1, ...], torch.zeros(1, 1, 8)) # given time t predict t+1
err = criterion(pred, seq[1:, ...]) # compare prediction to t+1
Once the model is trained, you can show it first k steps and continue to predict the next steps:
with torch.no_grad():
pred, h = rnn(s[:k,...], torch.zeros(1, 1, 8, dtype=torch.float))
# pred[-1, ...] is the predicted next step
prev = pred[-1:, ...]
for j in range(k+1, s.shape[0]):
pred, h = rnn(prev, h) # note how we keep track of the hidden state of the model. it is no longer init to zero.
prev = pred
I put everything together in a colab notebook so you can play with it.
For simplicity, I ignored the data normalization here, but you can find it in the colab notebook.
What's next?
These types of predictions are prone to error accumulation. This should be addressed during training, by shifting the inputs from the ground truth "clean" sequences to the actual predicted sequences, so the model will be able to compensate for its errors.

Keras: model with one input and two outputs, trained jointly on different data (semi-supervised learning)

I would like to code with Keras a neural network that acts both as an autoencoder AND a classifier for semi-supervised learning. Take for example this dataset where there is a few labeled images and a lot of unlabeled images:
Some papers listed here achieved that, or very similar things, successfully.
To sum up: if the model would have the same input data shape and the same "encoding" convolutional layers, but would split into two heads (fork-style), so there is a classification head and a decoding head, in a way that the unsupervised autoencoder will contribute to a good learning for the classification head.
With TensorFlow there would be no problem doing that as we have full control over the computational graph.
But with Keras, things are more high-level and I feel that all the calls to ".fit" must always provide all the data at once (so it would force me to tie together the classification head and the autoencoding head into one time-step).
One way in keras to almost do that would be with something that goes like this:
input = Input(shape=(32, 32, 3))
cnn_feature_map = sequential_cnn_trunk(input)
classification_predictions = Dense(10, activation='sigmoid')(cnn_feature_map)
autoencoded_predictions = decode_cnn_head_sequential(cnn_feature_map)
model = Model(inputs=[input], outputs=[classification_predictions, ])
metrics=['accuracy'])[images], [labels, images], epochs=10)
However, I think and I fear that if I just want to fit things in that way it will fail and ask for the missing head:
for epoch in range(10):
# classifications step[images], [labels, None], epochs=1)
# "semi-unsupervised" autoencoding step[images], [None, images], epochs=1)
# note: ".train_on_batch" could probably be used rather than ".fit" to avoid doing a whole epoch each time.
How should one implement that behavior with Keras? And could the training be done jointly without having to split the two calls to the ".fit" function?
Sometimes when you don't have a label you can pass zero vector instead of one hot encoded vector. It should not change your result because zero vector doesn't have any error signal with categorical cross entropy loss.
My custom to_categorical function looks like this:
def tricky_to_categorical(y, translator_dict):
encoded = np.zeros((y.shape[0], len(translator_dict)))
for i in range(y.shape[0]):
if y[i] in translator_dict:
encoded[i][translator_dict[y[i]]] = 1
return encoded
When y contains labels, and translator_dict is a python dictionary witch contains labels and its unique keys like this:
{'unisex':2, 'female': 1, 'male': 0}
If an UNK label can't be found in this dictinary then its encoded label will be a zero vector
If you use this trick you also have to modify your accuracy function to see real accuracy numbers. you have to filter out all zero vectors from our metrics
def tricky_accuracy(y_true, y_pred):
mask = K.not_equal(K.sum(y_true, axis=-1), K.constant(0)) # zero vector mask
y_true = tf.boolean_mask(y_true, mask)
y_pred = tf.boolean_mask(y_pred, mask)
return K.cast(K.equal(K.argmax(y_true, axis=-1), K.argmax(y_pred, axis=-1)), K.floatx())
note: You have to use larger batches (e.g. 32) in order to prevent zero matrix update, because It can make your accuracy metrics crazy, I don't know why
Alternative solution
Use Pseudo Labeling :)
you can train jointly, you have to pass an array insted of single label.
I used fit_generator, e.g.
steps_per_epoch=len(dataset) / batch_size,
def batch_generator():
batch_x = np.empty((batch_size, img_height, img_width, 3))
gender_label_batch = np.empty((batch_size, len(gender_dict)))
category_label_batch = np.empty((batch_size, len(category_dict)))
while True:
i = 0
for idx in np.random.choice(len(dataset), batch_size):
image_id = dataset[idx][0]
batch_x[i] = load_and_convert_image(image_id)
gender_label_batch[i] = gender_labels[idx]
category_label_batch[i] = category_labels[idx]
i += 1
yield batch_x, [gender_label_batch, category_label_batch]

Feature Vectors in Radial Basis Function Network

I am trying to use RBFNN for point cloud to surface reconstruction but I couldn't understand what would be my feature vectors in RBFNN.
Can any one please help me to understand this one.
A goal to get to this:
From inputs like this:
An RBF network essentially involves fitting data with a linear combination of functions that obey a set of core properties -- chief among these is radial symmetry. The parameters of each of these functions is learned by incremental adjustment based on errors generated through repeated presentation of inputs.
If I understand (it's been a very long time since I used one of these networks), your question pertains to preprocessing of the data in the point cloud. I believe that each of the points in your point cloud should serve as one input. If I understand properly, the features are your three dimensions, and as such each point can already be considered a "feature vector."
You have other choices that remain, namely the number of radial basis neurons in your hidden layer, and the radial basis functions to use (a Gaussian is a popular first choice). The training of the network and the surface reconstruction can be done in a number of ways but I believe this is beyond the scope of the question.
I don't know if it will help, but here's a simple python implementation of an RBF network performing function approximation, with one-dimensional inputs:
import numpy as np
import matplotlib.pyplot as plt
def fit_me(x):
return (x-2) * (2*x+1) / (1+x**2)
def rbf(x, mu, sigma=1.5):
return np.exp( -(x-mu)**2 / (2*sigma**2));
# Core parameters including number of training
# and testing points, minimum and maximum x values
# for training and testing points, and the number
# of rbf (hidden) nodes to use
num_points = 100 # number of inputs (each 1D)
num_rbfs = 20.0 # number of centers
x_min = -5
x_max = 10
# Training data, evenly spaced points
x_train = np.linspace(x_min, x_max, num_points)
y_train = fit_me(x_train)
# Testing data, more evenly spaced points
x_test = np.linspace(x_min, x_max, num_points*3)
y_test = fit_me(x_test)
# Centers of each of the rbf nodes
centers = np.linspace(-5, 10, num_rbfs)
# Everything is in place to train the network
# and attempt to approximate the function 'fit_me'.
# Start by creating a matrix G in which each row
# corresponds to an x value within the domain and each
# column i contains the values of rbf_i(x).
center_cols, x_rows = np.meshgrid(centers, x_train)
G = rbf(center_cols, x_rows)
plt.title('Radial Basis Functions')
# Simple training in this case: use pseudoinverse to get weights
weights =, y_train)
# To test, create meshgrid for test points
center_cols, x_rows = np.meshgrid(centers, x_test)
G_test = rbf(center_cols, x_rows)
# apply weights to G_test
y_predict =, weights)
plt.title('Predicted function')
error = y_predict - y_test
plt.title('Function approximation error')
First, you can explore the way in which inputs are provided to the network and how the RBF nodes are used. This should extend to 2D inputs in a straightforward way, though training may get a bit more involved.
To do proper surface reconstruction you'll likely need a representation of the surface that is altogether different than the representation of the function that's learned here. Not sure how to take this last step.

Label Propagation in sklearn is classifying every vector as 1

I have 2000 labelled data (7 different labels) and about 100K unlabeled data and I am trying to use sklearn.semi_supervised.LabelPropagation. The data has 1024 dimensions. My problem is that the classifier is labeling everything as 1. My code looks like this:
X_unlabeled = X_unlabeled[:10000, :]
X_both = np.vstack((X_train, X_unlabeled))
y_both = np.append(y_train, -np.ones((X_unlabeled.shape[0],)))
clf = LabelPropagation(max_iter=100).fit(X_both, y_both)
y_pred = clf.predict(X_test)
y_pred is all ones. Also, X_train is 2000x1024 and X_unlabeled is a subset of the unlabeled data which is 10000x1024.
I also get this error upon calling fit on the classifier:
/usr/local/lib/python2.7/site-packages/sklearn/semi_supervised/ RuntimeWarning: invalid value encountered in divide
self.label_distributions_ /= normalizer
Have you tried different values for the gamma parameter ? As the graph is constructed by computing an rbf kernel, the computation includes an exponential and the python exponential functions return 0 if the value is a too big negative number (see And if the graph is filled with 0, the label_distributions_ is filled with "nan" (because of normalization) and a warning appears. (be careful, the gamma value in scikit implementation is multiplied to the euclidean distance, it's not the same thing as in the Zhu paper.)
The LabelPropagation will finally be fixed in version 0.19
