LIBSVM gives same prediction for samples of untrained class. Why? - machine-learning

I have trained an SVM on two classes. One is that of genuine user samples. The second is many negative samples of equal sample size to that of the genuine user. I have tested this system on class that has not been used for training. The results are interesting and I cannot explain them; I don't know whether this is expected, an SVM issue or something different.
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
(0:0.9104172110162648)(1:0.08958278898373527)(Actual:1.0 Prediction:0.0)
Above is an example of the sort of output I get for different samples of an untrained and unseen class. It's exactly the same for each sample. I would expect them to be closer to class 1.0, and I would also expect at least a change in the probabilities!

Have you cross validated your models' performance at all? Have you done a grid search for the hyper parameters?
Your output could potentially be easy explained as poor execution of Machine Learning. If you are using the RBF kernel, and its width is too small, then the only factor that will contribute to a classification decision is the bias term. The bias term, by its nature, is the same for all inputs. Thus, you would get all the same outputs exactly (or almost) for all unseen test data (unless it was very very close to a training datum).
I can't say for sure this is what happened without knowing more details about your data and what you actually did. But this seems a likely scenario.

Related

TFF : change the code have no effect in changing test accuracy values

To improve this tutorial and test other things, I was pretrained the network with a centralized way in EMNIST database. Then I would like to Fine tune the pretrained network with a federated code above.
So, I only added:
def create_keras_model():
return tf.keras.models.Sequential([
tf.keras.models.load_model(path/to/model, compile=False)
tf.keras.layers.Dense(10, kernel_initializer='zeros'),
tf.keras.layers.Softmax(),
])
The problem is that I find same test accuracy values compared to test accuracy values without fine tuning a pretrained network.
Can you please give me solution.

Keras Conv1D on ECG Signal

I am trying to classify different ECG signals. I am using Keras' Conv1D, but am not getting any good results.
I have tried changing the number of layers, window size, etc, but every time I run this I get predictions all of the same class (the classes are 0,1,2, so I get a prediction output of something like [1,1,1,1,1,1,1,1,1,1,1,1,1,1], but the class changes each time I run the script).
The ECG signals are in 1000 point numpy arrays.
Are there any glaringly obvious things I am doing wrong here? I was thinking it would've worked great to use a few layers to just classify into 3 different ECG signals.
#arrange and randomize data
y1=[[0]]*len(lead1)
y2=[[1]]*len(lead2)
y3=[[2]]*len(lead3)
y=np.concatenate((y1,y2,y3))
data=np.concatenate((lead1,lead2,lead3))
data = keras.utils.normalize(data)
data=np.concatenate((data,y),axis=1)
data=np.random.permutation((data))
print(data)
#separate data and create categories
Xtrain=data[0:130,0:-1]
Xtrain=np.reshape(Xtrain,(len(Xtrain),1000,1))
Xpred=data[130:,0:-1]
Xpred=np.reshape(Xpred,(len(Xpred),1000,1))
Ytrain=data[0:130,-1]
Yt=to_categorical(Ytrain)
Ypred=data[130:,-1]
Yp=to_categorical(Ypred)
#create CNN model
model = Sequential()
model.add(Conv1D(20,20,activation='relu',input_shape=(1000,1)))
model.add(MaxPooling1D(3))
model.add(Conv1D(20,10,activation='relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(20,10,activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dense(3,activation='relu',use_bias=False))
model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])
model.fit(Xtrain,Yt)
#test model
print(model.evaluate(Xpred,Yp))
print(model.predict_classes(Xpred,verbose=1))
Are there any glaringly obvious things I am doing wrong here?
Indeed there is: the output you report is not surprising, given that you are currently using the ReLU as activation for your last layer, which does not make any sense.
In multi-class settings, such as yours, the activation of the last layer must be the softmax, and certainly not the ReLU; change your last layer to:
model.add(Dense(3, activation='softmax'))
Not quite sure why you ask for use_bias=False, but you can try both with and without it and experiment...

Tensorflow single sigmoid output with log loss vs two linear outputs with sparse softmax cross entropy loss for binary classification

I am experimenting with a binary classifier implementation in TensorFlow. If I have two plain outputs (i.e. no activation) in the final layer and use tf.losses.sparse_softmax_cross_entropy, my network trains as expected. However, if I change the output layer to produce a single output with a tf.sigmoid activation and use tf.losses.log_loss as the loss function, my network does not train (i.e. loss/accuracy does not improve).
Here is what my output layer/loss function looks like in the first (i.e. working) case:
out = tf.layers.dense(prev, 2)
loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=out)
In the second case, I have the following:
out = tf.layers.dense(prev, 1, activation=tf.sigmoid)
loss = tf.losses.log_loss(labels=y, predictions=out)
Tensor y is a vector of 0/1 values; it is not one-hot encoded. The network learns as expected in the first case, but not in the second case. Apart from these two lines, everything else is kept the same.
I do not understand why the second set-up does not work. Interestingly, if I express the same network in Keras and use the second set-up, it works. Am I using the wrong TensorFlow functions to express my intent in the second case? I'd like to produce a single sigmoid output and use binary cross-entropy loss to train a simple binary classifier.
I'm using Python 3.6 and TensorFlow 1.4.
Here is a small, runnable Python script to demonstrate the issue. Note that you need to have downloaded the StatOil/C-CORE dataset from Kaggle to be able to run the script as is.
Thanks!
Using a sigmoid activation on two outputs doesn't give you a probability distribution:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()
start = tf.constant([[4., 5.]])
out_dense = tf.layers.dense(start, units=2)
print("Logits (un-transformed)", out_dense)
out_sigmoid = tf.layers.dense(start, units=2, activation=tf.sigmoid)
print("Elementwise sigmoid", out_sigmoid)
out_softmax = tf.nn.softmax(tf.layers.dense(start, units=2))
print("Softmax (probability distribution)", out_softmax)
Prints:
Logits (un-transformed) tf.Tensor([[-3.64021587 6.90115976]], shape=(1, 2), dtype=float32)
Elementwise sigmoid tf.Tensor([[ 0.94315267 0.99705648]], shape=(1, 2), dtype=float32)
Softmax (probability distribution) tf.Tensor([[ 0.05623185 0.9437682 ]], shape=(1, 2), dtype=float32)
Instead of tf.nn.softmax, you could also use tf.sigmoid on a single logit, then set the other output to one minus that.

Layers for predicting financial data using Tensorflow/tflearn

I'd like to predict the interest rate and I've got some relevant factors like stock index and money supply number, something like that. The number of factors may be up to 200.
For example,the training data like, X contains factors and y is the interest rate I want to train and predict.
factor1 factor2 factor3 factor176 factor177 factor178
X= [[ 2.1428 6.1557 5.4101 ..., 5.86 6.0735 6.191 ]
[ 2.168 6.1533 5.2315 ..., 5.8185 6.0591 6.189 ]
[ 2.125 4.7965 3.9443 ..., 5.7845 5.9873 6.1283]...]
y= [[ 3.5593]
[ 3.014 ]
[ 2.7125]...]
So I want to use tensorflow/tflearn to train this model but I don't really know what method exactly I should choose to do regression. I have tried LinearRegression from tflearn before, but the result is not so great.
For now, I just use the code I found online.
net = tflearn.input_data([None, 178])
net = tflearn.fully_connected(net, 64, activation='linear',
weight_decay=0.0005)
net = tflearn.fully_connected(net, 1, activation='linear')
net = tflearn.regression(net, optimizer=
tflearn.optimizers.AdaGrad(learning_rate=0.01, initial_accumulator_value=0.01),
loss='mean_square', learning_rate=0.05)
model = tflearn.DNN(net, tensorboard_verbose=0, checkpoint_path='tmp/')
model.fit(X, y, show_metric=True,
batch_size=1, n_epoch=100)
The result is roughly 50% accuracy when the error range is ±10%.
I have tried to make the window to 7 days but the result is still bad. So I want to know what additional layer I can use to make this network better.
First of all this network makes no sense. If you do not have any activations on your hidden units, you network is equivalent to linear regression.
So first of all change
net = tflearn.fully_connected(net, 64, activation='linear',
weight_decay=0.0005)
to
net = tflearn.fully_connected(net, 64, activation='relu',
weight_decay=0.0005)
Another general thing is to always normalise your data. Your X's are big, y's are big as well - make sure they aren't, by for example whitening them (making them 0 mean and 1 std).
Finding right architecture is hard problem and you will not find any "magical recipies" for that. Start with understanding what you are doing. Log your training, see if the training loss converges to small values, if it does not - you either do not train long enough, network is too small, or training hyperparameters are off (like too big learning right, too high regularisation etc.)

Encog Neural Network - How to actually run testing data

I've been able to train the network, and gotten it trained down to the minimal error I want...
I don't actually see anywhere, even when I looked through the guide book, how to test the trained network on new data... I split part of my training data apart so that I can test the network's results on untrained data since I'm using it for classification. Here is the code I've got, not sure what to do with the MLData output. For classification, I just want to take the output neuron with the highest value... aka, most likely to be the correct classification node.
MLDataSet testingSet = new BasicMLDataSet(testingTraining, testingIdeal);
System.out.println("Test Results:");
for(MLDataPair pair: testingSet ) {
final MLData output = network.compute(pair.getInput());
//what do I do with this output?
}
(My testing data is obviously tagged with the correct classifications...)
Well it depends on what problem you have at hand, but the idea is that your output should be as close as possible to the test dataset output, so I suggest comparing that. For example, if this is a classification task, your output will be iterable and you should be able to work out what the selected output class is and compare it to the target. You can the work out a misclassification rate, or any other measure of accuracy (precision, recall, F1-score..). So something like:
int bad = 0;
for(MLDataPair pair: testingSet)
{
MLData output = network.compute(pair.getInput());
if(outputClass(output) != outputClass(pair.getIdeal()))
bad++;
}
double misclassificationRate = bad / testingSet.size()
You would have to write outputClass appropriately so that it returns the classification output, of course.
For regression you can do something similar, but instead of the mapping you would be looking at some distance measure between the two outputs to work out your error.

Resources