I am trying to classify different ECG signals. I am using Keras' Conv1D, but am not getting any good results.
I have tried changing the number of layers, window size, etc, but every time I run this I get predictions all of the same class (the classes are 0,1,2, so I get a prediction output of something like [1,1,1,1,1,1,1,1,1,1,1,1,1,1], but the class changes each time I run the script).
The ECG signals are in 1000 point numpy arrays.
Are there any glaringly obvious things I am doing wrong here? I was thinking it would've worked great to use a few layers to just classify into 3 different ECG signals.
#arrange and randomize data
y1=[[0]]*len(lead1)
y2=[[1]]*len(lead2)
y3=[[2]]*len(lead3)
y=np.concatenate((y1,y2,y3))
data=np.concatenate((lead1,lead2,lead3))
data = keras.utils.normalize(data)
data=np.concatenate((data,y),axis=1)
data=np.random.permutation((data))
print(data)
#separate data and create categories
Xtrain=data[0:130,0:-1]
Xtrain=np.reshape(Xtrain,(len(Xtrain),1000,1))
Xpred=data[130:,0:-1]
Xpred=np.reshape(Xpred,(len(Xpred),1000,1))
Ytrain=data[0:130,-1]
Yt=to_categorical(Ytrain)
Ypred=data[130:,-1]
Yp=to_categorical(Ypred)
#create CNN model
model = Sequential()
model.add(Conv1D(20,20,activation='relu',input_shape=(1000,1)))
model.add(MaxPooling1D(3))
model.add(Conv1D(20,10,activation='relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(20,10,activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dense(3,activation='relu',use_bias=False))
model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])
model.fit(Xtrain,Yt)
#test model
print(model.evaluate(Xpred,Yp))
print(model.predict_classes(Xpred,verbose=1))
Are there any glaringly obvious things I am doing wrong here?
Indeed there is: the output you report is not surprising, given that you are currently using the ReLU as activation for your last layer, which does not make any sense.
In multi-class settings, such as yours, the activation of the last layer must be the softmax, and certainly not the ReLU; change your last layer to:
model.add(Dense(3, activation='softmax'))
Not quite sure why you ask for use_bias=False, but you can try both with and without it and experiment...
Related
I am a 3-month DL freshman who is doing small NLP projects with Pytorch.
Recently I am trying to reappear a GAN network introduced by a paper, using my own text data, to generate some specific kinds of question sentences.
Here is some background... If you have no time or interest about it, just kindly read the following question is OK.
As that paper says, the generator is firstly trained normally with normal question data to make that the output at least looks like a real question. Then by using an auxiliary classifier's result (of classifying the outputs), the generator is trained again to just generate the specific (several unique categories) questions.
However, as the paper do not reveal its code, I have to do the code all myself. I have these three training thoughts, but I do not know their differences, could you kindly tell me about it?
If they have almost the same effect, could you tell me which is more recommended in Pytorch's grammar? Thank you very much!
Suppose the discriminator loss to generator is loss_G_D, the classifier loss to generator is loss_G_C, and loss_G_D and loss_G_C has the same shape, i.e. [batch_size, loss value], then what is the difference?
1.
optimizer.zero_grad()
loss_G_D = loss_func1(discriminator(generated_data))
loss_G_C = loss_func2(classifier(generated_data))
loss = loss_G+loss_C
loss.backward()
optimizer.step()
optimizer.zero_grad()
loss_G_D = loss_func1(discriminator(generated_data))
loss_G_D.backward()
loss_G_C = loss_func2(classifier(generated_data))
loss_G_C.backward()
optimizer.step()
optimizer.zero_grad()
loss_G_D = loss_func1(discriminator(generated_data))
loss_G_D.backward()
optimizer.step()
optimizer.zero_grad()
loss_G_C = loss_func2(classifier(generated_data))
loss_G_C.backward()
optimizer.step()
Additional info: I observed that the classifier's classification loss is always very big compared with generator's loss, like -300 vs 3. So maybe the third one is better?
First of all:
loss.backward() backpropagates the error and assigns a gradient for every parameter along the way that has requires_grad=True.
optimizer.step() updates the model parameters using their stored gradients
optimizer.zero_grad() sets the gradients to 0, so that you can backpropagate your loss and update your model parameters for each batch without interfering with other batches.
1 and 2 are quite similar, but if your model uses batch statistics or you have an adaptive optimizer they will probably perform differently. However, for instance, if your model doesn't use batch statistics and you have a plain old SGD optimizer, they will produce the same result, even though 1 would be faster since you do the backprop only once.
3 is a completely different case, since you update your model parameters with loss_G_D.backward() and optimizer.step() before processing and backpropagating loss_G_C.
Given all of these, it's up to you which one to choose depending on your application.
After a few tries, I had trained a GAN to produce semi-sensible output. In this model, it almost instantly found a solution and got stuck there. The loss for both the discriminator and generator were 0.68 (I have used a BCE loss), and the accuracies for both went to around 50%. The output of the generator looked at first glance good enough to be real data, but after analysing it I could see it was still not very good.
My solution here was to increase the power of the discriminator (increased the size of it) and re-train. I hoped by making it larger it would force the generator to create better samples. I got the following output.
It seems that as the GAN loss increases, and is producing worse samples, the discriminator can pick it out more easily.
When I check my output from the trained generator I see it follows some basic rules the real data is following, but again under closer scrutiny, they fail more complex tests the real data would pass. I would like to improve this.
My questions are:
Is my above interpretation of the plots correct?
For this run, have I made the discriminator to powerful? Should I increase the power of the generator?
Is there another technique I should investigate to stop this form of mode collapse?
EDIT: The architecture I am using is a form of Graph GAN. The generator is just a series of linear layers. The discriminator is 3 Graph Conv Layers, then some linear layers. Slightly similar to this paper. Two potentially unconventional things I am doing:
There is no batch normalisation, I have found this has a very negative effect on the training. Though I could try and persevere with it.
I am using StandardScaler to scale my data. This choice was made as it easily allows you to unscale data. This is useful as I can take the output of the generator and easily transform it into an original scale. However, StandardScaler does not scale things between 1 and -1, so I cannot use tanh as the final activation function of my generator, instead, the final layer of the generator is just Linear.
The outputs of the GAN (once rescaled and the shape has been changed) are similar to:
[[ 46.09169 -25.462175 20.705683 -31.696495 ]
[ 35.10637 -18.956036 15.20579 -24.803787 ]
[ 10.253135 -5.759581 5.9068713 -6.3003526]]
An example of the truth is:
[[ 45.6 30.294546 -17.218746 -29.41284 ]
[ 1.8186008 1.7064333 0.5984112 0.19312467]
[ 44.31433 28.234058 -17.615921 -29.262213 ]]
Notably, the top-left value in the matrix will always be 45.6. My Generator does not even consistently produce this.
I've created a model using google clouds vision api. I spent countless hours labeling data, and trained a model. At the end of almost 20 hours of "training" the model, it's still hit and miss.
How can I iterate on this model? I don't want to lose the "learning" it's done so far.. It works about 3/5 times.
My best guess is that I should loop over the objects again, find where it's wrong, and label accordingly. But I'm not sure of the best method for that. Should I be labeling all images where it "misses" as TEST data images? Are there best practices or resources I can read on this topic?
I'm by no means an expert, but here's what I'd suggest in order of most to least important:
1) Add more data if possible. More data is always a good thing, and helps develop robustness with your network's predictions.
2) Add dropout layers to prevent over-fitting
3) Have a tinker with kernel and bias initialisers
4) [The most relevant answer to your question] Save the training weights of your model and reload them into a new model prior to training.
5) Change up the type of model architecture you're using. Then, have a tinker with epoch numbers, validation splits, loss evaluation formulas, etc.
Hope this helps!
EDIT: More information about number 4
So you can save and load your model weights during or after the model has trained. See here for some more in-depth information about saving.
Broadly, let's cover the basics. I'm assuming you're going through keras but the same applies for tf:
Saving the model after training
Simply call:
model_json = model.to_json()
with open("{Your_Model}.json", "w") as json_file:
json_file.write(model_json)
# serialize weights to HDF5
model.save_weights("{Your_Model}.h5")
print("Saved model to disk")
Loading the model
You can load the model structure from json like so:
from keras.models import model_from_json
json_file = open('{Your_Model.json}', 'r')
loaded_model_json = json_file.read()
json_file.close()
model = model_from_json(loaded_model_json)
And load the weights if you want to:
model.load_weights('{Your_Weights}.h5', by_name=True)
Then compile the model and you're ready to retrain/predict. by_name for me was essential to re-load the weights back into the same model architecture; leaving this out may cause an error.
Checkpointing the model during training
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath={checkpoint_path},
save_weights_only=True,
verbose=1)
# Train the model with the new callback
model.fit(train_images,
train_labels,
epochs=10,
validation_data=(test_images,test_labels),
callbacks=[cp_callback]) # Pass callback to training
I've built an LSTM In Keras with the goal of predicting future values of a time-series from a high-dimensional, time-index input.
However, there's a unique requirement: for certain time points in the future, we know with certainty what some values of the input series will be. For example:
model = SomeLSTM()
trained_model = model.train(train_data)
known_data = [(24, {feature: 2, val: 7.0}), (25, {feature: 2, val: 8.0})]
predictions = trained_model(look_ahead=48, known_data=known_data)
Which would train the model up to time t (the end of training), and predict forward 48 time periods from time t, but substituting known_data values for feature 2 at times 24 and 25.
How exactly can I explicitly inject this into the LSTM at some time?
For reference, here's the model:
model = Sequential()
model.add(LSTM(hidden, input_shape=(look_back, num_features)))
model.add(Dropout(dropout))
model.add(Dense(look_ahead))
model.add(Activation('linear'))
This may be a result of my un-intuitive grasp of LSTMs, and I'd appreciate any clarification. I've dived into the Keras source code, and my first guess is to inject it right into the LSTM state variable, but I'm unsure how to do that at time t (or even if that is correct.)
I think a clean way of doing this is to introduce 2*look_ahead new features, where for each 0 <= i < look_ahead 2*i-th feature is an indicator whether the value of the i-th time step is known and (2*i+1)-th is the value itself (0 if not known). Accordingly, you can generate training data with these features to make your model take into account these known values.
I am not exactly sure what you are trying to do, but maybe create your own layer to go at the end that sets the data to the known values, similar to how dropout sets random values to zero. As a side note, I have had better results with pooling than dropout, so maybe try switching that out and training it. Here is a good guide on how to do it. https://www.tutorialspoint.com/keras/keras_customized_layer.htm
First of all: I'm completely new to Machine Learning and TensorFlow - I'm just playing around with this technology for a few weeks - and I really like it.
But I have (maybe a simple) question about the MNIST data set in combination with TensorFlow: I'm currently working through the "MNIST for ML Beginners" tutorial (https://www.tensorflow.org/versions/r0.11/tutorials/mnist/beginners/index.html#mnist-for-ml-beginners). I fully understand how the whole thing works, and what I accomplish with the source code.
My question is now the following:
Is it possible to see the individual weights parameters for each pixel? As far as I understand I can't access the individual weight parameters directly for each pixel, because the tf.matmul() operation returns me the sum over all weight parameters for a given class.
I want to access the individual weight parameters, because I want to see how these values are changing through the training process of the Neural Network.
Thanks for your help,
-Klaus
You can get the actual weights by just doing something like:
w = sess.run(W, feed_dict={x: batch_xs, y_: batch_ys})
print w.shape
If you want the per pixel results, just do a element-wise multiply of batch_xs * w (reshaped appropriately.)