I`m beggener in time series neural networks "LSTM" , I'm using it with pythonnet in visual studio, but I've tried a lot in training to no avail where the accuracy never increases , it's very small and loss function is nan . Is this from data or network architecture ? anyone can help ?
var trainx_data_numpy = data.trainX.reshape(218 , 201 , 1 );
var trainY_data_numpy = data.trainY.reshape(218, 201, 1);
var model = new Sequential();
model.Add(new LSTM(128, activation: "relu", input_shape: new Shape(
data.inputDimention.FD,
data.inputDimention.SD)
, return_sequences: true
));
model.Add(new Dropout(0.6));
model.Add(new LSTM(128 , return_sequences: true));
model.Add(new Dropout(0.6));
model.Add(new LSTM(128));
model.Add(new Dense(1));
model.Compile(optimizer: "adam" , loss: "mse", metrics: new string[] { "accuracy" });
var result = model.Fit(trainx_data_numpy,
trainY_data_numpy, batch_size: 1,
epochs: 10, verbose: 1 , validation_split:0.1f);
Related
I'm creating anti-phishing algorithm and I am facing problem of final labeling.
The possible labels are -1, 0, 1.
My neural network build is:
def create_model(self, n_hidden = 1, n_neurons = 30, learning_rate = 3e-3, input_shape = 30):
model = Sequential()
model.add(InputLayer(input_shape = input_shape))
for layer in range(n_hidden):
model.add(Dense(n_neurons, activation = "relu"))
model.add(Dense(n_neurons, activation="softmax"))
model.add(Dense(1))
optimizer = SGD(learning_rate = learning_rate)
model.compile(loss="mse", optimizer = optimizer, metrics=BinaryAccuracy())
return model
The results I am getting
looks like this:
[ 0.9413019 0.23881185 0.34624332 ... 1.0277238 -1.0253198
-1.0065029 ]
How can I make my network round the result to one of mentioned labels?
I'm trying to predict 'news' data by 'LSTM Many to One' model. I used keras. What should I change in my model to increase accuracy?
Current accuracy is : 55%
number of labels are : 68
data dimensions:
train_seq_x (16254, 499)
encoded_train_y (16254, 68)
test_seq_x (1807, 499)
test_y (1807,)
Model definition:
def train_model(classifier, feature_vector_train, label, feature_vector_valid, is_neural_net):
classifier.fit(feature_vector_train, label,epochs=10,batch_size=32,validation_split=0.05,shuffle=False)
#predict the labels on validation dataset
predictions = classifier.predict(feature_vector_valid)
if is_neural_net:
predictions = predictions.argmax(axis=-1)
return metrics.accuracy_score(predictions, test_y)
def create_rnn_lstm():
input_layer = layers.Input((train_seq_x.shape[1], ))
embedding_layer = layers.Embedding(len(word_index) + 1, 300, weights=[embedding_matrix], trainable=False)(input_layer)
lstm_layer1 = layers.LSTM(128)(embedding_layer)
output_layer2 = layers.Dense(68, activation="softmax")(lstm_layer1)
model = models.Model(inputs=input_layer, outputs=output_layer2)
model.compile(optimizer=optimizers.Adam(), loss='categorical_crossentropy',metrics=['accuracy'])
return model
classifier = create_rnn_lstm()
classifier.summary()
accuracy = train_model(classifier, train_seq_x, encoded_train_y, test_seq_x, is_neural_net=True)
print "LSTM, Word Embeddings", accuracy
I am trying to construct a siamese neural network to take in two facial expressions and output the probability that the two images are similar. I have 5 people, with 10 expressions per person, so 50 total images, but with Siamese, I can generate 2500 pairs (with repetition). I have already run dlib's facial landmark detection on each of the 50 images, so each of the two inputs to the siamese net are two flattened 136,1 element arrays. The siamese structure is below:
input_shape = (136,)
left_input = Input(input_shape, name = 'left')
right_input = Input(input_shape, name = 'right')
convnet = Sequential()
convnet.add(Dense(50,activation="relu"))
encoded_l = convnet(left_input)
encoded_r = convnet(right_input)
L1_layer = Lambda(lambda tensors:K.abs(tensors[0] - tensors[1]))
L1_distance = L1_layer([encoded_l, encoded_r])
prediction = Dense(1,activation='relu')(L1_distance)
siamese_net = Model(inputs=[left_input,right_input],outputs=prediction)
optimizer = Adam()
siamese_net.compile(loss="binary_crossentropy",optimizer=optimizer)
I have an array called x_train, which is 80% of all possible labels and whose elements are a list of lists. Data is a 5x10x64x2 matrix, where there are 5 ppl, 10 expressions per, 64 facial landmarks, and 2 positons (x,y) per landmark
x_train = [ [ [Person, Expression] , [Person, Expression] ], ...]
data = np.load('data.npy')
My train loop goes as follows:
def train(data, labels, network, epochs, batch_size):
track_loss = defaultdict(list)
for i in range(0, epochs):
iterations = len(labels)//batch_size
remain = len(labels)%batch_size
shuffle(labels)
print('Epoch%s-----' %(i + 1))
for j in range(0, iterations):
batch = [j*batch_size, j*batch_size + batch_size]
if(j == iterations - 1):
batch[1] += remain
mini_batch = np.zeros(shape = (batch[1] - batch[0], 2, 136))
for k in range(batch[0], batch[1]):
prepx = data[labels[k][0][0],labels[k][0][1],:,:]
prepy = data[labels[k][1][0],labels[k][1][1],:,:]
mini_batch[k - batch[0]][0] = prepx.flatten()
mini_batch[k - batch[0]][1] = prepy.flatten()
targets = np.array([1 if(labels[i][0][1] == labels[i][1][1]) else 0 for i in range(batch[0], batch[1])])
new_batch = mini_batch.reshape(batch[1] - batch[0], 2, 136, 1)
new_targets = targets.reshape(batch[1] - batch[0], 1)
#print(mini_batch.shape, targets.shape)
loss=siamese_net.train_on_batch(
{
'left': mini_batch[:, 0, :],
'right': mini_batch[:, 1, :]
},targets)
track_loss['Epoch%s'%(i)].append(loss)
return network, track_loss
siamese_net, track_loss = train(data, x_train,siamese_net, 20, 30)
The value of each element in the target array is either a 0 or 1, depending on whether the two expressions inputted into the net are different or the same.
Although, I have seen in the omniglot example there were far more images and test images, my neural net doesn't have any decrease in loss.
EDIT:
Here is the new loss with the fixed target tensor:
Epoch1-----Loss: 1.979214
Epoch2-----Loss: 1.631347
Epoch3-----Loss: 1.628090
Epoch4-----Loss: 1.634603
Epoch5-----Loss: 1.621578
Epoch6-----Loss: 1.631347
Epoch7-----Loss: 1.631347
Epoch8-----Loss: 1.631347
Epoch9-----Loss: 1.621578
Epoch10-----Loss: 1.634603
Epoch11-----Loss: 1.634603
Epoch12-----Loss: 1.621578
Epoch13-----Loss: 1.628090
Epoch14-----Loss: 1.624834
Epoch15-----Loss: 1.631347
Epoch16-----Loss: 1.634603
Epoch17-----Loss: 1.628090
Epoch18-----Loss: 1.631347
Epoch19-----Loss: 1.624834
Epoch20-----Loss: 1.624834
I want to know how to improve my architecture and training procedure, or even data prep, in order to improve the performance of the neural net. I assumed that using dlib's facial landmark detection would simplify complexity of the neural net, but I am starting to doubt that hypothesis.
I want to use FanChenLinSupportVectorRegression in accord.net. The predictions are correct for the learning inputs but the model doesn't work for other inputs. I don't understand my mistake?
In the below example, the first prediction is good, however if we want to predict a configuration not learned, the prediction is always the same regardless of the inputs:
// Declare a very simple regression problem
// with only 2 input variables (x and y):
double[][] inputs =
{
new[] { 3.0, 1.0 },
new[] { 7.0, 1.0 },
new[] { 3.0, 1.0 },
new[] { 3.0, 2.0 },
new[] { 6.0, 1.0 },
};
// The task is to output a weighted sum of those numbers
// plus an independent constant term: 7.4x + 1.1y + 42
double[] outputs =
{
7.4*3.0 + 1.1*1.0 + 42.0,
7.4*7.0 + 1.1*1.0 + 42.0,
7.4*3.0 + 1.1*1.0 + 42.0,
7.4*3.0 + 1.1*2.0 + 42.0,
7.4*6.0 + 1.1*1.0 + 42.0,
};
// Create a LibSVM-based support vector regression algorithm
var teacher = new FanChenLinSupportVectorRegression<Gaussian>()
{
Tolerance = 1e-5,
// UseKernelEstimation = true,
// UseComplexityHeuristic = true
Complexity = 10000,
Kernel = new Gaussian(0.1)
};
// Use the algorithm to learn the machine
var svm = teacher.Learn(inputs, outputs);
// Get machine's predictions for inputs
double[] prediction = svm.Score(inputs);
// It's OK the predictions are correct
double[][] inputs1 =
{
new[] { 2.0, 2.0 },
new[] { 5.0, 1.0 },
};
prediction = svm.Score(inputs1);
// predictions are wrong! what is my mistake?
I going to do Anomaly detection on my own images by using the example on deeplearning4j platform. And I change the code like this:
int rngSeed=123;
Random rnd = new Random(rngSeed);
int width=28;
int height=28;
int batchSize = 128;
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(12345)
.iterations(1)
.weightInit(WeightInit.XAVIER)
.updater(Updater.ADAGRAD)
.activation(Activation.RELU)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.learningRate(0.05)
.regularization(true).l2(0.0001)
.list()
.layer(0, new DenseLayer.Builder().nIn(784).nOut(250)
.build())
.layer(1, new DenseLayer.Builder().nIn(250).nOut(10)
.build())
.layer(2, new DenseLayer.Builder().nIn(10).nOut(250)
.build())
.layer(3, new OutputLayer.Builder().nIn(250).nOut(784)
.lossFunction(LossFunctions.LossFunction.MSE)
.build())
.pretrain(false).backprop(true)
.build();
MultiLayerNetwork net = new MultiLayerNetwork(conf);
net.setListeners(Collections.singletonList((IterationListener) new ScoreIterationListener(1)));
File trainData = new File("mnist_png/training");
FileSplit fsTrain = new FileSplit(trainData, NativeImageLoader.ALLOWED_FORMATS, rnd);
ImageRecordReader recorderReader = new ImageRecordReader(height, width);
recorderReader.initialize(fsTrain);
DataSetIterator dataIt = new RecordReaderDataSetIterator(recorderReader, batchSize);
List<INDArray> featuresTrain = new ArrayList<>();
while(dataIt.hasNext()){
DataSet ds = dataIt.next();
featuresTrain.add(ds.getFeatureMatrix());
}
System.out.println("************ training **************");
int nEpochs = 30;
for( int epoch=0; epoch<nEpochs; epoch++ ){
for(INDArray data : featuresTrain){
net.fit(data,data);
}
System.out.println("Epoch " + epoch + " complete");
}
And it threw an exception while training:
Exception in thread "main" org.deeplearning4j.exception.DL4JInvalidInputException: Input that is not a matrix; expected matrix (rank 2), got rank 4 array with shape [128, 1, 28, 28]
at org.deeplearning4j.nn.layers.BaseLayer.preOutput(BaseLayer.java:363)
at org.deeplearning4j.nn.layers.BaseLayer.activate(BaseLayer.java:384)
at org.deeplearning4j.nn.layers.BaseLayer.activate(BaseLayer.java:405)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.activationFromPrevLayer(MultiLayerNetwork.java:590)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.feedForwardToLayer(MultiLayerNetwork.java:713)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:1821)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:151)
at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:54)
at org.deeplearning4j.optimize.Solver.optimize(Solver.java:51)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1443)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1408)
at org.deeplearning4j.examples.dataExamples.AnomalyTest.main(AnomalyTest.java:86)
It seem that my input dataset has 4 columns while it need just 2 columes, so the question is how to convert imagerecorderread or something else to make it running properly?
So first of all, you may want to understand what a tensor is:
http://nd4j.org/tensor
The record reader returns a multi dimensional image, you need to flatten it in order for it to be used with a 2d neural net unless you plan on using CNNs for part of your training.
If you take a look at the exception (again you really should be familiar with ndarrays, they aren't new and are used in every deep learning library): you'll see a shape of:
[128, 1, 28, 28]
That is batch size by channels by rows x columns. You need to do a:
.setInputType(InputType.convolutional(28,28,1))
This will tell dl4j that it needs to flatten the 4d to a 2d. In this case it indicates that there's a rows,columns,channels of 28 x 28 x 1
If you add this to the bottom of your config it will work.
Of note if you are trying to do anomaly detection is we also have variational autoencoders you may want to look in to as well.