Do Anomaly detection on my own images use deeplearning4j - deeplearning4j

I going to do Anomaly detection on my own images by using the example on deeplearning4j platform. And I change the code like this:
int rngSeed=123;
Random rnd = new Random(rngSeed);
int width=28;
int height=28;
int batchSize = 128;
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(12345)
.iterations(1)
.weightInit(WeightInit.XAVIER)
.updater(Updater.ADAGRAD)
.activation(Activation.RELU)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.learningRate(0.05)
.regularization(true).l2(0.0001)
.list()
.layer(0, new DenseLayer.Builder().nIn(784).nOut(250)
.build())
.layer(1, new DenseLayer.Builder().nIn(250).nOut(10)
.build())
.layer(2, new DenseLayer.Builder().nIn(10).nOut(250)
.build())
.layer(3, new OutputLayer.Builder().nIn(250).nOut(784)
.lossFunction(LossFunctions.LossFunction.MSE)
.build())
.pretrain(false).backprop(true)
.build();
MultiLayerNetwork net = new MultiLayerNetwork(conf);
net.setListeners(Collections.singletonList((IterationListener) new ScoreIterationListener(1)));
File trainData = new File("mnist_png/training");
FileSplit fsTrain = new FileSplit(trainData, NativeImageLoader.ALLOWED_FORMATS, rnd);
ImageRecordReader recorderReader = new ImageRecordReader(height, width);
recorderReader.initialize(fsTrain);
DataSetIterator dataIt = new RecordReaderDataSetIterator(recorderReader, batchSize);
List<INDArray> featuresTrain = new ArrayList<>();
while(dataIt.hasNext()){
DataSet ds = dataIt.next();
featuresTrain.add(ds.getFeatureMatrix());
}
System.out.println("************ training **************");
int nEpochs = 30;
for( int epoch=0; epoch<nEpochs; epoch++ ){
for(INDArray data : featuresTrain){
net.fit(data,data);
}
System.out.println("Epoch " + epoch + " complete");
}
And it threw an exception while training:
Exception in thread "main" org.deeplearning4j.exception.DL4JInvalidInputException: Input that is not a matrix; expected matrix (rank 2), got rank 4 array with shape [128, 1, 28, 28]
at org.deeplearning4j.nn.layers.BaseLayer.preOutput(BaseLayer.java:363)
at org.deeplearning4j.nn.layers.BaseLayer.activate(BaseLayer.java:384)
at org.deeplearning4j.nn.layers.BaseLayer.activate(BaseLayer.java:405)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.activationFromPrevLayer(MultiLayerNetwork.java:590)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.feedForwardToLayer(MultiLayerNetwork.java:713)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:1821)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:151)
at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:54)
at org.deeplearning4j.optimize.Solver.optimize(Solver.java:51)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1443)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1408)
at org.deeplearning4j.examples.dataExamples.AnomalyTest.main(AnomalyTest.java:86)
It seem that my input dataset has 4 columns while it need just 2 columes, so the question is how to convert imagerecorderread or something else to make it running properly?

So first of all, you may want to understand what a tensor is:
http://nd4j.org/tensor
The record reader returns a multi dimensional image, you need to flatten it in order for it to be used with a 2d neural net unless you plan on using CNNs for part of your training.
If you take a look at the exception (again you really should be familiar with ndarrays, they aren't new and are used in every deep learning library): you'll see a shape of:
[128, 1, 28, 28]
That is batch size by channels by rows x columns. You need to do a:
.setInputType(InputType.convolutional(28,28,1))
This will tell dl4j that it needs to flatten the 4d to a 2d. In this case it indicates that there's a rows,columns,channels of 28 x 28 x 1
If you add this to the bottom of your config it will work.
Of note if you are trying to do anomaly detection is we also have variational autoencoders you may want to look in to as well.

Related

How to feed previous time-stamp prediction as additional input to the next time-stamp?

This question might have been asked, but I got confused.
I am trying to apply one of RNN types, e.g. LSTM for time-series forecasting. I have inputs, y (stock returns). For each timestamp, I'd like to get the predictions. Q1 - Am I correct choosing seq2seq approach?
I also want to use predictions from previous timestamp (initializing initial values with some constant) as additional (still using my existing inputs) input in the form of squared residuals, i.e. using
eps_{t-1} = (y_{t-1} - y^_{t-1})^2 as additional input at t (as well as previous inputs).
So, how can I do this in tensorflow or in pytorch?
I tried to depict what I want on the attached graph. The graph
p.s. Sorry, it the question is poorly formulated
Let say your input if of dimension (32,10,1) with batch_size 32, time steps of length 10 and dimension of 1. Same for your target (stock return). This code make use of the tf.scan function, which is usefull when implementing custom recurrent networks (it will iterate over the timesteps). It remains to use the residual of t-1 in t somewhere, as you would like to.
ps: it is a very basic implementation of lstm from scratch, without any bias or output activation.
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
BS = 32
TS = 10
inputs_dim = 1
target_dim = 1
inputs = tf.placeholder(shape=[BS, TS, inputs_dim], dtype=tf.float32)
stock_returns = tf.placeholder(shape=[BS, TS, target_dim], dtype=tf.float32)
state_size = 16
# initial hidden state
init_state = tf.placeholder(shape=[2, BS, state_size],
dtype=tf.float32, name='initial_state')
# initializer
xav_init = tf.contrib.layers.xavier_initializer
# params
W = tf.get_variable('W', shape=[4, state_size, state_size],
initializer=xav_init())
U = tf.get_variable('U', shape=[4, inputs_dim, state_size],
initializer=xav_init())
W_out = tf.get_variable('W_out', shape=[state_size, target_dim],
initializer=xav_init())
#the function to feed tf.scan with
def step(prev, inputs_):
#unpack all inputs and previous outputs
st_1, ct_1 = prev[0][0], prev[0][1]
x = inputs_[0]
target = inputs_[1]
#get previous squared residual
eps = prev[1]
"""
here do whatever you want with eps_t-1
like x += eps if x if of the same dimension
or include it somewhere in your graph
"""
# lstm gates (add bias if needed)
#
# input gate
i = tf.sigmoid(tf.matmul(x,U[0]) + tf.matmul(st_1,W[0]))
# forget gate
f = tf.sigmoid(tf.matmul(x,U[1]) + tf.matmul(st_1,W[1]))
# output gate
o = tf.sigmoid(tf.matmul(x,U[2]) + tf.matmul(st_1,W[2]))
# gate weights
g = tf.tanh(tf.matmul(x,U[3]) + tf.matmul(st_1,W[3]))
ct = ct_1*f + g*i
st = tf.tanh(ct)*o
"""
make prediction, compute residual in t
and pass it to t+1
Normaly, we would compute prediction outside the scan function,
but as we need it here, we could just keep it and return it back
as an output of the scan function
"""
prediction_t = tf.matmul(st, W_out) # + bias
eps = (target - prediction_t)**2
return [tf.stack((st, ct), axis=0), eps, prediction_t]
states, eps, preds = tf.scan(step, [tf.transpose(inputs, [1,0,2]),
tf.transpose(stock_returns, [1,0,2])], initializer=[init_state,
tf.zeros((32,1), dtype=tf.float32),
tf.zeros((32,1),dtype=tf.float32)])
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
out = sess.run(preds, feed_dict=
{inputs:np.random.rand(BS,TS,inputs_dim),
stock_returns:np.random.rand(BS,TS,target_dim),
init_state:np.zeros((2,BS,state_size))})
out = tf.transpose(out,[1,0,2])
print(out)
And the output :
Tensor("transpose_2:0", shape=(32, 10, 1), dtype=float32)
Base code from here

Training Loss Isnt Decreasing Across Epochs

I am trying to construct a siamese neural network to take in two facial expressions and output the probability that the two images are similar. I have 5 people, with 10 expressions per person, so 50 total images, but with Siamese, I can generate 2500 pairs (with repetition). I have already run dlib's facial landmark detection on each of the 50 images, so each of the two inputs to the siamese net are two flattened 136,1 element arrays. The siamese structure is below:
input_shape = (136,)
left_input = Input(input_shape, name = 'left')
right_input = Input(input_shape, name = 'right')
convnet = Sequential()
convnet.add(Dense(50,activation="relu"))
encoded_l = convnet(left_input)
encoded_r = convnet(right_input)
L1_layer = Lambda(lambda tensors:K.abs(tensors[0] - tensors[1]))
L1_distance = L1_layer([encoded_l, encoded_r])
prediction = Dense(1,activation='relu')(L1_distance)
siamese_net = Model(inputs=[left_input,right_input],outputs=prediction)
optimizer = Adam()
siamese_net.compile(loss="binary_crossentropy",optimizer=optimizer)
I have an array called x_train, which is 80% of all possible labels and whose elements are a list of lists. Data is a 5x10x64x2 matrix, where there are 5 ppl, 10 expressions per, 64 facial landmarks, and 2 positons (x,y) per landmark
x_train = [ [ [Person, Expression] , [Person, Expression] ], ...]
data = np.load('data.npy')
My train loop goes as follows:
def train(data, labels, network, epochs, batch_size):
track_loss = defaultdict(list)
for i in range(0, epochs):
iterations = len(labels)//batch_size
remain = len(labels)%batch_size
shuffle(labels)
print('Epoch%s-----' %(i + 1))
for j in range(0, iterations):
batch = [j*batch_size, j*batch_size + batch_size]
if(j == iterations - 1):
batch[1] += remain
mini_batch = np.zeros(shape = (batch[1] - batch[0], 2, 136))
for k in range(batch[0], batch[1]):
prepx = data[labels[k][0][0],labels[k][0][1],:,:]
prepy = data[labels[k][1][0],labels[k][1][1],:,:]
mini_batch[k - batch[0]][0] = prepx.flatten()
mini_batch[k - batch[0]][1] = prepy.flatten()
targets = np.array([1 if(labels[i][0][1] == labels[i][1][1]) else 0 for i in range(batch[0], batch[1])])
new_batch = mini_batch.reshape(batch[1] - batch[0], 2, 136, 1)
new_targets = targets.reshape(batch[1] - batch[0], 1)
#print(mini_batch.shape, targets.shape)
loss=siamese_net.train_on_batch(
{
'left': mini_batch[:, 0, :],
'right': mini_batch[:, 1, :]
},targets)
track_loss['Epoch%s'%(i)].append(loss)
return network, track_loss
siamese_net, track_loss = train(data, x_train,siamese_net, 20, 30)
The value of each element in the target array is either a 0 or 1, depending on whether the two expressions inputted into the net are different or the same.
Although, I have seen in the omniglot example there were far more images and test images, my neural net doesn't have any decrease in loss.
EDIT:
Here is the new loss with the fixed target tensor:
Epoch1-----Loss: 1.979214
Epoch2-----Loss: 1.631347
Epoch3-----Loss: 1.628090
Epoch4-----Loss: 1.634603
Epoch5-----Loss: 1.621578
Epoch6-----Loss: 1.631347
Epoch7-----Loss: 1.631347
Epoch8-----Loss: 1.631347
Epoch9-----Loss: 1.621578
Epoch10-----Loss: 1.634603
Epoch11-----Loss: 1.634603
Epoch12-----Loss: 1.621578
Epoch13-----Loss: 1.628090
Epoch14-----Loss: 1.624834
Epoch15-----Loss: 1.631347
Epoch16-----Loss: 1.634603
Epoch17-----Loss: 1.628090
Epoch18-----Loss: 1.631347
Epoch19-----Loss: 1.624834
Epoch20-----Loss: 1.624834
I want to know how to improve my architecture and training procedure, or even data prep, in order to improve the performance of the neural net. I assumed that using dlib's facial landmark detection would simplify complexity of the neural net, but I am starting to doubt that hypothesis.

Custom loss function for U-net in keras using class weights: `class_weight` not supported for 3+ dimensional targets

Here's the code I'm working with (pulled from Kaggle mostly):
inputs = Input((IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS))
...
outputs = Conv2D(4, (1, 1), activation='sigmoid') (c9)
model = Model(inputs=[inputs], outputs=[outputs])
model.compile(optimizer='adam', loss='dice', metrics=[mean_iou])
results = model.fit(X_train, Y_train, validation_split=0.1, batch_size=8, epochs=30, class_weight=class_weights)
I have 4 classes that are very imbalanced. Class A equals 70%, class B = 15%, class C = 10%, and class D = 5%. However, I care most about class D. So I did the following type of calculations: D_weight = A/D = 70/5 = 14 and so on for the weight for class B and A. (if there are better methods to select these weights, then feel free)
In the last line, I'm trying to properly set class_weights and I'm doing it as so: class_weights = {0: 1.0, 1: 6, 2: 7, 3: 14}.
However, when I do this, I get the following error.
class_weight not supported for 3+ dimensional targets.
Is it possible that I add a dense layer after the last layer and just use it as a dummy layer so I can pass the class_weights and then only use the output of the last conv2d layer to do the prediction?
If this is not possible, how would I modify the loss function (I'm aware of this post, however, just passing in the weights in to the loss function won't cut it, because the loss function is called separately for each class) ? Currently, I'm using the following loss function:
def dice_coef(y_true, y_pred):
smooth = 1.
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
def bce_dice_loss(y_true, y_pred):
return 0.5 * binary_crossentropy(y_true, y_pred) - dice_coef(y_true, y_pred)
But I don't see any way in which I can input class weights. If someone wants the full working code see this post. But remember to change the final conv2d layer's num classes to 4 instead of 1.
You can always apply the weights yourself.
The originalLossFunc below you can import from keras.losses.
The weightsList is your list with the weights ordered by class.
def weightedLoss(originalLossFunc, weightsList):
def lossFunc(true, pred):
axis = -1 #if channels last
#axis= 1 #if channels first
#argmax returns the index of the element with the greatest value
#done in the class axis, it returns the class index
classSelectors = K.argmax(true, axis=axis)
#if your loss is sparse, use only true as classSelectors
#considering weights are ordered by class, for each class
#true(1) if the class index is equal to the weight index
classSelectors = [K.equal(i, classSelectors) for i in range(len(weightsList))]
#casting boolean to float for calculations
#each tensor in the list contains 1 where ground true class is equal to its index
#if you sum all these, you will get a tensor full of ones.
classSelectors = [K.cast(x, K.floatx()) for x in classSelectors]
#for each of the selections above, multiply their respective weight
weights = [sel * w for sel,w in zip(classSelectors, weightsList)]
#sums all the selections
#result is a tensor with the respective weight for each element in predictions
weightMultiplier = weights[0]
for i in range(1, len(weights)):
weightMultiplier = weightMultiplier + weights[i]
#make sure your originalLossFunc only collapses the class axis
#you need the other axes intact to multiply the weights tensor
loss = originalLossFunc(true,pred)
loss = loss * weightMultiplier
return loss
return lossFunc
For using this in compile:
model.compile(loss= weightedLoss(keras.losses.categorical_crossentropy, weights),
optimizer=..., ...)
Changing the class balance directly on the input data
You can change the balance of the input samples too.
For instance, if you have 5 samples from class 1 and 10 samples from class 2, pass the samples for class 5 twice in the input arrays.
.
Using the sample_weight argument.
Instead of working "by class", you can also work "by sample".
Create an array of weights for each sample in your input array: len(x_train) == len(weights)
And fit passing this array to the sample_weight argument.
(If it's fit_generator, the generator will have to return the weights along with the train/true pairs: return/yield inputs, targets, weights)

Encog Backpropagation Error not changing

The total error for the network did not change on over 100,000 iterations.
The input is 22 values and the output is a single value. the input array is [195][22] and the output array is [195][1].
BasicNetwork network = new BasicNetwork();
network.addLayer(new BasicLayer(null,true,22));
network.addLayer(new BasicLayer(new ActivationSigmoid(),true,10));
network.addLayer(new BasicLayer(new ActivationSigmoid(),false,1));
network.getStructure().finalizeStructure();
network.reset();
MLDataSet training_data = new BasicMLDataSet(input, target_output);
final Backpropagation train = new Backpropagation(network, training_data);
int epoch = 1;
do {
train.iteration();
System.out.println("Epoch #" + epoch + " Error:" + train.getError());
epoch++;
}
while(train.getError() > 0.01);
{
train.finishTraining();
}
What is wrong with this code?
Depending on what the data you are trying to classify your network may be too small to transform the search space into a linearly separable problem. So try adding more neurons or layers - this will probably take longer to train. Unless it is already linearly separable and then a NN may be an inefficient way to solve this.
Also you don't have a training strategy, if the NN falls into local minima on the error surface it will be stuck there. See the encog user guide https://s3.amazonaws.com/heatonresearch-books/free/Encog3Java-User.pdf pg 166 has a list of training strategy's.
final int strategyCycles = 50;
final double strategyError = 0.25;
train.addStrategy(new ResetStrategy(strategyError,strategyCycles));

MultiLayerNetwork to predict simple function

I'm trying to develop some intuition for machine learning. I looked over examples from https://github.com/deeplearning4j/dl4j-0.4-examples and I wanted to develop my own example. Basically I just took a simple function: a * a + b * b + c * c - a * b * c + a + b + c and generated 10000 outputs for random a,b,c and tried to train my network on 90% of the inputs. The thing is no matter what I done my network never gets to predict the rest of the examples.
Here is my code:
public class BasicFunctionNN {
private static Logger log = LoggerFactory.getLogger(MlPredict.class);
public static DataSetIterator generateFunctionDataSet() {
Collection<DataSet> list = new ArrayList<>();
for (int i = 0; i < 100000; i++) {
double a = Math.random();
double b = Math.random();
double c = Math.random();
double output = a * a + b * b + c * c - a * b * c + a + b + c;
INDArray in = Nd4j.create(new double[]{a, b, c});
INDArray out = Nd4j.create(new double[]{output});
list.add(new DataSet(in, out));
}
return new ListDataSetIterator(list, list.size());
}
public static void main(String[] args) throws Exception {
DataSetIterator iterator = generateFunctionDataSet();
Nd4j.MAX_SLICES_TO_PRINT = 10;
Nd4j.MAX_ELEMENTS_PER_SLICE = 10;
final int numInputs = 3;
int outputNum = 1;
int iterations = 100;
log.info("Build model....");
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.iterations(iterations).weightInit(WeightInit.XAVIER).updater(Updater.SGD).dropOut(0.5)
.learningRate(.8).regularization(true)
.l1(1e-1).l2(2e-4)
.optimizationAlgo(OptimizationAlgorithm.LINE_GRADIENT_DESCENT)
.list(3)
.layer(0, new DenseLayer.Builder().nIn(numInputs).nOut(8)
.activation("identity")
.build())
.layer(1, new DenseLayer.Builder().nIn(8).nOut(8)
.activation("identity")
.build())
.layer(2, new OutputLayer.Builder(LossFunctions.LossFunction.RMSE_XENT)//LossFunctions.LossFunction.RMSE_XENT)
.activation("identity")
.weightInit(WeightInit.XAVIER)
.nIn(8).nOut(outputNum).build())
.backprop(true).pretrain(false)
.build();
//run the model
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
model.setListeners(Collections.singletonList((IterationListener) new ScoreIterationListener(iterations)));
//get the dataset using the record reader. The datasetiterator handles vectorization
DataSet next = iterator.next();
SplitTestAndTrain testAndTrain = next.splitTestAndTrain(0.9);
System.out.println(testAndTrain.getTrain());
model.fit(testAndTrain.getTrain());
//evaluate the model
Evaluation eval = new Evaluation(10);
DataSet test = testAndTrain.getTest();
INDArray output = model.output(test.getFeatureMatrix());
eval.eval(test.getLabels(), output);
log.info(">>>>>>>>>>>>>>");
log.info(eval.stats());
}
}
I also played with the learning rate, and it happens many time that score doesn't improve:
10:48:51.404 [main] DEBUG o.d.o.solvers.BackTrackLineSearch - Exited line search after maxIterations termination condition; score did not improve (bestScore=0.8522868127536543, scoreAtStart=0.8522868127536543). Resetting parameters
As an activation function I also tried relu
One obvious problem is that you are trying to model nonlinear function with linear model. Your neural network has no activation functions thus it efficiently can only express functions of the form W1a + W2b + W3c + W4. It does not matter how many hidden units you create - as long as there is no non-linear activation function used, your network degenerates to simple linear model.
update
There are also many "small weird things", including but not limited to:
you are using huge learning rate (0.8)
you are using lots of regularization for (quite complex, using both l1 and l2 regularizers for regression is not a common approach, especially in neural networks) a problem where you need none
rectifier units might not be the best ones to express square operation, as well as multiplication that you are looking for. Rectifiers are very good for classification, especially with deeper architectures, but not for shallow regression. Try sigmoid-alike (tanh, sigmoid) activations instead.
I am not entirely sure what "iteration" means in this implementation, but usually this is amount of samples/minibatches used for training. Thus using just 100 might be orders of magnitude too small for gradient descent learning

Resources