I'm doing image registration, essentially figuring out where a 60x60 image is on a larger 74x74 image. The images are different modalities - one is visual, the other IR - so simply matching (openCV - matchTemplate) or other techniques (like Mutual Information) don't work. So I'm trying a fully convolutional Siamese network. (Two references at end - but neither with source.)
I'd like the siamese part of the network to be identical in terms of structure/weight so that the features identified will be comparable. My challenge is the 'fully convolutional' part. From the latest paper "Benefitting from the fully convolutional structure, we are able to feed the left and right branches with different-sized input image patches."
Question - how does a Fully Convolutional Network do it's thing? And how can I fix my code?
What I want/What I get
Code:
def convUnit(model,nFilters,strides=1):
model.add(Conv2D(nFilters, padding="same", kernel_size=(3, 3), strides=strides))
model.add(BatchNormalization(epsilon=0.0001, scale=False, center=False))
model.add(Lambda(K.relu))
def getL2(): # From 'hardnet', w/ minor modes
model = Sequential()
model.add(Conv2D(32, padding="same", kernel_size=(3, 3), input_shape=(dimT,dimT,1)))
model.add(BatchNormalization(epsilon=0.0001, scale=False, center=False))
model.add(Lambda(K.relu))
convUnit(model,32)
convUnit(model,64,strides=2)
convUnit(model,64)
convUnit(model,128,strides=2)
convUnit(model,128,strides=2
model.add(Conv2D(128, kernel_size=(8,8)))
model.add(BatchNormalization(epsilon=0.0001, scale=False, center=False))
return model # of size 1,1,128
def createNewModel():
model = getL2() # One model for images
left_unk = Input(shape=(None,None,1))
right_unk = Input(shape=(None,None,1))
encode_L = model(left_unk)
encode_R = model(right_unk)
# Combine - options include 1x1 Conv (per paper), dot product,
# some matmul, ...
#m2 = Dot( 3 )([encode_L, encode_R]) # not sure of axes here - tried combos
siamese_net = Model([left_unk,right_unk],m2)
return siamese_net
Results:
When I run a simple prediction, the dot product fails with...
In[0].dim(1) and In[1].dim(1) must be the same: [1,1,1,128] vs [1,3,3,128] on node dot_11/MatMul
Thanks!
References:
- Zhang, et al, Registration of Multi-Modal Remote Sensing Image based on Deep Fully Convolutional Neural Network
Related
I'm currently working on a Chess AI.
The idea behind this project is to create a neural network that learns how to evaluate a board state and then traverse the next moves using Monte Carlo tree search to find the "best" move to play (evaluated by the NN).
Code on GitHub
TL;DR
The NN gets stuck predicting the average evaluation of the dataset and is thereby not learning to predict the evaluation of the board state.
Implementation
Dataset
The dataset is a collection of chess games. The games are fetched from the official lichess database.
Only games which have a evaluation score (which the NN is supposed to learn) are included.
This reduces the size of the dataset to about 11% of the original.
Data representation
Each move is a datapoint to train the network on.
The input for the NN are 12 arrays of size 8x8 (so called Bitboards), one for each of the 6x2 different pieces and colors.
The move evaluation is normalized to the range [-1, 1] using a scaled tanh function.
Since many evaluations are very close to 0 and -1/1, a percentage of these are dropped aswell, to reduce the variation in the dataset.
Without dropping some of the moves with evaluation close to 0 or -1/1 the dataset would look like this:
With dropping some, the dataset looks like this and is a lot less focused at one point:
The output of the NN is a single scalar value between -1 and 1, representing the evaluation of the board state. -1 meaning the board is heavily favored for the black player, 1 meaning the board is heavily favored for the white player.
def create_training_data(dataset: DataFrame) -> Tuple[np.ndarray, np.ndarray]:
def drop(indices, fract):
drop_index = np.random.choice(
indices,
size=int(len(indices) * fract),
replace=False)
dataset.drop(drop_index, inplace=True)
drop(dataset[abs(dataset[12] / 10.) > 30].index, fract=0.80)
drop(dataset[abs(dataset[12] / 10.) < 0.1].index, fract=0.90)
drop(dataset[abs(dataset[12] / 10.) < 0.15].index, fract=0.10)
# the first 12 entries are the bitboards for the pieces
y = dataset[12].values
X = dataset.drop(12, axis=1)
# move into range of -1 to 1
y = y.astype(np.float32)
y = np.tanh(y / 10.)
return X, y
The neural network
The neural network is implemented using Keras.
The CNN is used to extract features from the board, then passed to a dense network to reduce to an evaluation. This is based on the NN AlphaGo Zero has used in its implementation.
The CNN is implemented as follows:
model = Sequential()
model.add(Conv2D(256, (3, 3), activation='relu', padding='same', input_shape=(12, 8, 8, 1)))
for _ in range(10):
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(units=64, activation='relu'))
# model.add(Rescaling(scale=1 / 10., offset=0)) required? Data gets scaled in create_training_data, does the Network learn that/does doing that explicitly help?
model.add(Dense(units=1, activation='tanh'))
model.compile(
loss='mean_squared_error',
optimizer=Adam(learning_rate=0.01),
# metrics=['accuracy', 'mse'] # do these influence training at all?
)
Training
The training is done using Keras.
Multiple sets of 50k-500k moves are used to train the network.
The network is trained for 20 epochs on each move set with a batchsize of 64 and 10% of moves are used for validation.
Afterwards the learning rate is adjusted by 0.001 / (index + 1).
for i, chunk in enumerate(pd.read_csv("../dataset/nm_games.csv", header=None, chunksize=100000)):
X, y = create_training_data(chunk)
model.fit(
X,
y,
epochs=20,
batch_size=64,
validation_split=0.1
)
model.optimizer.learning_rate = 0.001 / (i + 1)
Issues
The NN currently does not learn anything. It converges within a few epochs to a average evaluation of the dataset and does not predict anything depending on the board state.
Example after 20 epochs:
Dataset Evaluation
NN Evaluation
Difference
-0.10164772719144821
0.03077016
0.13241789
0.6967725157737732
0.03180310
0.66496944
-0.3644430935382843
0.03119821
0.39564130
0.5291759967803955
0.03258476
0.49659124
-0.25989893078804016
0.03316733
0.29306626
The NN Evaluation is stuck at 0.03, which is the approximate average evaluation of the dataset.
It is also stuck there, not continuing to improve.
What I tried
Increased and decreased NN size
Added up to 20 extra Conv2D layers since google did that in their implementation aswell
Removed all 10 extra Conv2D layers since I read that many NN are too complex for the dataset
Trained for days at a time
Since the NN is stuck at 0.03, and also doesn't move from there, that was wasted.
Dense NN instead of CNN
Did not eliminate the point where the NN gets stuck, but trains faster (aka. gets stuck faster :) )
model = Sequential()
model.add(Dense(2048, input_shape=(12 * 8 * 8,), activation='relu'))
model.add(Dense(2048, activation='relu'))
model.add(Dense(2048, activation='relu'))
model.add(Dense(1, activation='tanh'))
model.compile(
loss='mean_squared_error',
optimizer=Adam(learning_rate=0.001),
# metrics=['accuracy', 'mse']
)
Sigmoid activation instead of tanh
Moves evaluation from a range of -1 to 1 to a range of 0 to 1 but otherwise did not change anything about getting stuck.
Epochs, batchsize and chunksize increased and decreased
All of these changes did not significantly change the NN evaluation.
Learning Rate addaption
Larger learning rates (0.1) made the NN unstable, each time training, converging to either -1, 1 or 0.
Smaller learning rates (0.0001) made the NN converge slower, but still stuck at 0.03.
Code on GitHub
Question
What to do? Is there something I'm missing or is there an error?
my two suggestions:
use the full dataset and score each position based on the fact if that player won the game or not. i don't know this dataset and there might be something with the evaluations by others ( or are they verified?) even if you are sure about the validity of it i would test this as it can provide some more information on what the problem might be
Check your data representation. probably you already did this a couple of times but i can tell you from experience it is easy to introduce one and to overlook them. adding a test might help you in the long run. some of my problems:
indication of current player colour? not sure if you have a player colour plane or you switch current player pieces?
incorrect translation from 1d to 3d or vice-versa. (should not prevent you from training but saves you a lot of time if you want to port to a different device)
I trained a go game engine and do not know what representation is used for chess, it took me some time to figure out a good representation for checkers.
not a solution but i found that cyclic learning rates worked great for my go-engine might be something to look at when the rest works
I am adapting this implementation of VAE https://github.com/keras-team/keras/blob/master/examples/variational_autoencoder.py that I found here https://blog.keras.io/building-autoencoders-in-keras.html
This implementation does not use convolutional layers so everything happens in 1D so to speak. My goal is to implement 3D convolutional layers within this model.
However I run into a shape mismatch at the loss function when running the batches (which are of 128 samples):
def vae_loss(self, x, x_decoded_mean):
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
#xent_loss.shape >> [128, 40, 20, 40, 1]
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
#kl_loss.shape >> [128]
return K.mean(xent_loss + kl_loss) # >> error shape mismatch
Almost the same question is answered here already Keras - Variational Autoencoder Incompatible shape for a model with 1D convolutional layers, but I can't really understand how to extrapolate the answer to my case wjich has a more complex Input shape.
I have tried this solution:
xent_loss = original_dim * metrics.binary_crossentropy(K.flatten(x), K.flatten(x_decoded_mean))
But I don't know whether it is a valid solution or not from a mathematical point of view, although now the model is running.
Your approach is right but it's highly dependent on K.binary_crossentropy implementation. tensorflow and theano ones should work for you (as far as I know). To make it more clean and not implementation dependent I suggest you the following way:
xent_loss_vec = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
xent_loss = K.mean(xent_loss_vec, axis=[1, 2, 3, 4])
# xent_loss.shape = (128,)
Now you are taking a mean out of losses for each voxel and thanks to that every valid implementation of binary_crossentropy should work fine for you.
Its commonplace for various neural network architectures in NLP and vision-language problems to tie the weights of an initial word embedding layer to that of an output softmax. Usually this produces a boost to sentence generation quality. (see example here)
In Keras its typical to embed word embedding layers using the Embedding class, however there seems to be no easy way to tie the weights of this layer to the output softmax. Would anyone happen to know how this could be implemented ?
Be aware that Press and Wolf dont't propose to freeze the weights to some pretrained ones, but tie them. That means, to ensure that input and output weights are always the same during training (in the sense of synchronized).
In a typical NLP model (e.g. language modelling/translation), you have an input dimension (vocabulary) of size V and a hidden representation size H. Then, you start with an Embedding layer, which is a matrix VxH. And the output layer is (probably) something like Dense(V, activation='softmax'), which is a matrix H2xV. When tying the weights, we want that those matrices are the same (therefore, H==H2).
For doing this in Keras, I think the way to go is via shared layers:
In your model, you need to instantiate a shared embedding layer (of dimension VxH), and apply it to either your input and output. But you need to transpose it, to have the desired output dimensions (HxV). So, we declare a TiedEmbeddingsTransposed layer, which transposes the embedding matrix from a given layer (and applies an activation function):
class TiedEmbeddingsTransposed(Layer):
"""Layer for tying embeddings in an output layer.
A regular embedding layer has the shape: V x H (V: size of the vocabulary. H: size of the projected space).
In this layer, we'll go: H x V.
With the same weights than the regular embedding.
In addition, it may have an activation.
# References
- [ Using the Output Embedding to Improve Language Models](https://arxiv.org/abs/1608.05859)
"""
def __init__(self, tied_to=None,
activation=None,
**kwargs):
super(TiedEmbeddingsTransposed, self).__init__(**kwargs)
self.tied_to = tied_to
self.activation = activations.get(activation)
def build(self, input_shape):
self.transposed_weights = K.transpose(self.tied_to.weights[0])
self.built = True
def compute_mask(self, inputs, mask=None):
return mask
def compute_output_shape(self, input_shape):
return input_shape[0], K.int_shape(self.tied_to.weights[0])[0]
def call(self, inputs, mask=None):
output = K.dot(inputs, self.transposed_weights)
if self.activation is not None:
output = self.activation(output)
return output
def get_config(self):
config = {'activation': activations.serialize(self.activation)
}
base_config = super(TiedEmbeddingsTransposed, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
The usage of this layer is:
# Declare the shared embedding layer
shared_embedding_layer = Embedding(V, H)
# Obtain word embeddings
word_embedding = shared_embedding_layer(input)
# Do stuff with your model
# Compute output (e.g. a vocabulary-size probability vector) with the shared layer:
output = TimeDistributed(TiedEmbeddingsTransposed(tied_to=shared_embedding_layer, activation='softmax')(intermediate_rep)
I have tested this in NMT-Keras and it trains properly. But, as I try to load a trained model, it gets an error, related to the way Keras loads the models: it doesn't load the weights from the tied_to. I've found several questions regarding this (1, 2, 3), but I haven't managed to solve this issue. If someone have any ideas on the next steps to take, I'd be very glad to hear them :)
As you may read here you should simply set trainable flag to False. E.g.
aux_output = Embedding(..., trainable=False)(input)
....
output = Dense(nb_of_classes, .. ,activation='softmax', trainable=False)
I have a question about using Keras to which I'm rather new. I'm using a convolutional neural net that feeds its results into a standard perceptron layer, which generates my output. This CNN is fed with a series of images. This is so far quite normal.
Now I like to pass a short non-image input vector directly into the last perceptron layer without sending it through all the CNN layers. How can this be done in Keras?
My code looks like this:
# last CNN layer before perceptron layer
model.add(Convolution2D(200, 2, 2, border_mode='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Dropout(0.25))
# perceptron layer
model.add(Flatten())
# here I like to add to the input from the CNN an additional vector directly
model.add(Dense(1500, W_regularizer=l2(1e-3)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
Any answers are greatly appreciated, thanks!
You didn't show which kind of model you use to me, but I assume that you initialized your model as Sequential. In a Sequential model you can only stack one layer after another - so adding a "short-cut" connection is not possible.
For this reason authors of Keras added option of building "graph" models. In this case you can build a graph (DAG) of your computations. It's a more complicated than designing a stack of layers, but still quite easy.
Check the documentation site to look for more details.
Provided your Keras's backend is Theano, you can do the following:
import theano
import numpy as np
d = Dense(1500, W_regularizer=l2(1e-3), activation='relu') # I've joined activation and dense layers, based on assumption you might be interested in post-activation values
model.add(d)
model.add(Dropout(0.5))
model.add(Dense(1))
c = theano.function([d.get_input(train=False)], d.get_output(train=False))
layer_input_data = np.random.random((1,20000)).astype('float32') # refer to d.input_shape to get proper dimensions of layer's input, in my case it was (None, 20000)
o = c(layer_input_data)
The answer here works. It is more high level and works also for the tensorflow backend:
input_1 = Input(input_shape)
input_2 = Input(input_shape)
merge = merge([input_1, input_2], mode="concat") # could also to "sum", "dot", etc.
hidden = Dense(hidden_dims)(merge)
classify = Dense(output_dims, activation="softmax")(hidden)
model = Model(input=[input_1, input_2], output=hidden)
I am trying to use RBFNN for point cloud to surface reconstruction but I couldn't understand what would be my feature vectors in RBFNN.
Can any one please help me to understand this one.
A goal to get to this:
From inputs like this:
An RBF network essentially involves fitting data with a linear combination of functions that obey a set of core properties -- chief among these is radial symmetry. The parameters of each of these functions is learned by incremental adjustment based on errors generated through repeated presentation of inputs.
If I understand (it's been a very long time since I used one of these networks), your question pertains to preprocessing of the data in the point cloud. I believe that each of the points in your point cloud should serve as one input. If I understand properly, the features are your three dimensions, and as such each point can already be considered a "feature vector."
You have other choices that remain, namely the number of radial basis neurons in your hidden layer, and the radial basis functions to use (a Gaussian is a popular first choice). The training of the network and the surface reconstruction can be done in a number of ways but I believe this is beyond the scope of the question.
I don't know if it will help, but here's a simple python implementation of an RBF network performing function approximation, with one-dimensional inputs:
import numpy as np
import matplotlib.pyplot as plt
def fit_me(x):
return (x-2) * (2*x+1) / (1+x**2)
def rbf(x, mu, sigma=1.5):
return np.exp( -(x-mu)**2 / (2*sigma**2));
# Core parameters including number of training
# and testing points, minimum and maximum x values
# for training and testing points, and the number
# of rbf (hidden) nodes to use
num_points = 100 # number of inputs (each 1D)
num_rbfs = 20.0 # number of centers
x_min = -5
x_max = 10
# Training data, evenly spaced points
x_train = np.linspace(x_min, x_max, num_points)
y_train = fit_me(x_train)
# Testing data, more evenly spaced points
x_test = np.linspace(x_min, x_max, num_points*3)
y_test = fit_me(x_test)
# Centers of each of the rbf nodes
centers = np.linspace(-5, 10, num_rbfs)
# Everything is in place to train the network
# and attempt to approximate the function 'fit_me'.
# Start by creating a matrix G in which each row
# corresponds to an x value within the domain and each
# column i contains the values of rbf_i(x).
center_cols, x_rows = np.meshgrid(centers, x_train)
G = rbf(center_cols, x_rows)
plt.plot(G)
plt.title('Radial Basis Functions')
plt.show()
# Simple training in this case: use pseudoinverse to get weights
weights = np.dot(np.linalg.pinv(G), y_train)
# To test, create meshgrid for test points
center_cols, x_rows = np.meshgrid(centers, x_test)
G_test = rbf(center_cols, x_rows)
# apply weights to G_test
y_predict = np.dot(G_test, weights)
plt.plot(y_predict)
plt.title('Predicted function')
plt.show()
error = y_predict - y_test
plt.plot(error)
plt.title('Function approximation error')
plt.show()
First, you can explore the way in which inputs are provided to the network and how the RBF nodes are used. This should extend to 2D inputs in a straightforward way, though training may get a bit more involved.
To do proper surface reconstruction you'll likely need a representation of the surface that is altogether different than the representation of the function that's learned here. Not sure how to take this last step.