I'm building a U-net for a two-class image segmentation problem using Keras. I'm loosely following the example given here: https://github.com/zhixuhao/unet . Following that example, the last few layers of the decoder (including the output layer) look like:
up9 = UpSampling2D(size = (2,2))(conv8)
merge9 = concatenate([conv1,up9], axis = 3)
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same')(merge9)
conv9 = Conv2D(16, 3, activation = 'relu', padding = 'same')(conv9)
conv9 = Conv2D(2, 3, activation = 'relu', padding = 'same')(conv9)
conv10 = Conv2D(1, 1, activation = 'sigmoid',padding='same')(conv9)
The key problem that I'm finding is that when I remove this line:
conv9 = Conv2D(2, 3, activation = 'relu', padding = 'same')(conv9)
then the U-net fails to train at all (using binary crossentropy), and I can't figure out why this layer is necessary given that I have seen almost identical U-nets elsewhere that skip this layer entirely.
Can anyone help shed some light here on why this layer is necessary? If I were to make a WAG, I'd say it has something to do with the fact that we want to go down to 2 filters to represent each class? Does it have something to do with the expected output shape (128x128x1) in this case?
Related
I am trying to use CNN+LSTM model for reliable stock price forecasting.
I am hoping to get model pred value of next 40 days output for every day.
I have successfully concatenated the Conv2D layer as LSTM input layer.
In short, I use [t-120 : t] sequence to predict [t+1 : t+40].
Now, I faced an issue of model output that prints very similar values (almost constant) during the 60 days of test period.
(It is not exactly the same, but the 40 day trend are almost same)
I am expecting a daily result of 40 days with shifted window.
(see image below for better understanding)
Here is my question:
Why am I getting similar output on my test set? (not shifted window)
Is there a problem with my activation function (maybe in the Dense layer)?
def create_model_cnn_lstm(params, numOfFeat, input_shape):
input_layers = []
channel_list = []
for i in range(0, numOfFeat):
layer = Input(shape=input_shape)
input_layers.append(layer)
layer = TimeDistributed(Conv2D(32,(3,3), strides=1,kernel_regularizer=0.01, padding='same', activation="relu", use_bias=True,kernel_initializer='glorot_uniform'))(layer)
layer = TimeDistributed(Conv2D(64,(5,5), strides=1,kernel_regularizer=0.01, padding='same', activation="relu", use_bias=True,kernel_initializer='glorot_uniform'))(layer)
layer = TimeDistributed(MaxPool2D(pool_size=2)(layer)
layer = TimeDistributed(Dropout(0.3))(layer)
layer = TimeDistributed(Flatten())(layer)
layer = TimeDistributed(Dense(10))(layer)
channel_list.append(layer)
layer = Concatenate(axis = -1)(channel_list)
layer = LSTM(units = 512, activation = 'tanh', return_sequences = False, dropout = 0.2, recurrent_dropout=0.1, kernel_regularizer=0.01)(layer)
layer = Dense(512, activation = 'tanh')(layer)
layer = Dropout(0.5)(layer)
layer = Dense(pred_len)(layer)
model = Model(input_layers, layer)
optimizer = optimizers.Adam(learning_rate=params["lr"], beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(loss='mse', optimizer=optimizer, metrics=['mse'])
return model
I have a dataset containing 1000 examples where each example has 5 features (a,b,c,d,e). I want to feed 7 examples to an LSTM so it predicts the feature (a) of the 8th day.
Reading Pytorchs documentation of nn.LSTM() I came up with the following:
input_size = 5
hidden_size = 10
num_layers = 1
output_size = 1
lstm = nn.LSTM(input_size, hidden_size, num_layers)
fc = nn.Linear(hidden_size, output_size)
out, hidden = lstm(X) # Where X's shape is ([7,1,5])
output = fc(out[-1])
output # output's shape is ([7,1])
According to the docs:
The input of the nn.LSTM is "input of shape (seq_len, batch, input_size)" with "input_size – The number of expected features in the input x",
And the output is: "output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t."
In this case, I thought seq_len would be the sequence of 7 examples, batchis 1 and input_size is 5. So the lstm would consume each example containing 5 features refeeding the hidden layer every iteration.
What am I missing?
When I extend your code to a full example -- I also added some comments to may help -- I get the following:
import torch
import torch.nn as nn
input_size = 5
hidden_size = 10
num_layers = 1
output_size = 1
lstm = nn.LSTM(input_size, hidden_size, num_layers)
fc = nn.Linear(hidden_size, output_size)
X = [
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
]
X = torch.tensor(X, dtype=torch.float32)
print(X.shape) # (seq_len, batch_size, input_size) = (7, 1, 5)
out, hidden = lstm(X) # Where X's shape is ([7,1,5])
print(out.shape) # (seq_len, batch_size, hidden_size) = (7, 1, 10)
out = out[-1] # Get output of last step
print(out.shape) # (batch, hidden_size) = (1, 10)
out = fc(out) # Push through linear layer
print(out.shape) # (batch_size, output_size) = (1, 1)
This makes sense to me, given your batch_size = 1 and output_size = 1 (I assume, you're doing regression). I don't know where your output.shape = (7, 1) come from.
Are you sure that your X has the correct dimensions? Did you create nn.LSTM maybe with batch_first=True? There are lot of little things that can sneak in.
I am trying to build a network through the keras functional API feeding two lists containing the number of units of the LSTM layers and of the FC (Dense) layers. I want to analyse 20 consecutive segments (batches) which contain fs time steps each and 2 values (2 features per time step). This is my code:
Rec = [4,4,4]
FC = [8,4,2,1]
def keras_LSTM(Rec,FC,fs, n_witness, lr=0.04, optimizer='Adam'):
model_LSTM = Input(batch_shape=(20,fs,n_witness))
return_state_bool=True
for i in range(shape(Rec)[0]):
nRec = Rec[i]
if i == shape(Rec)[0]-1:
return_state_bool=False
model_LSTM = LSTM(nRec, return_sequences=True,return_state=return_state_bool,
stateful=True, input_shape=(None,n_witness),
name='LSTM'+str(i))(model_LSTM)
for j in range(shape(FC)[0]):
nFC = FC[j]
model_LSTM = Dense(nFC)(model_LSTM)
model_LSTM = LeakyReLU(alpha=0.01)(model_LSTM)
nFC_final = 1
model_LSTM = Dense(nFC_final)(model_LSTM)
predictions = LeakyReLU(alpha=0.01)(model_LSTM)
full_model_LSTM = Model(inputs=model_LSTM, outputs=predictions)
model_LSTM.compile(optimizer=keras.optimizers.Adam(lr=lr, beta_1=0.9, beta_2=0.999,
epsilon=1e-8, decay=0.066667, amsgrad=False), loss='mean_squared_error')
return full_model_LSTM
model_new = keras_LSTM(Rec, FC, fs=fs, n_witness=n_wit)
model_new.summary()
When compiling I get the following error:
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(20, 2048, 2), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []
Which I actually don't quite understand, but suspect it may have something to do with inputs?
I solved the issue by modifying line 4 of the code as in the following:
x = model_LSTM = Input(batch_shape=(20,fs,n_witness))
along with line 21, as in the following:
full_model_LSTM = Model(inputs=x, outputs=predictions)
I have two inputs: qi_pos & qi_neg with the same shape. They should be processed by the two mlp layers, and finally get two results which acts as score. Here is my codes:
self.mlp1_pos = nn_layers.full_connect_(qi_pos, 256, activation='relu', use_bn = None, keep_prob=self.keep_prob, name = 'deep_mlp_1')
self.mlp2_pos = nn_layers.full_connect_(self.mlp1_pos, 128, activation='relu', use_bn = True, keep_prob=self.keep_prob, name = 'deep_mlp_2')
self.pos_pair_sim = nn_layers.full_connect_(self.mlp2_pos, 1, activation=None, use_bn = True, keep_prob=self.keep_prob, name = 'deep_mlp_3')
tf.get_variable_scope().reuse_variables()
self.mlp1_neg = nn_layers.full_connect_(qi_neg, 256, activation='relu', use_bn = None, keep_prob=self.keep_prob, name = 'deep_mlp_1')
self.mlp2_neg = nn_layers.full_connect_(self.mlp1_neg, 128, activation='relu', use_bn = True, keep_prob=self.keep_prob, name = 'deep_mlp_2')
self.neg_pair_sim = nn_layers.full_connect_(self.mlp2_neg, 1, activation=None, use_bn = True, keep_prob=self.keep_prob, name = 'deep_mlp_3')
I use BN layer to normalize the nodes in hidden layers:
def full_connect_(inputs, num_units, activation=None, use_bn = None, keep_prob = 1.0, name='full_connect_'):
with tf.variable_scope(name):
shape = [inputs.get_shape()[-1], num_units]
weight = weight_variable(shape)
bias = bias_variable(shape[-1])
outputs_ = tf.matmul(inputs, weight) + bias
if use_bn:
outputs_ = tf.contrib.layers.batch_norm(outputs_, center=True, scale=True, is_training=True,decay=0.9,epsilon=1e-5, scope='bn')
if activation=="relu":
outputs = tf.nn.relu(outputs_)
elif activation == "tanh":
outputs = tf.tanh(outputs_)
elif activation == "sigmoid":
outputs = tf.nn.sigmoid(outputs_)
else:
outputs = outputs_
return outputs
with tf.name_scope('predictions'):
self.sim_diff = self.pos_pair_sim - self.neg_pair_sim # shape = (batch_size, 1)
self.preds = tf.sigmoid(self.sim_diff) # shape = (batch_size, 1)
self.infers = self.pos_pair_sim
Below is the loss definition.It seems all right.
with tf.name_scope('predictions'):
sim_diff = pos_pair_sim - neg_pair_sim
predictions = tf.sigmoid(sim_diff)
self.infers = pos_pair_sim
## loss and optim
with tf.name_scope('loss'):
self.loss = nn_layers.cross_entropy_loss_with_reg(self.labels, self.preds)
tf.summary.scalar('loss', self.loss)
I am not sure whether I have used the BN layers in right way. I mean that the BN parameters are derived from the hidden units from the two separate parts, which are based on qi_pos and qi_neg tensors as inputs. Anyway, anyone could help check it?
Your code seems fine to me, there's no problem in applying BN in different branches of the network. But I'd like to mention few notes here:
BN hyperparameters are pretty standard, so I usually don't manually set decay, epsilon and renorm_decay. It doesn't mean you mustn't change them, it's simply unnecessary in most cases.
You're applying the BN before the activation function, however, there is evidence that it works better if applied after the activation. See, for example, this discussion. Once again, it doesn't mean it's a bug, simply one more architecture to consider.
I am trying to build a very simple OCR for start my tests on bigger models. The problem here is that I can't figure out how should be my output data for my training
code:
def simple_model():
output = 28
if K.image_data_format() == 'channels_first':
input_shape = (1, input_height, input_width)
else:
input_shape = (input_height, input_width, 1)
conv_to_rnn_dims = (input_width // (2), (input_height // (2)) * conv_blades)
model = Sequential()
model.add(Conv2D(conv_blades, (3, 3), input_shape=input_shape, padding='same'))
model.add(MaxPooling2D(pool_size=(2,2), name='max2'))
model.add(Reshape(target_shape=conv_to_rnn_dims, name='reshape'))
model.add(GRU(64, return_sequences=True, kernel_initializer='he_normal', name='gru1'))
model.add(TimeDistributed(Dense(output, kernel_initializer='he_normal', name='dense2')))
model.add(Activation('softmax', name='softmax'))
model.compile(loss='mse',
optimizer='adamax',
metrics=["accuracy"])
return model
img = load_img('exit.png', grayscale=True, target_size=[input_height, input_width])
x = img_to_array(img)
x = x.reshape((1,) + x.shape)
y = np.array(['exit'])
model = simple_model()
model.fit(x, y, batch_size=1,
epochs=10,
validation_data=(x, y),
verbose=1)
print model.predict(y)
Image Example:
(source: exitfest.org)
When I run this code, I get the following error:
ValueError: Error when checking target: expected softmax to have 3 dimensions, but got array with shape (1, 1)
Note 1: I know I can't train my model with only one image and one label, I am aware and I have a bunch more images like that, but first I need to run this simple model before improve it.
Note 2: this is the first time I work with Image-to-Sequence output, it may have other problems, so feel free to change the code if there is this kind of mistake.
Well, as I haven't received any answer, I will link to the answer I posted in another question
Here I explain how to use the keras OCR example and answer some other questions.