I'am designing keras model for classification based on article data.
I have data with 4 dimension as follows
[batch, article_num, word_num, word embedding size]
and i want to feed each (word_num, word embedding) data into keras Bidirectional layer
in order to get result with 3 dimension as follows.
[batch, article_num, bidirectional layer output size]
when i tried to feed 4 dimension data for testing like this
inp = Input(shape=(article_num, word_num, ))
# dims = [batch, article_num, word_num]
x = Reshape((article_num * word_num, ), input_shape = (article_num, word_num))(inp)
# dims = [batch, article_num * word_num]
x = Embedding(word_num, word_embedding_size, input_length = article_num * word_num)(x)
# dims = [batch, article_num * word_num, word_embedding_size]
x = Reshape((article_num , word_num, word_embedding_size),
input_shape = (article_num * word_num, word_embedding_size))(x)
# dims = [batch, article_num, word_num, word_embedding_size]
x = Bidirectional(CuDNNLSTM(50, return_sequences = True),
input_shape=(article_num , word_num, word_embedding_size))(x)
and i got the error
ValueError: Input 0 is incompatible with layer bidirectional_12: expected ndim=3, found ndim=4
how can i achieve this?
If you don't want it to touch the article_num dimension, you can try using a TimeDistributed wrapper. But I'm not certain that it will be compatible with bidirectional and other stuff.
inp = Input(shape=(article_num, word_num))
x = TimeDistributed(Embedding(word_num, word_embedding_size)(x))
#option 1
#x1 shape : (batch, article_num, word_num, 50)
x1 = TimeDistributed(Bidirectional(CuDNNLSTM(50, return_sequences = True)))(x)
#option 2
#x2 shape : (batch, article_num, 50)
x2 = TimeDistributed(Bidirectional(CuDNNLSTM(50)))(x)
Hints:
Don't use input_shape everywhere, you only need it at the Input tensor.
You probably don't need any of the reshapes if you also use a TimeDistributed in the embedding.
If you don't want word_num in the final dimension, use return_sequences=False.
Related
I have a dataset containing 1000 examples where each example has 5 features (a,b,c,d,e). I want to feed 7 examples to an LSTM so it predicts the feature (a) of the 8th day.
Reading Pytorchs documentation of nn.LSTM() I came up with the following:
input_size = 5
hidden_size = 10
num_layers = 1
output_size = 1
lstm = nn.LSTM(input_size, hidden_size, num_layers)
fc = nn.Linear(hidden_size, output_size)
out, hidden = lstm(X) # Where X's shape is ([7,1,5])
output = fc(out[-1])
output # output's shape is ([7,1])
According to the docs:
The input of the nn.LSTM is "input of shape (seq_len, batch, input_size)" with "input_size – The number of expected features in the input x",
And the output is: "output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t."
In this case, I thought seq_len would be the sequence of 7 examples, batchis 1 and input_size is 5. So the lstm would consume each example containing 5 features refeeding the hidden layer every iteration.
What am I missing?
When I extend your code to a full example -- I also added some comments to may help -- I get the following:
import torch
import torch.nn as nn
input_size = 5
hidden_size = 10
num_layers = 1
output_size = 1
lstm = nn.LSTM(input_size, hidden_size, num_layers)
fc = nn.Linear(hidden_size, output_size)
X = [
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
[[1,2,3,4,5]],
]
X = torch.tensor(X, dtype=torch.float32)
print(X.shape) # (seq_len, batch_size, input_size) = (7, 1, 5)
out, hidden = lstm(X) # Where X's shape is ([7,1,5])
print(out.shape) # (seq_len, batch_size, hidden_size) = (7, 1, 10)
out = out[-1] # Get output of last step
print(out.shape) # (batch, hidden_size) = (1, 10)
out = fc(out) # Push through linear layer
print(out.shape) # (batch_size, output_size) = (1, 1)
This makes sense to me, given your batch_size = 1 and output_size = 1 (I assume, you're doing regression). I don't know where your output.shape = (7, 1) come from.
Are you sure that your X has the correct dimensions? Did you create nn.LSTM maybe with batch_first=True? There are lot of little things that can sneak in.
Here is code I wrote to perform a single convolution and output the shape.
Using formula from http://cs231n.github.io/convolutional-networks/ to calculate output size :
You can convince yourself that the correct formula for calculating how
many neurons “fit” is given by (W−F+2P)/S+1
The formula for computing the output size has been implemented below as
def output_size(w , f , stride , padding) :
return (((w - f) + (2 * padding)) / stride) + 1
The issue is output_size computes a size of 2690.5 which differs to the result of the convolution which is 1350 :
%reset -f
import torch
import torch.nn.functional as F
import numpy as np
from PIL import Image
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from pylab import plt
plt.style.use('seaborn')
%matplotlib inline
width = 60
height = 30
kernel_size_param = 5
stride_param = 2
padding_param = 2
img = Image.new('RGB', (width, height), color = 'red')
in_channels = 3
out_channels = 3
class ConvNet(nn.Module):
def __init__(self):
super(ConvNet, self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(in_channels,
out_channels,
kernel_size=kernel_size_param,
stride=stride_param,
padding=padding_param))
def forward(self, x):
out = self.layer1(x)
return out
# w : input volume size
# f : receptive field size of the Conv Layer neurons
# output_size computes spatial size of output volume - spatial dimensions are (width, height)
def output_size(w , f , stride , padding) :
return (((w - f) + (2 * padding)) / stride) + 1
w = width * height * in_channels
f = kernel_size_param * kernel_size_param
print('output size :' , output_size(w , f , stride_param , padding_param))
model = ConvNet()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=.001)
img_a = np.array(img)
img_pt = torch.tensor(img_a).float()
result = model(img_pt.view(3, width , height).unsqueeze_(0))
an = result.view(30 , 15 , out_channels).data.numpy()
# print(result.shape)
# print(an.shape)
# print(np.amin(an.flatten('F')))
print(30 * 15 * out_channels)
Have I implemented output_size correctly ? How to amend this model so the result of Conv2d has same shape as result of output_size ?
The problem is that your input image is not a square, so you should apply the formula on the width and the heigth of the input image.
And also you should not use the nb_channels in the formula because we are explicitly defining how many channels we want in the output.
Then you use your f=kernel_size and not f=kernel_size*kernel_size as described in the formula.
w = width
h = height
f = kernel_size_param
output_w = int(output_size(w , f , stride_param , padding_param))
output_h = int(output_size(h , f , stride_param , padding_param))
print("Output_size", [out_channels, output_w, output_h]) #--> [1, 3, 30 ,15]
And then output size :
print("Output size", result.shape) #--> [1, 3, 30 ,15]
Formula source : http://cs231n.github.io/convolutional-networks/
I'm trying to get a basic LSTM working in TensorFlow. I'm receiving the following error:
TypeError: 'Tensor' object is not iterable.
The offending line is:
rnn_outputs, final_state = tf.nn.dynamic_rnn(cell, x, sequence_length=seqlen,
initial_state=init_state,)`
I'm using version 1.0.1 on windows 7. My inputs and label have the following shapes
x_shape = (50, 40, 18), y_shape = (50, 40)
Where:
batch size = 50
sequence length = 40
input vector length at each step = 18
I'm building my graph as follows
def build_graph(learn_rate, seq_len, state_size=32, batch_size=5):
# use a fixed sequence length
seqlen = tf.constant(seq_len, shape=[batch_size],dtype=tf.int32)
# Placeholders
x = tf.placeholder(tf.float32, [batch_size, None, 18])
y = tf.placeholder(tf.float32, [batch_size, None])
keep_prob = tf.constant(1.0)
# RNN
cell = tf.contrib.rnn.LSTMCell(state_size)
init_state = tf.get_variable('init_state', [1, state_size],
initializer=tf.constant_initializer(0.0))
init_state = tf.tile(init_state, [batch_size, 1])
rnn_outputs, final_state = tf.nn.dynamic_rnn(cell, x, sequence_length=seqlen,
initial_state=init_state,)
# Add dropout, as the model otherwise quickly overfits
rnn_outputs = tf.nn.dropout(rnn_outputs, keep_prob)
# Prediction layer
with tf.variable_scope('prediction'):
W = tf.get_variable('W', [state_size, num_classes])
b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))
preds = tf.tanh(tf.matmul(rnn_outputs, W) + b)
# MSE
loss = tf.square(tf.subtract(y, preds))
# loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits, y))
train_step = tf.train.AdamOptimizer(learn_rate).minimize(loss)
Can anyone tell me what I am missing?
Sequence length should be iterable e.g. a list or tensor, not a scalar. In your case specifically, you need to replace sequence length = 40 with a list of the lengths of each input. For instance, if your first sequence has 10 steps, the second 13 and the third 18, you would pass in [10, 13, 18]. This lets TensorFlow's dynamic RNN know how many steps to unroll for (I believe it uses a while loop internally).
I implemented a genrative adversarial network in Keras. My training data size is about 16,000, where each image is of 32*32 size. All of my training images are the resized versions of the imageds from the imagenet dataset with regard to the object detection task. I fed the image matrix directly into the network without doing the center crop. I used the AdamOptimizer with the learning rate being 1e-4, and beta1 being 0.5 and I also set the dropout rate to be 0.1. I first trained the discrimator on 3000 real images and 3000 fake images and it achieved a 93% accuracy. Then, I trained for 500 epochs with the batch size being 32. However, my model seemed to converge in only a few epochs(<10), and the images it generated were ugly.
Plot of the Loss Function
Random Samples Generated by the Generator
I was wondering whether my training dataset is too small(compared to those in the paper of DCGAN, which are more than 300,000) or my model configuration is not correct. What's more, should I train the SGD on D for k iterations (where k is small, perhaps 1) and then training with SGD on G for one iteration as suggested by Ian Goodfellow in the original paper?(I have just tried to train them one at a time)
Below is the configuration of the generator.
g_input = Input(shape=[100])
H = Dense(1024*4*4, init='glorot_normal')(g_input)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = Reshape( [4, 4,1024] )(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(512, 3, 3, border_mode='same', init='glorot_uniform')(H)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(256, 3, 3, border_mode='same', init='glorot_uniform')(H)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(3, 3, 3, border_mode='same', init='glorot_uniform')(H)
g_V = Activation('tanh')(H)
generator = Model(g_input,g_V)
generator.compile(loss='binary_crossentropy', optimizer=opt)
generator.summary()
Below is the configuration of the discriminator:
d_input = Input(shape=shp)
H = Convolution2D(64, 5, 5, subsample=(2, 2), border_mode = 'same', init='glorot_normal')(d_input)
H = LeakyReLU(0.2)(H)
#H = Dropout(dropout_rate)(H)
H = Convolution2D(128, 5, 5, subsample=(2, 2), border_mode = 'same', init='glorot_normal')(H)
H = BatchNormalization(mode=2)(H)
H = LeakyReLU(0.2)(H)
#H = Dropout(dropout_rate)(H)
H = Flatten()(H)
H = Dense(256, init='glorot_normal')(H)
H = LeakyReLU(0.2)(H)
d_V = Dense(2,activation='softmax')(H)
discriminator = Model(d_input,d_V)
discriminator.compile(loss='categorical_crossentropy', optimizer=dopt)
discriminator.summary()
Below is the configuration of GAN as a whole:
gan_input = Input(shape=[100])
H = generator(gan_input)
gan_V = discriminator(H)
GAN = Model(gan_input, gan_V)
GAN.compile(loss='categorical_crossentropy', optimizer=opt)
GAN.summary()
I think problem is with loss function
Try
loss='categorical_crossentropy',
I suspect that your generator is trainable while you training the gan. You can verify by using generator.layers[-1].get_weights() to see if the parameters changed during training process of gan.
You should freeze discriminator before you assemble it to gan:
generator.trainnable = False
gan_input = Input(shape=[100])
H = generator(gan_input)
gan_V = discriminator(H)
GAN = Model(gan_input, gan_V)
GAN.compile(loss='categorical_crossentropy', optimizer=opt)
GAN.summary()
see this discussion:
https://github.com/fchollet/keras/issues/4674
I was trying to train a very simple model on TensorFlow. Model takes a single float as input and returns the probability of input being greater than 0. I used 1 hidden layer with 10 hidden units. Full code is shown below:
import tensorflow as tf
import random
# Graph construction
x = tf.placeholder(tf.float32, shape = [None,1])
y_ = tf.placeholder(tf.float32, shape = [None,1])
W = tf.Variable(tf.random_uniform([1,10],0.,0.1))
b = tf.Variable(tf.random_uniform([10],0.,0.1))
layer1 = tf.nn.sigmoid( tf.add(tf.matmul(x,W), b) )
W1 = tf.Variable(tf.random_uniform([10,1],0.,0.1))
b1 = tf.Variable(tf.random_uniform([1],0.,0.1))
y = tf.nn.sigmoid( tf.add( tf.matmul(layer1,W1),b1) )
loss = tf.square(y - y_)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
# Training
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
N = 1000
while N != 0:
batch = ([],[])
u = random.uniform(-10.0,+10.0)
if u >= 0.:
batch[0].append([u])
batch[1].append([1.0])
if u < 0.:
batch[0].append([u])
batch[1].append([0.0])
sess.run(train_step, feed_dict = {x : batch[0] , y_ : batch[1]} )
N -= 1
while(True):
u = raw_input("Give an x\n")
print sess.run(y, feed_dict = {x : [[u]]})
The problem is, I am getting terribly unrelated results. Model does not learn anything and returns irrelevant probabilities. I tried to adjust learning rate and change variable initialization, but I did not get anything useful. Do you have any suggestions?
You are computing only one probability what you want is to have two classes:
greater/equal than zero.
less than zero.
So the output of the network will be a tensor of shape two that will contain the probabilities of each class. I renamed y_ in your example to labels:
labels = tf.placeholder(tf.float32, shape = [None,2])
Next we compute the cross entropy between the result of the network and the expected classification. The classes for positive numbers would be [1.0, 0] and for negative numbers would be [0.0, 1.0].
The loss function becomes:
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, labels)
loss = tf.reduce_mean(cross_entropy)
I renamed the y to logits as that is a more descriptive name.
Training this network for 10000 steps gives:
Give an x
3.0
[[ 0.96353203 0.03686807]]
Give an x
200
[[ 0.97816485 0.02264325]]
Give an x
-20
[[ 0.12095013 0.87537241]]
Full code:
import tensorflow as tf
import random
# Graph construction
x = tf.placeholder(tf.float32, shape = [None,1])
labels = tf.placeholder(tf.float32, shape = [None,2])
W = tf.Variable(tf.random_uniform([1,10],0.,0.1))
b = tf.Variable(tf.random_uniform([10],0.,0.1))
layer1 = tf.nn.sigmoid( tf.add(tf.matmul(x,W), b) )
W1 = tf.Variable(tf.random_uniform([10, 2],0.,0.1))
b1 = tf.Variable(tf.random_uniform([1],0.,0.1))
logits = tf.nn.sigmoid( tf.add( tf.matmul(layer1,W1),b1) )
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, labels)
loss = tf.reduce_mean(cross_entropy)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
# Training
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
N = 1000
while N != 0:
batch = ([],[])
u = random.uniform(-10.0,+10.0)
if u >= 0.:
batch[0].append([u])
batch[1].append([1.0, 0.0])
if u < 0.:
batch[0].append([u])
batch[1].append([0.0, 1.0])
sess.run(train_step, feed_dict = {x : batch[0] , labels : batch[1]} )
N -= 1
while(True):
u = raw_input("Give an x\n")
print sess.run(logits, feed_dict = {x : [[u]]})