Why does Keras not generalize my data? - machine-learning

Ive been trying to implement a basic multilayered LSTM regression network to find correlations between cryptocurrency prices.
After running into unusable training results, i've decided to play around with some sandbox code, to make sure i've got the idea right before trying again on my full dataset.
The problem is I can't get Keras to generalize my data.
ts = 3
in_dim = 1
data = [i*100 for i in range(10)]
# tried this, didn't accomplish anything
# data = [(d - np.mean(data))/np.std(data) for d in data]
x = data[:len(data) - 4]
y = data[3:len(data) - 1]
assert(len(x) == len(y))
x = [[_x] for _x in x]
y = [[_y] for _y in y]
x = [x[idx:idx + ts] for idx in range(0, len(x), ts)]
y = [y[idx:idx + ts] for idx in range(0, len(y), ts)]
x = np.asarray(x)
y = np.asarray(y)
x looks like this:
[[[ 0]
[100]
[200]]
[[300]
[400]
[500]]]
and y:
[[[300]
[400]
[500]]
[[600]
[700]
[800]]]
and this works well when I predict using a very similar dataset, but doesn't generalize when I try a similar sequence with scaled values
model = Sequential()
model.add(BatchNormalization(
axis = 1,
input_shape = (ts, in_dim)))
model.add(LSTM(
100,
input_shape = (ts, in_dim),
return_sequences = True))
model.add(TimeDistributed(Dense(in_dim)))
model.add(Activation('linear'))
model.compile(loss = 'mse', optimizer = 'rmsprop')
model.fit(x, y, epochs = 2000, verbose = 0)
p = np.asarray([[[10],[20],[30]]])
prediction = model.predict(p)
print(prediction)
prints
[[[ 165.78544617]
[ 209.34489441]
[ 216.02174377]]]
I want
[[[ 40.0000]
[ 50.0000]
[ 60.0000]]]
how can I format this so that when i plug in a sequence with values that are of a completely different scale, the network will still output its predicted value? I've tried normalizing my training data, but the results are still entirely unusable.
What have I done wrong here?

How about transform your input data before sending into your LSTM, use something like sklearn.preprocessing.StandardScaler? after prediction you can call scaler.inverse_transform(prediction)

Related

Linear Regression in Tensor Flow - Error when modifying getting started code

I am very new to TensorFlow and I am in parallel learning traditional machine learning techniques. Previously, I was able to successfully implement linear regression modelling in matlab and in Python using scikit.
When I tried to reproduce it using Tensorflow with the same dataset, I am getting invalid outputs. Could someone advise me on where I am making the mistake or what I am missing!
Infact, I am using the code from tensor flow introductory tutorial and I just changed the x_train and y_train to a different data set.
# Loading the ML coursera course ex1 (Wk 2) data to try it out
'''
path = r'C:\Users\Prasanth\Dropbox\Python Folder\ML in Python\data\ex1data1.txt'
fh = open(path,'r')
l1 = []
l2 = []
for line in fh:
temp = (line.strip().split(','))
l1.append(float(temp[0]))
l2.append(float(temp[1]))
'''
l1 = [6.1101, 5.5277, 8.5186, 7.0032, 5.8598, 8.3829, 7.4764, 8.5781, 6.4862, 5.0546, 5.7107, 14.164, 5.734, 8.4084, 5.6407, 5.3794, 6.3654, 5.1301, 6.4296, 7.0708, 6.1891, 20.27, 5.4901, 6.3261, 5.5649, 18.945, 12.828, 10.957, 13.176, 22.203, 5.2524, 6.5894, 9.2482, 5.8918, 8.2111, 7.9334, 8.0959, 5.6063, 12.836, 6.3534, 5.4069, 6.8825, 11.708, 5.7737, 7.8247, 7.0931, 5.0702, 5.8014, 11.7, 5.5416, 7.5402, 5.3077, 7.4239, 7.6031, 6.3328, 6.3589, 6.2742, 5.6397, 9.3102, 9.4536, 8.8254, 5.1793, 21.279, 14.908, 18.959, 7.2182, 8.2951, 10.236, 5.4994, 20.341, 10.136, 7.3345, 6.0062, 7.2259, 5.0269, 6.5479, 7.5386, 5.0365, 10.274, 5.1077, 5.7292, 5.1884, 6.3557, 9.7687, 6.5159, 8.5172, 9.1802, 6.002, 5.5204, 5.0594, 5.7077, 7.6366, 5.8707, 5.3054, 8.2934, 13.394, 5.4369]
l2 = [17.592, 9.1302, 13.662, 11.854, 6.8233, 11.886, 4.3483, 12.0, 6.5987, 3.8166, 3.2522, 15.505, 3.1551, 7.2258, 0.71618, 3.5129, 5.3048, 0.56077, 3.6518, 5.3893, 3.1386, 21.767, 4.263, 5.1875, 3.0825, 22.638, 13.501, 7.0467, 14.692, 24.147, -1.22, 5.9966, 12.134, 1.8495, 6.5426, 4.5623, 4.1164, 3.3928, 10.117, 5.4974, 0.55657, 3.9115, 5.3854, 2.4406, 6.7318, 1.0463, 5.1337, 1.844, 8.0043, 1.0179, 6.7504, 1.8396, 4.2885, 4.9981, 1.4233, -1.4211, 2.4756, 4.6042, 3.9624, 5.4141, 5.1694, -0.74279, 17.929, 12.054, 17.054, 4.8852, 5.7442, 7.7754, 1.0173, 20.992, 6.6799, 4.0259, 1.2784, 3.3411, -2.6807, 0.29678, 3.8845, 5.7014, 6.7526, 2.0576, 0.47953, 0.20421, 0.67861, 7.5435, 5.3436, 4.2415, 6.7981, 0.92695, 0.152, 2.8214, 1.8451, 4.2959, 7.2029, 1.9869, 0.14454, 9.0551, 0.61705]
print ('List length and data type', len(l1), type(l1))
#------------------#
import tensorflow as tf
# Model parameters
W = tf.Variable([0], dtype=tf.float64)
b = tf.Variable([0], dtype=tf.float64)
# Model input and output
x = tf.placeholder(tf.float64)
linear_model = W * x + b
y = tf.placeholder(tf.float64)
# loss or cost function
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
# optimizer (gradient descent) with learning rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
# training data (labelled input & output swt)
# Using coursera data instead of sample data
#x_train = [1.0, 2, 3, 4]
#y_train = [0, -1, -2, -3]
x_train = l1
y_train = l2
# training loop (1000 iterations)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(1000):
sess.run(train, {x: x_train, y: y_train})
# evaluate training accuracy
curr_W, curr_b, curr_loss = sess.run([W, b, loss], {x: x_train, y: y_train})
print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))
Output
List length and data type: 97 <class 'list'>
W: [ nan] b: [ nan] loss: nan
One major problem with your estimator is the loss function. Since you use tf.reduce_sum, the loss grows with the number of samples, which you have to compensate by using a smaller learning rate. A better solution would be to use mean square error loss
loss = tf.reduce_mean(tf.square(linear_model - y))

Use neural network to learn a square wave function

Out of curiosity, I am trying to build a simple fully connected NN using tensorflow to learn a square wave function such as the following one:
Therefore the input is a 1D array of x value (as the horizontal axis), and the output is a binary scalar value. I used tf.nn.sparse_softmax_cross_entropy_with_logits as loss function, and tf.nn.relu as activation. There are 3 hidden layers (100*100*100) and a single input node and output node. The input data are generated to match the above wave shape and therefore the data size is not a problem.
However, the trained model seems to fail completed, predicting for the negative class always.
So I am trying to figure out why this happened. Whether the NN configuration is suboptimal, or it is due to some mathematical flaw in NN beneath the surface (though I think NN should be able to imitate any function).
Thanks.
As per suggestions in the comment section, here is the full code. One thing I noticed saying wrong earlier is, there were actually 2 output nodes (due to 2 output classes):
"""
See if neural net can find piecewise linear correlation in the data
"""
import time
import os
import tensorflow as tf
import numpy as np
def generate_placeholder(batch_size):
x_placeholder = tf.placeholder(tf.float32, shape=(batch_size, 1))
y_placeholder = tf.placeholder(tf.float32, shape=(batch_size))
return x_placeholder, y_placeholder
def feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, loop):
x_selected = [[None]] * batch_size
y_selected = [None] * batch_size
for i in range(batch_size):
x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0]
y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i]
feed_dict = {x_placeholder: x_selected,
y_placeholder: y_selected}
return feed_dict
def inference(input_x, H1_units, H2_units, H3_units):
with tf.name_scope('H1'):
weights = tf.Variable(tf.truncated_normal([1, H1_units], stddev=1.0/2), name='weights')
biases = tf.Variable(tf.zeros([H1_units]), name='biases')
a1 = tf.nn.relu(tf.matmul(input_x, weights) + biases)
with tf.name_scope('H2'):
weights = tf.Variable(tf.truncated_normal([H1_units, H2_units], stddev=1.0/H1_units), name='weights')
biases = tf.Variable(tf.zeros([H2_units]), name='biases')
a2 = tf.nn.relu(tf.matmul(a1, weights) + biases)
with tf.name_scope('H3'):
weights = tf.Variable(tf.truncated_normal([H2_units, H3_units], stddev=1.0/H2_units), name='weights')
biases = tf.Variable(tf.zeros([H3_units]), name='biases')
a3 = tf.nn.relu(tf.matmul(a2, weights) + biases)
with tf.name_scope('softmax_linear'):
weights = tf.Variable(tf.truncated_normal([H3_units, 2], stddev=1.0/np.sqrt(H3_units)), name='weights')
biases = tf.Variable(tf.zeros([2]), name='biases')
logits = tf.matmul(a3, weights) + biases
return logits
def loss(logits, labels):
labels = tf.to_int32(labels)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')
return tf.reduce_mean(cross_entropy, name='xentropy_mean')
def inspect_y(labels):
return tf.reduce_sum(tf.cast(labels, tf.int32))
def training(loss, learning_rate):
tf.summary.scalar('lost', loss)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
global_step = tf.Variable(0, name='global_step', trainable=False)
train_op = optimizer.minimize(loss, global_step=global_step)
return train_op
def evaluation(logits, labels):
labels = tf.to_int32(labels)
correct = tf.nn.in_top_k(logits, labels, 1)
return tf.reduce_sum(tf.cast(correct, tf.int32))
def run_training(x, y, batch_size):
with tf.Graph().as_default():
x_placeholder, y_placeholder = generate_placeholder(batch_size)
logits = inference(x_placeholder, 100, 100, 100)
Loss = loss(logits, y_placeholder)
y_sum = inspect_y(y_placeholder)
train_op = training(Loss, 0.01)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
max_steps = 10000
for step in range(max_steps):
start_time = time.time()
feed_dict = feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, step)
_, loss_val = sess.run([train_op, Loss], feed_dict = feed_dict)
duration = time.time() - start_time
if step % 100 == 0:
print('Step {}: loss = {:.2f} {:.3f}sec'.format(step, loss_val, duration))
x_test = np.array(range(1000)) * 0.001
x_test = np.reshape(x_test, (1000, 1))
_ = sess.run(logits, feed_dict={x_placeholder: x_test})
print(min(_[:, 0]), max(_[:, 0]), min(_[:, 1]), max(_[:, 1]))
print(_)
if __name__ == '__main__':
population = 10000
input_x = np.random.rand(population)
input_y = np.copy(input_x)
for bin in range(10):
print(bin, bin/10, 0.5 - 0.5*(-1)**bin)
input_y[input_x >= bin/10] = 0.5 - 0.5*(-1)**bin
batch_size = 1000
input_x = np.reshape(input_x, (population, 1))
run_training(input_x, input_y, batch_size)
Sample output shows that the model always prefer the first class over the second, as shown by min(_[:, 0]) > max(_[:, 1]), i.e. the minimum logit output for the first class is higher than the maximum logit output for the second class, for a sample size of population.
My mistake. The problem occurred in the line:
for i in range(batch_size):
x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0]
y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i]
Python is mutating the whole list of x_selected to the same value. Now this code issue is resolved. The fix is:
x_selected = np.zeros((batch_size, 1))
y_selected = np.zeros((batch_size,))
for i in range(batch_size):
x_selected[i, 0] = x[(loop*batch_size + i) % x.shape[0], 0]
y_selected[i] = y[(loop*batch_size + i) % y.shape[0]]
After this fix, the model is showing more variation. It currently outputs class 0 for x <= 0.5 and class 1 for x > 0.5. But this is still far from ideal.
So after changing the network configuration to 100 nodes * 4 layers, after 1 million training steps (batch size = 100, sample size = 10 million), the model is performing very well showing only errors at the edges when y flips.
Therefore this question is closed.
You essentially try to learn a periodic function and the function is highly non-linear and non-smooth. So it is NOT simple as it looks like. In short, a better representation of the input feature helps.
Suppose your have a period T = 2, f(x) = f(x+2).
For a reduced problem when input/output are integers, your function is then f(x) = 1 if x is odd else -1. In this case, your problem would be reduced to this discussion in which we train a Neural Network to distinguish between odd and even numbers.
I guess the second bullet in that post should help (even for the general case when inputs are float numbers).
Try representing the numbers in binary using a fixed length precision.
In our reduced problem above, it's easy to see that the output is determined iff the least-significant bit is known.
decimal binary -> output
1: 0 0 1 -> 1
2: 0 1 0 -> -1
3: 0 1 1 -> 1
...
I created the model and the structure for the problem of recognizing odd/even numbers in here.
If you abstract the fact that:
decimal binary -> output
1: 0 0 1 -> 1
2: 0 1 0 -> -1
3: 0 1 1 -> 1
Is almost equivalent to:
decimal binary -> output
1: 0 0 1 -> 1
2: 0 1 0 -> 0
3: 0 1 1 -> 1
You may update the code to fit your need.

GAN Converges in Just a Few Epochs

I implemented a genrative adversarial network in Keras. My training data size is about 16,000, where each image is of 32*32 size. All of my training images are the resized versions of the imageds from the imagenet dataset with regard to the object detection task. I fed the image matrix directly into the network without doing the center crop. I used the AdamOptimizer with the learning rate being 1e-4, and beta1 being 0.5 and I also set the dropout rate to be 0.1. I first trained the discrimator on 3000 real images and 3000 fake images and it achieved a 93% accuracy. Then, I trained for 500 epochs with the batch size being 32. However, my model seemed to converge in only a few epochs(<10), and the images it generated were ugly.
Plot of the Loss Function
Random Samples Generated by the Generator
I was wondering whether my training dataset is too small(compared to those in the paper of DCGAN, which are more than 300,000) or my model configuration is not correct. What's more, should I train the SGD on D for k iterations (where k is small, perhaps 1) and then training with SGD on G for one iteration as suggested by Ian Goodfellow in the original paper?(I have just tried to train them one at a time)
Below is the configuration of the generator.
g_input = Input(shape=[100])
H = Dense(1024*4*4, init='glorot_normal')(g_input)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = Reshape( [4, 4,1024] )(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(512, 3, 3, border_mode='same', init='glorot_uniform')(H)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(256, 3, 3, border_mode='same', init='glorot_uniform')(H)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = UpSampling2D(size=( 2, 2))(H)
H = Convolution2D(3, 3, 3, border_mode='same', init='glorot_uniform')(H)
g_V = Activation('tanh')(H)
generator = Model(g_input,g_V)
generator.compile(loss='binary_crossentropy', optimizer=opt)
generator.summary()
Below is the configuration of the discriminator:
d_input = Input(shape=shp)
H = Convolution2D(64, 5, 5, subsample=(2, 2), border_mode = 'same', init='glorot_normal')(d_input)
H = LeakyReLU(0.2)(H)
#H = Dropout(dropout_rate)(H)
H = Convolution2D(128, 5, 5, subsample=(2, 2), border_mode = 'same', init='glorot_normal')(H)
H = BatchNormalization(mode=2)(H)
H = LeakyReLU(0.2)(H)
#H = Dropout(dropout_rate)(H)
H = Flatten()(H)
H = Dense(256, init='glorot_normal')(H)
H = LeakyReLU(0.2)(H)
d_V = Dense(2,activation='softmax')(H)
discriminator = Model(d_input,d_V)
discriminator.compile(loss='categorical_crossentropy', optimizer=dopt)
discriminator.summary()
Below is the configuration of GAN as a whole:
gan_input = Input(shape=[100])
H = generator(gan_input)
gan_V = discriminator(H)
GAN = Model(gan_input, gan_V)
GAN.compile(loss='categorical_crossentropy', optimizer=opt)
GAN.summary()
I think problem is with loss function
Try
loss='categorical_crossentropy',
I suspect that your generator is trainable while you training the gan. You can verify by using generator.layers[-1].get_weights() to see if the parameters changed during training process of gan.
You should freeze discriminator before you assemble it to gan:
generator.trainnable = False
gan_input = Input(shape=[100])
H = generator(gan_input)
gan_V = discriminator(H)
GAN = Model(gan_input, gan_V)
GAN.compile(loss='categorical_crossentropy', optimizer=opt)
GAN.summary()
see this discussion:
https://github.com/fchollet/keras/issues/4674

What are the problems that causes neural networks stagnate in learning?

I was trying to see how accurate a neural network can approximate simple functions, like a scalar-valued polynomial in several variables. So I had these ideas:
Fix a polynomial of several variables, say, f(x_1,..,x_n).
Generate 50000 vectors of length n using numpy.random which will serve as training data.
Evaluate the f(x) at these points, the value will be used as label.
Make test data and label in the same way
Write a neural network and see how accuracy it can approximate f(x) on test set.
Here is my sample neural network implemented in tensorflow
import tensorflow as tf
import numpy as np
input_vector_length = int(10)
output_vector_length = int(1)
train_data_size = int(50000)
test_data_size = int(10000)
train_input_domain = [-10, 10] #Each component in an input vector is between -10 and 10
test_input_domain = [-10, 10]
iterations = 20000
batch_size = 200
regularizer = 0.01
sess = tf.Session()
x = tf.placeholder(tf.float32, shape=[None, input_vector_length], name="x")
y = tf.placeholder(tf.float32, shape =[None, output_vector_length], name="y")
function = tf.reduce_sum(x, 1) + 0.25*tf.pow(tf.reduce_sum(x,1), 2) + 0.025*tf.pow(tf.reduce_sum(x,1), 3)
#make train data input
train_input = (train_input_domain[1]-train_input_domain[0])*np.random.rand(train_data_size, input_vector_length) + train_input_domain[0]
#make train data label
train_label = sess.run(function, feed_dict = {x : train_input})
train_label = train_label.reshape(train_data_size, output_vector_length)
#make test data input
test_input = (test_input_domain[1]-test_input_domain[0])*np.random.rand(test_data_size, input_vector_length) + test_input_domain[0]
#make test data label
test_label = sess.run(function, feed_dict = {x : test_input})
test_label = test_label.reshape(test_data_size, output_vector_length)
def weight_variables(shape, name):
initial = 10*tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variables(shape, name):
initial = 10*tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def take_this_batch(data, batch_index=[]):
A = []
for i in range(len(batch_index)):
A.append(data[i])
return A
W_0 = weight_variables(shape=[input_vector_length, 10], name="W_0")
B_0 = bias_variables(shape=[10], name="W_0")
y_1 = tf.sigmoid(tf.matmul(x, W_0) + B_0)
W_1 = weight_variables(shape=[10, 20], name="W_1")
B_1 = bias_variables(shape=[20], name="B_1")
y_2 = tf.sigmoid(tf.matmul(y_1, W_1) + B_1)
W_2 = weight_variables(shape=[20,40], name="W_2")
B_2 = bias_variables(shape=[40], name="B_2")
y_3 = tf.sigmoid(tf.matmul(y_2, W_2) + B_2)
keep_prob = tf.placeholder(tf.float32, name="keep_prob")
y_drop = tf.nn.dropout(y_3, keep_prob)
W_output = weight_variables(shape=[40, output_vector_length], name="W_output")
B_output = bias_variables(shape=[output_vector_length], name="B_output")
y_output = tf.matmul(y_drop, W_output) + B_output
weight_sum = tf.reduce_sum(tf.square(W_0)) + tf.reduce_sum(tf.square(W_1)) + tf.reduce_sum(tf.square(W_2)) + tf.reduce_sum(tf.square(W_3))
cost = tf.reduce_mean(tf.square(y - y_output)) + regularizer*(weight_sum)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
error = cost
sess.run(tf.initialize_all_variables())
with sess.as_default():
for step in range(iterations):
batch_index = np.random.randint(low=0, high=train_data_size, size=batch_size)
batch_input = take_this_batch(train_input, batch_index)
batch_label = take_this_batch(train_label, batch_index)
train_step.run(feed_dict = {x : batch_input, y:batch_label, keep_prob:0.5})
if step % 1000 == 0:
current_error = error.eval(feed_dict = {x:batch_input, y:batch_label, keep_prob:1.0})
print("step %d, Current error is %f" % (step,current_error))
print(error.eval(feed_dict={x:test_input, y:test_label, keep_prob:1.0}))
Simply speaking, the performance of this neural network is horrifying! My neural network has three hidden layers of size 10, 20 and 40. The input layer is of size 10, and the output layer has size 1. I used a simple L^2 cost function, and I regularized it with the square of weights and regularizer 0.01.
During training stage, I noticed that the error seems to get stuck and refuses to go down. I am wondering what could go wrong? Thanks a lot for reading this long question. Any suggestion is appreciated.
Since you are using sigmoid as the activation function in the hidden layers, the value at these neurons is reduced to the range of (0,1). Hence, it is a good idea to normalize the input data for this network.

A Simple Network on TensorFlow

I was trying to train a very simple model on TensorFlow. Model takes a single float as input and returns the probability of input being greater than 0. I used 1 hidden layer with 10 hidden units. Full code is shown below:
import tensorflow as tf
import random
# Graph construction
x = tf.placeholder(tf.float32, shape = [None,1])
y_ = tf.placeholder(tf.float32, shape = [None,1])
W = tf.Variable(tf.random_uniform([1,10],0.,0.1))
b = tf.Variable(tf.random_uniform([10],0.,0.1))
layer1 = tf.nn.sigmoid( tf.add(tf.matmul(x,W), b) )
W1 = tf.Variable(tf.random_uniform([10,1],0.,0.1))
b1 = tf.Variable(tf.random_uniform([1],0.,0.1))
y = tf.nn.sigmoid( tf.add( tf.matmul(layer1,W1),b1) )
loss = tf.square(y - y_)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
# Training
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
N = 1000
while N != 0:
batch = ([],[])
u = random.uniform(-10.0,+10.0)
if u >= 0.:
batch[0].append([u])
batch[1].append([1.0])
if u < 0.:
batch[0].append([u])
batch[1].append([0.0])
sess.run(train_step, feed_dict = {x : batch[0] , y_ : batch[1]} )
N -= 1
while(True):
u = raw_input("Give an x\n")
print sess.run(y, feed_dict = {x : [[u]]})
The problem is, I am getting terribly unrelated results. Model does not learn anything and returns irrelevant probabilities. I tried to adjust learning rate and change variable initialization, but I did not get anything useful. Do you have any suggestions?
You are computing only one probability what you want is to have two classes:
greater/equal than zero.
less than zero.
So the output of the network will be a tensor of shape two that will contain the probabilities of each class. I renamed y_ in your example to labels:
labels = tf.placeholder(tf.float32, shape = [None,2])
Next we compute the cross entropy between the result of the network and the expected classification. The classes for positive numbers would be [1.0, 0] and for negative numbers would be [0.0, 1.0].
The loss function becomes:
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, labels)
loss = tf.reduce_mean(cross_entropy)
I renamed the y to logits as that is a more descriptive name.
Training this network for 10000 steps gives:
Give an x
3.0
[[ 0.96353203 0.03686807]]
Give an x
200
[[ 0.97816485 0.02264325]]
Give an x
-20
[[ 0.12095013 0.87537241]]
Full code:
import tensorflow as tf
import random
# Graph construction
x = tf.placeholder(tf.float32, shape = [None,1])
labels = tf.placeholder(tf.float32, shape = [None,2])
W = tf.Variable(tf.random_uniform([1,10],0.,0.1))
b = tf.Variable(tf.random_uniform([10],0.,0.1))
layer1 = tf.nn.sigmoid( tf.add(tf.matmul(x,W), b) )
W1 = tf.Variable(tf.random_uniform([10, 2],0.,0.1))
b1 = tf.Variable(tf.random_uniform([1],0.,0.1))
logits = tf.nn.sigmoid( tf.add( tf.matmul(layer1,W1),b1) )
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, labels)
loss = tf.reduce_mean(cross_entropy)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
# Training
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
N = 1000
while N != 0:
batch = ([],[])
u = random.uniform(-10.0,+10.0)
if u >= 0.:
batch[0].append([u])
batch[1].append([1.0, 0.0])
if u < 0.:
batch[0].append([u])
batch[1].append([0.0, 1.0])
sess.run(train_step, feed_dict = {x : batch[0] , labels : batch[1]} )
N -= 1
while(True):
u = raw_input("Give an x\n")
print sess.run(logits, feed_dict = {x : [[u]]})

Resources