TensorFlow's perceptron gives unexplaineble output - machine-learning

I'am new to TF: I took perceptron's code from this tutorial on MNIST(actually, its not necessary to follow this link) :https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/multilayer_perceptron.py
I wanted to remake those perceptron to a perceptron with 1 layer and linear activation function, to make it the most simpliest form of : output =w2(w1*x+b1)+b2. But this is what i get:
Data:
X_train: array([[ 10.],
[ 10.],
[ 11.],
[ 6.],
[ 8.],
[ 9.],
[ 22.],
[ 14.],
[ 6.],
[ 8.],
[ 11.],
[ 9.],
[ 13.],
[ 7.],
[ 13.],
[ 7.],
[ 13.],
[ 11.]])
y_train: array([[ 44.5825],
[ 53.99 ],
[ 52.4475],
[ 37.6 ],
[ 38.6125],
[ 39.5875],
[ 43.07 ],
[ 74.8575],
[ 34.185 ],
[ 38.61 ],
[ 34.8175],
[ 36.61 ],
[ 34.0675],
[ 37.67 ],
[ 49.725 ],
[ 79.4775],
[ 50.41 ],
[ 51.26 ]])
X_test: array([[ 6.],
[ 14.],
[ 14.],
[ 12.],
[ 13.],
[ 13.]])
y_test: array([[ 55.75 ],
[ 33.035 ],
[ 38.3275],
[ 39.2825],
[ 50.7325],
[ 45.2575]])
Parameters:
learning_rate = 1
training_epochs = 1
display_step = 1 #maintaining variable
x = tf.placeholder("float", [None, 1])
y = tf.placeholder("float", [None, 1])
Perceptron model:
def multilayer_perceptron(x, weights, biases, output_0):
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
out_layer = tf.add(tf.matmul(layer_1, weights['out']), biases['out'])
output_o = out_layer #This variable is just needed to print result in session
return out_layer
output_0 = tf.Variable(tf.random_normal([1, n_classes]))
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'out': tf.Variable(tf.random_normal([n_classes]))}
Let's build the graph:
prediction = multilayer_perceptron(x, weights, biases, output)
cost = tf.reduce_mean(tf.square(prediction-y)) #MSE
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) #Gives the smallest cost
init = tf.initialize_all_variables()
Finally, let's run the session:
with tf.Session() as Sess:
Sess.run(init)
for epoch in range(training_epochs):
avg_cost = 0.
number_of_bathces = len(X_train)/batch_size
_, c = Sess.run([optimizer, cost], feed_dict = {x: X_train, y: y_train})
avg_cost += c/len(X_train)
print(Sess.run(output_0))
if epoch % display_step ==0:
print("Epoch:", '%02d' % (epoch+1), "cost =", "{:.9f}".format(avg_cost))
print("Optimization finished")
correct_prediction = tf.equal(tf.arg_max(prediction,1), tf.arg_max(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print("Accuracy:", accuracy.eval({x:X_test, y:y_test}))
And now, we get the output:
[[ 0.77995574]]
Epoch: 01 cost = 262.544189453
Optimization finished
Accuracy: 1.0
The most confusing thing is the output(first number)! It should be somewhere in range of [30; 50]! Please, explain me, where did i do wrong.

Your code is notably messy, so I've removed a lot of redundant pieces:
from __future__ import print_function
import numpy as np
import tensorflow as tf
X_train = np.array([[ 10.], [ 10.], [ 11.], [ 6.], [ 8.], [ 9.], [ 22.], [ 14.], [ 6.], [ 8.], [ 11.], [ 9.], [ 13.], [ 7.], [ 13.], [ 7.], [ 13.], [ 11.]])
y_train = np.array([[ 44.5825], [ 53.99 ], [ 52.4475], [ 37.6 ], [ 38.6125], [ 39.5875], [ 43.07 ], [ 74.8575], [ 34.185 ], [ 38.61 ], [ 34.8175], [ 36.61 ], [ 34.0675], [ 37.67 ], [ 49.725 ], [ 79.4775], [ 50.41 ], [ 51.26 ]])
X_test = np.array([[ 6.], [ 14.], [ 14.], [ 12.], [ 13.], [ 13.]])
y_test = np.array([[ 55.75 ], [ 33.035 ], [ 38.3275], [ 39.2825], [ 50.7325], [ 45.2575]])
learning_rate = 0.05
training_epochs = 10
n_classes = 1
n_hidden_1 = 5
n_hidden_2 = 5
n_input = 1
x = tf.placeholder(tf.float32, [None, 1])
y = tf.placeholder(tf.float32, [None, 1])
def multilayer_perceptron(x, weights, biases):
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
out_layer = tf.add(tf.matmul(layer_1, weights['out']), biases['out'])
return out_layer
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'out': tf.Variable(tf.random_normal([n_classes]))}
prediction = multilayer_perceptron(x, weights, biases)
cost = tf.reduce_mean(tf.square(prediction - y)) #MSE
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost) #Gives the smallest cost
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
_, c = sess.run([optimizer, cost], feed_dict = {x: X_train, y: y_train})
print("Epoch:", '%02d' % (epoch+1), "cost =", "{:.9f}".format(c))
print("Optimization finished")
print(sess.run(prediction, feed_dict = {x: X_test, y: y_test} ))
It seems to work now. I've got the following results:
Epoch: 01 cost = 1323.519653320
Epoch: 02 cost = 926.386840820
Epoch: 03 cost = 628.072326660
Epoch: 04 cost = 431.689270020
Epoch: 05 cost = 343.259063721
Epoch: 06 cost = 355.978668213
Epoch: 07 cost = 430.280548096
Epoch: 08 cost = 501.149414062
Epoch: 09 cost = 527.575683594
Epoch: 10 cost = 507.708007812
Optimization finished
[[ 30.79703712]
[ 69.70319366]
[ 69.70319366]
[ 59.97665405]
[ 64.83992004]
[ 64.83992004]]
Results may vary due to random initialization of weights.
Couple of tips:
Use smaller learning rate
Train over several epochs to see the dynamics

Related

Multiple dimensionality reduction techniques with pipeline and GridSearchCV

we all know the common approach to define a pipeline with a dimensionality reduction technique and then a model for training and testing. Then we can apply the GridSearchCv for hyperparameter tuning.
grid = GridSearchCV(
Pipeline([
('reduce_dim', PCA()),
('classify', RandomForestClassifier(n_jobs = -1))
]),
param_grid=[
{
'reduce_dim__n_components': range(0.7,0.9,0.1),
'classify__n_estimators': range(10,50,5),
'classify__max_features': ['auto', 0.2],
'classify__min_samples_leaf': [40,50,60],
'classify__criterion': ['gini', 'entropy']
}
],
cv=5, scoring='f1')
grid.fit(X,y)
I can understand the above code.
Now i was going through the documentation today and there i found one part code which is little bit strange.
pipe = Pipeline([
# the reduce_dim stage is populated by the param_grid
('reduce_dim', 'passthrough'), # How does this work??
('classify', LinearSVC(dual=False, max_iter=10000))
])
N_FEATURES_OPTIONS = [2, 4, 8]
C_OPTIONS = [1, 10, 100, 1000]
param_grid = [
{
'reduce_dim': [PCA(iterated_power=7), NMF()],
'reduce_dim__n_components': N_FEATURES_OPTIONS, ### No PCA is used..??
'classify__C': C_OPTIONS
},
{
'reduce_dim': [SelectKBest(chi2)],
'reduce_dim__k': N_FEATURES_OPTIONS,
'classify__C': C_OPTIONS
},
]
reducer_labels = ['PCA', 'NMF', 'KBest(chi2)']
grid = GridSearchCV(pipe, n_jobs=1, param_grid=param_grid)
X, y = load_digits(return_X_y=True)
grid.fit(X, y)
First of all while defining a pipeline, it used a string 'passthrough' instead of a object.
('reduce_dim', 'passthrough'), ```
Then while defining different dimensionality reduction technique for the grid search, it used a different strategy. How does [PCA(iterated_power=7), NMF()] this work ?
'reduce_dim': [PCA(iterated_power=7), NMF()],
'reduce_dim__n_components': N_FEATURES_OPTIONS, # here
Please Someone explain the code to me .
Solved - in one line, the order is ['PCA', 'NMF', 'KBest(chi2)']
Courtesy of - seralouk (see answer below)
For Reference If someone looks for more details
1 2 3
It is equivalent as far as I know.
In the documentation you have this:
pipe = Pipeline([
# the reduce_dim stage is populated by the param_grid
('reduce_dim', 'passthrough'),
('classify', LinearSVC(dual=False, max_iter=10000))
])
N_FEATURES_OPTIONS = [2, 4, 8]
C_OPTIONS = [1, 10, 100, 1000]
param_grid = [
{
'reduce_dim': [PCA(iterated_power=7), NMF()],
'reduce_dim__n_components': N_FEATURES_OPTIONS,
'classify__C': C_OPTIONS
},
{
'reduce_dim': [SelectKBest(chi2)],
'reduce_dim__k': N_FEATURES_OPTIONS,
'classify__C': C_OPTIONS
},
]
Initially we have ('reduce_dim', 'passthrough'), and then 'reduce_dim': [PCA(iterated_power=7), NMF()]
The definition of the PCA is done in the second line.
You could define alternatively:
pipe = Pipeline([
# the reduce_dim stage is populated by the param_grid
('reduce_dim', PCA(iterated_power=7)),
('classify', LinearSVC(dual=False, max_iter=10000))
])
N_FEATURES_OPTIONS = [2, 4, 8]
C_OPTIONS = [1, 10, 100, 1000]
param_grid = [
{
'reduce_dim__n_components': N_FEATURES_OPTIONS,
'classify__C': C_OPTIONS
},
{
'reduce_dim': [SelectKBest(chi2)],
'reduce_dim__k': N_FEATURES_OPTIONS,
'classify__C': C_OPTIONS
},
]

Memory error while loading very lard data from h5 file on a cluster

I am running into a MemoryError when I attempt to load a very large dataset from an hdf5 file. I have attached a short example below.
import dask
import dask.array as da
import h5py
from dask.distributed import Client
client = Client('tcp://10.11.69.71:44393')
handle = h5py.File('h5_file.h5', 'r') # matrix size: (4500, 6291456)
a = da.from_array(handle['_data'], chunks='auto') # matrix size: (6291456, 128)
st1 = da.random.random((a.shape[1], 128))
st = client.run(start)
res = da.matmul(a, st1)
res.compute()
this results in the following error:
distributed.worker - WARNING - Compute Failed
Function: execute_task
args: ((subgraph_callable, (<function concatenate_axes at 0x2b85d304a0d0>, [array([[ 42., 50., 5., ..., 168., 203., 214.],
[129., 159., 0., ..., 187., 153., 136.],
[ 0., 0., 0., ..., 228., 209., 204.],
...,
[ 18., 28., 13., ..., 255., 227., 218.],
[ 79., 86., 61., ..., 53., 64., 55.],
[ 42., 76., 106., ..., 101., 35., 20.]], dtype=float32), array([[ 50., 60., 33., ..., 169., 204., 215.],
[ 24., 111., 0., ..., 185., 151., 133.],
[ 0., 0., 0., ..., 226., 207., 202.],
...,
[ 17., 23., 14., ..., 255., 228., 219.],
[111., 120., 101., ..., 53., 64., 55.],
[ 85., 98., 90., ..., 100., 37., 22.]], dtype=float32), array([[ 65., 61., 35., ..., 170., 205., 215.],
[215., 237., 214., ..., 184., 149., 131.],
[ 49., 42., 21., ..., 223., 205., 200.],
...,
[ 16., 20., 11., ..., 255., 229., 220.],
[ 85., 85., 69., ..., 53., 64., 54.],
[ 6
kwargs: {}
Exception: MemoryError()
Am I loading the data incorrectly? I have tried to use result as well to no avail.
PS I am using dask-mpi to create my client
Note that by calling .compute you are asking for the output of your computation to be returned to you as a single, in-memory, numpy array.
If your output is very large then you might instead want to save it to a file, using a function like to_hdf5.

Decrease loss in keras training using lstm

I have an input like this:
x_train = [
[0,0,0,1,-1,-1,1,0,1,0,...,0,1,-1],
[-1,0,0,-1,-1,0,1,1,1,...,-1,-1,0],
...
[1,0,0,1,1,0,-1,-1,-1,...,-1,-1,0]
]
which 1 means increase in one metric and -1 means decrease in it and 0 means no change in the metric. Each array has 83 items for 83 fields and the output (labels) for each array is a categorical array that shows effect of these metrics on a single metric:
[[ 0. 0. 1.]
[ 1. 0. 0.],
[ 0. 0. 1.],
...
[ 0. 0. 1.],
[ 1. 0. 0.]]
I used keras and lstm in the following code:
def train(x, y, x_test, y_test):
x_train = np.array(x)
y_train = np.array(y)
y_train = to_categorical(y_train, 3)
model = Sequential()
model.add(Embedding(x_train.shape[0], output_dim=256))
model.add(LSTM(128))
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax'))
opt = optimizers.SGD(lr=0.001)
model.compile(loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=128, nb_epoch=100)
y_test = to_categorical(y_test, 3)
score = model.evaluate(x_test, y_test, batch_size=128)
prediction = model.predict(x_test, batch_size=128)
print score
print prediction
but the loss after 100 epochs is:
1618/1618 [==============================] - 0s - loss: 0.7328 - acc: 0.5556
How can I decrease this loss percentage?

Is there a matrix_element_inv in Maxima?

In Maxima, we have matrix_element_add, matrix_element_mult and matrix_element_transpose.
Is there a matrix_element_inv, and if not, how could I make one?
If you want to invert matrix,first remember that not all matrix can be inverted, so first be sure that your matrix can be inverted.
For maxima working with matrix the operator for multiplying is .
so with A . A = A^2
if we want to get this value is A^^2
Normally the operator apply to each element of the matrix so if you would to invert all the elements:
(%i1) A: matrix ([17, 3], [-8, 11]);
[ 17 3 ]
(%o1) [ ]
[ - 8 11 ]
(%i9) A^-1;
[ 1 1 ]
[ -- - ]
[ 17 3 ]
(%o9) [ ]
[ 1 1 ]
[ - - -- ]
[ 8 11 ]
then to get the inverse of a matrix:
(%i2) B: A^^-1;
[ 11 3 ]
[ --- - --- ]
[ 211 211 ]
(%o2) [ ]
[ 8 17 ]
[ --- --- ]
[ 211 211 ]
(%i4) B.A;
[ 1 0 ]
(%o4) [ ]
[ 0 1 ]
(%i5) A.B;
[ 1 0 ]
(%o5) [ ]
[ 0 1 ]
be sure that your matrix is invertible:
(%i6) Bad: matrix ([2, 3], [4, 6]);
[ 2 3 ]
(%o6) [ ]
[ 4 6 ]
(%i7) Bad^^-1;
expt: undefined: 0 to a negative exponent.
-- an error. To debug this try: debugmode(true);
(%i8) newdet(Bad);
(%o8)/R/ 0
Now you should read carefully this section:
http://maxima.sourceforge.net/docs/manual/maxima_23.html
specially when telling about
matrix_element_add
so really there are only this opereators so doesn't exist a matrix_element_inv
so you can write your own using lambda functions as follows for example for getting the transpose of all the inverted elements:
(%i10) matrix_element_transpose: lambda ([x], x^-1)$
(%i11) transpose(A);
[ 1 1 ]
[ -- - - ]
[ 17 8 ]
(%o11) [ ]
[ 1 1 ]
[ - -- ]
[ 3 11 ]
hope this helps

How to repeat unknown dimension in TensorFlow

For example (I can do this with Theano without a problem):
std_var = T.repeat(T.exp(log_var)[None, :], Mean.shape[0], axis=0)
wrt TF Mean has shape (?, num), but log_var has shape (num,)
I don't know how to do the same in TensorFlow...
You can use shape to extract the shape of a placeholder during evaluation. Then simply tile the tensor. For instance, for:
num = 3
p1 = tf.placeholder(tf.float32, (None, num))
p2 = tf.placeholder(tf.float32, (num,))
the operation:
op = tf.tile(tf.reshape(p2, [1, -1]), (tf.shape(p1)[0], 1))
sess.run(op, feed_dict={p1:[[1,2,3],
[4,5,6]],
p2: [1,2,1]})
will give:
array([[ 1., 2., 1.],
[ 1., 2., 1.]], dtype=float32)
However, in most cases you actually do not need to do that since you can rely on the broadcasting behavior of TF operations. For instance:
op = tf.add(p1, p2)
sess.run(op, feed_dict={p1:[[1,2,3],
[4,5,6]],
p2: [1,2,1]})
gives:
array([[ 2., 4., 4.],
[ 5., 7., 7.]], dtype=float32)

Resources