I am working on project that requires identifying facial features given a person's face. I formulated this as a regression problem and want to start with a simple conv network and defined the network below.
I noticed the output predicted was always the same and some more debug later, I see that the weights and gradients of score layer do not change over iterations. I am using a fixed learning rate of ~5e-2 to generate the example below. The training loss seems decrease as iterations progress but I am unable to understand why. I also logged other layers: 'conv1', 'conv2', 'fc1' and see the same behavior of remaining constant over iterations. Since the loss seems to decrease, something must be changing and my guess is that logging the way I am doing below may not be correct.
Could you please give me some pointers to check? Please let me know if you need more information
Modified lenet:
# Modified LeNet. Added relu1, relu2 and, dropout.
# Loss function is an Euclidean distance
def lenet(hdf5_list, batch_size=64, dropout_ratio=0.5, train=True):
# our version of LeNet: a series of linear and simple nonlinear transformations
n = caffe.NetSpec()
n.data, n.label = L.HDF5Data(batch_size=batch_size, source=hdf5_list, ntop=2)
n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'), bias_filler=dict(type='constant', value=0.1))
n.relu1 = L.ReLU(n.conv1, in_place=False, relu_param=dict(negative_slope=0.1))
n.pool1 = L.Pooling(n.relu1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'), bias_filler=dict(type='constant', value=0.1))
n.relu2 = L.ReLU(n.conv2, in_place=False, relu_param=dict(negative_slope=0.1))
n.pool2 = L.Pooling(n.relu2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
if train:
n.drop3 = fc1_input = L.Dropout(n.pool2, in_place=True, dropout_param = dict(dropout_ratio=dropout_ratio) )
else:
fc1_input = n.pool2
n.fc1 = L.InnerProduct(fc1_input, num_output=500, weight_filler=dict(type='xavier'), bias_filler=dict(type='constant', value=0.1))
n.relu3 = L.ReLU(n.fc1, in_place=True, relu_param=dict(negative_slope=0.1))
n.score = L.InnerProduct(n.relu3, num_output=30, weight_filler=dict(type='xavier'))
n.loss = L.EuclideanLoss(n.score, n.label)
return n.to_proto()
solver loop:
#custom solver loop
for it in range(niter):
solver.step(1)
train_loss[it] = solver.net.blobs['loss'].data
score_weights.append(solver.net.params['score'][0].data)
score_biases.append(solver.net.params['score'][1].data)
score_weights_diff.append(solver.net.params['score'][0].diff)
score_biases_diff.append(solver.net.params['score'][1].diff)
if (it % val_interval) == 0 or (it == niter - 1):
val_error_this = 0
for test_it in range(niter_val_error):
solver.test_nets[0].forward()
val_error_this += euclidean_loss(solver.test_nets[0].blobs['score'].data ,
solver.test_nets[0].blobs['label'].data) / niter_val_error
val_error[it // val_interval] = val_error_this
printing the scores:
print score_weights_diff[0].shape
for i in range(10):
score_weights_i = score_weights_diff[i]
print score_weights_i[0:30:10,0]
print score_biases_diff[0].shape
for i in range(5):
score_biases_i = score_biases_diff[i]
print score_biases_i[0:30:6]
output:
(30, 500)
[ -3.71852257e-05 7.34565838e-05 2.61445384e-04]
[ -3.71852257e-05 7.34565838e-05 2.61445384e-04]
[ -3.71852257e-05 7.34565838e-05 2.61445384e-04]
[ -3.71852257e-05 7.34565838e-05 2.61445384e-04]
[ -3.71852257e-05 7.34565838e-05 2.61445384e-04]
[ -3.71852257e-05 7.34565838e-05 2.61445384e-04]
[ -3.71852257e-05 7.34565838e-05 2.61445384e-04]
[ -3.71852257e-05 7.34565838e-05 2.61445384e-04]
[ -3.71852257e-05 7.34565838e-05 2.61445384e-04]
[ -3.71852257e-05 7.34565838e-05 2.61445384e-04]
131
(30,)
[ 3.22921231e-04 5.66378840e-05 -5.15143370e-07 -1.51118627e-04
2.30352176e-04]
[ 3.22921231e-04 5.66378840e-05 -5.15143370e-07 -1.51118627e-04
2.30352176e-04]
[ 3.22921231e-04 5.66378840e-05 -5.15143370e-07 -1.51118627e-04
2.30352176e-04]
[ 3.22921231e-04 5.66378840e-05 -5.15143370e-07 -1.51118627e-04
2.30352176e-04]
[ 3.22921231e-04 5.66378840e-05 -5.15143370e-07 -1.51118627e-04
2.30352176e-04]
It's a bit difficult to see from your code, but it is possible that score_weights_diff, score_biases_diff and the other lists are storing references to solver.net.params['score'][0].diff and therefore all entries in the list are actually the same and change together at each iteration.
Try and save a copy:
score_weights_diff.append(solver.net.params['score'][0].diff[...].copy())
Try and print the weights/biases after each iteration and see if they change.
Related
I'm very new to NetLogo, and am attempting to incorporate bias into the Preferential Attachment model by making the probability of attachment depend on a node's color as well as degree; nodes will have a % bias (determined by a slider) to choose to link with a node of the same color as themselves.
So far, I've made the nodes heterogenous, either blue or red, with blue being ~70% of the population:
to make-node [old-node]
create-turtles 1
[
set color red
if random 100 < 70
[ set color blue ]
if old-node != nobody
[ create-link-with old-node [ set color green ]
;; position the new node near its partner
move-to old-node
fd 8
]
]
end
I understand the main preferential attachment code, which has the new node select a partner proportional to its link count:
to-report find-partner
report [one-of both-ends] of one-of links
end
Where I encounter issues is in the following procedure. Sometimes it works as expected, but sometimes I get the following error message: OF expected input to be an agent or agentset but got NOBODY instead - regarding the statement "[one-of both-ends] of one-of links" under my 'while' loop.
to make-node [old-node]
create-turtles 1
[
set color red
set characteristic 0
if random 100 < 70
[ set color blue
set characteristic 1]
if (old-node != nobody) and (random 100 < bias-percent) and (any? turtles with [color = red]) and (any? turtles with [color = blue])
[
while [characteristic != [characteristic] of old-node]
[
set old-node [one-of both-ends] of one-of links
]
]
if old-node != nobody
[ create-link-with old-node [ set color green ]
;; position the new node near its partner
move-to old-node
fd 8
]
]
end
Any help is appreciated!!!
I am trying to simulate what happens to vultures if they randomly stumble upon a carcass that has been poisoned by poachers. The poisoned carcass needs to be random. I also need to plot the deaths, so do i need to set up a dead/poisoned state in order to plot the deaths, do I need to code a to die section. Im not sure. TIA
to go
; repeat 10800 [
ask vultures
[
if state = "searching" [ search-carcasses ]
if state = "following" [follow-leaders-find-carcasses]
if state = "searching"
[ if random-float 1 < ( 1 / 360 )
[ ifelse random 2 = 0
[ rt 45 ]
[ lt 45 ] ] ]
if state != "feeding"
[ fd 0.009 ]
if state = "leader" [set time-descending time-descending + 1]
if mycarcass != 0
[ if distance mycarcass <= 0.009
[ set state "feeding"
ask mycarcass
[ set occupied? "yes" ] ] ] if state = "feeding" [
ask mycarcass
[if poisoned? = "yes"
[set state "poisoned"] ] ] if state = "poisoned" [die] ] tick ; ]
I am working on an image processing project in python in which I am required to change the coordinate system
I thought it is analogous to matrix transformation and tried but it is not working, I have taken the coordinates of the red dots
Simply subtract by 256 and divide by 512. The connection is that you see values of 256 get mapped to 0. Therefore, 0 gets mapped to -256, 256 gets mapped to 0 and 512 gets mapped to 256. However, you further need the values to be in the range of [-0.5, 0.5]. Dividing everything by 512 finishes this off.
Therefore the relationship is:
out = (in - 256) / 512 = (in / 512) - 0.5
Try some values from your example input above to convince yourself that this is the correct relationship.
If you want to form this as a matrix multiplication, this can be interpreted as an affine transform with scale and translation, but no rotation:
[ 1/512 0 -0.5 ]
K = [ 0 1/512 -0.5 ]
[ 0 0 1 ]
Take note that you will need to use homogeneous coordinates to achieve the desired result.
For example:
(x, y) = (384, 256)
[X] [ 1/512 0 -0.5 ][384]
[Y] = [ 0 1/512 -0.5 ][256]
[1] [ 0 0 1 ][ 1 ]
[X] [384/512 - 0.5] [ 0.25 ]
[Y] = [256/512 - 0.5] = [ 0 ]
[1] [ 1 ] [ 1 ]
Simply remove the last coordinate to get the final answer of (0.25, 0).
When performing 2D convolutions in TersorFlow using the conv_2d layer, does it expect the pixels to be lined up as
[
[img[i].red, img[i].green, img[i].blue],
[img[i+1].red, etc.],
]
Or
[
[mg[i].red, img[i+1].red, etc.],
[mg[i].green, img[i+1].green, etc.],
]
or some other way?
2D convolutions expect a 4-d tensor as input with the following shape:
[batch_size, image_height, image_width, channel_size]
In case of rgb images the channels are the three colors. Therefore the pixel should be lined up as:
[
[
[img[i,j].red, img[i,j].green, img[i,j].blue],
[img[i, j+1].red, img[i, j+1].green, img[i, j+1].blue],
etc
],
[
[img[i+1,j].red, img[i+1,j].green, img[i+1,j].blue],
[img[i+1, j+1].red, img[i+1, j+1].green, img[i+1, j+1].blue],
etc
],
etc
]
(with img[y_coordinate, x_coordinate] and img[i,j] = img[i*image_width + j])
I wrote a code in TensorFlow for linear classification. I generated the fake data based on a rule which is "If the difference is greater than x(some constant) the output should be [1,0] else the output should be [0,1]. Here is my code
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def main(_):
# Import data
# Create the model
x = tf.placeholder(tf.float32, [None, 2])
W1 = weight_variable([2, 2])
b1 = bias_variable([2])
y2 = tf.nn.softmax(tf.matmul(x, W1) + b1)
# Define loss and optimizer'''
y_ = tf.placeholder(tf.float32, [None, 2])
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y2))
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# Train
for _ in range(10000):
batch_xs, batch_ys = data_supplier.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
# Test trained model
test_batch_x, test_batch_y = data_supplier.test_data()
correct_prediction = tf.equal(tf.argmax(y2, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: test_batch_x,
y_: test_batch_y}))
print(x.eval(feed_dict={x: test_batch_x, y_: test_batch_y}))
print(y2.eval(feed_dict={x: test_batch_x, y_: test_batch_y}))
print(y_.eval(feed_dict={x: test_batch_x, y_: test_batch_y}))
print(W1.eval())
print(b1.eval())
print(cross_entropy.eval(feed_dict={x: test_batch_x, y_: test_batch_y}))
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--data_dir', type=str, default='/tmp/tensorflow/mnist/input_data',
help='Directory for storing input data')
FLAGS, unparsed = parser.parse_known_args()
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
And here's the data_supplier code:
TOTAL_DATA_SIZE = 50000
TRAIN_DATA_SIZE = 40000
VALIDATION_DATA_SIZE = 5000
TEST_DATA_SIZE = 5000
COLUMNS = ["a", "b","output", "outputbar"]
FEATURES = ["a", "b"]
LABELS = ["output", "outputbar"]
training_set = pd.read_csv("train.csv", skipinitialspace=True, skiprows=1, names=COLUMNS)
training_set_features = training_set.as_matrix(columns=FEATURES)
training_set_labels = training_set.as_matrix(columns=LABELS)
test_set = pd.read_csv("test.csv", skipinitialspace=True, skiprows=1, names=COLUMNS)
test_set_features = test_set.as_matrix(columns=FEATURES)
test_set_labels = test_set.as_matrix(columns=LABELS)
def next_batch(BATCH_SIZE):
k = np.random.randint(0,TRAIN_DATA_SIZE-BATCH_SIZE)
return training_set_features[k:k+BATCH_SIZE], training_set_labels[k:k+BATCH_SIZE]
def test_data():
return test_set_features, test_set_labels
And here's the output:
accuracy: 0.6852
Input: [[ 0.51166666 0.79333335]
[ 0.85833335 0.21833333]
[ 0.80333334 0.48333332]
...,
[ 0.28333333 0.96499997]
[ 0.97666669 0.84833336]
[ 0.57666665 0.21333334]]
Predictions: [[ 0.80804855 0.19195142]
[ 0.78380686 0.21619321]
[ 0.80210352 0.19789645]
...,
[ 0.80708122 0.19291875]
[ 0.83949649 0.16050354]
[ 0.76328743 0.23671262]]
Actual output: [[ 0. 1.]
[ 1. 0.]
[ 1. 0.]
...,
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]]
Weights: [[ 0.3034386 -0.10369452]
[ 0.29422989 -0.21103808]]
Bias: [ 0.5141086 -0.3141087]
Cross Entropy: 0.624272
The accuracy is meaningless currently as every prediction is classified into [1,0]. What is the mistake I'm making?
You should use the softmax_cross_entropy_with_logits() in the way it is designed for: https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits.
The warning says that the function expects unscaled logits, i.e. you should not perform a softmax operation for y2: y2 = tf.matmul(x, W1) + b1. For the test set, you will have to perform a softmax operation: y_out_test = tf.nn.softmax(y2) or something like that.
Maybe this will already solve your problem.
If not, if only a single class is predicted, it is often a hint to an imbalance in the data set, i.e. that one class occurs much more frequently than the other. You should check whether this is the case for you. If so, you might find some advice on how to deal with this, for example here: http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/. I did not check this site in detail, so I cannot tell you whether it is particularly helpful, but you might want to consult some web pages if your data set is heavily imbalanced.