I am recreating the LightGBM binary log loss function using first and second-order derivatives calculated from https://www.derivative-calculator.net.
But my plots are different from the actual plots of the original definition as found in LightGBM
Why the difference in the plots? Am I calculating derivatives in the wrong way?
As we know,
loss = -y_true log(y_pred) - (1-y_true) log(1-y_pred) where y_pred = sigmoid(logits)
Here is what calculator finds for,
-y log(1/(1+e^-x)) - (1-y) log(1-1/(1+e^-x))
=
and,
=
When I plot above using code,
def custom_odds_loss(y_true, y_pred):
y = y_true
# ======================
# Inverse sigmoid
# ======================
epsilon_ = 1e-7
y_pred = np.clip(y_pred, epsilon_, 1 - epsilon_)
y_pred = np.log(y_pred/(1-y_pred))
# ======================
grad = -((y-1)*np.exp(y_pred)+y)/(np.exp(y_pred)+1)
hess = np.exp(y_pred)/(np.exp(y_pred)+1)**2
return grad, hess
# Penalty chart for True 1s all the time
y_true_k = np.ones((1000, 1))
y_pred_k = np.expand_dims(np.linspace(0, 1, 1000), axis=1)
grad, hess = custom_odds_loss(y_true_k, y_pred_k)
data_ = {
'Payoff#grad': grad.flatten(),
}
pd.DataFrame(data_).plot(title='Target=1(G)|Penalty(y-axis) vs Probability/1000. (x-axis)');
data_ = {
'Payoff#hess': hess.flatten(),
}
pd.DataFrame(data_).plot(title='Target=1(H)|Penalty(y-axis) vs Probability/1000. (x-axis)');
Now, actual plot of LightGBM,
def custom_odds_loss(y_true, y_pred):
# ======================
# Inverse sigmoid
# ======================
epsilon_ = 1e-7
y_pred = np.clip(y_pred, epsilon_, 1 - epsilon_)
y_pred = np.log(y_pred/(1-y_pred))
# ======================
grad = y_pred - y_true
hess = y_pred * (1. - y_pred)
return grad, hess
# Penalty chart for True 1s all the time
y_true_k = np.ones((1000, 1))
y_pred_k = np.expand_dims(np.linspace(0, 1, 1000), axis=1)
grad, hess = custom_odds_loss(y_true_k, y_pred_k)
data_ = {
'Payoff#grad': grad.flatten(),
}
pd.DataFrame(data_).plot(title='Target=1(G)|Penalty(y-axis) vs Probability/1000. (x-axis)');
data_ = {
'Payoff#hess': hess.flatten(),
}
pd.DataFrame(data_).plot(title='Target=1(H)|Penalty(y-axis) vs Probability/1000. (x-axis)');
In the second function, you don't need to invert the sigmoid.
You see, the derivatives you found can be simplified as follows:
This simplification allows us not to invert anything and find the gradient and second derivative simply like this:
def custom_odds_loss(y_true, y_pred):
grad = y_pred - y_true
hess = y_pred * (1. - y_pred)
return grad, hess
I am very new to TensorFlow and I am in parallel learning traditional machine learning techniques. Previously, I was able to successfully implement linear regression modelling in matlab and in Python using scikit.
When I tried to reproduce it using Tensorflow with the same dataset, I am getting invalid outputs. Could someone advise me on where I am making the mistake or what I am missing!
Infact, I am using the code from tensor flow introductory tutorial and I just changed the x_train and y_train to a different data set.
# Loading the ML coursera course ex1 (Wk 2) data to try it out
'''
path = r'C:\Users\Prasanth\Dropbox\Python Folder\ML in Python\data\ex1data1.txt'
fh = open(path,'r')
l1 = []
l2 = []
for line in fh:
temp = (line.strip().split(','))
l1.append(float(temp[0]))
l2.append(float(temp[1]))
'''
l1 = [6.1101, 5.5277, 8.5186, 7.0032, 5.8598, 8.3829, 7.4764, 8.5781, 6.4862, 5.0546, 5.7107, 14.164, 5.734, 8.4084, 5.6407, 5.3794, 6.3654, 5.1301, 6.4296, 7.0708, 6.1891, 20.27, 5.4901, 6.3261, 5.5649, 18.945, 12.828, 10.957, 13.176, 22.203, 5.2524, 6.5894, 9.2482, 5.8918, 8.2111, 7.9334, 8.0959, 5.6063, 12.836, 6.3534, 5.4069, 6.8825, 11.708, 5.7737, 7.8247, 7.0931, 5.0702, 5.8014, 11.7, 5.5416, 7.5402, 5.3077, 7.4239, 7.6031, 6.3328, 6.3589, 6.2742, 5.6397, 9.3102, 9.4536, 8.8254, 5.1793, 21.279, 14.908, 18.959, 7.2182, 8.2951, 10.236, 5.4994, 20.341, 10.136, 7.3345, 6.0062, 7.2259, 5.0269, 6.5479, 7.5386, 5.0365, 10.274, 5.1077, 5.7292, 5.1884, 6.3557, 9.7687, 6.5159, 8.5172, 9.1802, 6.002, 5.5204, 5.0594, 5.7077, 7.6366, 5.8707, 5.3054, 8.2934, 13.394, 5.4369]
l2 = [17.592, 9.1302, 13.662, 11.854, 6.8233, 11.886, 4.3483, 12.0, 6.5987, 3.8166, 3.2522, 15.505, 3.1551, 7.2258, 0.71618, 3.5129, 5.3048, 0.56077, 3.6518, 5.3893, 3.1386, 21.767, 4.263, 5.1875, 3.0825, 22.638, 13.501, 7.0467, 14.692, 24.147, -1.22, 5.9966, 12.134, 1.8495, 6.5426, 4.5623, 4.1164, 3.3928, 10.117, 5.4974, 0.55657, 3.9115, 5.3854, 2.4406, 6.7318, 1.0463, 5.1337, 1.844, 8.0043, 1.0179, 6.7504, 1.8396, 4.2885, 4.9981, 1.4233, -1.4211, 2.4756, 4.6042, 3.9624, 5.4141, 5.1694, -0.74279, 17.929, 12.054, 17.054, 4.8852, 5.7442, 7.7754, 1.0173, 20.992, 6.6799, 4.0259, 1.2784, 3.3411, -2.6807, 0.29678, 3.8845, 5.7014, 6.7526, 2.0576, 0.47953, 0.20421, 0.67861, 7.5435, 5.3436, 4.2415, 6.7981, 0.92695, 0.152, 2.8214, 1.8451, 4.2959, 7.2029, 1.9869, 0.14454, 9.0551, 0.61705]
print ('List length and data type', len(l1), type(l1))
#------------------#
import tensorflow as tf
# Model parameters
W = tf.Variable([0], dtype=tf.float64)
b = tf.Variable([0], dtype=tf.float64)
# Model input and output
x = tf.placeholder(tf.float64)
linear_model = W * x + b
y = tf.placeholder(tf.float64)
# loss or cost function
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
# optimizer (gradient descent) with learning rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
# training data (labelled input & output swt)
# Using coursera data instead of sample data
#x_train = [1.0, 2, 3, 4]
#y_train = [0, -1, -2, -3]
x_train = l1
y_train = l2
# training loop (1000 iterations)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(1000):
sess.run(train, {x: x_train, y: y_train})
# evaluate training accuracy
curr_W, curr_b, curr_loss = sess.run([W, b, loss], {x: x_train, y: y_train})
print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))
Output
List length and data type: 97 <class 'list'>
W: [ nan] b: [ nan] loss: nan
One major problem with your estimator is the loss function. Since you use tf.reduce_sum, the loss grows with the number of samples, which you have to compensate by using a smaller learning rate. A better solution would be to use mean square error loss
loss = tf.reduce_mean(tf.square(linear_model - y))
I am following the blog for transfer learning:
##First I compute the saved the bottleneck features and build a new model and train it with the bottleneck features:
input_layer = Input(shape=base_model.output_shape[1:])
x = GlobalAveragePooling2D()(input_layer)
x = Dense(512, activation='relu',name='fc_new_1')(x)
x = Dropout(0.2)(x)
x = Dense(512, activation='relu',name='fc_new_2')(x)
x = Dense(num_classes, activation='softmax',name='logit_new')(x)
Add_layers = Model(inputs=input_layer, outputs=x,name='Add_layers')
##Then I put this new model at the end of pretrained models:
base_model = ResNet50(include_top=False, weights='imagenet', input_shape=
(img_shape[0],img_shape[1],3))
x = base_model.output
predictions = Add_layers(x)
model = Model(inputs=base_model.input, outputs=predictions)
model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=
['accuracy'])
##Then, I evaluate the model :
score = model.evaluate_generator(train_generator, nb_train_samples //
batch_size_finetuning)
print('The evaluation of the entire model before fine tuning : ')
print(score)
score = model.evaluate_generator(validation_generator, nb_validation_samples
// batch_size_evaluation)
print('The evaluation of the entire model before fine tuning : ')
print(score)
And get training loss and accuracy : [0.015362062912073827, 1.0]
validation loss and accuracy : [0.89740632474422455, 0.75]
##Just one line below it, I trained the new model:
model.fit_generator(train_generator,
steps_per_epoch= nb_train_samples // batch_size_finetuning,
epochs=finetuning_epoch,
validation_data=validation_generator,
validation_steps=nb_validation_samples //batch_size_evaluation,
callbacks=[checkpointer_finetuning,
history_finetuning,TB_finetuning,
lrate_finetuning,Eartly_Stopping_finetuning]);
Then the output is :
31/31 [==============================] - 35s - loss: 3.4004 - acc: 0.3297 - val_loss: 0.9591 - val_acc: 0.7083
Weird thing is: This problem only happens if I use Resnet50 and InceptionV3 but not with vgg16. I am pretty sure that changing the pretrained model is the only difference. I understand that the dropout might make it different but it should not be this large and vgg16 has no obvious problem at all.
Another weird thing is: If I change every layer to be .trainable = False and compile, the validation accuracy will still decrease dramatically. I even checked the weights of every layer and if .trainable = False weights will not change and .trainable = True weights will change.
Any help is appreciated!!! THANKS!!!
I use tensorflow and python to predict the stock price in a demo. But when i add dropout to the code, the generated figure doesnt seem to be correct. Please advise where is wrong.
with tf.variable_scope(scope_name):
cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=n_inputs)
lstm_dropout = tf.nn.rnn_cell.DropoutWrapper(cell,input_keep_prob=1.0, output_keep_prob=1.0)
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_dropout]*num_layers)
output, state = tf.nn.rnn(cell, input, dtype=tf.float32)
You should only apply dropout in training, but not in inference.
You can do this by pass the dropout probability by a placeholder.
Then set the keep probability to one when inference.
As your example :
input_keep_prob = tf.placeholder(tf.float32)
output_keep_prob = tf.placeholder(tf.float32)
with tf.variable_scope(scope_name):
cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=n_inputs)
lstm_dropout = tf.nn.rnn_cell.DropoutWrapper(cell,input_keep_prob=input_keep_prob,
output_keep_prob=output_keep_prob)
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_dropout]*num_layers)
output, state = tf.nn.rnn(cell, input, dtype=tf.float32)
#setup your loss and training optimizer
#y_pred = .....
#loss = .....
#train_op = .....
with tf.Session() as sess:
sess.run(train_op, feed_dict={input_keep_prob=0.7, output_keep_prob=0.7}) #set dropout when training
y = sess.run(y_pred, feed_dict={input_keep_prob=1.0, output_keep_prob=1.0}) #retrieve the prediction without dropout when inference
I'm trying to define a pinbal loss function for implementing a 'quantile regression' in neural network with Keras (with Tensorflow as backend).
The definition is here: pinball loss
It's hard to implement traditional K.means() etc. function since they deal with the whole batch of y_pred, y_true, yet I have to consider each component of y_pred, y_true, and here's my original code:
def pinball_1(y_true, y_pred):
loss = 0.1
with tf.Session() as sess:
y_true = sess.run(y_true)
y_pred = sess.run(y_pred)
y_pin = np.zeros((len(y_true), 1))
y_pin = tf.placeholder(tf.float32, [None, 1])
for i in range((len(y_true))):
if y_true[i] >= y_pred[i]:
y_pin[i] = loss * (y_true[i] - y_pred[i])
else:
y_pin[i] = (1 - loss) * (y_pred[i] - y_true[i])
pinball = tf.reduce_mean(y_pin, axis=-1)
return K.mean(pinball, axis=-1)
sgd = SGD(lr=0.1, clipvalue=0.5)
model.compile(loss=pinball_1, optimizer=sgd)
model.fit(Train_X, Train_Y, nb_epoch=10, batch_size=20, verbose=2)
I attempted to transfer y_pred, y_true is to vectorized data structure so I can cite them with index, and deal with individual components, yet it seems problem occurs due to the lack of knowledge in treating y_pred, y_true individually.
I tried to dive into lines directed by errors, yet I almost get lost.
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'dense_16_target' with dtype float
[[Node: dense_16_target = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
How can I fix it? Thanks!
I’ve figured this out by myself with Keras backend:
def pinball(y_true, y_pred):
global i
tao = (i + 1) / 10
pin = K.mean(K.maximum(y_true - y_pred, 0) * tao +
K.maximum(y_pred - y_true, 0) * (1 - tao))
return pin
This is a more efficient version:
def pinball_loss(y_true, y_pred, tau):
err = y_true - y_pred
return K.mean(K.maximum(tau * err, (tau - 1) * err), axis=-1)
Using an additional parameter and the functools.partial function is IMHO the cleanest way of setting different values for tau:
model.compile(loss=functools.partial(pinball_loss, tau=0.1), optimizer=sgd)