Zero initialiser for biases using get_variable in tensorflow - machine-learning

A code I'm modifying is using tf.get_variable for weight variables, and tf.Variable for bias initialisation. After some searching, it seems that get_variable should always be favoured due to its portability in regards to sharing. So I tried to change the bias variable to get_variable but can't seem to get it to work.
Original: tf.Variable(tf.zeros([128]), trainable=True, name="b1")
My attempt: tf.get_variable(name="b1", shape=[128], initializer=tf.zeros_initializer(shape=[128]))
I get an error saying that the shape should not be specified for constants. But removing the shape then throws an error for no arguments.
I'm very new to tf so I'm probably misunderstanding something fundamental here. Thanks for the help in advance :)

Following should work:
tf.get_variable(name="b1", shape=[128], initializer=tf.zeros_initializer())


Modifying loss function faster rcnn detectron

For my thesis I am trying to modify the loss function of faster-rcnn with regards to recognizing table structures.
Currently I am using Facebooks Detectron. Seems to be working great but I am now actively trying to modify the loss function. Debugging my code I notice this is where the loss functions are added
def add_fast_rcnn_losses(model):
"""Add losses for RoI classification and bounding box regression."""
cls_prob, loss_cls =
['cls_score', 'labels_int32'], ['cls_prob', 'loss_cls'],
loss_bbox =
'bbox_pred', 'bbox_targets', 'bbox_inside_weights',
loss_gradients = blob_utils.get_loss_gradients(model, [loss_cls, loss_bbox])
model.Accuracy(['cls_prob', 'labels_int32'], 'accuracy_cls')
model.AddLosses(['loss_cls', 'loss_bbox'])
return loss_gradients
The debugger cant find any declaration or implementation of nor SoftmaxWithLoss. Detectron uses caffe, and when I look in the net_builder (which inits the I see it makes "binds"(dont know the proper word) to caffe2, which on itself is a pylib with a compiled lib behind it.
Am I looking in the wrong place to make a minor adjustment to this loss function, or will I really have to open de source from dcaffe, adjust the loss, recompile the lib?
You should implement loss function by yourself. Modifying library source code and recompile it - isn't very good idea :)
You can create python function, that will take GT and predicted data and return loss value.
Also you can create a duplicate of L1-smooth or Cross-entropy, which is currently used and then, when you will make sure, that they are the same, you can modify them. Or you can implement, for example, L2 loss for boxes and use it instead.
More information about custom losses you can find in caffee documentation.

OpenCV Background Model Component Extraction

I am working with the BackgroundSubtractorMOG2 class in OpenCV (Python), and am trying to extract the individual components of the background model. As I understand it, each pixel will be modeled by the mixture of a varying number of gaussian distributions, each defined by a mean and variance. So, how can I determine what all of these components (means and variances) are after feeding the background subtractor a given number of frames?
The documentation here:
Does not seem to discuss doing this.
This information must be contained somewhere in the background subtractor object. Does anyone know how to get to it?
Edit: A little more searching has led me to believe that the cv2.Algorithm class is required to read the parameters from the BackgroundSubtractorMOG2 object. I think the two questions posed here:
Reading algorithm parameters from file in OpenCV
are similar to what I am asking, but I am unable to interpret the answers. I thought the solution would be something along the lines of:
Parameters ='name_of_backgroundsubtractorMOG2_object')
but this returns an error of: 'Required argument 'fn' (pos 1) not found'
Edit 2: Unfortunately I think this question has been answered here:
Save opencv BackgroundSubtractorMOG to file?
Short answer: It cannot be done! Sad!

encogmodel selectmethod configuration

Could someone point me to examples on how to configure the encogmodel with selectmethod? This is an overloaded method with the first one providing just taking inputs as dataset and method. The second one however allows the following:
I am unable to get this working as the following error appears "Layer can't have zero neurons, Unknown architecture element:". Any help is appreciated. thank you.
Also, some insight on how to dump the weights in this approach? When the model is built via building the network (BasicNetwork), it is possible to dump the weights as network.flat approach. In this encogmodel driven approach, how do we dump the weights, gradients etc? thank you
There are three examples for EncogModel, you can find them here:
If that does not help, let me know more specifically what you are trying to do, or provide some code that is not working, and I update this to a more specific answer.
The weights can be directly accessed by BasicNetwork.dumpWeights, BasicNetwork.dumpWeightsVerbose(), or more directly with BasicNetwork.getWeight

Why do we need to explicitly update the moving_mean and moving_variance in TensorFlow's Batch normalization in tf.contrib.layers.batch_norm?

To Long To Read: How can I use Batch Normalization with tf.contrib.layers.batch_norm without having to explicitly tell session to update the moving_statistics (moving_mean and moving_variance) or not?
A few months ago I provided an answer to How could I use Batch Normalization in TensorFlow? and noticed a few weird details that I wanted to address. First it seems that the implementation that I provide seems repetitive with respect to the is_training variable. Recall my suggested code:
from tensorflow.contrib.layers.python.layers import batch_norm as batch_norm
def batch_norm_layer(x,train_phase,scope_bn):
bn_train = batch_norm(x, decay=0.999, center=True, scale=True,
reuse=None, # is this right?
bn_inference = batch_norm(x, decay=0.999, center=True, scale=True,
reuse=True, # is this right?
z = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference)
return z
in it I have a train_phase variable that just holds a tf boolean tf.placeholder(tf.bool, name='phase_train'). As you can see, it is used to decide if the batch norm layer should be in inference mode or not. However, the variable seemed a little redundant, since it seems I have two variables that specify the same thing twice. i.e. once in train_phase and another in is_training. Is that really necessary?
I thought about it a bit and it seems I might to be able to remove the hard coded (is_training=True/False) with the (pseudo)code:
from tensorflow.contrib.layers.python.layers import batch_norm as batch_norm
def batch_norm_layer(x,train_phase,scope_bn):
bn = batch_norm(x, decay=0.999, center=True, scale=True,
reuse=None, # is this right?
z = tf.cond(train_phase, lambda: bn, lambda: bn)
return z
which seems to make the train_phase variable completely redundant/silly. This actually highlights my most important point, is the train_phase variable and tf.cond(train_phase, lambda: bn_train, lambda: bn_inference) even necessary? Which actually brings up my biggest complaint about the code (though I think this code might not even run because when defining the graph the placeholder train_phase might not even have a value but you get the idea).
Honestly I find having to even explicitly define train_phase very dangerous because it seems very unnecessary for users to have to handle the inference/training mode of Batch Norm this explicitly. Though, "normal" users of Batch Norm should always update the moving_mean,moving_variance with the train data and any standard user of Batch Norm should not be updating moving_mean,moving_variance with test statistics at any time. Since the user is required to do:, feed_dict={x: batch_xs, y_: batch_ys, phase_train=True})
it can bring cause really bad bugs for users that shouldn't even exist in the first place (at least in my opinion). Furthermore, it seems weird to have to explicitly say what the phase_train is because whenever one trains, one uses an optimizer, so it should be incredibly clear when that code is called that it should be true. Maybe this is a terrible idea but it feels like the optimizer or the session should be setting that to true automatically rather than relying on the user to do it right.
I understand that sometimes users are allowed more flexibility to be more creative but I can't really appreciate how this (even for a researcher) be a good feature. Maybe I am just using the library incorrectly or being paranoic, but should the user really be forced to be so explicit when using batch norm? Is there some way around this?
As a side point, having the phase_train be part of the model also makes the code be a bit more ugly and confusing than it feels necessary because it seems to me that its unavoidable to have a line of code where the session is being used to check if the batch norm flag is on or not. The code I am trying to avoid writing is the logic:
if batch_norm:
# during training, feed_dict={x: batch_xs, y_: batch_ys, phase_train=True})
# with no batch norm, feed_dict={x: batch_xs, y_: batch_ys})
it just feels totally unnecessary. It feels the during training the model should know if it should be updating the variables or not.
As quick (really ugly) solution to the last problem with the if condition in the session, one can always define phase_train as part of the model (or at least as part of the graph) and accordingly set it equal to true and/or false when appropriate but when one doesn't actually use the batch norm layer, one actually does not use the phase_train placeholder in the model even if we set it have a value in the i.e. the sessions sets it to true or false, but when BN is not being used, it doesn't even matter what one sets it equal to since its not actually being used. Obviously, this makes the code really confusing (since one is defining some variable one doesn't even need), but I can't seem to find a way to hide the phase_train variable. For the moment this is what I am going for because it seems really ugly to have to split (or duplicate) my code between lines that have:, feed_dict={...,phase_train=False})
and the ones that don't have it all:, feed_dict={...})
Ideally I want the second solution and have batch norm work regardless if I use the silly phase_train variable.
I don't really have a complete answer to your question, but I have a few observations:
The standard practice seems to be to build slightly different graphs for training and for inference, each built with or without is_training enabled.
The batch_norm layer is designed so that you can use an arg_scope to set is_training=True for all layers in your model. For example, take a look at how the Inceptionv3 model is defined here: . This at least makes it much more convenient to set is_training once in your Python code that builds a model and to have it apply everywhere.
Tensorflow's underlying infrastructure doesn't distinguish between training and inference timeā€”it's just running graphs of operators. tf.Session doesn't really know anything about Neural Networks, training, or inference, so it isn't the right place for this kind of logic.
One could imagine that an Optimizer should rewrite the graph to enable is_training for those operators that support it. I don't have a strong opinion about this; you might try filing a Tensorflow Github issue making that feature request to see what others think about it. It might seem a bit too "magical".
Hope that helps!

Bootstapping hazard rates with non-linear interpolators

I have been using QuantLib 1.6.2 to bootstrap the hazard rates from a CDS
curve. My code is similar to the example "CDS.cpp" that comes with the
QuantLib distribution, i.e.,
boost::shared_ptr<PiecewiseDefaultCurve<HazardRate, BackwardFlat> >
hazardRateStructure(new PiecewiseDefaultCurve<HazardRate, BackwardFlat>
(todaysDate, instruments, Actual365Fixed()));
I tried to experiment with different non-linear interpolation methods (instead of BackwardFlat listed above) such as:
but I am getting the error "no appropriate default constructor available". What is the proper way of passing one of these interpolators to the
PiecewiseDefaultCurve class?
Thank you,
[Note: in case someone stumbles on this question, I'm copying here the answer I gave you on the QuantLib mailing list.]
The classes you're listing are the actual interpolation classes, but the curve is expecting a corresponding factory class (for instance, BackwardFlat in the example is the factory for the BackwardFlatInterpolation class). In the case of cubic interpolations, you'll have to use the Cubic class. By default, it builds Kruger interpolations (I'm not aware of the reason for the choice) so if you write:
PiecewiseDefaultCurve<HazardRate, Cubic>(todaysDate, instruments, Actual365Fixed())
you'll get a curve using the KrugerCubic class. To get the other interpolations, you can pass a Cubic instance with the corresponding parameters (you can look them up in the constructors of the interpolation classes); for instance,
PiecewiseDefaultCurve<HazardRate, Cubic>(todaysDate, instruments, Actual365Fixed(),
1e-12, Cubic(CubicInterpolation::Spline, false))
will give you a curve using the CubicNaturalSpline class, and
PiecewiseDefaultCurve<HazardRate, Cubic>(todaysDate, instruments, Actual365Fixed(),
1e-12, Cubic(CubicInterpolation::Parabolic, true))
will use the MonotonicParabolic class.
