Does the Tensorflow random forest module (tensor_forest) allow to specify a true weighted least squares objective function to be minimized during the training of the random forest model?
From what I gather, random forest model training in tensor_forest occurs within a specific estimator (TensorForestEstimator), which does not seem to allow to specify a custom loss function (weighted least squares is the one I am interested).
How can I achieve this?
Related
Keras model.fit supports per-sample weights. What is the range of acceptable values for these weights? Must they sum to 1 across all training samples? Or does keras accept any weight values and then perform some sort of normalization? The keras source includes, e.g. training_utils.standardize_weights but that does not appear to be doing statistical standardization.
After looking at the source here, I've found that you should be able to pass any acceptable numerical values (within overflow bounds) for both sample weights and class weights. They do not need to sum to 1 across all training samples and each weight may be greater than one. The only sort of normalization that appears to be happening is taking the max of 2D class weight inputs.
If both class weights and samples weights are provided, it provides the product of the two.
I think the unspoken component here is that the activation function should be dealing with normalization.
I get different results for model.predict_proba(X)[:,0] compared to model.decision_function(X)for a regular Grad Boost Decision Tree classifier in SKLearn so I know that that is not the same.
I want the scores of the model. To plot ROC curves etc. How can I get the decision function for XGBoost classifier using the SKLearn wrapper? And why is predict_proba different from scores?
In general, I would not expect the sklearn.GradientBoostingClassifier and xgboost.XGBClassifier to agree, as those use very different implementations. But there are also conceptual difference between the quantities that you have tried to compare:
And why is predict_proba different from scores?
Probabilities (output of model.predict_proba(X)) are obtained from the scores (output of model.decision_function(X)) applying the loss/objective function, see here for the call to the loss function and here for the actual transformation.
I want the scores of the model. To plot ROC curves etc. How can I get the decision function for XGBoost classifier using the SKLearn wrapper?
For the ROC curve you will want to use xgbmodel.predict_proba(X)[:,1], i.e. the second column that correspond to the class 1.
I built a deep learning model which accept image of size 250*250*3 and output 62500(250*250) binary vector which contains 0s in pixels that represent the background and 1s in pixels which represents ROI.
My model is based on DenseNet121 but when i use softmax as an activation function in last layer and categorical cross entropy loss function , the loss is nan.
What is the best loss and activation function that i can use it in my model?
What is the difference between binary cross entropy and categorical cross entropy loss function?
Thanks in advance.
What is the best loss and activation function that i can use it in my model?
Use binary_crossentropy because every output is independent, not mutually exclusive and can take values 0 or 1, use sigmoid in the last layer.
Check this interesting question/answer
What is the difference between binary cross entropy and categorical cross entropy loss function?
Here is a good set of answers to that question.
Edit 1: My bad, use binary_crossentropy.
After a quick look at the code (again) I can see that keras uses:
for binary_crossentropy -> tf.nn.sigmoid_cross_entropy_with_logits
(From tf docs): Measures the probability error in discrete classification tasks in which each class is independent and not mutually exclusive. For instance, one could perform multilabel classification where a picture can contain both an elephant and a dog at the same time.
for categorical_crossentropy -> tf.nn.softmax_cross_entropy_with_logits
(From tf docs): Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.
Can anyone please explain in simple words and possibly with some examples what is a loss function in the field of machine learning/neural networks?
This came out while I was following a Tensorflow tutorial:
https://www.tensorflow.org/get_started/get_started
It describes how far off the result your network produced is from the expected result - it indicates the magnitude of error your model made on its prediciton.
You can then take that error and 'backpropagate' it through your model, adjusting its weights and making it get closer to the truth the next time around.
The loss function is how you're penalizing your output.
The following example is for a supervised setting i.e. when you know the correct result should be. Although loss functions can be applied even in unsupervised settings.
Suppose you have a model that always predicts 1. Just the scalar value 1.
You can have many loss functions applied to this model. L2 is the euclidean distance.
If I pass in some value say 2 and I want my model to learn the x**2 function then the result should be 4 (because 2*2 = 4). If we apply the L2 loss then its computed as ||4 - 1||^2 = 9.
We can also make up our own loss function. We can say the loss function is always 10. So no matter what our model outputs the loss will be constant.
Why do we care about loss functions? Well they determine how poorly the model did and in the context of backpropagation and neural networks. They also determine the gradients from the final layer to be propagated so the model can learn.
As other comments have suggested I think you should start with basic material. Here's a good link to start off with http://neuralnetworksanddeeplearning.com/
Worth to note we can speak of different kind of loss functions:
Regression loss functions and classification loss functions.
Regression loss function describes the difference between the values that a model is predicting and the actual values of the labels.
So the loss function has a meaning on a labeled data when we compare the prediction to the label at a single point of time.
This loss function is often called the error function or the error formula.
Typical error functions we use for regression models are L1 and L2, Huber loss, Quantile loss, log cosh loss.
Note: L1 loss is also know as Mean Absolute Error. L2 Loss is also know as Mean Square Error or Quadratic loss.
Loss functions for classification represent the price paid for inaccuracy of predictions in classification problems (problems of identifying which category a particular observation belongs to).
Name a few: log loss, focal loss, exponential loss, hinge loss, relative entropy loss and other.
Note: While more commonly used in regression, the square loss function can be re-written and utilized for classification.
Imagine we have a classification problem on a dataset where the examples are only positive (equivalently negative). For instance, on a problem where the the winning class is specified by position (e.g. think of a tennis dataset problem where the first player is always the winner). How can we create negative examples in order to train a supervised learning algorithm on this dataset? One idea could be to generate negative examples, by exchanging the positions of the features that are tied to each of the classes. Do you think this will give an unbiased dataset? Could we create negative duplicates of our original dataset and train a supervised learning algorithm on this double dataset?