How to write a custom evaluation metric in python for xgboost? - machine-learning

I would like to add the kappa evaluation metric to use in xgboost in Python. I am having trouble understanding how to connect a Python function with xgboost.
According to the xgboost documentation, a "User can add multiple evaluation metrics, for python user, remember to pass the metrics in as list of parameters pairs instead of map, so that latter ‘eval_metric’ won’t override previous one"
This has been raised in xgboost's github page for R but not for Python.
For example if the kappa function is:
def kappa(preds, y):
# perform kappa calculation
return score
How do I go about implementing it with xgboost?
Specifing 'kappa' as a string in the eval_metric parameter
results in XGBoostError: unknown evaluation metric type: kappa.
Likewise specifying the kappa method object results in XGBoostError: unknown evaluation metric type: <function kappa at 0x7fbef4b03488>.
How can a custom evaluation metric be used in xgboost in python?

Change your method to:
def kappa(preds, y):
# perform kappa calculation
return 'kappa', score
And use it with feval argument:
bst = xgb.train(params, dtrain, num_rounds, watchlist, feval=kappa, maximize=False)
When writing custom evaluation metrics remember about setting maximize argument. Setting it to true means that the algorithm is getting better with bigger score of the evaluation metric.

Related

how to define the derivative of a custom activation function in keras

I have a custom activation function and its derivative, although I can use the custom activation function I don't know how to tell keras what is its derivative.
It seems like it finds one itself but I have a parameter that has to be shared between the function and its derivative so how can I do that?
I know there is a relatively easy way to do this in tensorflow but I have no idea how to implement it in keras here is how you do it in tensorflow
Edit: based on the answer I got maybe I wasn't clear enough. What I want is to implement a custom derivative for my activation function so that it use my derivative during the backpropagation. I know how to implement a custom activation function.
Take a look at the source code where the activation functions of Keras are defined:
keras/activations.py
For example:
def relu(x, alpha=0., max_value=None):
"""Rectified Linear Unit.
# Arguments
x: Input tensor.
alpha: Slope of the negative part. Defaults to zero.
max_value: Maximum value for the output.
# Returns
The (leaky) rectified linear unit activation: `x` if `x > 0`,
`alpha * x` if `x < 0`. If `max_value` is defined, the result
is truncated to this value.
"""
return K.relu(x, alpha=alpha, max_value=max_value)
And also how does Keras layers call the activation functions: self.activation = activations.get(activation) the activation can be string or callable.
Thus, similarly, you can define your own activation function, for example:
def my_activ(x, p1, p2):
...
return ...
Suppose you want use this activation in Dense layer, you just put your function like this:
x = Dense(128, activation=my_activ(p1, p2))(input)
If you mean you want to implement your own derivative:
If your activation function is written in Tensorflow/Keras functions of which the operations are differentiable (e.g. K.dot(), tf.matmul(), tf.concat() etc.), then the derivatives will be obtained by automatic differentiation https://en.wikipedia.org/wiki/Automatic_differentiation. In that case you dont need to write your own derivative.
If you still want to re-write the derivatives, check this document https://www.tensorflow.org/extend/adding_an_op where you need to register your gradients using tf.RegisterGradient

Different costs for underestimation and overestimation

I have a regression problem, but the cost function is different: The cost for an underestimate is higher than an overestimate. For example, if predicted value < true value, the cost will be 3*(true-predicted)^2; if predicted value > true value, the cost will be 1*(true-predicted)^2.
I'm thinking of using classical regression models such as linear regression, random forest etc. What modifications should I make to adjust for this cost function?
As I know, the ML API such as scikit-learn does not provide the functionality to directly modify the cost function. If I have to use these APIs, what can I do?
Any recommended reading?
You can use Tensorflow (or theano) for custom cost functions. The common linear regression implementation is here.
To find out how you can implement your custom cost function looking at a huber loss function implemented in tensorflow might help you. Here comes your custom cost function which you should replace in the linked code so instead of
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
in the linked code you'll have:
error = y_known - y_pred
condition = tf.less(error, 0)
overestimation_loss = 1 * tf.square(error)
underestimation_loss = 3 * tf.square(error)
cost = tf.reduce_mean(tf.where(condition, overestimation_loss, underestimation_loss))
Here when condition is true, error is lower than zero which means you y_known is smaller than y_pred so you'll have overestimation and so the tf.where statement will choose overestimation_loss otherwise underestimation loss.
The secret is that you'll compute both losses and choose where to use them using tf.where and condition.
Update:
If you want to use other libraries, if huber loss is implemented you can take a look to get ideas because huber loss is a conditional loss function similar to yours.
You can use asymmetric cost function to make your model overestimate or underestimate. You can replace cost function in this implementation with:
def acost(a): return tf.pow(pred-Y, 2) * tf.pow(tf.sign(pred-Y) + a, 2)
for more detail see this link

Spark K-fold Cross Validation

I’m having some trouble understanding Spark’s cross validation. Any example I have seen uses it for parameter tuning, but I assumed that it would just do regular K-fold cross validation as well?
What I want to do is to perform k-fold cross validation, where k=5. I want to get the accuracy for each result and then get the average accuracy.
In scikit learn this is how it would be done, where scores would give you the result for each fold, and then you can use scores.mean()
scores = cross_val_score(classifier, y, x, cv=5, scoring='accuracy')
This is how I am doing it in Spark, paramGridBuilder is empty as I don’t want to enter any parameters.
val paramGrid = new ParamGridBuilder().build()
val evaluator = new MulticlassClassificationEvaluator()
evaluator.setLabelCol("label")
evaluator.setPredictionCol("prediction")
evaluator.setMetricName("precision")
val crossval = new CrossValidator()
crossval.setEstimator(classifier)
crossval.setEvaluator(evaluator)
crossval.setEstimatorParamMaps(paramGrid)
crossval.setNumFolds(5)
val modelCV = crossval.fit(df4)
val chk = modelCV.avgMetrics
Is this doing the same thing as the scikit learn implementation? Why do the examples use training/testing data when doing cross validation?
How to cross validate RandomForest model?
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/ModelSelectionViaCrossValidationExample.scala
What you're doing looks ok.
Basically, yes, it works the same as sklearn's grid search CV.
For each EstimatorParamMaps (a set of params), the algorithm is tested with CV so avgMetrics is average cross-validation accuracy metric/s on all folds.
In case one is using empty ParamGridBuilder (no params search), it's like having "regular" cross validation" and we that will result one cross-validated training accuracy.
Each CV iteration includes K-1 training folds and 1 test fold, so why most examples separate the data to training/testing data before doing cross validation?
because the test folds inside the CV are used for params grid search.
That means additional validation dataset is needed for model selection.
So what is called "test dataset" is needed to evaluate the final model. Read more here

Restricted Boltzman Machines for continious inputs

There is an implementation for RBMs. Thr original RBM implementation is for the discrete data such as images, my data is a binary data, does the code work for real data too? I read somewhere that there is a gaussin version of the typical RBM that works for that, is it also implemented in that module?
In short, an RBM is simply a Markov random field on a bipartite graph. So therefore you can have any probability distribution to describe the relationship between nodes.
In terms of code, you don't really need to copy things explicitly. The role that the probability function selected will play is in the contrastive divergence algorithm. You should only have to change the way that the samples are selected. The parts of the code that need to be changed are copied below.
def sample_h_given_v(self, v0_sample):
''' This function infers state of hidden units given visible units '''
# compute the activation of the hidden units given a sample of
# the visibles
pre_sigmoid_h1, h1_mean = self.propup(v0_sample)
# get a sample of the hiddens given their activation
# Note that theano_rng.binomial returns a symbolic sample of dtype
# int64 by default. If we want to keep our computations in floatX
# for the GPU we need to specify to return the dtype floatX
h1_sample = self.theano_rng.binomial(size=h1_mean.shape,
n=1, p=h1_mean,
dtype=theano.config.floatX)
return [pre_sigmoid_h1, h1_mean, h1_sample]
def propdown(self, hid):
'''This function propagates the hidden units activation downwards to
the visible units
Note that we return also the pre_sigmoid_activation of the
layer. As it will turn out later, due to how Theano deals with
optimizations, this symbolic variable will be needed to write
down a more stable computational graph (see details in the
reconstruction cost function)
'''
pre_sigmoid_activation = T.dot(hid, self.W.T) + self.vbias
return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]
def sample_v_given_h(self, h0_sample):
''' This function infers state of visible units given hidden units '''
# compute the activation of the visible given the hidden sample
pre_sigmoid_v1, v1_mean = self.propdown(h0_sample)
# get a sample of the visible given their activation
# Note that theano_rng.binomial returns a symbolic sample of dtype
# int64 by default. If we want to keep our computations in floatX
# for the GPU we need to specify to return the dtype floatX
v1_sample = self.theano_rng.binomial(size=v1_mean.shape,
n=1, p=v1_mean,
dtype=theano.config.floatX)
return [pre_sigmoid_v1, v1_mean, v1_sample]

Weka Classification

I was trying to data model a Classification Machine Learning algorithm on a data set which has 32 attributes,the last column being Target class.I refined the attributes number in to 6 from 32 ,which I felt would be more useful for my Classification model.
I tried to perform J48 and some incremental classification algorithm.
I expected output structure which consists of confusion matrix,correctlt and incorrectly classified instances,kappa value.
But my result did not give any information on Correctly and Incorrectly classified instances.Also,it did not predict confusion matrix and Kappa value.All I received is like this:
=== Summary ===
Correlation coefficient 0.9482
Mean absolute error 0.2106
Root mean squared error 0.5673
Relative absolute error 13.4077 %
Root relative squared error 31.9157 %
Total Number of Instances 1461
Can anyone tell me why I did not get Confusion matrix,kappa and Correct,Incorrect instances information.
Unfortunately you didnt write your code, or what version of weka do you apply.
BTW, to calculate confusion mtx, kappa etc. you can use methods of Evaluation class, http://weka.sourceforge.net/doc.dev/weka/classifiers/Evaluation.html
for example, after you train your model:
classifier.buildClassifier(train); \\train is an instances
Evaluation eval = new Evaluation(train);
//evaulate your model at 10 fold cross validation manner
eval.crossValidateModel(classifier, train, 10, new Random(1));
System.out.println(classifier);
//print different stats with
System.out.println(eval.toSummaryString());
System.out.println(eval.toMatrixString());
System.out.println(eval.toClassDetailsString());

Resources