Why does Mahout RMSRecommenderEvaluator evaluate method sometimes results NaN?

Why does Mahout RMSRecommenderEvaluator evaluate method sometimes results NaN? - machine-learning

Im trying to evaluate my recommendation system using the following code
RecommenderEvaluator rmsEvaluator = new RMSRecommenderEvaluator();
double score = rmsEvaluator.evaluate(recommenderBuilder, null, model, 0.95, 0.05);
System.out.println("RMS Evaluator Score: " + score);
Some time the score is NaN.
Why does Mahout RMSRecommenderEvaluator evaluate method results NaN?

Maybe your dataset is to small to have meaningful results.

Related

Generating local interpretations using Shap Kernel Explainer

I am trying to interpret my model using shap kernel explainer. The dataset is of shape (176683, 42). The explainer (xgbexplainer) is successfully modelled and when I use it to generate shap_values, it throws Memory Error.
import shap
xgb_explainer = shap.KernelExplainer(trained_model.steps[-1][-1].predict,X_for_shap.values)
shap_val = xgb_explainer.shap_values(X_for_shap.loc[0], nsamples=1)
First I used nsamples as default = 2*X_for_shap.shape[2] + 2048, it returned
MemoryError: Unable to allocate array with shape (2132, 7420686) and data type float64
When I set it to nsamples = 1, it runs for indefinite time. Please help me out to understand where I am doing wrong here
This is the screenshot of the error message

One thing that I dont understand about the kernelexplainer is why we need to impute the missing features with some strategies ( mean , median k means etc ) ? Why not just ignore them and fit a linear learner and compare it with the the model without observing that feature ? P( y| {S} U feature_i ) - P( y | { S } ) ? What kind of added value SHAP approach provides with having the whole features but some of them unknown ?

regression with stochastic gradient descent algorithm

I am studying regression with Machine Learning in Action book and I saw a source like below :
def stocGradAscent0(dataMatrix, classLabels):
m, n = np.shape(dataMatrix)
alpha = 0.01
weights = np.ones(n) #initialize to all ones
for i in range(m):
h = sigmoid(sum(dataMatrix[i]*weights))
error = classLabels[i] - h
weights = weights + alpha * error * dataMatrix[i]
return weights
You may guess what the code means. But I didn't understand it. I read the book several times and searched related stuff like wiki or google, where exponential function is from to get weights for minimum differences. And why do we get proper weight using the exponential function with sum of X*weights? It would be kind of OLS. Anyway then we get the result like below:
Thanks!

It just the basics in linear regression. In the for loop it tries to calculate the error function
Z = β₀ + β₁X ; where β₁ AND X are matrices
hΘ(x) = sigmoid(Z)
i.e. hΘ(x) = 1/(1 + e^-(β₀ + β₁X)
then update the weights. normally it's better to give it a high number for iterations in the for loop like 1000, m it would be small i guess.
i want to explain more but i can't explain better than this dude here
Happy learning!!

DL4J Prediction Formatting

I have two questions on deeplearning4j that are somewhat related.
When I execute “INDArray predicted = model.output(features,false);” to generate a prediction, I get the label predicted by the model; it is either 0 or 1. I tried to search for a way to have a probability (value between 0 and 1) instead of strictly 0 or 1. This is useful when you need to set a threshold for what your model should consider as a 0 and what it should consider as a 1. For example, you may want your model to output '1' for any prediction that is higher than or equal to 0.9 and output '0' otherwise.
My second question is that I am not sure why the output is represented as a two-dimensional array (shown after the code below) even though there are only two possibilities, so it would be better to represent it with one value - especially if we want it as a probability (question #1) which is one value.
PS: in case relevant to the question, in the Schema the output column is defined using ".addColumnInteger". Below are snippets of the code used.
Part of the code:
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.iterations(1)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.learningRate(learningRate)
.updater(org.deeplearning4j.nn.conf.Updater.NESTEROVS).momentum(0.9)
.list()
.layer(0, new DenseLayer.Builder()
.nIn(numInputs)
.nOut(numHiddenNodes)
.weightInit(WeightInit.XAVIER)
.activation("relu")
.build())
.layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
.weightInit(WeightInit.XAVIER)
.activation("softmax")
.weightInit(WeightInit.XAVIER)
.nIn(numHiddenNodes)
.nOut(numOutputs)
.build()
)
.pretrain(false).backprop(true).build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
model.setListeners(new ScoreIterationListener(10));
for (int n=0; n<nEpochs; n++) {
model.fit(trainIter);
}
Evaluation eval = new Evaluation(numOutputs);
while (testIter.hasNext()){
DataSet t = testIter.next();
INDArray features = t.getFeatureMatrix();
System.out.println("Input features: " + features);
INDArray labels = t.getLabels();
INDArray predicted = model.output(features,false);
System.out.println("Predicted output: "+ predicted);
System.out.println("Desired output: "+ labels);
eval.eval(labels, predicted);
System.out.println();
}
System.out.println(eval.stats());
Output from running the code above:
Input features: [0.10, 0.34, 1.00, 0.00, 1.00]
Predicted output: [1.00, 0.00]
Desired output: [1.00, 0.00]
*What I want the output to look like (i.e. a one-value probability):**
Input features: [0.10, 0.34, 1.00, 0.00, 1.00]
Predicted output: 0.14
Desired output: 0.0

I will answer your questions inline but I just want to note:
I would suggest taking a look at our docs and examples:
https://github.com/deeplearning4j/dl4j-examples
http://deeplearning4j.org/quickstart
A 100% 0 or 1 is just a badly tuned neural net. That's not at all how things work. A softmax by default returns probabilities. Your neural net is just badly tuned. Look at updating dl4j too. I'm not sure what version you're on but we haven't used strings in activations for at least a year now? You seem to have skipped a lot of steps when starting with us. I'll reiterate again, at least take a look above for a starting point rather than using year old code.
What you're seeing there is just standard deep learning 101. So the advice I'm about to give you can be found on the internet and is applicable for any deep learning software. A two label softmax sums each row to 1. If you want 1 label, use sigmoid with 1 output and a different loss function. We use softmax because it can work for any number of ouputs and all you have to do is change the number of outputs rather than having to change the loss function and activation function on top of that.

What are the best Pre-Processing techniques for Sentiment Analysis.?

I am trying to classify a dataset of reviews in to two classes say class A and class B. I am using LightGBM to classify.
I have changed the parameters for the classifier many times but I can't get a huge difference in the results.
I think the problem is with the pre-processing step. I defined a function as shown below to take care of pre-processing. I used Stemming and removed stopwords. I don't know what I am missing. I have tried LancasterStemmer and PorterStemmer
stops = set(stopwords.words("english"))
def cleanData(text, lowercase = False, remove_stops = False, stemming = False, lemm = False):
txt = str(text)
txt = re.sub(r'[^A-Za-z0-9\s]',r'',txt)
txt = re.sub(r'\n',r' ',txt)
if lowercase:
txt = " ".join([w.lower() for w in txt.split()])
if remove_stops:
txt = " ".join([w for w in txt.split() if w not in stops])
if stemming:
st = PorterStemmer()
txt = " ".join([st.stem(w) for w in txt.split()])
if lemm:
wordnet_lemmatizer = WordNetLemmatizer()
txt = " ".join([wordnet_lemmatizer.lemmatize(w) for w in txt.split()])
return txt
Are there any more pre-processing steps to be done to get a better accuracy.?
URL for the dataset : Dataset
EDIT :
Parameters that I used are as mentioned below.
params = {'task': 'train',
'boosting_type': 'gbdt',
'objective': 'binary',
'metric': 'binary_logloss',
'learning_rate': 0.01,
'max_depth': 22,
'num_leaves': 78,
'feature_fraction': 0.1,
'bagging_fraction': 0.4,
'bagging_freq': 1}
I have altered the depth and num_leaves parameters along with others. But the accuracy is kind of stuck at a certain level..

There are a few things to consider. First of all your training set is not balanced - the class distribution is ~ 70%/30%. You need to consider this fact in training. What types of features are you using? Using the right set of features could improve your performance.

What is the meaning of the GridSearchCV best_score_ attribute? (the value is different from the mean of the cross validation array)

I'm confused with the results, probably I'm not getting the concept of cross validation and GridSearch right. I had followed the logic behind this post:
https://randomforests.wordpress.com/2014/02/02/basics-of-k-fold-cross-validation-and-gridsearchcv-in-scikit-learn/
argd = CommandLineParser(argv)
folder,fname=argd['dir'],argd['fname']
df = pd.read_csv('../../'+folder+'/Results/'+fname, sep=";")
explanatory_variable_columns = set(df.columns.values)
response_variable_column = df['A']
explanatory_variable_columns.remove('A')
y = np.array([1 if e else 0 for e in response_variable_column])
X =df[list(explanatory_variable_columns)].as_matrix()
kf_total = KFold(len(X), n_folds=5, indices=True, shuffle=True, random_state=4)
dt=DecisionTreeClassifier(criterion='entropy')
min_samples_split_range=[x for x in range(1,20)]
dtgs=GridSearchCV(estimator=dt, param_grid=dict(min_samples_split=min_samples_split_range), n_jobs=1)
scores=[dtgs.fit(X[train],y[train]).score(X[test],y[test]) for train, test in kf_total]
# SAME AS DOING: cross_validation.cross_val_score(dtgs, X, y, cv=kf_total, n_jobs = 1)
print scores
print np.mean(scores)
print dtgs.best_score_
RESULTS OBTAINED:
# score [0.81818181818181823, 0.78181818181818186, 0.7592592592592593, 0.7592592592592593, 0.72222222222222221]
# mean score 0.768
# .best_score_ 0.683486238532
ADDITIONAL NOTE:
I ran it using another combination of the explanatory variables (using only some of them) and I got the inverse problem. Now the .best_score_ is higher than all the values in the cross validation array.
# score [0.74545454545454548, 0.70909090909090911, 0.79629629629629628, 0.7407407407407407, 0.64814814814814814]
# mean score 0.728
# .best_score_ 0.802752293578

The code is confusing several things.
dtgs.fit(X[train_],y[train_]) does internal 3-fold cross-validation for every parameter combination from param_grid, producing a grid of 20 results, which you can open by calling dtgs.grid_scores_.
[dtgs.fit(X[train_],y[train_]).score(X[test],y[test]) for train_, test in kf_total] Therefore this line fits grid search five times and then takes its score using 5-Fold cross validation. The result is the array of scores of 5-Fold validation.
And when you call dtgs.best_score_ you get the best score in the grid of the results of 3-fold validation of hyperparameters for the last fit (of 5).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Why does Mahout RMSRecommenderEvaluator evaluate method sometimes results NaN? - machine-learning

Maybe your dataset is to small to have meaningful results.

Related

Generating local interpretations using Shap Kernel Explainer

regression with stochastic gradient descent algorithm

DL4J Prediction Formatting

What are the best Pre-Processing techniques for Sentiment Analysis.?

What is the meaning of the GridSearchCV best_score_ attribute? (the value is different from the mean of the cross validation array)

Categories

Resources