No Recommendation made on small Dataset, despite best Pearson Corellation similarity - mahout

I am facing a small problem while running Recommender engine in Mahout:
The data set on which I am working is given below:
1,101,5.0
1,102,4.0
1,103,4.0
1,107,5.0
1,108,3.0
2,101,3.0
2,102,4.0
2,104,4.0
2,105,4.0
3,101,5.0
3,102,4.0
When I calculate the Pearson similarity between 1 and 3 I get a value 0.99999998 approx 1.0
Which is best similarity, So according to recommendation rule. The output for recommendation to User_ID 3 should be Item_ID 107
But my output gives no recommendation.
Below is my code:
public static void main( String[] args ) throws Exception
{
///////////////////////Data Model//////////////////////////////////////
DataModel model = new FileDataModel(new File("data/dataset_2.csv"));
System.out.println(model.getMaxPreference());
///////////////////Similarity between Users////////////////////////////
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
System.out.println("Pearson distance "+similarity.userSimilarity(3, 1));
//////////////////The Neighbors who satisfy the threshold level//////////
UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.1, similarity, model);
///////////////////Recommender recomending the best/////////////////////////
UserBasedRecommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);
List <RecommendedItem> recommendations = recommender.recommend(3, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
I would appreciate If anybody could point out the mistake if any or If my understanding on Mahout pearson corellation is wrong.

PearsonCorrelationSimilarity does not work well with small and less similar dataset. You can change similarity method or neighbourhood size. When you increase dataset size, you will get good result.
In addition, you can increase recommendation size (for recommend function howMany).

Related

regression with stochastic gradient descent algorithm

I am studying regression with Machine Learning in Action book and I saw a source like below :
def stocGradAscent0(dataMatrix, classLabels):
m, n = np.shape(dataMatrix)
alpha = 0.01
weights = np.ones(n) #initialize to all ones
for i in range(m):
h = sigmoid(sum(dataMatrix[i]*weights))
error = classLabels[i] - h
weights = weights + alpha * error * dataMatrix[i]
return weights
You may guess what the code means. But I didn't understand it. I read the book several times and searched related stuff like wiki or google, where exponential function is from to get weights for minimum differences. And why do we get proper weight using the exponential function with sum of X*weights? It would be kind of OLS. Anyway then we get the result like below:
Thanks!
It just the basics in linear regression. In the for loop it tries to calculate the error function
Z = β₀ + β₁X ; where β₁ AND X are matrices
hΘ(x) = sigmoid(Z)
i.e. hΘ(x) = 1/(1 + e^-(β₀ + β₁X)
then update the weights. normally it's better to give it a high number for iterations in the for loop like 1000, m it would be small i guess.
i want to explain more but i can't explain better than this dude here
Happy learning!!

DL4J Prediction Formatting

I have two questions on deeplearning4j that are somewhat related.
When I execute “INDArray predicted = model.output(features,false);” to generate a prediction, I get the label predicted by the model; it is either 0 or 1. I tried to search for a way to have a probability (value between 0 and 1) instead of strictly 0 or 1. This is useful when you need to set a threshold for what your model should consider as a 0 and what it should consider as a 1. For example, you may want your model to output '1' for any prediction that is higher than or equal to 0.9 and output '0' otherwise.
My second question is that I am not sure why the output is represented as a two-dimensional array (shown after the code below) even though there are only two possibilities, so it would be better to represent it with one value - especially if we want it as a probability (question #1) which is one value.
PS: in case relevant to the question, in the Schema the output column is defined using ".addColumnInteger". Below are snippets of the code used.
Part of the code:
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.iterations(1)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.learningRate(learningRate)
.updater(org.deeplearning4j.nn.conf.Updater.NESTEROVS).momentum(0.9)
.list()
.layer(0, new DenseLayer.Builder()
.nIn(numInputs)
.nOut(numHiddenNodes)
.weightInit(WeightInit.XAVIER)
.activation("relu")
.build())
.layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
.weightInit(WeightInit.XAVIER)
.activation("softmax")
.weightInit(WeightInit.XAVIER)
.nIn(numHiddenNodes)
.nOut(numOutputs)
.build()
)
.pretrain(false).backprop(true).build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
model.setListeners(new ScoreIterationListener(10));
for (int n=0; n<nEpochs; n++) {
model.fit(trainIter);
}
Evaluation eval = new Evaluation(numOutputs);
while (testIter.hasNext()){
DataSet t = testIter.next();
INDArray features = t.getFeatureMatrix();
System.out.println("Input features: " + features);
INDArray labels = t.getLabels();
INDArray predicted = model.output(features,false);
System.out.println("Predicted output: "+ predicted);
System.out.println("Desired output: "+ labels);
eval.eval(labels, predicted);
System.out.println();
}
System.out.println(eval.stats());
Output from running the code above:
Input features: [0.10, 0.34, 1.00, 0.00, 1.00]
Predicted output: [1.00, 0.00]
Desired output: [1.00, 0.00]
*What I want the output to look like (i.e. a one-value probability):**
Input features: [0.10, 0.34, 1.00, 0.00, 1.00]
Predicted output: 0.14
Desired output: 0.0
I will answer your questions inline but I just want to note:
I would suggest taking a look at our docs and examples:
https://github.com/deeplearning4j/dl4j-examples
http://deeplearning4j.org/quickstart
A 100% 0 or 1 is just a badly tuned neural net. That's not at all how things work. A softmax by default returns probabilities. Your neural net is just badly tuned. Look at updating dl4j too. I'm not sure what version you're on but we haven't used strings in activations for at least a year now? You seem to have skipped a lot of steps when starting with us. I'll reiterate again, at least take a look above for a starting point rather than using year old code.
What you're seeing there is just standard deep learning 101. So the advice I'm about to give you can be found on the internet and is applicable for any deep learning software. A two label softmax sums each row to 1. If you want 1 label, use sigmoid with 1 output and a different loss function. We use softmax because it can work for any number of ouputs and all you have to do is change the number of outputs rather than having to change the loss function and activation function on top of that.

RBM for collaborative filtering

My algorithm RBM for collaborative filtering will not converge...
The idea of what I think RBM for collaborative filtering is
initial w , b , c and random at [0,1]
For By User
clamp data -> visible (softmax)
Hidden = sigmoid(b+W*V)
Run Gibbs on Hidden -> Hidden_gibbs
Positive = Hidden*Visible
Hidden -> reconstruct -> reconstruct_visible
Run Gibbs on reconstruct_visible -> reconstruct_visible_gibbs
negative = Hidden_gibbs*reconstruct_visible_gibbs
End for
Update
w = w + (positive-negative)/Number_User
b = b + (visible - reconstruct_visible_gibbs)/Number_User
c = c + (Hidden - Hidden_gibbs)/Number_User
I have seen lots of paper or lecture, and have no idea where is wrong
This is not an easy problem! Your description of the learning procedure looks fine. But, there's a lot of room for mistakes from the description to the actual code. Also, for CF, "vanilla" RBM won't work.
How did you implemented the visible "softmax" units?
Did you train your RBM with a "single-user" dataset, as recommended in the original work[1]?
There are 2 more details about weight updates and prediction procedure that are slightly different from vanilla's RBM
[1] Salakhutdinov http://www.machinelearning.org/proceedings/icml2007/papers/407.pdf

Finding the probability with which an instance in classified in Weka

I am using Weka for classification using LibSVM classifier, and wanted some help related to the outputs that I get from the evaluation model.
In the below example, my test.arff file contains 1000 instances, and I want to know the probability with which each instance is classified as yes/ no (It's a simple two class problem).
For instance, for instance 1, if it is classified as 'yes', then with what probability is it classified so, is something which I am looking for.
Below is the code snippet that I have currently:
// Read and load the Training ARFF file
ArffLoader trainArffLoader = new ArffLoader();
trainArffLoader.setFile(new File("train_clusters.arff"));
Instances train = trainArffLoader.getDataSet();
train.setClassIndex(train.numAttributes() - 1);
System.out.println("Loaded Train File");
// Read and load the Test ARFF file
ArffLoader testArffLoader = new ArffLoader();
testArffLoader.setFile(new File("test_clusters.arff"));
Instances test = testArffLoader.getDataSet();
test.setClassIndex(test.numAttributes() - 1);
System.out.println("Loaded Test File");
LibSVM libsvm = new LibSVM();
libsvm.buildClassifier(train);
// Evaluation
Evaluation evaluation = new Evaluation(train);
evaluation.evaluateModel(libsvm, test);
System.out.println(evaluation.toSummaryString("\nPrinting the Results\n=====================\n", true));
System.out.println(evaluation.toClassDetailsString());
You should use libsvm.distributionForInstance method. It returns probability estimate for each class index (for 2 in your cases).
For example, to print all estimates for each instance from test set use something like this:
for (Instance instance : test) {
double[] distribution = libsvm.distributionForInstance(instance);
for (int classIndex : classIndices) {
System.out.print(distribution[classIndex] + " ");
}
System.out.println();
}
Note that it is not true probability, but estimations made by Platt's method (see the question).

How to do multi class classification using Support Vector Machines (SVM)

In every book and example always they show only binary classification (two classes) and new vector can belong to any one class.
Here the problem is I have 4 classes(c1, c2, c3, c4). I've training data for 4 classes.
For new vector the output should be like
C1 80% (the winner)
c2 10%
c3 6%
c4 4%
How to do this? I'm planning to use libsvm (because it most popular). I don't know much about it. If any of you guys used it previously please tell me specific commands I'm supposed to use.
LibSVM uses the one-against-one approach for multi-class learning problems. From the FAQ:
Q: What method does libsvm use for multi-class SVM ? Why don't you use the "1-against-the rest" method ?
It is one-against-one. We chose it after doing the following comparison: C.-W. Hsu and C.-J. Lin. A comparison of methods for multi-class support vector machines, IEEE Transactions on Neural Networks, 13(2002), 415-425.
"1-against-the rest" is a good method whose performance is comparable to "1-against-1." We do the latter simply because its training time is shorter.
Commonly used methods are One vs. Rest and One vs. One.
In the first method you get n classifiers and the resulting class will have the highest score.
In the second method the resulting class is obtained by majority votes of all classifiers.
AFAIR, libsvm supports both strategies of multiclass classification.
You can always reduce a multi-class classification problem to a binary problem by choosing random partititions of the set of classes, recursively. This is not necessarily any less effective or efficient than learning all at once, since the sub-learning problems require less examples since the partitioning problem is smaller. (It may require at most a constant order time more, e.g. twice as long). It may also lead to more accurate learning.
I'm not necessarily recommending this, but it is one answer to your question, and is a general technique that can be applied to any binary learning algorithm.
Use the SVM Multiclass library. Find it at the SVM page by Thorsten Joachims
It does not have a specific switch (command) for multi-class prediction. it automatically handles multi-class prediction if your training dataset contains more than two classes.
Nothing special compared with binary prediction. see the following example for 3-class prediction based on SVM.
install.packages("e1071")
library("e1071")
data(iris)
attach(iris)
## classification mode
# default with factor response:
model <- svm(Species ~ ., data = iris)
# alternatively the traditional interface:
x <- subset(iris, select = -Species)
y <- Species
model <- svm(x, y)
print(model)
summary(model)
# test with train data
pred <- predict(model, x)
# (same as:)
pred <- fitted(model)
# Check accuracy:
table(pred, y)
# compute decision values and probabilities:
pred <- predict(model, x, decision.values = TRUE)
attr(pred, "decision.values")[1:4,]
# visualize (classes by color, SV by crosses):
plot(cmdscale(dist(iris[,-5])),
col = as.integer(iris[,5]),
pch = c("o","+")[1:150 %in% model$index + 1])
data=load('E:\dataset\scene_categories\all_dataset.mat');
meas = data.all_dataset;
species = data.dataset_label;
[g gn] = grp2idx(species); %# nominal class to numeric
%# split training/testing sets
[trainIdx testIdx] = crossvalind('HoldOut', species, 1/10);
%# 1-vs-1 pairwise models
num_labels = length(gn);
clear gn;
num_classifiers = num_labels*(num_labels-1)/2;
pairwise = zeros(num_classifiers ,2);
row_end = 0;
for i=1:num_labels - 1
row_start = row_end + 1;
row_end = row_start + num_labels - i -1;
pairwise(row_start : row_end, 1) = i;
count = 0;
for j = i+1 : num_labels
pairwise( row_start + count , 2) = j;
count = count + 1;
end
end
clear row_start row_end count i j num_labels num_classifiers;
svmModel = cell(size(pairwise,1),1); %# store binary-classifers
predTest = zeros(sum(testIdx),numel(svmModel)); %# store binary predictions
%# classify using one-against-one approach, SVM with 3rd degree poly kernel
for k=1:numel(svmModel)
%# get only training instances belonging to this pair
idx = trainIdx & any( bsxfun(#eq, g, pairwise(k,:)) , 2 );
%# train
svmModel{k} = svmtrain(meas(idx,:), g(idx), ...
'Autoscale',true, 'Showplot',false, 'Method','QP', ...
'BoxConstraint',2e-1, 'Kernel_Function','rbf', 'RBF_Sigma',1);
%# test
predTest(:,k) = svmclassify(svmModel{k}, meas(testIdx,:));
end
pred = mode(predTest,2); %# voting: clasify as the class receiving most votes
%# performance
cmat = confusionmat(g(testIdx),pred);
acc = 100*sum(diag(cmat))./sum(cmat(:));
fprintf('SVM (1-against-1):\naccuracy = %.2f%%\n', acc);
fprintf('Confusion Matrix:\n'), disp(cmat)
For multi class classification using SVM;
It is NOT (one vs one) and NOT (one vs REST).
Instead learn a two-class classifier where the feature vector is (x, y) where x is data and y is the correct label associated with the data.
The training gap is the Difference between the value for the correct class and the value of the nearest other class.
At Inference choose the "y" that has the maximum
value of (x,y).
y = arg_max(y') W.(x,y') [W is the weight vector and (x,y) is the feature Vector]
Please Visit link:
https://nlp.stanford.edu/IR-book/html/htmledition/multiclass-svms-1.html#:~:text=It%20is%20also%20a%20simple,the%20label%20of%20structural%20SVMs%20.

Resources