MLP not training XOR correctly - machine-learning

I'm new to neural networks. I've been trying to implement a two layer network to learn the XOR function using the backpropagation algorithm. The hidden layer has 2 units and the output layer is having 1 unit. All units use the sigmoid activation function.
I'm initializing the weights between -1 to +1 and have a +1 bias.
The problem is that the network learns the function very small number of times when re-initialized from scratch using some other random values of weights. It's learning the other boolean functions (AND, OR) in very small number of iterations, almost every time for almost all randomly assigned weights. The problem is with the XOR function - it doesn't converge to optimal values for some values of random weights.
I'm using stochastic gradient descent, backpropagation for learning.
My code : http://ideone.com/IYPW2N
Hidden layer's compute output function code:
public double computeOutput(double[] input){
this.input = input;
output = bias+w[0]*input[0] + w[1]*input[1];
output = 1/(1+Math.pow(Math.E, -output));
return output;
}
Compute Error Function code:
public double computeError(double w, double outputUnitError){
error = (output)*(1-output)*(outputUnitError*w);
return error;
}
Fixing errors for hidden units :
public void fixError(){
for(int i=0;i<input.length;++i) w[i] += n*error*input[i];
}
Output unit's compute output function:
public void computeOutput(double[] input) {
this.input = input;
output = bias+input[0]*w[0]+input[1]*w[1];
output = 1/(1+Math.pow(Math.E, -output));
}
Output unit's computeError function:
public void computeError(double t){
this.t = t;
error = (output)*(1-output)*(t-output);
}
Output Unit's fixError function (updating the weights):
public void fixError() {
for(int i=0;i<w.length;++i) w[i] += n*error*input[i];
}
The code stops training as soon as in any iteration, all examples are classified correctly. It stops otherwise when number of iterations exceeds 90k.
The learning rate is set to 0.05. If the output unit's value is greater than 0.5, it's counted as +1, otherwise a 0.
Training Examples:
static Example[] examples = {
new Example(new double[]{0, 0}, 0),
new Example(new double[]{0, 1}, 1),
new Example(new double[]{1, 0}, 1),
new Example(new double[]{1, 1}, 0)
};
Output from code:
Iterations > 90000, stop...
Displaying outputs for all examples...
0.018861254512881773
0.7270271284494716
0.5007550527204925
0.5024353957353963
Training complete. No of iterations = 45076
Displaying outputs for all examples...
0.3944511789979849
0.5033004761575361
0.5008283246200929
0.2865272493546562
Training complete. No of iterations = 39707
Displaying outputs for all examples...
0.39455754434259843
0.5008762488126696
0.5029579167912538
0.28715696580224176
Iterations > 90000, stop...
Displaying outputs for all examples...
0.43116164638530535
0.32096730276984053
0.9758219334403757
0.32228953888593287
I've tried several values for the learning rate and increased the number of hidden layer units, but still it's not learning XOR.
Please correct me where I'm getting it wrong or if there's a bug in the implementation.
I checked out other threads but did not get any satisfactory solution to my problem.

You are supposed to learn bias too, while your code assumes that bias is constant (you should have weight connected to bias). Without bias, you will not be able to learn XOR.

Related

Tensorflow LSTM PTB Example - Understanding forward and backward pass

Right now I am going through the tensorflow example on LSTMs where they use the PTB dataset to create an LSTM network capable of predicting the next word. I've spent a lot of time trying to understand the code, and have a good understanding for most of it however there is one function which I don't fully grasp:
def run_epoch(session, model, eval_op=None, verbose=False):
"""Runs the model on the given data."""
costs = 0.0
iters = 0
state = session.run(model.initial_state)
fetches = {
"cost": model.cost,
"final_state": model.final_state,
}
if eval_op is not None:
fetches["eval_op"] = eval_op
for step in range(model.input.epoch_size):
feed_dict = {}
for i, (c, h) in enumerate(model.initial_state):
feed_dict[c] = state[i].c
feed_dict[h] = state[i].h
vals = session.run(fetches, feed_dict)
cost = vals["cost"]
state = vals["final_state"]
costs += cost
iters += model.input.num_steps
return np.exp(costs / iters)
My confusion is this: each time through the outerloop I believe we have processed batch_size * num_steps numbers of words, done the forward propagation and done the backward propagation. But, how in the next iteration, for example, do we know to start with the 36th word of each batch if num_steps = 35? I suspect it is some change in an attribute of the class model on each iteration but I cannot figure that out. Thanks for your help.

Real FFT output

I have implemented fft into at32ucb series ucontroller using kiss fft library and currently struggling with the output of the fft.
My intention is to analyse sound coming from piezo speaker.
Currently, the frequency of the sounder is 420Hz which I successfully got from the fft output (cross checked with an oscilloscope). However, the output frequency is just half of expected if I put function generator waveform into the system.
I suspect its the frequency bin calculation formula which I got wrong; currently using, fft_peak_magnitude_index*sampling frequency / fft_size.
My input is real and doing real fft. (output samples = N/2)
And also doing iir filtering and windowing before fft.
Any suggestion would be a great help!
// IIR filter calculation, n = 256 fft points
for (ctr=0; ctr<n; ctr++)
{
// filter calculation
y[ctr] = num_coef[0]*x[ctr];
y[ctr] += (num_coef[1]*x[ctr-1]) - (den_coef[1]*y[ctr-1]);
y[ctr] += (num_coef[2]*x[ctr-2]) - (den_coef[2]*y[ctr-2]);
y1[ctr] = y[ctr] - 510; //eliminate dc offset
// hamming window
hamming[ctr] = (0.54-((0.46) * cos(2*M_PI*ctr/n)));
window[ctr] = hamming[ctr]*y1[ctr];
fft_input[ctr].r = window[ctr];
fft_input[ctr].i = 0;
fft_output[ctr].r = 0;
fft_output[ctr].i = 0;
}
kiss_fftr_cfg fftConfig = kiss_fftr_alloc(n,0,NULL,NULL);
kiss_fftr(fftConfig, (kiss_fft_scalar * )fft_input, fft_output);
peak = 0;
freq_bin = 0;
for (ctr=0; ctr<n1; ctr++)
{
fft_mag[ctr] = 10*(sqrt((fft_output[ctr].r * fft_output[ctr].r) + (fft_output[ctr].i * fft_output[ctr].i)))/(0.5*n);
if(fft_mag[ctr] > peak)
{
peak = fft_mag[ctr];
freq_bin = ctr;
}
frequency = (freq_bin*(10989/n)); // 10989 is the sampling freq
//************************************
//Usart write
char filtResult[10];
//sprintf(filtResult, "%04d %04d %04d\n", (int)peak, (int)freq_bin, (int)frequency);
sprintf(filtResult, "%04d %04d %04d\n", (int)x[ctr], (int)fft_mag[ctr], (int)frequency);
char c;
char *ptr = &filtResult[0];
do
{
c = *ptr;
ptr++;
usart_bw_write_char(&AVR32_USART2, (int)c);
// sendByte(c);
} while (c != '\n');
}
The main problem is likely to be how you declared fft_input.
Based on your previous question, you are allocating fft_input as an array of kiss_fft_cpx. The function kiss_fftr on the other hand expect an array of scalar. By casting the input array into a kiss_fft_scalar with:
kiss_fftr(fftConfig, (kiss_fft_scalar * )fft_input, fft_output);
KissFFT essentially sees an array of real-valued data which contains zeros every second sample (what you filled in as imaginary parts). This is effectively an upsampled version (although without interpolation) of your original signal, i.e. a signal with effectively twice the sampling rate (which is not accounted for in your freq_bin to frequency conversion). To fix this, I suggest you pack your data into a kiss_fft_scalar array:
kiss_fft_scalar fft_input[n];
...
for (ctr=0; ctr<n; ctr++)
{
...
fft_input[ctr] = window[ctr];
...
}
kiss_fftr_cfg fftConfig = kiss_fftr_alloc(n,0,NULL,NULL);
kiss_fftr(fftConfig, fft_input, fft_output);
Note also that while looking for the peak magnitude, you probably are only interested in the final largest peak, instead of the running maximum. As such, you could limit the loop to only computing the peak (using freq_bin instead of ctr as an array index in the following sprintf statements if needed):
for (ctr=0; ctr<n1; ctr++)
{
fft_mag[ctr] = 10*(sqrt((fft_output[ctr].r * fft_output[ctr].r) + (fft_output[ctr].i * fft_output[ctr].i)))/(0.5*n);
if(fft_mag[ctr] > peak)
{
peak = fft_mag[ctr];
freq_bin = ctr;
}
} // close the loop here before computing "frequency"
Finally, when computing the frequency associated with the bin with the largest magnitude, you need the ensure the computation is done using floating point arithmetic. If as I suspect n is an integer, your formula would be performing the 10989/n factor using integer arithmetic resulting in truncation. This can be simply remedied with:
frequency = (freq_bin*(10989.0/n)); // 10989 is the sampling freq

How to get the decision function from svm_model

Say I have a feature vector [v1,v2,v3],
then I have a decision function a*v1+b*v2+c*v3 =d
how do I get the values (a,b,c,d) using the inforrmation in svm_model?
I saw that these two fields in svm_model
public double[][] sv_coef;// coefficients for SVs in decision functions (sv_coef[k-1][l])
public double[] rho;// constants in decision functions (rho[k*(k-1)/2])
I suspect it could be essential for getting the decision function.
There is also a SVs field in svm_model. Your decision function is wv+b=0, where v = [v1,v2,v3]. Then,
w = SVs' * msv_coef;
b = -.rho;
For multi-class SVM, you may also need another field called Label
if Label(1) == -1
w = -w;
b = -b;
end
Check the FAQ part for more details.

WEKA classification likelihood of the classes

I would like to know if there is a way in WEKA to output a number of 'best-guesses' for a classification.
My scenario is: I classify the data with cross-validation for instance, then on weka's output I get something like: these are the 3 best-guesses for the classification of this instance. What I want is like, even if an instance isn't correctly classified i get an output of the 3 or 5 best-guesses for that instance.
Example:
Classes: A,B,C,D,E
Instances: 1...10
And output would be:
instance 1 is 90% likely to be class A, 75% likely to be class B, 60% like to be class C..
Thanks.
Weka's API has a method called Classifier.distributionForInstance() tha can be used to get the classification prediction distribution. You can then sort the distribution by decreasing probability to get your top-N predictions.
Below is a function that prints out: (1) the test instance's ground truth label; (2) the predicted label from classifyInstance(); and (3) the prediction distribution from distributionForInstance(). I have used this with J48, but it should work with other classifiers.
The inputs parameters are the serialized model file (which you can create during the model training phase and applying the -d option) and the test file in ARFF format.
public void test(String modelFileSerialized, String testFileARFF)
throws Exception
{
// Deserialize the classifier.
Classifier classifier =
(Classifier) weka.core.SerializationHelper.read(
modelFileSerialized);
// Load the test instances.
Instances testInstances = DataSource.read(testFileARFF);
// Mark the last attribute in each instance as the true class.
testInstances.setClassIndex(testInstances.numAttributes()-1);
int numTestInstances = testInstances.numInstances();
System.out.printf("There are %d test instances\n", numTestInstances);
// Loop over each test instance.
for (int i = 0; i < numTestInstances; i++)
{
// Get the true class label from the instance's own classIndex.
String trueClassLabel =
testInstances.instance(i).toString(testInstances.classIndex());
// Make the prediction here.
double predictionIndex =
classifier.classifyInstance(testInstances.instance(i));
// Get the predicted class label from the predictionIndex.
String predictedClassLabel =
testInstances.classAttribute().value((int) predictionIndex);
// Get the prediction probability distribution.
double[] predictionDistribution =
classifier.distributionForInstance(testInstances.instance(i));
// Print out the true label, predicted label, and the distribution.
System.out.printf("%5d: true=%-10s, predicted=%-10s, distribution=",
i, trueClassLabel, predictedClassLabel);
// Loop over all the prediction labels in the distribution.
for (int predictionDistributionIndex = 0;
predictionDistributionIndex < predictionDistribution.length;
predictionDistributionIndex++)
{
// Get this distribution index's class label.
String predictionDistributionIndexAsClassLabel =
testInstances.classAttribute().value(
predictionDistributionIndex);
// Get the probability.
double predictionProbability =
predictionDistribution[predictionDistributionIndex];
System.out.printf("[%10s : %6.3f]",
predictionDistributionIndexAsClassLabel,
predictionProbability );
}
o.printf("\n");
}
}
I don't know if you can do it natively, but you can just get the probabilities for each class, sorted them and take the first three.
The function you want is distributionForInstance(Instance instance) which returns a double[] giving the probability for each class.
Not in general. The information you want is not available with all classifiers -- in most cases (for example for decision trees), the decision is clear (albeit potentially incorrect) without a confidence value. Your task requires classifiers that can handle uncertainty (such as the naive Bayes classifier).
Technically the easiest thing to do is probably to train the model and then classify an individual instance, for which Weka should give you the desired output. In general you can of course also do it for sets of instances, but I don't think that Weka provides this out of the box. You would probably have to customise the code or use it through an API (for example in R).
when you calculate a probability for the instance, how exactly do you do this?
I have posted my PART rules and data for the new instance here but as far as calculation manually I am not so sure how to do this! Thanks
EDIT: now calculated:
private float[] getProbDist(String split){
// takes in something such as (52/2) meaning 52 instances correctly classified and 2 incorrectly classified.
if(prob_dis.length > 2)
return null;
if(prob_dis.length == 1){
String temp = prob_dis[0];
prob_dis = new String[2];
prob_dis[0] = "1";
prob_dis[1] = temp;
}
float p1 = new Float(prob_dis[0]);
float p2 = new Float(prob_dis[1]);
// assumes two tags
float[] tag_prob = new float[2];
tag_prob[1] = 1 - tag_prob[1];
tag_prob[0] = (float)p2/p1;
// returns double[] as being the probabilities
return tag_prob;
}

Log likelihood to implement Naive Bayes for Text Classification

I am implementing Naive Bayes algorithm for text classification. I have ~1000 documents for training and 400 documents for testing. I think I've implemented training part correctly, but I am confused in testing part. Here is what I've done briefly:
In my training function:
vocabularySize= GetUniqueTermsInCollection();//get all unique terms in the entire collection
spamModelArray[vocabularySize];
nonspamModelArray[vocabularySize];
for each training_file{
class = GetClassLabel(); // 0 for spam or 1 = non-spam
document = GetDocumentID();
counterTotalTrainingDocs ++;
if(class == 0){
counterTotalSpamTrainingDocs++;
}
for each term in document{
freq = GetTermFrequency; // how many times this term appears in this document?
id = GetTermID; // unique id of the term
if(class = 0){ //SPAM
spamModelArray[id]+= freq;
totalNumberofSpamWords++; // total number of terms marked as spam in the training docs
}else{ // NON-SPAM
nonspamModelArray[id]+= freq;
totalNumberofNonSpamWords++; // total number of terms marked as non-spam in the training docs
}
}//for
for i in vocabularySize{
spamModelArray[i] = spamModelArray[i]/totalNumberofSpamWords;
nonspamModelArray[i] = nonspamModelArray[i]/totalNumberofNonSpamWords;
}//for
priorProb = counterTotalSpamTrainingDocs/counterTotalTrainingDocs;// calculate prior probability of the spam documents
}
I think I understood and implemented training part correctly, but I am not sure I could implemented testing part properly. In here, I am trying to go through each test document and I calculate logP(spam|d) and logP(non-spam|d) for each document. Then I compare these two quantities in order to determine the class (spam/non-spam).
In my testing function:
vocabularySize= GetUniqueTermsInCollection;//get all unique terms in the entire collection
for each testing_file:
document = getDocumentID;
logProbabilityofSpam = 0;
logProbabilityofNonSpam = 0;
for each term in document{
freq = GetTermFrequency; // how many times this term appears in this document?
id = GetTermID; // unique id of the term
// logP(w1w2.. wn) = C(wj)∗logP(wj)
logProbabilityofSpam+= freq*log(spamModelArray[id]);
logProbabilityofNonSpam+= freq*log(nonspamModelArray[id]);
}//for
// Now I am calculating the probability of being spam for this document
if (logProbabilityofNonSpam + log(1-priorProb) > logProbabilityofSpam +log(priorProb)) { // argmax[logP(i|ck) + logP(ck)]
newclass = 1; //not spam
}else{
newclass = 0; // spam
}
}//for
My problem is; I want to return the probability of each class instead of exact 1's and 0's (spam/non-spam). I want to see e.g. newclass = 0.8684212 so I can apply threshold later on. But I am confused here. How can I calculate the probability for each document? Can I use logProbabilities to calculate it?
The probability of the data described by a set of features {F1, F2, ..., Fn} belonging in class C, according to the naïve Bayes probability model, is
P(C|F) = P(C) * (P(F1|C) * P(F2|C) * ... * P(Fn|C)) / P(F1, ..., Fn)
You have all the terms (in logarithmic form), except for the 1 / P( F1, ..., Fn) term since that's not used in the naïve Bayes classifier that you're implementing. (Strictly, the MAP classifier.)
You'd have to collect frequencies of the features as well, and from them calculate
P(F1, ..., Fn) = P(F1) * ... * P(Fn)

Resources