how to fix nomalization problem or overfitting - machine-learning

i tried to normalize my data sets column with this code , but the results on the column in (daddr)was not in 0 , 1 range enter image description here
and also the results in loss apear like following enter image description here
this is the code i used enter image description here
please tell me what is the missing thing to solve the (loss ) problem , how i could do the MinMax Normalization on all data sets column , is the problem overfitting or what ?

Normalizing the data is not always necessary. It depends with the model you use. Most of the time Normalizing is necessary when working with sigmoid or tanh function in your model. Do you really need to normalize the data ? Try without Normalizing.

Related

Is reverse prediction possible in machine learning?

Is it possible to predict input data by providing output label on a trained model?
The question doesn't really make sense.
We use the terms input and output to describe the data w.r.t. the problem we are trying to solve.
For example, given a latitude on the earth's surface, it's possible to create a model to try to predict the average temperature in August. Here the input is the latitude and the output is the avg temperature. So, it is of course possible, to try to reverse this and instead frame the problem so as to try to train a model to predict the latitude of a place given it's avg temp in august. BUT in doing so, you have, by definition, changed the input and output around. Latitude is now the output and temp the input.
Input 5+2 gives Output 7
Input 6+1 gives Output 7
Input 4+3 gives Output 7.
As you see in the example above it is easy given two numbers and an operation to get a number uniquely. However starting from number 7 can you uniquely predict which were the two numbers that gave 7?
You may run into these sort of problems. However if it is one to one and onto function (bijective), then you might be able to reverse it. Off course you will have to reverse the labels of input and output.

Get label prediction from Cifar-10 model

I'm currently working on the Cifar-10 tutorial of tensorflow. I'd like to change the evaluation such that I can see for each image what the prediction of my model was, and whether it was true/false. I struggle with the first part: if I print the predictions (sess.run([top_k_op])) I get true/false values which I assume are whether the prediction was correct or not. However, if I try to print the actual prediction (I tried so far to print the logits, and print the top_k_op tensor), I get some numbers or values, but nothing that looks like the labels. What do I have to change about my code to actually see the labels that my model predicted?
You want to evaluate first the logits. This is a probability distribution over your classes out of your network. The index of the tensor with the higher value will give you the most likely class for your label.
you can use tf.argmax to get the index and then use the index in your labels to print it out
print labels[index]
You can figure out an answer by looking here
In svhn.py, at line 116 the predicted label is printed: print (step, int(test_labels[0]))
I did it in a clear way by using:
classification = sess.run(top_k_predict_op)
print (step, int(test_labels[0]))
print "network predicted:", classification[0], "for real label:", test_labels
Be sure that you are predicting on 24*24 images, in case you trained your model with the original version of the TensorFlow CIFAR-10 model.

what's meaning of function predict's returned value in OpenCV?

I use function predict in opencv to classify my gestures.
svm.load("train.xml");
float ret = svm.predict(mat);//mat is my feature vector
I defined 5 labels (1.0,2.0,3.0,4.0,5.0), but in fact the value of ret are (0.521220207,-0.247173533,-0.127723947······)
So I am confused about it. As Opencv official document, the function returns a class label (classification) in my case.
update: I don't still know why to appear this result. But I choose new features to train models and the return value of predict function is what I defined during train phase (e.g. 1 or 2 or 3 or etc).
During the training of an SVM you assign a label to each class of training data.
When you classify a sample the returned result will match up with one of these labels telling you which class the sample is predicted to fall into.
There's some more documentation here which might help:
http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html
With Support Vector Machines (SVM) you have a training function and a prediction one. The training function is to train your data and save those informations on an xml file (it facilitates the prediction process in case you use a huge number of training data and you must do the prediction function in another project).
Example : 20 images per class in your case : 20*5=100 training images,each image is associated with a label of its appropriate class and all these informations are stocked in train.xml)
For the prediction function , it tells you what's label to assign to your test image according to your training DATA (the hole work you did in training process). Your prediction results might be good and might be bad , it's all about your training data I think.
If you want try to calculate the error rate for your classifier to see how much it can give good results or bad ones.

Missing data & single imputation

I have a complete ozone data set which consist a few missing values. I would like to use SPSS to do single imputation to impute my data.
Before I start impute my data, I would like to do randomly simulate missing data patterns with 5%, 10%, 15%, 25% and 40% of the data missing in order to evaluating the accuracy of imputation methods.
Can someone please teach me how to do the randomly missing data pattern by using SPSS?
Besides that can someone please tell me how to obtain the performance indicator such as: mean absolute error, coefficient of determination and root mean square error in order to check the best method for estimating missing values.
Unfortunately, my current SPSS supports no missing data analysis, so I can only give some general advice.
First: For your missing data pattern: Simply go to Data -> Select cases -> Random Sample and delete the desired amount of cases and then run the Imputation.
The values you mentioned should be provided by spss if you use their imputation module. There is a manual:
ftp://public.dhe.ibm.com/software/analytics/spss/documentation/statistics/20.0/de/client/Manuals/IBM_SPSS_Missing_Values.pdf
The answer to your first question. Assume your study variable is y and you want to simulate missingness of the variable y. This is en example code to compute extra variable y_miss according to your missing data pattern.
do if uniform(1) < .05.
comp y_miss = $SYSMIS.
else.
comp y_miss = y.
end if.

Error while using opencv_train cascade

I am trying to create my own haar cascade classifier for hand gesture recognition. After generating the sample images[positive and negative] and generating the .vec file, when i try to execute the opencv_trainascaded exe file, i get the following error :
"Train dataset for temp stage can not be filled. Branch training terminated."
Can anyone help me in this regard??
Thanks in advance
Try opencv_haartraining function.
You might want to try aligning the positive and negative samples to the same size - see the linked reference.
Code wise, it looks like the error is only thrown when attempting to get the negative and positive images, so you might also want to make sure you are telling the classifier exe the correct number of positive and negative training samples.
http://nmarkou.blogspot.com/2012/01/using-train-cascades.html
check for the no of negative samples vs no. entries you have for them in .txt file.....

Resources