I would like to know what SPSS does when it computes the UICI and LICI (upper and lower individual confidence interval). I am asking because when we compute "by hand" the same prediction interval for a given individual using the output tables from a simple linear regression we get a slightly different interval (up to 0,005 difference).
I couldn't find online how to get the code used for this command in order to look closer at what SPSS does when we "check" the boxes for mean and individual prediction intervals.
Thanks for your help,
The SPSS Algorithms manual accessible from the Help menu will give you the formulas. Note that a confidence interval is not the same as a prediction interval.
Related
can we predict growth percentage in sales of an item given the change in discount(positive or negative number) from the previous year as a predictor variable. There seems to be no correlation between these. How to solve this problem using machine learning?
You are on the wrong track to ask this question.
Correlation is on the knowledge side of Statistics, Please check Pearson’s correlation of coefficient / Spearman’s correlation of coefficient in order to find the correlation between the discount changes and the sales groth correlation.
In Machine Learning, we seldom compare two percentage data, instead, we compare the actual sales/discount value. A simple ML can be applied by Linear regression (most ML is used in multi-dimension, as your case is one-x one-y data (single column to single output). Please refer to related information online and solved with excel or python code.
Suppose I have a model that classifies images in to one of n categories. I know how to calculate the accuracy and sensitivity based just on the output label. However, I want to be more specific. How could I also incorporate the confidence percentage which is produced with each output???
You could use bootstrapping to obtain a confidence interval of your model on the dataset. A full demonstration here. If you want it for an individual sample, you may define another list, like the stat list, and store the predicted probabilities for that individual in there instead.
I have 6 variables with values of something between 0 and 2 and a function, where these variables are given into. It predicts the result of a football match by looking at the past matches of both teams.
Depending on the variables, obviously, the output of the function changes. Each variable determines how much a certain part of the algorithm is weighed (e.g. how much less a game should be weighed that was 6 months ago compared to a game that was a week ago).
My goal is now to find out what the perfect ratios between the different variables and thus between the different parts of the algorithm are, so that my algorithm predicts most matches correctly. Is there any way of achieving that?
I thought of doing something like this with machine learning, something similar to linear/polynomal regression.
To determine how close a tip is I thought of giving:
2 points for when the tendency was right (predicted that Team A would win and Team A did win)
4 points for when the goal difference is right (Prediction: Team A
wins 2:1, actual result: 1:0)
5 points for when the result is
predicted correctly (predicted result: 2:1 and actual result: 2:1)
Which would make a loss function of
maximal points for game (which is 5) - points for predicted result
If I am able to minimize that, hopefully, after looking at some training sets (past seasons), it will theoretically score the most amount of points, when you give it a new season together with the variables computed in beforehand as input.
Now I'm trying to find out, by how much and in which direction I have to change each of my variables so that the loss is made smaller each time you look at a new training set.
You probably have to look at how big the loss is but I dont know how to find out which variable to change and in which direction. Is that possible and if so, how do I do that?
Currently I'm using javascript.
I am assuming that you are trying to use gradient descent to train your regression model.
Loss functions that you can use with gradient descent have to be differentiable, so simply giving/subtracting points to certain properties of the prediction is not possible.
A loss function that may be suitable for this task is the mean squared error, which is simply computed by averaging the squared differences between predicted and expected values. Your expected values would then just be the scores of both teams in the game.
You would then have to compute the gradient of the loss of your prediction with respect to the weights that the prediction function uses to compute its outputs. This can be done using backpropagation (details of which are way too broad for this answer, there are many tutorials available on the web).
The intuition behind the gradient of a function is that it points in the direction of steepest ascend of your function. If you update your parameters in that direction, the output of your function will grow. If this value is the loss of your prediction function, you want it to be smaller, so you go a small step in the opposite direction of your gradient.
I have time series data of size 100000*5. 100000 samples and five variables.I have labeled each 100000 samples as either 0 or 1. i.e. binary classification.
I want to train it using LSTM , because of the time series nature of data.I have seen examples of LSTM for time series prediction, Is it suitable to use it in my case.
Not sure about your needs.
LSTM is best suited for sequence models, like time series you said, and your description don't look a time series.
Any way, you may use LSTM for time series, not for prediction, but for classification like this article.
In my experience, for binary classification having only 5 features you could find better methods, will consume more memory thant other methods, and could get worst results.
First of all, you can see it from a different perspective, i.e. instead of having 10,000 labeled samples of 5 variables, you should treat it as 10,000 unlabeled samples of 6 variables, where the 6th variable is the label.
Therefore, you can train your LSTM as a multivariate predictor for your 6th variable, that is the sample label and compare with the ground truth during testing to evaluate its performance.
I'm writing a naive bayes classifier for a class project and I just got it working... sort of. While I do get an error-free output, the winning output label had an output probability of 3.89*10^-85.
Wow.
I have a couple of ideas of what I might be doing wrong. Firstly, I am not normalizing the output percentages for the classes, so all of the percentages are effectively zero. While that would give me numbers that look nice, I don't know if that's the correct thing to do.
My second idea was to reduce the number of features. Our input data is a list of pseudo-images in the form of a very long text file. Currently, our features are just the binary value of every pixel of the image, and with a 28x28 image that's a lot of features. If I instead chopped the image into blocks of size, say, 7x7, how much would that actually improve the output percentages?
tl;dr Here's the general things I'm trying to understand about naive bayes:
1) Do you need to normalize the output percentages from testing each class?
2) How much of an effect does having too many features have on the results?
Thanks in advance for any help you can give me.
It could be normal. The output of a naive bayes is not meant to be a real probability. What it is meant to do is order a score among competing classes.
The reason why the probability is so low is that many Naive Bayes implementations are the product of the probabilities of all the observed features of the instance that is being classified. If you are classifying text, each feature may have a low conditional probability for each class (example: lower than 0.01). If you multiply 1000s of feature probabilities, you quickly end up with numbers such as you have reported.
Also, the probabilities returned are not the probabilities of each class given the instance, but an estimate of the probabilities of observing this set of features, given the class. Thus, the more you have features, the less likely it is to observe these exact features. A bayesian theorem is used to change argmax_c P(class_c|features) to argmax_c P(class_c)*P(features|class_c), and then the P(features|class_c) is further simplified by making independence assumption, which allows changing that to a product of the probabilities of observing each individual feature given the class. These assumptions don't change the argmax (the winning class).
If I were you, I would not really care about the probability output, focus instead on the accuracy of your classifier and take action to improve the accuracy, not the calculated probabilities.