how do I use a linear regression model on a specific case not in the DB, and get the prediction WITH a confidence interval? - spss

I wish to use the prediction model for a specific case not in the DB, without adding a line in the DB.
How can I do an automated calculation of the prediction (I mean: put parameter values, and get the prediction (average and Interval)
How do I get an interval of confidence of the prediction?
I find no way in the menus.

Related

Why random forests or decision trees can't give 100% precision? And How to handle huge noise in the middle?

If decision tree tries to determine splits based on highest amount of data belonging to similar class, why can't it split this particular data until each split has only 1 element in it,
which would lead to a 100% precision?
Having only 1 data point / case in every terminal node can cause over-fitting on the training dataset. To avoid this, test the constructed model using a particular summary statistic (eg. RMSE) against both the training and validation datasets. In Random Forest, the 'Out of Bag' sample can be used as a validation set. This is the proportion of data (roughly 37%) that is not used in the construction of every tree. The RMSE should be relatively similar between both the training and validation sets.

How to evaluate ML image classifier with confidence

Suppose I have a model that classifies images in to one of n categories. I know how to calculate the accuracy and sensitivity based just on the output label. However, I want to be more specific. How could I also incorporate the confidence percentage which is produced with each output???
You could use bootstrapping to obtain a confidence interval of your model on the dataset. A full demonstration here. If you want it for an individual sample, you may define another list, like the stat list, and store the predicted probabilities for that individual in there instead.

How is the validation accuracy in Keras determined for every epoch?

This is in relation to a related post here.
Is the validation data evaluated on the model which gives the 0.9381 training accuracy or is it based on also splitting the validation data across the 500 steps per epoch, then taking the mean validation accuracy across all steps ?
Your training accuracy is evaluated after every batch.
The validation accuracy is calculated at the end of the Epoch.
If you want to test it, you can create a custom callback (https://keras.io/callbacks/). There is a method on_batch_end used for training accuracy and on_epoch_end used for the validation datas. If you save within the callback the accuracy and plot it, you will see the evolution.
You can see below, for example, the evolution of accuracy of 4 RNN cells after every batch on 1 Epoch. As the result was extremely noisy, I've added a sliding average. The star is the validation score at the end of the Epoch.

Can I use logistic regression algorithm to predict an ETA for a given task based on historical data?

Can I use logistic regression algorithm to predict an ETA for a given task based on historical data? I have some tasks which takes variable amount of time based on few factors like task type, weather, season, time of request etc.
Today we capture the time taken for all the tasks based on task types in a mysql store. Now we want to add a feature where based on factors and task type, we want to predict an ETA for the task and show it to customer.
We are planning to use Spark and use Logistic Regression and SVM algorithm. We are too new to this domain and need your guidance in terms of validating the approach and additional pointers.
You can achieve this with just a linear regression model because you're trying to predict a continuous outcome (ETA).
You would just train a regression model where you're predicting ETA from your input features (task type, weather, season etc). So what this model learns is how long would the task takes to complete given a certain set of inputs, the predicted outcome is what you would then show to customers
Take a look at this: http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-least-squares-lasso-and-ridge-regression
Logistic regression/SVM is used for classifying discrete outcomes (i.e. categories/groups).
So another approach might be to stratify the ETA scores in your mysql database into something like short/medium/long time to complete, and then use those 3 categories as your labels instead of the actual numerical value. Then you can use logistic regression to train a model that classifies into those 3 categories, based on your listed input features. This would work, but you lose some resolution due to condensing your ETA data into only 3 groups but that's a design decision you'd have to make.

Using weka to measure the quality of my classifier

I programmed my own classifier in python, I used a text corpus to test it using F1 measurement, but now I want to test it in other Data Mining tasks, so I have my classifier output file to a given corpus and I want to measure the quality using Weka different measures, how I can past to Weka the output file and get the quality?
I think the correct procedure should be some sort of n-fold validation: Divide your data set into training and test sets. Develop the model on the training set; calculate its sum of squared errors SSE(train).
The take the model and run the test data through it and calculate the SSE(test) using the predicted and actual response values. That'll help you assess the accuracy and bias of your model.
Have a look at Elements of Statistical Learning Using R.

Resources