Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have data with the dimension of (2055, 95). I split it into train data: (1640, 95) and validation data: (415, 95).
I build a KNN classifier but don't know which k param to choose so set k in range and find out which k is fit for my problem. But I got this data:
I know that if we choose k = 1 means that the model is overfitting. So in my case, the best k is 3?
To determine the optimal k parameter in KNN, I would suggest to plot silhouette coefficient for different k values and apply elbow method to determine which one is the most suitable.
silhouette_coefficients = []
for k in range(2, 11):
kmeans = KMeans(n_clusters=k, **kmeans_kwargs)
kmeans.fit(scaled_features)
score = silhouette_score(scaled_features, kmeans.labels_)
silhouette_coefficients.append(score)
plt.style.use("fivethirtyeight")
plt.plot(range(2, 11), silhouette_coefficients)
plt.xticks(range(2, 11))
plt.xlabel("Number of Clusters")
plt.ylabel("Silhouette Coefficient")
plt.show()
For such a case below the optimal would be 3 since the rate of change decreases after x=3.
You can have a look at https://code-ai.mk/kmeans-elbow-method-tutorial/ for further information on elbow method.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Determine the regression line for the below data points:
(x1, y1) = (1, 4), (x2, y2) = (2, 3), (x3, y3) = (3, 9)
i.e. the function h(x) = w + hx that minimizes the squared error loss on this data.
This question just boils down to math.
First, we write our error function
The derivative of the error function tells us how the error changes as we change variables. Because there are two variables (m and b), it is a partial derivative. When the derivative is equal to zero we know we have reached a minimum (and because we're taking the derivative of a quadratic we know there is a single global minimum).
Writing out each term in the sum gives us
Two variables, two equations means we can solve for both!
In your case we have h=m and w=b
As a double check, Desmos is a great tool https://www.desmos.com/calculator
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Here is an open question:
suppose I need to predict a student's exam score given some inputs, e.g. hours spent on prep, previous scores, etc. How should I bound the output between 0 - 100? What are the best practices out there?
Thanks!
Edit:
Since the answers are mostly concerned about bounding model output after we have the predictions, is it possible to train the model beforehand such that this bound is implicitly learned by the model?
You would train an Isotonic Regression model: http://scikit-learn.org/stable/modules/generated/sklearn.isotonic.IsotonicRegression.html
Or you could simply clip the predicted values that are out of bounds.
It is general practice, when training multi-flavored data to appropriately scale it between 0 - 1, so for example, say ur test data was:
[input: [10 hrs studying, 100% on last test], output: [95% on this test] ]
then you should first standardize both input and output by dividing by the greatest numerical value in each of their elements or the greatest possible value:
input = input/input.max
output = output/100
[input: [0.1 , 1], output: [0.95] ]
When you are done training and want to predict a test scores, simply multiply the output by 100 and you are done.
BTW what you want to do is well documented on stephenwelch's Neural Network Youtube series.
You can either do Normalisation or Standardisation. They would transform your values within [0, 1].
I am not sure why you need the range to be 0-100, but if it is really so, you can multiply by 100 to get that range post the above transformation.
Normalise: Here each value of your feature column is converted like so:
X_new = (X - X_min) / (X_max - X_min)
where X_min and X_max are min and max values in the feature.
Standardise: Here each value of your feature column is converted like so:
X_new = (X - Mean) / StandardDeviation
where Mean and StandardDeviation are the mean and SD values of your feature.
Check which one gives you better results. If your data has extreme outliers, Standardisation might give better results.
In sklearn, you can use sklearn.preprocessing.normalize or sklearn.preprocessing.StandardScaler to do the conversions.
HTH
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Assume we have these inputs and output data:
1,1 -> 1
1,1 -> 1
1,1 -> 1
1,1 -> 0
1,0 -> 0
0,1 -> 1
0,0 -> 0
Is there any type of classifier that we can train with above data and when we give (1,1) as input, 75% of the time it gives out 1 and 25% of the time gives 0? (and 100% for the rest of the cases since they do not have alternatives).
I am only aware of Boltzmann machine (a stochastic neural network). How about classifiers other than Nnet?
In fact any classifier, that can output class probabilities (including Naive Bayes, NN, SVM) can work this way. In most cases you simply select class which maximizes the conditional probability
P(c|x)
In your case, simply select class according to probability distribution
c ~ P(c|x)
so for example, you train SVM with probabilistic outputs, and get that for a given input x_1 you have
P(1|x_1) = 0.75; P(0|x_1) = 0.25
And simply return 1 with 75% chance
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am working on a past exam paper. I am given a data set as follows:
Hair {brown, red} = {B,R},
Height {tall, short} = {T,S} and
Country {UK, Italy} = {U,I}
(B,T,U) (B,T,U) (B,T,I)
(R,T,U) (R,T,U) (B,T,I)
(R,T,U) {R,T,U) (B,T,I)
(R,S,U) (R,S,U) (R,S,I)
Question: Estimate the probabilities P(B,T|U), P(B|U), P(T|U), P(U) and P(I)
As the question states estimate, I am guessing that I don't need to calculate any values. Is it just a case of adding up how many times P(B,T|U) occurs over the whole data set e.g. (2/12) = 16%.
Then would the probability of P(U) be 0?
I don't think so. Out of your 12 records, 8 are from the country UK. So P(U) should be 8/12 = 2/3 ~= .66
Bayes' theorem is P(A|B) = P(B|A)P(A)/P(B) , which you're going to need to estimate some of those probabilities.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Anybody please help me to interpret the following result generated in weka for classification using naive bayes.
Please explain clearly what is
Normal Distribution
Mean
StandardDev
WeightSum
Precision.
Please help me. I am new in weka.
**
Naive Bayes Classifier
Class Normal: Prior probability = 0.5
1374195_at: Normal Distribution. Mean = 218.06 StandardDev = 6.0572 WeightSum = 3 Precision = 36.34333334
1373315_at: Normal Distribution. Mean = 1142.58 StandardDev = 21.1589 WeightSum = 3 Precision = 126.95333339999999
Normal distribution is the classic gaussian distribution. Mean and Standard deviation are properties of a normal/gaussian distribution. Look to basic statistics texts about this.
Weight Sum. This value is calculated for numerical values. Its value is equal to class distribution. For iris dataset there are 3 classes (50,50,50) and this value is 50 for all of them. For weather dataset it is 9 5. Same as class instance number. Your attribute value affects your result according to class distribution.
Precision : TP / (TP + FP) The percentage of positive predictions that are correct.
More resources :
Classifier Evaluation