How to interpret the naive bayes result in weka? [closed] - machine-learning

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Anybody please help me to interpret the following result generated in weka for classification using naive bayes.
Please explain clearly what is
Normal Distribution
Mean
StandardDev
WeightSum
Precision.
Please help me. I am new in weka.
**
Naive Bayes Classifier
Class Normal: Prior probability = 0.5
1374195_at: Normal Distribution. Mean = 218.06 StandardDev = 6.0572 WeightSum = 3 Precision = 36.34333334
1373315_at: Normal Distribution. Mean = 1142.58 StandardDev = 21.1589 WeightSum = 3 Precision = 126.95333339999999

Normal distribution is the classic gaussian distribution. Mean and Standard deviation are properties of a normal/gaussian distribution. Look to basic statistics texts about this.
Weight Sum. This value is calculated for numerical values. Its value is equal to class distribution. For iris dataset there are 3 classes (50,50,50) and this value is 50 for all of them. For weather dataset it is 9 5. Same as class instance number. Your attribute value affects your result according to class distribution.
Precision : TP / (TP + FP) The percentage of positive predictions that are correct.
More resources :
Classifier Evaluation

Related

How can I handle high cardinal/sparse features using neural network? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I look for examples about encoding high cardinal or sparse datasets using Neural Networks but I cannot find it. Also I search about embedding numerical variables (not categorical) however I couldn't find any examples either. Can you send me a GitHub link etc. if you have about these issues?
Working with neural networks I am assuming that tensorflow with Keras backend is being used?
If so here is a reference snippet, main library used tf.feature_column
import tensorflow as tf
from tensorflow.keras import layers
feature_columns=[]
for col in list(df_train_numerical.columns):
col = tf.feature_column.numeric_column(col)
feature_columns.append(col)
for col in list(df_train_categorical.columns):
col = tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_hash_bucket(col, hash_bucket_size=8000), dimension=8)
#above hash bucket size is specified (cardinality) with dimension
feature_columns.append(col)
feature_layer = layers.DenseFeatures(feature_columns)
Following that the feature_layer is basically the first layer of the neural network-
model = tf.keras.models.Sequential()
model.add(feature_layer)
reference git code

Basic question about heavy-tailed distribution [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I have a basic question about heavy-tailed distributions.
Suppose there are 50,000 cities in Spain and the population of each is denoted by p(1), p(2), …, p(n). Based on the mean of the distribution 𝜇 and the deviation 𝜎, how can we tell if the distribution is heavy-tailed or not? What procedure should we consider?
If you have all 50,000 observations then you can calculate the central moments about the mean.
In particular, the fourth central moment divided by the variance squared is the kurtosis. This number will tell you if the distribution is platykurtic or not. If it is greater than three, it means that your distribution has heavier tails than a standard normal distribution.
So if you are working in Python and all 50K observations are stored in x:
from scipy import stats
# Calculate kurtosis
k = stats.moment(x, 4) / x.var()**2
# Evaluate
if k > 3:
print('Distribution has heavy tails')
else:
print('Distribution does not have heavy tails')

Should I chose k = 3 in this case of KNN classifier? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have data with the dimension of (2055, 95). I split it into train data: (1640, 95) and validation data: (415, 95).
I build a KNN classifier but don't know which k param to choose so set k in range and find out which k is fit for my problem. But I got this data:
I know that if we choose k = 1 means that the model is overfitting. So in my case, the best k is 3?
To determine the optimal k parameter in KNN, I would suggest to plot silhouette coefficient for different k values and apply elbow method to determine which one is the most suitable.
silhouette_coefficients = []
for k in range(2, 11):
kmeans = KMeans(n_clusters=k, **kmeans_kwargs)
kmeans.fit(scaled_features)
score = silhouette_score(scaled_features, kmeans.labels_)
silhouette_coefficients.append(score)
plt.style.use("fivethirtyeight")
plt.plot(range(2, 11), silhouette_coefficients)
plt.xticks(range(2, 11))
plt.xlabel("Number of Clusters")
plt.ylabel("Silhouette Coefficient")
plt.show()
For such a case below the optimal would be 3 since the rate of change decreases after x=3.
You can have a look at https://code-ai.mk/kmeans-elbow-method-tutorial/ for further information on elbow method.

Bound output values [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Here is an open question:
suppose I need to predict a student's exam score given some inputs, e.g. hours spent on prep, previous scores, etc. How should I bound the output between 0 - 100? What are the best practices out there?
Thanks!
Edit:
Since the answers are mostly concerned about bounding model output after we have the predictions, is it possible to train the model beforehand such that this bound is implicitly learned by the model?
You would train an Isotonic Regression model: http://scikit-learn.org/stable/modules/generated/sklearn.isotonic.IsotonicRegression.html
Or you could simply clip the predicted values that are out of bounds.
It is general practice, when training multi-flavored data to appropriately scale it between 0 - 1, so for example, say ur test data was:
[input: [10 hrs studying, 100% on last test], output: [95% on this test] ]
then you should first standardize both input and output by dividing by the greatest numerical value in each of their elements or the greatest possible value:
input = input/input.max
output = output/100
[input: [0.1 , 1], output: [0.95] ]
When you are done training and want to predict a test scores, simply multiply the output by 100 and you are done.
BTW what you want to do is well documented on stephenwelch's Neural Network Youtube series.
You can either do Normalisation or Standardisation. They would transform your values within [0, 1].
I am not sure why you need the range to be 0-100, but if it is really so, you can multiply by 100 to get that range post the above transformation.
Normalise: Here each value of your feature column is converted like so:
X_new = (X - X_min) / (X_max - X_min)
where X_min and X_max are min and max values in the feature.
Standardise: Here each value of your feature column is converted like so:
X_new = (X - Mean) / StandardDeviation
where Mean and StandardDeviation are the mean and SD values of your feature.
Check which one gives you better results. If your data has extreme outliers, Standardisation might give better results.
In sklearn, you can use sklearn.preprocessing.normalize or sklearn.preprocessing.StandardScaler to do the conversions.
HTH

Estimating probabilities using Bayes rule? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am working on a past exam paper. I am given a data set as follows:
Hair {brown, red} = {B,R},
Height {tall, short} = {T,S} and
Country {UK, Italy} = {U,I}
(B,T,U) (B,T,U) (B,T,I)
(R,T,U) (R,T,U) (B,T,I)
(R,T,U) {R,T,U) (B,T,I)
(R,S,U) (R,S,U) (R,S,I)
Question: Estimate the probabilities P(B,T|U), P(B|U), P(T|U), P(U) and P(I)
As the question states estimate, I am guessing that I don't need to calculate any values. Is it just a case of adding up how many times P(B,T|U) occurs over the whole data set e.g. (2/12) = 16%.
Then would the probability of P(U) be 0?
I don't think so. Out of your 12 records, 8 are from the country UK. So P(U) should be 8/12 = 2/3 ~= .66
Bayes' theorem is P(A|B) = P(B|A)P(A)/P(B) , which you're going to need to estimate some of those probabilities.

Resources