Estimating probabilities using Bayes rule? [closed] - machine-learning

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am working on a past exam paper. I am given a data set as follows:
Hair {brown, red} = {B,R},
Height {tall, short} = {T,S} and
Country {UK, Italy} = {U,I}
(B,T,U) (B,T,U) (B,T,I)
(R,T,U) (R,T,U) (B,T,I)
(R,T,U) {R,T,U) (B,T,I)
(R,S,U) (R,S,U) (R,S,I)
Question: Estimate the probabilities P(B,T|U), P(B|U), P(T|U), P(U) and P(I)
As the question states estimate, I am guessing that I don't need to calculate any values. Is it just a case of adding up how many times P(B,T|U) occurs over the whole data set e.g. (2/12) = 16%.
Then would the probability of P(U) be 0?

I don't think so. Out of your 12 records, 8 are from the country UK. So P(U) should be 8/12 = 2/3 ~= .66
Bayes' theorem is P(A|B) = P(B|A)P(A)/P(B) , which you're going to need to estimate some of those probabilities.

Related

Taking a sample of the image dataset [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
For example i want to develop a Deep Learning model for classification of images and I have thousands of images . Since training the model with the whole dataset takes a long amount of time i would like to take a sample (10%) of the original dataset for initial training . How to do this?
If the dataset is contained into a folder, I will try in the following:
import os
import numpy as np
images = os.listdir('Path to your dataset') # list of all the images
n_test_images = int(len(images) * 0.1) # 10% of the total images
subset_images = np.random.choice(images, size=n_test_images, replace=False)
I used replace=True to avoid picking twice the same element.
After I selected the 10% of the images, I load them.
Actually I am not sure if this way is the most optimal one, but it could be a good starting point.

Should I chose k = 3 in this case of KNN classifier? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have data with the dimension of (2055, 95). I split it into train data: (1640, 95) and validation data: (415, 95).
I build a KNN classifier but don't know which k param to choose so set k in range and find out which k is fit for my problem. But I got this data:
I know that if we choose k = 1 means that the model is overfitting. So in my case, the best k is 3?
To determine the optimal k parameter in KNN, I would suggest to plot silhouette coefficient for different k values and apply elbow method to determine which one is the most suitable.
silhouette_coefficients = []
for k in range(2, 11):
kmeans = KMeans(n_clusters=k, **kmeans_kwargs)
kmeans.fit(scaled_features)
score = silhouette_score(scaled_features, kmeans.labels_)
silhouette_coefficients.append(score)
plt.style.use("fivethirtyeight")
plt.plot(range(2, 11), silhouette_coefficients)
plt.xticks(range(2, 11))
plt.xlabel("Number of Clusters")
plt.ylabel("Silhouette Coefficient")
plt.show()
For such a case below the optimal would be 3 since the rate of change decreases after x=3.
You can have a look at https://code-ai.mk/kmeans-elbow-method-tutorial/ for further information on elbow method.

Pytorch Transform library explaination [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
transforms.Normalize([0.5]*3, [0.5]*3)
Can someone help me to understand what this and how it works?
You have the documentation for the Normalizetransform here. It says : Normalize a tensor image with mean and standard deviation. Given mean: (mean[1],...,mean[n]) and std: (std[1],..,std[n]) for n channels, this transform will normalize each channel of the input torch.Tensor i.e., output[channel] = (input[channel] - mean[channel]) / std[channel]
So in your case, you are constructing a Normalize transform with mean=std=[0.5,0.5,0.5]. This means that you are expecting an input with 3 channels, and for each channel you want to normalize with the function
x -> (x-0.5)/0.5 = 2x-1

I didnt quite get the "there exist a j"part. Can anyone help me understand it better? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I didnt quite get the "there exist a j"part. Can anyone help me understand it better?
For each dimension, the most extreme 10% are categorized as boundary. The collection of all points which lie in the most extreme 10% of any dimension is classified as the boundary set.
for a 1D line: fraction of points in boundary f = 0.100
for a 2D square: f = 0.1 + 2*(0.05-2*0.05**2) = 0.190. To see why, you can draw a square with cutting lines at the 0.05 and 0.95 fractions for each of the 2 dimensions. You will end up with:
for a 3D cube: f = 0.1 + #I'm too lazy to write it all down = 0.271
for a 50D hypercube (definitely not going to write the direct calculation): f = 0.995.
Now luckily there is an indirect way of calculating these fractions which requires significantly less effort. I'll leave that bit of homework for you to do.

How to interpret the naive bayes result in weka? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Anybody please help me to interpret the following result generated in weka for classification using naive bayes.
Please explain clearly what is
Normal Distribution
Mean
StandardDev
WeightSum
Precision.
Please help me. I am new in weka.
**
Naive Bayes Classifier
Class Normal: Prior probability = 0.5
1374195_at: Normal Distribution. Mean = 218.06 StandardDev = 6.0572 WeightSum = 3 Precision = 36.34333334
1373315_at: Normal Distribution. Mean = 1142.58 StandardDev = 21.1589 WeightSum = 3 Precision = 126.95333339999999
Normal distribution is the classic gaussian distribution. Mean and Standard deviation are properties of a normal/gaussian distribution. Look to basic statistics texts about this.
Weight Sum. This value is calculated for numerical values. Its value is equal to class distribution. For iris dataset there are 3 classes (50,50,50) and this value is 50 for all of them. For weather dataset it is 9 5. Same as class instance number. Your attribute value affects your result according to class distribution.
Precision : TP / (TP + FP) The percentage of positive predictions that are correct.
More resources :
Classifier Evaluation

Resources