Predicting class label for single image in image classification - image-processing

I am designing a Static Hand Gesture Recognition using deep learning neural networks.
I started with this implementation on kaggle - https://www.kaggle.com/ranjeetjain3/deep-learning-using-sign-langugage/notebook#Sign-Language.
The accuracy of this looks very high, but when I try predictions for custom images, I am getting wrong results. As a newbie, I doubt my interpretation and need help for it.
Below is my code with prediction:
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
# Read Images
infer_image = mpimg.imread('D:\\Mayuresh\\DL using SL MNIST\\input\\infer\\7.png')
plt.imshow(infer_image)
# Resizing before the prediction
infer_image = np.resize(infer_image, (28,28,1))
infer_image_arr = np.array(infer_image)
infer_image_arr = infer_image_arr.reshape(1,28,28,1)
# Prediction
y_pred = model.predict_classes(infer_image_arr)
print(y_pred)
# Dictionary for translation
my_dict2 = {
0: 'a',
1: 'b',
2: 'c',
3: 'd',
4: 'e',
5: 'f',
6: 'g',
7: 'h',
8: 'i',
9: 'k',
10: 'l',
11: 'm',
12: 'n',
13: 'o',
14: 'p',
15: 'q',
16: 'r',
17: 's',
18: 't',
19: 'u',
20: 'v',
21: 'w',
22: 'x',
23: 'y'
}
my_dict2[int(y_pred)]
Can someone suggest changes needed or a snippet to predict the hand gesture for one image?

I assume you didn't train anything for your project and use neural network weights given in the Kaggle site.
The accuracy of this looks very high, but when I try predictions for custom images, I am getting wrong results.
It seems like the network you are using over-fitted to MNIST dataset. So when you give different image, it gives bad results.
What you should do is creating a gesture dataset which includes many cases, especially cases that you want to detect. Then you should train your network with this newly created dataset using present weights as initial weights for your training. Your network should learn different gesture situations. The key point for increasing accuracy in your project is training your network with different gesture images that resemble your inference input.

I believe you need a codebase where you can train your model on the dataset that you generate. Based on things like your background, light settings etc. That would be better than using a pre-trained model which is trained on the data from the different distribution
I would recommend using this where you can start video feed and training images will be automatically taken for the gestures. You also can select the number of classes you want. That can improve your performance. Or else you can use the original code Emojinator which can detect 13 hand gestures

Related

Is it fair enough to make model evaluation based on just "train_test_split"?

I'm absolutely confused about model evaluation, interpreting its results and using cross_val_score. I don't understand why evaluation on a test set is usually considered as a final and solid result, while if we just choose other split, we'll get a different value which could be far worse (or far better) than the previous one. Below, I'll illustrate what I'm talking about with an example and after that I'll ask some more precise questions.
*I used a dataset from Jason Brownlee: https://github.com/jbrownlee/Datasets/blob/master/pima-indians-diabetes.data.csv
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv'
df = pd.read_csv(url, names=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'Target'], header=None)
X, y = df.drop('Target', axis=1), df['Target']
Here is our target distribution:
X_rest, X_test, y_rest, y_test = train_test_split(X, y, test_size=0.3,
random_state=777, stratify=y)
X_train, X_val, y_train, y_val = train_test_split(X_rest, y_rest, test_size=0.25,
random_state=777, stratify=y_rest)
Checking sample's sizes:
Train size: 52.3 %
Val size: 17.6 %
Test size: 30.1 %
Tuning the hyper-parameters:
base_model = LogisticRegression(random_state=777, max_iter=2000)
params = {
'C': np.arange(0, 5, 0.1)
}
grdSrch = GridSearchCV(base_model, params, scoring='average_precision', cv=5)
grdSrch.fit(X_val, y_val)
print(f'Best params: {grd.best_params_}')
Best params: {'C': 2.5}
Training model with the best parameter and getting an average_precision_score value:
model = LogisticRegression(C=grdSrch.best_params_['C'], random_state=777, max_iter=2000)
model.fit(X_train, y_train)
y_pred_scores = model.predict_proba(X_test)[:, 1]
print(f'Avg. precision: {average_precision_score(y_test, y_pred_scores)}')
Avg. precision: 0.7067839537770597
Now, I want to be sure that result is not unfair because of some unexpectedly good train/test splitting. And I use cross_val_score for that purpose:
res_ = cross_val_score(LogisticRegression(C=grd.best_params_['C'], random_state=777, max_iter=2000),
X,
y,
scoring='average_precision',
cv=StratifiedKFold(n_splits=15, shuffle=True))
print(res_)
print()
print(f'Mean score: {np.round(res_.mean(), 4)}')
Then I get:
[0.7779402 0.69200873 0.63972188 0.82368544 0.6044146 0.70668374
0.85022563 0.79848536 0.60740097 0.68802039 0.92567494 0.84554528
0.61855088 0.78731357 0.79852637]
Mean score: 0.7443
And what do we see here? We got a pretty high variance among those results plus higher overall mean value. So, at this point I totally lost it. My questions are:
Can we use cross_val_score on a whole dataset to assess a fairness (?) of our final evaluation result?
If we can, why do we even use train_test_split with just one score when the cross_val_score gives us more clear picture about actual scores?
If we cannot, then for what reason?
It seems like we actually don't have any "final" result for any metric, because we can always get some pool of various scores depending on a train/test splitting. So, how can we make a real business decisions in such circumstances?
Depend on data set but most preferred method because it gives your model the opportunity to train on multiple train-test splits. This gives you a better indication of how well your model will perform on unseen data.suppose ,you take 10-fold cross validation, the dataset would be split into 10 groups, and the model would be trained and tested 10 separate times so each group would get a chance to be the test set but in train test split ,there is one time .
Train test split method is good to use when you have a very large dataset and you are starting to build an initial model in your data science project. Keep in mind that because cross-validation uses multiple train-test splits, it takes more computational power and time to run than using the holdout method.

Accuracy score in K-nearest Neighbour Classifier not matching with GridSearchCV

I'm learning Machine Learning and I'm facing a mismatch I can't explain.
I have a grid to compute the best model, according to the accuracy returned by GridSearchCV.
model=sklearn.neighbors.KNeighborsClassifier()
n_neighbors=[3, 4, 5, 6, 7, 8, 9]
weights=['uniform','distance']
algorithm=['auto','ball_tree','kd_tree','brute']
leaf_size=[20,30,40,50]
p=[1]
param_grid = dict(n_neighbors=n_neighbors, weights=weights, algorithm=algorithm, leaf_size=leaf_size, p=p)
grid = sklearn.model_selection.GridSearchCV(estimator=model, param_grid=param_grid, cv = 5, n_jobs=1)
SGDgrid = grid.fit(data1, targetd_simp['VALUES'])
print("SGD Classifier: ")
print("Best: ")
print(SGDgrid.best_score_)
value=SGDgrid.best_score_
print("params:")
print(SGDgrid.best_params_)
print("Best estimator:")
print(SGDgrid.best_estimator_)
y_pred_train=SGDgrid.best_estimator_.predict(data1)
print(sklearn.metrics.confusion_matrix(targetd_simp['VALUES'],y_pred_train))
print(sklearn.metrics.accuracy_score(targetd_simp['VALUES'],y_pred_train))
The results I get are the following:
SGD Classifier:
Best:
0.38694539229180525
params:
{'algorithm': 'auto', 'leaf_size': 20, 'n_neighbors': 8, 'p': 1, 'weights': 'distance'}
Best estimator:
KNeighborsClassifier(leaf_size=20, n_neighbors=8, p=1, weights='distance')
[[4962 0 0]
[ 0 4802 0]
[ 0 0 4853]]
1.0
Probably this model is highly overfitted. I still to check it, but it's not the matter of question here.
So, basically, if I understand correctly, GridSearchCV is finding a best accuracy score of 0.3869 (quite poor) for one of the chunks in the cross validation, but the final confusion matrix is perfect, as well as the accuracy of this final matrix. It doesn't make much sense for me... How such a in theory, bad model is performing so well?
I also added scoring = 'accuracy' in GridSearchCV to be sure that the returned value is actually accuracy, and it returns exactly the same value.
What am I missing here?
The behavior you are describing is rather normal and to be expected. You should know that GridSearchCV has a parameter refit which is by default set to true. It triggers the following:
Refit an estimator using the best found parameters on the whole dataset.
This means that the estimator returned by best_estimator_ has been refit on your whole dataset (data1 in your case). It is therefore data that the estimator has already seen during training and, expectedly, performs especially well on it. You can easily reproduce this with the following example:
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
X, y = make_classification(random_state=7)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
search = GridSearchCV(KNeighborsClassifier(), param_grid={'n_neighbors': [3, 4, 5]})
search.fit(X_train, y_train)
print(search.best_score_)
>>> 0.8533333333333333
print(accuracy_score(y_train, search.predict(X_train)))
>>> 0.9066666666666666
While this is not as impressive as in your case, it is still a clear result. During cross-validation, the model is validated against one fold that was not used for training the model, and thus, against data the model has not seen before. In the second case, however, the model already saw all data during training and it is to be expected that the model will perform better on them.
To get a better feeling of the true model performance, you should use a holdout set with data the model has not seen before:
print(accuracy_score(y_test, search.predict(X_test)))
>>> 0.76
As you can see, the model performs considerably worse on this data and shows us that the former metrics were all a bit too optimistic. The model did in fact not generalize that well.
In conclusion, your result is not surprising and has an easy explanation. The high discrepancy in scores is impressive but still follows the same logic and is actually just a clear indicator of overfitting.

Wierd behavoir while training an SVM classifier

I am searching for the best value of C (Cost parameter) for training my SVM classifier. Here is my code:
clear all; close all; clc
% Load training features and labels
[y, x] = libsvmread('training_data.train'); %the training dataset is named training_data.train
cost=[2^-7,2^-5,2^-3,2^-1,2^1,2^3,2^5,2^7,2^9,2^11,2^13,2^15];
accuracy=zeros(1,length(cost)); %This array will store the accuracy values corresponding to each element in the cost array
for i = 1:length(cost)
opt = sprintf('-c %i -v 3',cost(i));
accuracy(i)=svmtrain(y,x,opt);
end
accuracy
I am using the LIBSVM library. When I run this program, the accuracy array is populated with pretty weird values:
Here is the output:
Columns 1 through 8:
67.335 93.696 91.404 92.550 93.696 93.553 93.553 93.553
Columns 9 through 12:
93.553 93.553 93.553 93.553
This means that I get the highest cross-validation accuracy on 2^-5. Should I get the highest accuracy on the highest value of C? (As much as I understand, it is a penalty factor for misclassification). Is this behavior expected of it?
(I am building a classifier for breast cancer identification using the UCI ML database).
Should I get the highest accuracy on the highest value of C? (As much as I understand, it is a penalty factor for misclassification).
No, there is no guarantee, as the SVM cost is not accuracy-based, it uses a specific surrogate function which only roughly behaves like accuracy, but you can expect many random fluctuations. In general, you should expect high values for high C, but not necessarily the highest one in general.
Is this behavior expected of it? (I am building a classifier for breast cancer identification using the UCI ML database).
Yes, it is a possible outcome.

Keras LSTM Time Series

I have a problem and at this point I'm completely lost as to how to solve it. I'm using Keras with an LSTM layer to project a time series. I'm trying to use the previous 10 data points to predict the 11th.
Here's the code:
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
def _load_data(data):
"""
data should be pd.DataFrame()
"""
n_prev = 10
docX, docY = [], []
for i in range(len(data)-n_prev):
docX.append(data.iloc[i:i+n_prev].as_matrix())
docY.append(data.iloc[i+n_prev].as_matrix())
if not docX:
pass
else:
alsX = np.array(docX)
alsY = np.array(docY)
return alsX, alsY
X, y = _load_data(df_test)
X_train = X[:25]
X_test = X[25:]
y_train = y[:25]
y_test = y[25:]
in_out_neurons = 2
hidden_neurons = 300
model = Sequential()
model.add(LSTM(in_out_neurons, hidden_neurons, return_sequences=False))
model.add(Dense(hidden_neurons, in_out_neurons))
model.add(Activation("linear"))
model.compile(loss="mean_squared_error", optimizer="rmsprop")
model.fit(X_train, y_train, nb_epoch=10, validation_split=0.05)
predicted = model.predict(X_test)
So I'm taking the input data (a two column dataframe), creating X which is an n by 10 by 2 array, and y which is an n by 2 array which is one step ahead of the last row in each array of X (labeling the data with the point directly ahead of it.
predicted is returning
[[ 7.56940445, 5.61719704],
[ 7.57328357, 5.62709032],
[ 7.56728049, 5.61216415],
[ 7.55060187, 5.60573629],
[ 7.56717342, 5.61548522],
[ 7.55866942, 5.59696181],
[ 7.57325984, 5.63150951]]
but I should be getting
[[ 73, 48],
[ 74, 42],
[ 91, 51],
[102, 64],
[109, 63],
[ 93, 65],
[ 92, 58]]
The original data set only has 42 rows, so I'm wondering if there just isn't enough there to work with? Or am I missing a key step in the modeling process maybe? I've seen some examples using Embedding layers etc, is that something I should be looking at?
Thanks in advance for any help!
Hey Ryan!
I know it's late but I just came across your question hope it's not too late or that you still find some knowledge here.
First of all, Stackoverflow may not be the best place for this kind of question. First reason to that is you have a conceptual question that is not this site's purpose. Moreover your code runs so it's not even a matter of general programming. Have a look at stats.
Second from what I see there is no conceptual error. You're using everything necessary that is:
lstm with propper dimensions
return_sequences=false just before your Dense layer
linear activation for your output
mse cost/loss/objective function
Third I however find it extremely unlikely that your network learns anything with so few pieces of data. You have to understand that you have less data than parameters here! For the great majority of supervised learning algorithm, the first thing you need is not a good model, it's good data. You can not learn from so few examples, especially not with a complex model such as LSTM networks.
Fourth It seems like your target data is made of relatively high values. First step of pre-processing here could be to standardize the data : center it around zero - that is translate your data by its mean - and rescale by ists standard deviation. This really helps learning!
Fifth In general here are a few things you should look into to improve learning and reduce overfitting :
Dropout
Batch Normalization
Other optimizer (such as Adam)
Gradient clipping
Random hyper parameter search
(This is not exhaustive, if you're reading this and think something should be added, comment it so it's useful for future readers!)
Last but NOT least I suggest you look at this tutorial on Github, especially the recurrent tutorial for time series with keras.
PS: Daniel Hnyk updated his post ;)

Predicting classes with a lot of data skewed towards one class

I have a question on how to deal with some interesting data.
I currently have some data (The counts are real, but the situation is fake) where we predict the number of t-shirts that people will purchase online today. We know quite a bit about everyone for our feature attributes and these change from day to day. We also know how many t-shirts the everyone purchased the previous days.
What I want is to have an algorithm that is able to churn out a continuous variable that is a ranking or “score” of the number of t-shirts the person is going to purchase the today. My end goal is that if I can get this score attached to each person, I can sort them according to the score and use them in a specific UI. Currently I’ve been using random forest regression with sci-kit where my target classes are yesterday’s count of t-shirt purchases by each person. This has worked out pretty well except that my data is mildly difficult in that there are a lot of people that purchase 0 t-shirts. This is a problem due to my random forest giving me a lot of predicted classes of 0 and I cannot sort them effectively. I get why this is happening, but I’m not sure the best way to get around it.
What I want is a non-zero score (even if it’s a very small number close to 0) that tells me more about the features and the predicted class. I feel that some of my features must be able to tell me something and give me a better prediction than 0.
I think that the inherent problem is using random forest regressor as the algorithm. Each tree is getting a vote; however, there are so many zeros that there are many forests where all trees are voting for 0. I would like another algorithm to try, but i don’t know which one would work best. Currently I’m training on the whole dataset and using the out-of-bag estimate that scikit provides.
Here are the counts (using python’s Counter([target classes]) of the data classes. This is setup as such: {predicted_class_value: counts_of_that_value_in_the_target_class_list}
{0: 3560426, 1: 121256, 2: 10582, 3: 1029, 4: 412, 5: 88, 6: 66, 7: 35, 8: 21, 9: 17, 10: 17, 11: 10, 12: 2, 13: 2, 15: 2, 21: 2, 17: 1, 18: 1, 52: 1, 25: 1}
I have tried some things to manipulate the training data, but I’m really guessing at things to do.
One thing I tried was scaling the number of zeros in the training set to a linearly scaled amount based on the other data. So instead of passing the algorithm 3.5 million 0-class rows, I scaled it down to 250,000. So my training set looked like: {0: 250,000, 1: 121256, 2: 10582, 3: 1029, … }. This has a drastic effect on the amount of 0’s coming back from the algorithm. I’ve gone from the algo guessing 99% of the data as 0 to about only 50%. However, I don’t know if this is a valid thing to do or if it even makes sense.
Other things I’ve tried include increasing the forest size - however that doesn’t have too much of an effect, telling the random forest to only use sqrt features for each of the tree - which has had a pretty good effect, and using the out-of-bag estimate - which also has seem to have good results.
To summarize, I have a set of data where there is a disproportionate amount of data toward one class. I would like to have some way to produce a continuous value that is a “score” for each value in the predicted dataset so I may sort them.
Thank you for your help!
This is an unbalanced class problem. One thing you could do is over/undersampling. Undersampling means that you randomely delete instances from the majority class. Over sampling means that you sample with replacement instances from the minority class. Or you could use a combination of both. One thing you could try is SMOTE[1] which is an oversampling algorithm but instead of just sampling exsisting instances from the minority class, it creates synthetic instances which will avoid overfitting and in theory will be better at generalizing.
[1] Chawla, Nitesh V., et al. "SMOTE: synthetic minority over-sampling technique." Journal of artificial intelligence research (2002): 321-357.

Resources