Here is some main parts of my code
I used custom Soft-attention + inception resnet v2
datagen=ImageDataGenerator(preprocessing_function=tf.keras.applications.inception_resnet_v2.preprocess_input, rescale=1./255.) # I did rescale
attention_layer,map2 = SoftAttention(aggregate=True,m=16,concat_with_x=False,ch=int(conv.shape[-1]),name='soft_attention')(conv)
attention_layer=(MaxPooling2D(pool_size=(2, 2),padding="same")(attention_layer))
conv=(MaxPooling2D(pool_size=(2, 2),padding="same")(conv))
conv = concatenate([conv,attention_layer])
conv = Activation('relu')(conv)
conv = Dropout(0.4)(conv)
output = Flatten()(conv)
output = Dense(7, activation='softmax')(output)
model = Model(inputs=irv2.input, outputs=output)
opt1=tf.keras.optimizers.Adam(lr=0.01)
model.compile(optimizer=opt1,
loss='categorical_crossentropy',
metrics=['accuracy'])
class_weights = {
0: 1.0, # akiec
1: 1.0, # bcc
2: 1.0, # bkl
3: 1.5, # df
4: 1.0, # mel
5: 1.0, # nv
6: 1.0, # vasc
}
Earlystop = EarlyStopping(monitor='val_loss', mode='min',patience=70, min_delta=0.001)
history = model.fit(train_batches,
steps_per_epoch=(len(train_df)/10),
epochs=100,
verbose=1,
validation_data=test_batches,validation_steps=len(test_df)/batch_size,callbacks=[checkpoint,Earlystop],class_weight=class_weights)
predictions = []
for i in range(len(arr)):
p = model.predict(arr[i].reshape(1,299,299,3))
print(p)
predictions.append(np.argmax(p))
but I got:
[[0. 0. 0. 0. 0. 0. 1.]]
[[0. 0. 0. 0. 0. 0. 1.]]
[[0. 0. 0. 0. 0. 0. 1.]]
[[0. 0. 0. 0. 0. 0. 1.]]
[[0. 0. 0. 0. 0. 0. 1.]]
[[0. 0. 0. 0. 0. 0. 1.]]
[[0. 0. 0. 0. 0. 0. 1.]]
[[0. 0. 0. 0. 0. 0. 1.]]
[[0. 0. 0. 0. 0. 0. 1.]]
[[0. 0. 0. 0. 0. 0. 1.]]
[[0. 0. 0. 0. 0. 0. 1.]]
[[1.0111562e-35 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
0.0000000e+00 1.0000000e+00]]
and every time like this.
Related
I did a ML model of handwritten digit recognition, and I'm trying to use the accuracy_score to know the % of the predcition if the model is accurated enough.
This is the model:
model = Sequential(
[
tf.keras.Input(shape=(64,)),
Dense(25, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01), name = "L1"),
Dense(15, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01), name = "L2"),
Dense(10, activation='linear', name = "L3"),
], name = "my_model"
)
#Compiling the model
model.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(0.01),
)
#Fitting the model
model.fit(
x_train, y_train,
epochs=1000
Here is some of the data:
(1797, 64)
X_train.shape (1078, 64) y_train.shape (1078,)
X_cv.shape (359, 64) y_cv.shape (359,)
X_test.shape (360, 64) y_test.shape (360,)
[[ 0. 0. 5. ... 0. 0. 0.]
[ 0. 0. 0. ... 10. 0. 0.]
[ 0. 0. 0. ... 16. 9. 0.]
...
[ 0. 0. 0. ... 7. 0. 0.]
[ 0. 2. 15. ... 5. 0. 0.]
[ 0. 0. 1. ... 3. 0. 0.]]
Everytime I run the code and use the accuracy_score I get the error message:
ValueError: Classification metrics can't handle a mix of multiclass and multiclass-multioutput targets
Does anyone know how I can fix this?
Thanks in advance.
I tried a way to fix, but I'm not sure if it's correct.
I used this code:
predictions = model.predict(x_test)
print(accuracy_score(y_test, np.argmax(predictions, axis=1)))
I get a number like '0.90', but I'm not sure if it's correct.
I am trying to build a model that given an item, predicts which store it belongs to.
I have a data-set of ~250 records which are supposed to be items in different online stores.
Each record is composed of:
Category,Sub Category,Price,Store Identifier(The y variable)
I have tried several number of neighbors, tried Manhattan distance but unfortunately can not get better results accuracy ~0.55.
Random Forest produces accuracy ~0.7.
My intuition says that a model should be able to predict this problem. What am I missing?
This is the data:
https://pastebin.com/nUsSbkp4
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values
labelencoder_X_0 = LabelEncoder()
X[:, 0] = labelencoder_X_0.fit_transform(X[:, 0])
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
onehotencoder_0 = OneHotEncoder(categorical_features = [0])
X = onehotencoder_0.fit_transform(X).toarray()
onehotencoder_1 = OneHotEncoder(categorical_features = [1])
X = onehotencoder_1.fit_transform(X).toarray()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# classifier = RandomForestClassifier(n_estimators=25, criterion='entropy', random_state = 0)
classifier = KNeighborsClassifier(n_neighbors=3, metric='minkowski', p=2)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
accuracy = classifier.score(X_test, y_test)
print(accuracy)
KNN can potentially produce good forecasts with categorical predictors. I have had success with it before. But there is some stuff not note then:
the numeric variables have to be on the same scale, e.g. by using min-max scaling
one could try specicfically desined error metrics such as the Gower distance
Besides that though, you actually have a bug in the one-hot-encoding:
After calling the first one hot encoder you have an array of shape (273, 21):
onehotencoder_0 = OneHotEncoder(categorical_features = [0])
X = onehotencoder_0.fit_transform(X).toarray()
print(X.shape)
print(X[:5,:])
Out:
(275, 21)
[[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 52. 33.99]
[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 52. 33.97]
[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 36. 27.97]
[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 37. 13.97]
[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 20. 9.97]]
Then you call the one hot encode on the second column which had only two values (zero and one) and hence results in:
onehotencoder_1 = OneHotEncoder(categorical_features = [1])
X = onehotencoder_1.fit_transform(X).toarray()
print(X.shape)
print(X[:5,:])
Out:
(275, 22)
[[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 52. 33.99]
[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 52. 33.97]
[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 36. 27.97]
[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 37. 13.97]
[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 20. 9.97]]
So, if you could fix that or simply use for instance pipelines to avoid that and add the scaling of the numeric variables like that:
from sklearn.pipeline import Pipeline, FeatureUnion, make_pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.neighbors import KNeighborsClassifier
class Columns(BaseEstimator, TransformerMixin):
def __init__(self, names=None):
self.names = names
def fit(self, X, y=None, **fit_params):
return self
def transform(self, X):
return X.loc[:,self.names]
dataset = pd.read_csv('data.csv', header=None)
dataset.columns = ["cat1", "cat2", "num1", "target"]
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, 3]
labelencoder_X_0 = LabelEncoder()
X.iloc[:, 0] = labelencoder_X_0.fit_transform(X.iloc[:, 0])
labelencoder_X_1 = LabelEncoder()
X.iloc[:, 1] = labelencoder_X_1.fit_transform(X.iloc[:, 1])
numeric = ["num1"]
categorical = ["cat1", "cat2"]
pipe = Pipeline([
("features", FeatureUnion([
('numeric', make_pipeline(Columns(names=numeric),StandardScaler())),
('categorical', make_pipeline(Columns(names=categorical), OneHotEncoder(sparse=False)))
])),
])
X = pipe.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# classifier = RandomForestClassifier(n_estimators=25, criterion='entropy', random_state = 0)
classifier = KNeighborsClassifier(n_neighbors=3, metric='minkowski', p=2)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
accuracy = classifier.score(X_test, y_test)
print(accuracy)
Out:
0.7101449275362319
As you see this brings the accuracy at least into the ball park of the Random Forrest!
So what you could try out next is to try the gower distance. There is an ongoing discussion of adding it to sklearn here, so may check out the posted code there in the Ipython Notebook and try that.
I'm working on LSTM's.
The output is categorical.
Its of format [[t11,t12,t13],[t21,t22,t23]
I was able to do it for 1d array and i'm finding it difficult to do it for 2d array.
from keras.utils import to_categorical
print(to_categorical([[9,10,11],[10,11,12]]))
output
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
There were two different inputs each having 3 time steps, but in output its all combined.
I need it to be,
[[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]],
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]]
If shapes are weird, try to make it 1D, use the function, and reshape it back:
originalShape = myData.shape
totalFeatures = myData.max() + 1
categorical = myData.reshape((-1,))
categorical = to_categorical(categorical)
categorical = categorical.reshape(originalShape + (totalFeatures,))
I realized i can achieve what I want by reshaping,
print(a.reshape(2,3,13))
[[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]]
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]]
You get the error when reshaping because the highest class index is 12 and therefore there are 13 classes (0, 1, ..., 12). To avoid such errors furthermore, you might Numpy let infer those dimension by calling one_hot.reshape(sparse.shape + [-1]) where one_hot is the one-hot encoded vector produced by to_categorical() and sparse the orignal one.
I wanted to modify the following keras mean squared error loss (MSE) such that the loss is only computed sparsely.
def mean_squared_error(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
My output y is a 3 channel image, where the 3rd channel is non-zero at only those pixels where loss is to be computed. Any idea how can I modify the above to compute sparse loss?
This is not the exact loss you are looking for, but I hope it will give you a hint to write your function (see also here for a Github discussion):
def masked_mse(mask_value):
def f(y_true, y_pred):
mask_true = K.cast(K.not_equal(y_true, mask_value), K.floatx())
masked_squared_error = K.square(mask_true * (y_true - y_pred))
masked_mse = (K.sum(masked_squared_error, axis=-1) /
K.sum(mask_true, axis=-1))
return masked_mse
f.__name__ = 'Masked MSE (mask_value={})'.format(mask_value)
return f
The function computes the MSE loss over all the values of the predicted output, except for those elements whose corresponding value in the true output is equal to a masking value (e.g. -1).
Two notes:
when computing the mean the denominator must be the count of non-masked values and not the
dimension of the array, that's why I'm not using K.mean(masked_squared_error, axis=1) and I'm
instead averaging manually.
the masking value must be a valid number (i.e. np.nan or np.inf will not do the job), which means that you'll have to adapt your data so that it does not contain the mask_value.
In this example, the target output is always [1, 1, 1, 1], but some prediction values are progressively masked.
y_pred = K.constant([[ 1, 1, 1, 1],
[ 1, 1, 1, 3],
[ 1, 1, 1, 3],
[ 1, 1, 1, 3],
[ 1, 1, 1, 3],
[ 1, 1, 1, 3]])
y_true = K.constant([[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[-1, 1, 1, 1],
[-1,-1, 1, 1],
[-1,-1,-1, 1],
[-1,-1,-1,-1]])
true = K.eval(y_true)
pred = K.eval(y_pred)
loss = K.eval(masked_mse(-1)(y_true, y_pred))
for i in range(true.shape[0]):
print(true[i], pred[i], loss[i], sep='\t')
The expected output is:
[ 1. 1. 1. 1.] [ 1. 1. 1. 1.] 0.0
[ 1. 1. 1. 1.] [ 1. 1. 1. 3.] 1.0
[-1. 1. 1. 1.] [ 1. 1. 1. 3.] 1.33333
[-1. -1. 1. 1.] [ 1. 1. 1. 3.] 2.0
[-1. -1. -1. 1.] [ 1. 1. 1. 3.] 4.0
[-1. -1. -1. -1.] [ 1. 1. 1. 3.] nan
To prevent nan from showing up, follow the instructions here. The following assumes you want the masked value (background) to be equal to zero:
# Copied almost character-by-character (only change is default mask_value=0)
# from https://github.com/keras-team/keras/issues/7065#issuecomment-394401137
def masked_mse(mask_value=0):
"""
Made default mask_value=0; not sure this is necessary/helpful
"""
def f(y_true, y_pred):
mask_true = K.cast(K.not_equal(y_true, mask_value), K.floatx())
masked_squared_error = K.square(mask_true * (y_true - y_pred))
# in case mask_true is 0 everywhere, the error would be nan, therefore divide by at least 1
# this doesn't change anything as where sum(mask_true)==0, sum(masked_squared_error)==0 as well
masked_mse = K.sum(masked_squared_error, axis=-1) / K.maximum(K.sum(mask_true, axis=-1), 1)
return masked_mse
f.__name__ = str('Masked MSE (mask_value={})'.format(mask_value))
return f
I was reading about TfidfVectorizer implementation of scikit-learn, i don´t understand what´s the output of the method, for example:
new_docs = ['He watches basketball and baseball', 'Julie likes to play basketball', 'Jane loves to play baseball']
new_term_freq_matrix = tfidf_vectorizer.transform(new_docs)
print tfidf_vectorizer.vocabulary_
print new_term_freq_matrix.todense()
output:
{u'me': 8, u'basketball': 1, u'julie': 4, u'baseball': 0, u'likes': 5, u'loves': 7, u'jane': 3, u'linda': 6, u'more': 9, u'than': 10, u'he': 2}
[[ 0.57735027 0.57735027 0.57735027 0. 0. 0. 0.
0. 0. 0. 0. ]
[ 0. 0.68091856 0. 0. 0.51785612 0.51785612
0. 0. 0. 0. 0. ]
[ 0.62276601 0. 0. 0.62276601 0. 0. 0.
0.4736296 0. 0. 0. ]]
What is?(e.g.: u'me': 8 ):
{u'me': 8, u'basketball': 1, u'julie': 4, u'baseball': 0, u'likes': 5, u'loves': 7, u'jane': 3, u'linda': 6, u'more': 9, u'than': 10, u'he': 2}
is this a matrix or just a vector?, i can´t understand what´s telling me the output:
[[ 0.57735027 0.57735027 0.57735027 0. 0. 0. 0.
0. 0. 0. 0. ]
[ 0. 0.68091856 0. 0. 0.51785612 0.51785612
0. 0. 0. 0. 0. ]
[ 0.62276601 0. 0. 0.62276601 0. 0. 0.
0.4736296 0. 0. 0. ]]
Could anybody explain me in more detail these outputs?
Thanks!
TfidfVectorizer - Transforms text to feature vectors that can be used as input to estimator.
vocabulary_ Is a dictionary that converts each token (word) to feature index in the matrix, each unique token gets a feature index.
What is?(e.g.: u'me': 8 )
It tells you that the token 'me' is represented as feature number 8 in the output matrix.
is this a matrix or just a vector?
Each sentence is a vector, the sentences you've entered are matrix with 3 vectors.
In each vector the numbers (weights) represent features tf-idf score.
For example:
'julie': 4 --> Tells you that the in each sentence 'Julie' appears you will have non-zero (tf-idf) weight. As you can see in the 2'nd vector:
[ 0. 0.68091856 0. 0. 0.51785612 0.51785612
0. 0. 0. 0. 0. ]
The 5'th element scored 0.51785612 - the tf-idf score for 'Julie'.
For more info about Tf-Idf scoring read here: http://en.wikipedia.org/wiki/Tf%E2%80%93idf
So tf-idf creates a set of its own vocabulary from the entire set of documents. Which is seen in first line of output. (for better understanding I have sorted it)
{u'baseball': 0, u'basketball': 1, u'he': 2, u'jane': 3, u'julie': 4, u'likes': 5, u'linda': 6, u'loves': 7, u'me': 8, u'more': 9, u'than': 10, }
And when the document is parsed to get its tf-idf. Document:
He watches basketball and baseball
and its output,
[ 0.57735027 0.57735027 0.57735027 0. 0. 0. 0.
0. 0. 0. 0. ]
is equivalent to,
[baseball basketball he jane julie likes linda loves me more than]
Since our document has only these words: baseball, basketball, he, from the vocabulary created. The document vector output has values of tf-idf for only these three words and in the same sorted vocabulary position.
tf-idf is used to classify documents, ranking in search engine. tf: term frequency(count of the words present in document from its own vocabulary), idf: inverse document frequency(importance of the word to each document).
The method addresses the fact that all words should not be weighted equally, using the weights to indicate the words that are most unique to the document, and best used to characterize it.
new_docs = ['basketball baseball', 'basketball baseball', 'basketball baseball']
new_term_freq_matrix = vectorizer.fit_transform(new_docs)
print (vectorizer.vocabulary_)
print ((new_term_freq_matrix.todense()))
{'basketball': 1, 'baseball': 0}
[[ 0.70710678 0.70710678]
[ 0.70710678 0.70710678]
[ 0.70710678 0.70710678]]
new_docs = ['basketball baseball', 'basketball basketball', 'basketball basketball']
new_term_freq_matrix = vectorizer.fit_transform(new_docs)
print (vectorizer.vocabulary_)
print ((new_term_freq_matrix.todense()))
{'basketball': 1, 'baseball': 0}
[[ 0.861037 0.50854232]
[ 0. 1. ]
[ 0. 1. ]]
new_docs = ['basketball basketball baseball', 'basketball basketball', 'basketball
basketball']
new_term_freq_matrix = vectorizer.fit_transform(new_docs)
print (vectorizer.vocabulary_)
print ((new_term_freq_matrix.todense()))
{'basketball': 1, 'baseball': 0}
[[ 0.64612892 0.76322829]
[ 0. 1. ]
[ 0. 1. ]]