how to split input in a Keras model - machine-learning

I'm trying to do something that's pretty simple in keras with no success. I have an input X with size (?, 1452, 1). All I want to do is split this input to a vector of 1450 and a vector of 2 and deal with them seperately in the network. I tried:
X1 = X[:, 1450:1452, :]
X2 = X[:, 0:1450, :]
and then do whatever I want. It compiles fine until it gets to the line where I create the model. I get an error saying that my Tensor object doesn't have any attribute called _keras_history even though it does. So i'm guessing keras converts X1 and X2 to a regular tensor. so, I tried using Lambda layers in the following way:
X1 = Lambda(lambda x: X[:, 0:1450, :], output_shape=(1450, 1))(X)
after using this, the network compiled and trained completely fine, and the only problem was saving the model to json/yaml. It gave me an error in the copy.deepcopy part saying:
TypeError: can't pickle _thread.lock objects. after researching online
I found out there's a problem in saving Lambda layers to json for some reason.
Question: Do you know of any way to split the input in a normal way that would allow me to save the model later? Thanks!

Related

How do I use numpy vectors as input for Trax?

I want to be able to use random numpy vectors as input for the Trax machine learning library. I find it helpful to be able to define my own simple inputs and outputs to see how the models are working. Am I doing this the right way, or am I missing something obvious and using the library totally wrong?
Colab notebook with working example
Make X matrix (Nx3) for inputs.
Make Y vector (Nx1) equal to the dot product between X and (1,0,0).
So, this makes y_i simply indicate which side of the yz plane the vector is on. The model would be learning this boundary.
def generate_xy(n):
X = np.random.normal(loc=0, scale=1, size=(n,3))
Y = np.dot(X, [1,0,0]) > 0
Y = Y.astype(int).reshape(-1,1)
return X, Y
X, Y = generate_xy(10000)
I reviewed the codebase at github and it looks like the input is supposed to be:
labeled_data: Iterator of batches of labeled data tuples. Each tuple
has
1+ data (input value) tensors followed by 1 label (target value)
tensor. All tensors are NumPy ndarrays or their JAX counterparts.
Create a very simple model:
model = tl.Serial(
tl.Dense(1),
tl.Sigmoid()
)
Define a train task: this is the part where I'm wondering if I'm doing something very wrong :)
train_task = ts.TrainTask(
labeled_data= zip(X,Y),
loss_layer=tl.CategoryCrossEntropy(),
optimizer=trax.optimizers.Adam(0.01),
n_steps_per_checkpoint=None
)
Define training loop and train model
training_loop = trax.supervised.training.Loop(model, train_task)
training_loop.run(9999) #one epoch
I'm not sure if this whole example is just contrived and way outside of what the library is intended to be used for, or if I'm just struggling to figure out the best way to handle inputs. Just looking for some guidance on best practices here. Thanks so much!

Reshaping numpy array as an input to CNN

I have seen multiple posts on reshaping numpy arrays as inputs to CNN's however, I haven't been able to successfully reshape my array as an input to my CNN!
I have a CNN that merges with another model further downstream. The input shape of the CNN is (4,4,1) -- it is bigger but i have purposefully made it smaller to establish he pipeline and get it running before i put in the proper size.
the format will be the same however, its a 1 channel n x n np.array. I am getting errors when reshaping which I will mention after the code. The input dimensions are put in to the model as follows:
cnn_branch_input = tf.keras.layers.Input(shape=(4,4,1))
cnn_branch_two = tf.keras.layers.Conv2D(etc....)(cnn_branch_input)
the np array (which is originally a pandas dataframe) characteristics and reshaping are as follows:
np.array(array).shape
(4,4)
input = np.array(array).reshape(-1,1,4,4)
input.shape
(1,1,4,4)
the input to my merged model is as follows:
model.fit([cnn_input,gnn_input, gnn_node_feat], y,
#sample_weight=train_mask,
#validation_data=validation_data,
batch_size=4,
shuffle=False)
this causes an error which makes sense to me:
ValueError: Data cardinality is ambiguous:
x sizes: 1, 4, 4 -- Please provide data which shares the same first dimension.
So now when reshaping to intentionally have a 4x4 plus 1 channel shape as follows:
input = np.array(array).reshape(-1,4,4,1)
input.shape
(1,4,4,1)
Two things, the array reshapes to 4, 1x1 arrays, so it seems the structure of the original array is lost, and I get the same error!!
Notice that in both reshape methods, the shape is either (1,4,4,1) or (1,1,4,4).. the -1 entry simply becomes a 1, making the CNN think the first element is shape 1. I thought the -1 would allow me to successfully add the sample dimension as 'any number of samples'.
Simply entering the original (4,4) array, I receive the error that the CNN received a 2 dim array while a 4 dimension array is required.
Im really confused as to how to correctly reshape this array! I would appreciate any help!

Confusion matrix subset of classes not working properly

I have searched for an answer to this question on the internet including suggestion when writing the title but still to no avail so hopefully someone can help!
I am trying to construct a confusion matrix using sci-kit learn. This comes after a keras model.
This is bizarre because i am having the following problem: For the training and test set of the original data... I can construct the confusion matrix as follows (please note this is a multi-label problem and so data has to be subset for the different labels.
The following works fine:
cm = confusion_matrix(y_train[:,0:6].argmax(axis=1), trainpred[:,0:6].argmax(axis=1))
and the 6:18 etc... until all classes have been subset. The confusion matrix that forms as a result reflects the true outcome of the keras model..
The problem arises when i deploy the model on completely unseen data.
I deploy the model by calling model.predict() and get results as above. However, now I cannot subset confusion matrices in the same way.
The code cm=confusion_matrix etc...causes the output of the CM to be the wrong dimensions, even when specifying 0:6 etc..
I therefore used the code from above used but with the labels argument modification:
age[0,1,2,3,4]
organ[5,6,7,8]
cm = confusion_matrix(y_train[:,0:6].argmax(axis=1), trainpred[:,0:6].argmax(axis=1), labels=age)
The FIRST label (1:5) works perfectly... However, the next labels do not! I dont get the right values in the confusion matrices and the matching is also incorrect for those that are in there.
To put this in to context: there are over 400 samples in the unseen test data.
model.predict shows very high classification and correct scores for most labels..
calling CM=ytest[:,4:8]etc, does indeed produce a 4x4 matrix, however there are like 5 values in there not 400, and those values that are in there are not correctly matching.
Also.. with the labels age being 012345, subsetting the ytest to 0:6 causes the correct confusion matrix to form (i am unsure as to why the 6 has to be included in the subset... nevertheless i have tried different combinations with the same issue!
I have searched high and low for this answer so would really appreciate some assistance as it is incredibly frustrating. any more code/information i can provide i will be happy to!!
Many thanks!
This is happening because you are trying to subset the generated confusion matrix, but you actually have to generate a new confusion matrix manually with the specified class labels. If you classes A, B, C you will get a 3X3 matrix. If you want to create matrix focusing only on class A, the other classes will become the false class, but the false positive and false negative will change and hence you cannot just sample the initial matrix.
This is how you show actually do it
import matplotlib.pytplot as plt
import seaborn as sns
def generate_matrix(y_true, predict, class_name):
TP, FP, FN, TN = 0, 0, 0, 0
for i in range(len(y_true)):
if y_true[i] == class_name:
if y_true[i] == predict[i]:
TP += 1
else:
FN += 1
else:
if y_true[i] == predict[i]:
TN += 1
else:
FP += 1
return np.array([[TP, FP],
[FN, TN]])
# Plot new matrix
matrix = generate_matrix(actual_labels,
predicted_labels,
class_name = 'A')
This will generate a confusion matrix for class A.

One-hot-encoded labels___multi-hot-encoded output_Keras

I have a 1D-image with 1x2048 pixels as input and 32 classes for which I have defined a layer of 32 filters with the same size of the image(1x2048) which are L1-regularized.
My image examples are one-hot encodded. However, my goal is to get a multi-hot encoded output when I sum some of these images together and feed it to the trained model.
The training goes well and it can classify each class seperately, but if I sum two image and feed it to the model it only outputs a one-hot encoded vector( although I expect a two-hot encoded vector). If I look at the kernels after training, they make sense as most of the weights are zero except the ones which define my class.
I don't understand why I get a one-hot vector output rather than multi-hot vector.
The reason I don't already sum the images and use them for training the model is that the possible making the possible combination of the images exceed my memory power.
An image of the network I have in mind
input_shape=(1,2048,1)
model = Sequential()
model.add(Conv2D(32, kernel_size=(1, 2048), strides=(1, 1),
activation='sigmoid',
input_shape=input_shape,
kernel_regularizer=keras.regularizers.l1(0.01),
kernel_constraint=keras.constraints.non_neg() ))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=optimizer,metrics=['accuracy'])
You are using the wrong loss function
categorical_crossentropy will always return you exactly one 1-value in your vector, no matter the input. It tries to classify every instance into one (and only one) available class.
What you desire, though, is (potentially) mutliple ones in your output. Therefore, you should use binary_crossentropy instead. Also see this post.
On a side note, I would heavily advice you to really consider this twice, since - if you don't really have the case with multiple classes that often, it will maybe result in a lot of false positives. I.e., cases where you get more than one class predicted.
On another note, you might want to consider using Conv1D since your signal is 1-dimensional only.
#Azerila
The thing you are looking for is Mixup augmentation. It is implemented as follows:
def mixup(entry1,entry2):
image1,label1 = entry1
image2,label2 = entry2
alpha = [0.2]
dist = tfd.Beta(alpha, alpha)
l = dist.sample(1)[0][0]
img = l*image1+(1-l)*image2
lab = l*label1+(1-l)*label2
return img, lab

Ensemble of different kinds of regressors using scikit-learn (or any other python framework)

I am trying to solve the regression task. I found out that 3 models are working nicely for different subsets of data: LassoLARS, SVR and Gradient Tree Boosting. I noticed that when I make predictions using all these 3 models and then make a table of 'true output' and outputs of my 3 models I see that each time at least one of the models is really close to the true output, though 2 others could be relatively far away.
When I compute minimal possible error (if I take prediction from 'best' predictor for each test example) I get a error which is much smaller than error of any model alone. So I thought about trying to combine predictions from these 3 diffent models into some kind of ensemble. Question is, how to do this properly? All my 3 models are build and tuned using scikit-learn, does it provide some kind of a method which could be used to pack models into ensemble? The problem here is that I don't want to just average predictions from all three models, I want to do this with weighting, where weighting should be determined based on properties of specific example.
Even if scikit-learn not provides such functionality, it would be nice if someone knows how to property address this task - of figuring out the weighting of each model for each example in data. I think that it might be done by a separate regressor built on top of all these 3 models, which will try output optimal weights for each of 3 models, but I am not sure if this is the best way of doing this.
This is a known interesting (and often painful!) problem with hierarchical predictions. A problem with training a number of predictors over the train data, then training a higher predictor over them, again using the train data - has to do with the bias-variance decomposition.
Suppose you have two predictors, one essentially an overfitting version of the other, then the former will appear over the train set to be better than latter. The combining predictor will favor the former for no true reason, just because it cannot distinguish overfitting from true high-quality prediction.
The known way of dealing with this is to prepare, for each row in the train data, for each of the predictors, a prediction for the row, based on a model not fit for this row. For the overfitting version, e.g., this won't produce a good result for the row, on average. The combining predictor will then be able to better assess a fair model for combining the lower-level predictors.
Shahar Azulay & I wrote a transformer stage for dealing with this:
class Stacker(object):
"""
A transformer applying fitting a predictor `pred` to data in a way
that will allow a higher-up predictor to build a model utilizing both this
and other predictors correctly.
The fit_transform(self, x, y) of this class will create a column matrix, whose
each row contains the prediction of `pred` fitted on other rows than this one.
This allows a higher-level predictor to correctly fit a model on this, and other
column matrices obtained from other lower-level predictors.
The fit(self, x, y) and transform(self, x_) methods, will fit `pred` on all
of `x`, and transform the output of `x_` (which is either `x` or not) using the fitted
`pred`.
Arguments:
pred: A lower-level predictor to stack.
cv_fn: Function taking `x`, and returning a cross-validation object. In `fit_transform`
th train and test indices of the object will be iterated over. For each iteration, `pred` will
be fitted to the `x` and `y` with rows corresponding to the
train indices, and the test indices of the output will be obtained
by predicting on the corresponding indices of `x`.
"""
def __init__(self, pred, cv_fn=lambda x: sklearn.cross_validation.LeaveOneOut(x.shape[0])):
self._pred, self._cv_fn = pred, cv_fn
def fit_transform(self, x, y):
x_trans = self._train_transform(x, y)
self.fit(x, y)
return x_trans
def fit(self, x, y):
"""
Same signature as any sklearn transformer.
"""
self._pred.fit(x, y)
return self
def transform(self, x):
"""
Same signature as any sklearn transformer.
"""
return self._test_transform(x)
def _train_transform(self, x, y):
x_trans = np.nan * np.ones((x.shape[0], 1))
all_te = set()
for tr, te in self._cv_fn(x):
all_te = all_te | set(te)
x_trans[te, 0] = self._pred.fit(x[tr, :], y[tr]).predict(x[te, :])
if all_te != set(range(x.shape[0])):
warnings.warn('Not all indices covered by Stacker', sklearn.exceptions.FitFailedWarning)
return x_trans
def _test_transform(self, x):
return self._pred.predict(x)
Here is an example of the improvement for the setting described in #MaximHaytovich's answer.
First, some setup:
from sklearn import linear_model
from sklearn import cross_validation
from sklearn import ensemble
from sklearn import metrics
y = np.random.randn(100)
x0 = (y + 0.1 * np.random.randn(100)).reshape((100, 1))
x1 = (y + 0.1 * np.random.randn(100)).reshape((100, 1))
x = np.zeros((100, 2))
Note that x0 and x1 are just noisy versions of y. We'll use the first 80 rows for train, and the last 20 for test.
These are the two predictors: a higher-variance gradient booster, and a linear predictor:
g = ensemble.GradientBoostingRegressor()
l = linear_model.LinearRegression()
Here is the methodology suggested in the answer:
g.fit(x0[: 80, :], y[: 80])
l.fit(x1[: 80, :], y[: 80])
x[:, 0] = g.predict(x0)
x[:, 1] = l.predict(x1)
>>> metrics.r2_score(
y[80: ],
linear_model.LinearRegression().fit(x[: 80, :], y[: 80]).predict(x[80: , :]))
0.940017788444
Now, using stacking:
x[: 80, 0] = Stacker(g).fit_transform(x0[: 80, :], y[: 80])[:, 0]
x[: 80, 1] = Stacker(l).fit_transform(x1[: 80, :], y[: 80])[:, 0]
u = linear_model.LinearRegression().fit(x[: 80, :], y[: 80])
x[80: , 0] = Stacker(g).fit(x0[: 80, :], y[: 80]).transform(x0[80:, :])
x[80: , 1] = Stacker(l).fit(x1[: 80, :], y[: 80]).transform(x1[80:, :])
>>> metrics.r2_score(
y[80: ],
u.predict(x[80:, :]))
0.992196564279
The stacking prediction does better. It realizes that the gradient booster is not that great.
Ok, after spending some time on googling 'stacking' (as mentioned by #andreas earlier) I found out how I could do the weighting in python even with scikit-learn. Consider the below:
I train a set of my regression models (as mentioned SVR, LassoLars and GradientBoostingRegressor). Then I run all of them on training data (same data which was used for training of each of these 3 regressors). I get predictions for examples with each of my algorithms and save these 3 results into pandas dataframe with columns 'predictedSVR', 'predictedLASSO' and 'predictedGBR'. And I add the final column into this datafrane which I call 'predicted' which is a real prediction value.
Then I just train a linear regression on this new dataframe:
#df - dataframe with results of 3 regressors and true output
from sklearn linear_model
stacker= linear_model.LinearRegression()
stacker.fit(df[['predictedSVR', 'predictedLASSO', 'predictedGBR']], df['predicted'])
So when I want to make a prediction for new example I just run each of my 3 regressors separately and then I do:
stacker.predict()
on outputs of my 3 regressors. And get a result.
The problem here is that I am finding optimal weights for regressors 'on average, the weights will be same for each example on which I will try to make prediction.
What you describe is called "stacking" which is not implemented in scikit-learn yet, but I think contributions would be welcome. An ensemble that just averages will be in pretty soon: https://github.com/scikit-learn/scikit-learn/pull/4161
Late response, but I wanted to add one practical point for this sort of stacked regression approach (which I use this frequently in my work).
You may want to choose an algorithm for the stacker which allows positive=True (for example, ElasticNet). I have found that, when you have one relatively stronger model, the unconstrained LinearRegression() model will often fit a larger positive coefficient to the stronger and a negative coefficient to the weaker model.
Unless you actually believe that your weaker model has negative predictive power, this is not a helpful outcome. Very similar to having high multi-colinearity between features of a regular regression model. Causes all sorts of edge effects.
This comment applies most significantly to noisy data situations. If you're aiming to get RSQ of 0.9-0.95-0.99, you'd probably want to throw out the model which was getting a negative weighting.

Resources