I am using kmeans clustering technique from a video but i do not understand why we use .fit method in kmeans clustering?
kmeans = KMeans(n_clusters=5, random_state=0)
kmeans.fit(X) //why we use this fit method here
kmeans is your defined model.
To train our model , we use kmeans.fit() here.
The argument in
kmeans.fit(argument)
is our data set that need to be Clustered.
After using the
fit() function
our model is ready.
And we get labels for that clusters using
data_labels = kmeans.labels_
Because the sklearn people early on decided that everything should have fit(X, y) and predict(X) functions. And it likely is not going to change, because of backwards compatibility...
It does not make a whole lot of sense for clustering, which does not use y (which defaults to None as it is ignored). And there is no real use case where you would want to drop-in replace a classifier with a clustering, either.
Nevertheless, you'll at some point need to run the algorithm. It is an anti-pattern to do this in a constructor (so KMeans(n_clusters=5, data=X) is a no-no), so you will have to invoke some method. You may as well call it fit then, which fits at least for optimization based methods such as k-means.
You could, however, simply use the method k_means(X, n_clusters=5) instead of using the class. Then it would be a single line (see the source code of fit for an example).
Related
I'm a newbie when in comes to ML and neural nets, have been studying mainly online through coursera videos and a bit of kaggle/github. All the examples or cases where I've seen neural networks being applied have one thing in common - they use a specific type of activation function in all the nodes pertaining to a specific layer.
From what I understand, each node uses non-linear activation functions to learn about a particular pattern in the data. If it were so, why not use multiple types of activation functions?
I did find one link, which basically says that it's easier to manage a network if we use just one activation function per layer. Any other benefits?
Purpose of an activation function is to introduce non-linearity to a neural network. See this answer for more insight on why our deep neural networks would not actually be deep without non-linearity.
Activation functions do their job by controlling the outputs of the neurons. Sometimes they provide a simple threshold like ReLU does, which can be coded as following:
if input > 0:
return input
else:
return 0
And some other times they behave in more complicated ways such as tanh(x) or sigmoid(x). See this answer for more on different sorts of activations.
I also would like to add that I agree with #Joe, an activation function does not learn a particular pattern, it effects the way that a neural network learns multiple patterns. Each activation function have its own kind of effect on the output.
Thus, one benefit of not using multiple activation functions in a single layer would be predictability of their effect. We know what ReLU or Sigmoid does to the output of a convolutional filter for example. But do we now the effect of their cascaded use? In which order btw, does ReLU come first, or is it better for us to use Sigmoid first? Does it matter?
If we want to benefit from the combination of activation functions, all of these questions (and maybe many more) need to be answer with scientific evidences. Tedious experiments and evaluations should be done to get some meaningful results. Only then we would now what does it mean to use them together and after that, maybe a new type of activation function will arise and there will be a new name for it.
I have a binary classification problem I'm trying to tackle in Keras. To start, I was following the usual MNIST example, using softmax as the activation function in my output layer.
However, in my problem, the 2 classes are highly unbalanced (1 appears ~10 times more often than the other). And what's even more critical, they are non-symmetrical in the way they may be mistaken.
Mistaking an A for a B is way less severe than mistaking a B for an A. Just like a caveman trying to classify animals into pets and predators: mistaking a pet for a predator is no big deal, but the other way round will be lethal.
So my question is: how would I model something like this with Keras?
thanks a lot
A non-exhaustive list of things you could do:
Generate a balanced data set using data augmentations. If the data are images, you can add image augmentations in a custom data generator that will output balanced amounts of data from each class per batch and save the results to a new data set. If the data are tabular, you can use a library like imbalanced-learn to perform over/under sampling.
As #Daniel said you can use class_weights during training (in the fit method) in a way that mistakes on important class are penalized more. See this tutorial: Classification on imbalanced data. The same idea can be implemented with a custom loss function with/without class_weights during training.
I have a problem which involves optimization of actions over time:
Lets assume I have a set of input variables X where each X_i_t has a
value at each point in time t = 0 ... T.
For each point in time, I would like to choose an action a_t of a set of
actions A,
such that a utility function U(a0, ..., a_T) is maximized.
Note, the utility function does not have a closed-form solution and its value depends on the entire sequence of actions a_0 ... a_T.
How would I implement something like this? I am perfectly happy with a keyword I can use to look up relevant literature. I do not need a full solution. - Though if somebody can point me to a python sklearn function which does this, I would definitly not say no...
My first intuition was "logistic regression" but there is no way to assign "correct labels" to an action a_t at time t, since the utility depends on the actions taken earlier and later in the time series.
If you plan to use neural networks with TensorFlow or Pytorch, it will be easy. As long as you can express the function U within the framework and the utility function is reasonably close to being continuous, you can back-propagate the utility to the network. You just ask the optimizer to maximize the utility and that's it.
If the utility function is discrete, it gets tricky, but there are several tricks you might try. One of them is the REINFORCE algorithm (Monte-Carlo policy gradient). Another trick that is getting quite popular is Gubmle softmax that allows sampling of discrete action and propagating the error to the network.
If you plan to use different classifiers (like decision forests or whatever), you might try something based on imitation learning like the SEARN algorithim.
How train_on_batch() is different from fit()? What are the cases when we should use train_on_batch()?
For this question, it's a simple answer from the primary author:
With fit_generator, you can use a generator for the validation data as
well. In general, I would recommend using fit_generator, but using
train_on_batch works fine too. These methods only exist for the sake of
convenience in different use cases, there is no "correct" method.
train_on_batch allows you to expressly update weights based on a collection of samples you provide, without regard to any fixed batch size. You would use this in cases when that is what you want: to train on an explicit collection of samples. You could use that approach to maintain your own iteration over multiple batches of a traditional training set but allowing fit or fit_generator to iterate batches for you is likely simpler.
One case when it might be nice to use train_on_batch is for updating a pre-trained model on a single new batch of samples. Suppose you've already trained and deployed a model, and sometime later you've received a new set of training samples previously never used. You could use train_on_batch to directly update the existing model only on those samples. Other methods can do this too, but it is rather explicit to use train_on_batch for this case.
Apart from special cases like this (either where you have some pedagogical reason to maintain your own cursor across different training batches, or else for some type of semi-online training update on a special batch), it is probably better to just always use fit (for data that fits in memory) or fit_generator (for streaming batches of data as a generator).
train_on_batch() gives you greater control of the state of the LSTM, for example, when using a stateful LSTM and controlling calls to model.reset_states() is needed. You may have multi-series data and need to reset the state after each series, which you can do with train_on_batch(), but if you used .fit() then the network would be trained on all the series of data without resetting the state. There's no right or wrong, it depends on what data you're using, and how you want the network to behave.
Train_on_batch will also see a performance increase over fit and fit generator if youre using large datasets and don't have easily serializable data (like high rank numpy arrays), to write to tfrecords.
In this case you can save the arrays as numpy files and load up smaller subsets of them (traina.npy, trainb.npy etc) in memory, when the whole set won't fit in memory. You can then use tf.data.Dataset.from_tensor_slices and then using train_on_batch with your subdataset, then loading up another dataset and calling train on batch again, etc, now you've trained on your entire set and can control exactly how much and what of your dataset trains your model. You can then define your own epochs, batch sizes, etc with simple loops and functions to grab from your dataset.
Indeed #nbro answer helps, just to add few more scenarios, lets say you are training some seq to seq model or a large network with one or more encoders. We can create custom training loops using train_on_batch and use a part of our data to validate on the encoder directly without using callbacks. Writing callbacks for a complex validation process could be difficult. There are several cases where we wish to train on batch.
Regards,
Karthick
From Keras - Model training APIs:
fit: Trains the model for a fixed number of epochs (iterations on a dataset).
train_on_batch: Runs a single gradient update on a single batch of data.
We can use it in GAN when we update the discriminator and generator using a batch of our training data set at a time. I saw Jason Brownlee used train_on_batch in on his tutorials (How to Develop a 1D Generative Adversarial Network From Scratch in Keras)
Tip for quick search: Type Control+F and type in the search box the term that you want to search (train_on_batch, for example).
I'm trying to make a regression model with TensorFlow while using the sklearn implementation so it plays nicely with all the other models I've made. However I cannot seem to find a way to train the model with a custom score function (cost function or objective function).
Is this simply impossible with skflow?
Thanks loads!
Many of the examples uses learn.models.logistic_regression, which is basically a built-in high-level model that returns predictions and losses. For example, models.logistic_regression uses ops.losses_ops.softmax_classifier, which means you can look into how ops.losses_ops.softmax_classifier is implemented and implement your own loss function using perhaps TensorFlow low-level APIs.