Why PyTorch MultiheadAttention is considered as activation function? - machine-learning

When scrolling all activation functions available on PyTorch package (here) I found that nn.MultiheadAttention is described there. Can you please explain why it's considered activation function? Maybe I understand something wrong, but Multihead Attention have it's own learnable weights, so it seems to be more suitable for Layers, and not activation functions. Can you please correct me, and give some insights that I'm not getting.
Thank you!

Related

Is there a way to read/print the activations of the hidden layers of a Neural Network in Stable Baselines?

A central requirement for the project I am working on is being able to read the activations of the neurons in the hidden layers of the PPO2 models that I trained using the Stable Baselines library.
Here is a very related question. I would like to print them as demonstrated here.
The closest I came to this is by doing this:
print(model.get_parameters())
This only prints the weights and biases but not the activations at prediction. I tried to edit the files of the Stable Baselines library but to no avail.
I have also tried
print(model.policy)
and this returns <class 'stable_baselines.common.policies.MlpPolicy'> as this only refers to the type of policy I am using. If there is no way to do this effectively, would it be easy to migrate my simple environment and train with another library? Would appreciate any help/suggestions I can get.
I think the easiest way, which does not require from you to override pytorch's methods, is to attach a forward hook. I think I found here the code exactly for your problem. However you still need to define a custom policy, where you paste all that hook attachment, but that should be not so hard in your case.

Binary Classification Task on Very Similar Patterns

I'm trying to do a binary classification task on a set of sentences which are so similar to each other. My problem is I'm not sure how to deal with this problem with such similarity between samples. Here are some of my questions:
(1). Which classification technique will be more suitable in this case?
(2). Will feature selection help in this case?
(3). Could sequence classification algorithms, based on recurrent neural network (LSTM) be a potential approach to follow?
I'll be glad to see any hint or help regarding to this problem, thank you!
(only a potential Answer to 3)
Assuming you only have to classify if they are in a certain category you wouldn't want to use RNN's unless you actually want it to make something new out of it (sequence-to-sequence)
That said it is possible to classify it if you end it with a sequence-flattener and a fully-connected-Layer

How to debug win implementing a machine learning model?

For example, when I implement a svm and it doesn't work well. The problem is I made a wrong choice of alpha when implementing smo algorithm or I got the KKT function wrong. But how can I know what the problem is?
Thanks a lot.
In general, cross - validation is used to make sure that your model performs correctly.

How to choose the right normalization method for the right dataset?

There are several normalization methods to choose from. L1/L2 norm, z-score, min-max. Can anyone give some insights as to how to choose the proper normalization method for a dataset?
I didn't pay too much attention to normalization before, but I just got a small project where it's performance has been heavily affected not by parameters or choices of the ML algorithm but by the way I normalized the data. Kind of surprise to me. But this may be a common problem in practice. So, could anyone provide some good advice? Thanks a lot!

Is it possible to see the current iteration number in OpenCV's cvKmeans2?

I'm trying to cluster a really large dataset - 3030764x162 into 4000 clusters using the cvKmeans2 function in OpenCV 2.1.
I would like to see which iteration the K-means algorithm is currently in (similar to what is displayed in Matlab), but I don't see any documentation that points to how I can do this.
It's kind of frustrating seeing a blank screen and not knowing when the code is going to terminate!
Thank you.
Unfortunate as it seems, the answer is No, you cannot. There are no debugging/informative statements anywhere in the kmeans function as provided by OpenCV. However, you may edit and add statements to the method as you deem appropriate.
#Sau,
May be you need some other way of doing it. Though my answer is not relevant to OpenCV.
I have not tried in OpenCV, I had once done KMeans clustering for a extremely large data set and it was more a option better than OpenCV as it worked in a distributed mode. Though very lengthy, but still you might be interested. Its Kmeans clustering using Mahout
Check it out

Resources