Is RMSProp compatible with online (stochastic) learning? - machine-learning

Quick question :
Is RMSProp optimizer compatible with online (stochastic, update weights every turn) learning ? All I can read of is about RMSProp being used with mini-batch or full-batch update, but none seems to explicitely state that online stochastic learning would be out of question.

Very short answer: it is. You can use it with SGD. Example: http://www.erogol.com/comparison-sgd-vs-momentum-vs-rmsprop-vs-momentumrmsprop/

Related

When is a task considered few-shot learning?

When reading about few-shot learning, I can never seem to find an exact definition. When the concept is explained, it is often done by saying something along the lines of 'using few data samples'.
Is there a precise definition of few-shot learning, or when a task is considered few-shot learning? When the term 'N-way-K-shot learning' is used, are there any boundaries on which values N and K can have?
The idea behind few-shot learning is to train a classifier using only a small number of labelled samples. More precisely, given N classes and k samples, the aim is to use samples per class where m << k to train the classifier. Few-shot learning is a popular option when the number of available labelled samples is limited.
However, when it comes to implementing few-shot learning, one popular method is to use contrastive learning to fine-tune an existing model to learn the similarities between samples belonging to the same class.
SetFit is a few-shot learning framework that uses contrastive learning for fine-tuning sentence transformers to be used for text classification. I suggest you read SetFit's paper (available on Axriv; here: https://arxiv.org/abs/2209.11055). I believe it has the technical details you need to answer your question. Moreover, SetFit's implementation is available on GitHub (here: https://github.com/huggingface/setfit).
I hope it helped.

Could i use Adaboost to solve the linear regression problem?

Two months ago,i have learn the adaboost and I am surprised at it strength,so i have a question,could it can be used to solve the relationship between Tea polyphenols and spectrum? In many papers they use linear regression to predict tea polyphenols through spectral data.So can i use adaboost to solve this problem?
(I hope I made it clear)
#think_maths already gave you a working practical solution, so let me give you a bit of intuition
If you look at the algorithm, It's pretty simple. The job of Adaboost is to give proper weights to the observations and classifiers/regressors so that the predictions for unusual observations become better. In the picture, function G(x) is any machine learning model of your choice, It could be Linear Regression as well.
You could read some paper if you want to learn deeper about it -
AdaBoost.RT: A boosting algorithm for regression problems.
also this thread -
Can AdaBoost be used for regression?

Why is RMSProp considered "leaky"?

decay_rate = 0.99 # decay factor for RMSProp leaky sum of grad^2
I'm perplexed by the wording of comments like the above where they talk about a "leaky" sum of squares for the RMSProp optimizer. So far I've been able to uncover that this particular line is copy-pasta'd from Andrej Karpathy's Deep Reinforcement Learning: Pong from Pixels, and that RMSProp is an unpublished optimizer proposed by Hinton in one of his Coursera Classes. Looking at the math for RMSProp from link 2, it's hard to figure out how any of this is "leaky."
Would anyone happen to know why RMSProp is described this way?
RMsprop keeps the exponentialy decaying average of squared gradients. Wording (however unfortunate) of "leaky" refers to the fact how much of the previous estimate "leaks" to the current one, since
E[g^2]_t := 0.99 E[g^2]_{t-1} + 0.01 g^2_t
\_______________/ \________/
"leaking" new data

learning rate decay in LSTM

I am currently reproducing the code for char-RNN described in http://karpathy.github.io/2015/05/21/rnn-effectiveness/. There are codes already implemented in tensorflow and the code I am referring to is at https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/train.py I am having a question for the learning rate decay.In the code the optimizer is defined as an AdamOptimizer. When I went through the code, I saw a line as following:
for e in range(args.num_epochs):
sess.run(tf.assign(model.lr, args.learning_rate * (args.decay_rate ** e)))
which adjusts the learning rate by a decay constant.
My question is: isn't Adam optimizer making us able to control the learning rate? Why do we still use a decay rate with respect to learning rate here?
I think you mean RMSprop and not Adam, both of the codes you linked use RMSprop. RMSprop only scales gradients to not have too large or too small norms. So, it is important to decay the learning rate when we have to slow down training after several epochs.

Does Andrew Ng's ANN from Coursera use SGD or batch learning?

What type of learning is Andrew Ng using in his neural network excercise on Coursera? Is it stochastic gradient descent or batch learning?
I'm a little confused right now...
If you are talking about session-based course (which I have passed previously): https://www.coursera.org/learn/machine-learning
than it uses a batch-learning approach, in exercise 4 (which covers ANN). If you carefully study the cost function you will see that it is calculated using all of available examples, not just one randomly chosen.

Resources