What are the differences between adapter tuning and prefix tuning? [closed] - machine-learning

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 months ago.
Improve this question
I am trying to understand the concept of adapter-tuning, prompt-tuning, and prefix-tuning in the context of few-shot learning.
It appears to me that I can apply prompt tuning to a black box language model.
I read for prompt tuning the entire pre-trained language model is frozen. If that's the case prompt tuning could be applied for an OpenAI model like gpt-3 and Codex.
How could I do prompt tuning with OpenAI Codex? I don't find any way so far.
How these techniques are different than in-context example that could be given by few-shot learning.
Can anyone please guide me in the correct direction?

In my understanding all three concepts mentioned are based on a pre-trained model so in general should work with the GPT model that is molded within OpenAI Codex.
Adapter-tuning involves adding small, task-specific "adapter" modules to the pre-trained model, which can be trained on a few examples to improve performance on the specific task. This is especially interesting in case you want to do task adaptation in my opinion.
The idea is to horizontally extend the model by additional layers. You are touching theta.
Prompt-tuning involves providing the model with a few examples of the desired output, along with a prompt indicating the task that the model should perform. You can also read up on this looking for cues or priors. Intuitively this can be understood in guiding the model explicitly.
The idea is to add prior knowledge through the input. You are touching x.
Prefix-tuning involves providing the model with a few examples of text inputs, along with a prefix that indicates the task that the model should perform. In my understanding this is basically prompt tuning but focusses on the specifics of natural language processing.
The idea is to add prior knowledge through the input. You are touching x.
In their paper on OpenAI Codex they explain how they did fine-tune and adapt their GPT model to the GitHub Data they use for copilot. Read it here.
And this is an open source project which tries to replicate OpenAI Codex - gets pretty close to what you are trying to do, if I understood your comment correctly.

These are alternatives to fine-tuning model. They are essentially solutions that reside between few-shot learning and complete fine-tuning of models.
The other answer in this SO post is completely wrong. Fine-tuning has nothing to do with neither prompt tuning nor prefix tuning. These two are completely different techniques than fine-tuning.
Correct reference to prompt tuning and prefix tuning are given below:
Prompt Tuning: For prompt tuning k learnable parameter i.e. continuous token embeddings is appended to the input. But the entire pre-trained language model is frozen.
Prefix Tuning: For k positions prepended to the input, concatenate additional learnable weights for keys and values at every attention layer. Different to prompt tuning (only learnable input vectors).
Papers that introduced these techniques are given below:
Prompt Tuning: https://aclanthology.org/2021.emnlp-main.243/
Prefix-Tuning: https://arxiv.org/abs/2101.00190

Related

What orders of hyperparameter tuning [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I have using Neural Network for a classification problem and I am now at the point to tune all the hyperparameters.
For now, I saw many different hyperparameters that I have to tune :
Learning rate
batch-size
number of iterations (epoch)
For now, my tuning is quite "manual" and I am not sure I am not doing everything in a proper way. Is there a special order to tune the parameters? E.g learning rate first, then batch size, then ... I am not sure that all these parameters are independent. Which ones are clearly independent and which ones are clearly not independent? Should we then tune them together? Is there any paper or article which talks about properly tuning all the parameters in a special order?
There is even more than that! E.g. the number of layers, the number of neurons per layer, which optimizer to chose, etc...
So the real work in training a neural network is actually finding the best-suited parameters.
I would say there is no clear guideline because training a machine learning algorithm, in general, is always task-specific.
You see, there are many hyperparameters to tune, and you won't have time to try out every combination of each. For many hyperparameters, you will build somewhat of intuition on what a good choice would be, but for now, a great starting point is always using what has been proven by others to work. So if you find a paper on the same or similar task you could try to use the same or similar parameters as them too.
Just to share with you some small experiences I've made:
I rarely vary the learning rate. I mostly choose the Adam optimizer and stick with it.
The batch size I try to choose as big as possible without running out of memory
number of iterations you could just set to e.g. 1000. You can always look at the current loss and decide for yourself if you can stop when the net e.g. isn't learning anymore.
Keep in mind these are in no way rules or strict guidelines. Just some ideas until you've got a better intuition yourself. The more papers you've read and more nets you've trained you will understand what to chose when better.
Hope this serves a good starting point at least.

Fine-tuning GPT-2/3 on new data [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I'm trying to wrap my head around training OpenAI's language models on new data sets. Is there anyone here with experience in that regard?
My idea is to feed either GPT-2 or 3 (I do not have API access to 3 though) with a textbook, train it on it and be able to "discuss" the content of the book with the language model afterwards. I don't think I'd have to change any of the hyperparameters, I just need more data in the model.
Is it possible??
Thanks a lot for any (also conceptual) help!
Presently GPT-3 has no way to be finetuned as we can do with GPT-2, or GPT-Neo / Neo-X. This is because the model is kept on their server and requests has to be made via API. A Hackernews post says that finetuning GPT-3 is planned or in process of construction.
Having said that, OpenAI's GPT-3 provide Answer API which you could provide with context documents (up to 200 files/1GB). The API could then be used as a way for discussion with it.
EDIT:
Open AI has recently introduced Fine Tuning beta.
https://beta.openai.com/docs/guides/fine-tuning
Thus it will be best answer to the question to follow through description on that link.
You can definitely retrain GPT-2. Are you only looking to train it for language generation purposes or do you have a specific downstream task you would like to adapt the GPT-2?
Both these tasks are possible and not too difficult. If you want to train the model for language generation i.e have it generate text on a particular topic, you can train the model exactly as it was trained during the pre-training phase. This means training it on a next-token prediction task with a cross-entropy loss function. As long as you have a dataset, and decent compute power, this is not too hard to implement.
When you say, 'discuss' the content of the book, it seems to me that you are looking for a dialogue model/chatbot. Chatbots are trained in a different way and if you are indeed looking for a dialogue model, you can look at DialoGPT and other models. They can be trained to become task-oriented dialog agents.

Using Reinforcement Learning for Classfication Problems [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
Can I use reinforcement learning on classification? Such as human activity recognition? And how?
There are two types of feedback. One is evaluative that is used in reinforcement learning method and second is instructive that is used in supervised learning mostly used for classification problems.
When supervised learning is used, the weights of the neural network are adjusted based on the information of the correct labels provided in the training dataset. So, on selecting a wrong class, the loss increases and weights are adjusted, so that for the input of that kind, this wrong class is not chosen again.
However, in reinforcement learning, the system explores all the possible actions, class labels for various inputs in this case and by evaluating the reward it decides what is right and what is wrong. It may be the case too that until it gets the correct class label it may be giving wrong class name as it is the best possible output it has found till now. So, it doesn't make use of the specific knowledge we have about the class labels, hence slows the convergence rate significantly as compared to supervised learning.
You can use reinforcement learning for classification problems but it won't be giving you any added benefit and instead slow down your convergence rate.
Short answer: Yes.
Detailed answer: yes but it's an overkill. Reinforcement learning is useful when you don't have labeled dataset to learn the correct policy, so you need to develop correct strategy based on the rewards. This also allows to backpropagate through non-differentiable blocks (which I suppose is not your case). The biggest drawback of reinforcement learning methods is that thay are typically took a VERY large amount of time to converge. So, if you possess labels, it would be a LOT more faster and easier to use regular supervised learning.
You may be able to develop an RL model that chooses which classifier to use. The gt labels being used to train the classifiers and the change in performance of those classifiers being the reward for the RL model. As others have said, it would probably take a very long time to converge, if it ever does. This idea may also require many tricks and tweaks to make it work. I would recommend searching for research papers on this topic.

Fuzzy logic vs AI vs Machine learning vs Deep learning [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
How do these four subjects differ from one another? From what I understand, they learn from numerous input data and output an estimated output. My understanding is very lacking thus me questioning these. It made no sense to me about the examples given by people such as the spam email, apple orange cat dog identification, neural network examples.
Is there a better representation of these four subjects in a more simpler example with coding to show the concept? I really appreciate that a lot.
Links to example you think that is very simple with code is more than welcome. I need something relate-able to get the code writing concept better.
Many thanks!
Fuzzy logic is a form of many-valued logic in which the truth values of variables may be any real number between 0 and 1. By contrast, in Boolean logic, the truth values of variables may only be the integer values 0 or 1. Fuzzy logic has been employed to handle the concept of partial truth, where the truth value may range between completely true and completely false.1 Furthermore, when linguistic variables are used, these degrees may be managed by specific (membership) functions.
The field of AI research defines itself as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of success at some goal. Colloquially, the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving" (known as Machine Learning).
Machine Learning by Tom Mitchell:
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.
Deep learning is machine learning with deep neural networks.
Hence: AI is a superset of Machine Learning. Machine Learning is a superset of Deep learning. AI includes fuzzy logic:
Resources
Fuzzy logic lecture notes (German)
Computerphile: Fuzzy logic
IEEE: Fuzzy logic
Tom Mitchel: Machine Learning
Michael Nielsen: Deep Learning

Neural network: should the algorithm be rewritten for every case? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have 2 sequences of numbers and I'd want to continue it using neural algorithms (there is some logic in them, but I don't know what, and there are no external factors affecting the selection). There are some relationship is in each of the two sequences separately, as well as between them.
So, I'm new to machine learning, but I've got such an idea: is there any already written-and-well-working applications (libraries) that implement exact algorithms for me not to learn them all before using. Simply like "most-frequently-used-neural-algorithms-kit".
I'm thinking of analysing some music sheets and two sequences: "notes" and "durations".
OK, according to the comments I think I got what you want.
Generally, no, you don't need to rewrite the standard algorithm of ANN. But be aware that ANN is not an algorithm, but a cluster of algorithms (including BackPropagation-ANN, Hopfield-ANN, Boltzmann Machine etc). Among them I recommend BP-ANN which is simple and suitable for your project. You might want to input a sequences of the known notes and duration, and then expect an output of the next note and duration.
To use BP-ANN, you don't need to rewrite them. Due to its a widely-used algorithm, there are many toolkits and open source implementations of it:
Google "back propagation neural network implementation", you will find it easily. There are also a few opensource projects on Github(in both C language and Matlab): https://github.com/search?q=back+propagation&type=Everything&repo=&langOverride=&start_value=1
For further reading if you also want to deeply understand the details of its implementation, read this: http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1279&context=ecetr&sei-redir=1
If you're interested in neural networks there are plenty of libraries available.
ANNIE is one such open source example, the MATLAB Neural Network toolbox is a
commercial example. These are libraries which you tell the architecture of the
neural network, you can train, test, verify, etc. The important part in all
these machine learning methods is how you represent your data, and those were
the comments you were getting (for example Predictor's). Sometimes you get
excellent results with one representation and very bad results with others.
There are also libraries to train SVMs (a specialized algorithm to train neural
networks) with quadratic regularization, LIBSVM is one great example.
There is also plenty of work on predicting time series with neural networks (if
that is what you want to do with music, I am not sure what exactly you want).
If the input is a series of (note, duration) pairs, then I suspect you'd get much farther by summarizing the historical note-to-note transitions or by something similar in an effort to capture the syntax of the music (Markov analysis, etc.), than you would by stuffing this into a neural network. It may help, too, to try representing the series as note differentials, measuring how many notes up or down the scale the new note is, rather than the actual value of the note itself.

Resources