Fine-tuning GPT-2/3 on new data [closed] - machine-learning

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I'm trying to wrap my head around training OpenAI's language models on new data sets. Is there anyone here with experience in that regard?
My idea is to feed either GPT-2 or 3 (I do not have API access to 3 though) with a textbook, train it on it and be able to "discuss" the content of the book with the language model afterwards. I don't think I'd have to change any of the hyperparameters, I just need more data in the model.
Is it possible??
Thanks a lot for any (also conceptual) help!

Presently GPT-3 has no way to be finetuned as we can do with GPT-2, or GPT-Neo / Neo-X. This is because the model is kept on their server and requests has to be made via API. A Hackernews post says that finetuning GPT-3 is planned or in process of construction.
Having said that, OpenAI's GPT-3 provide Answer API which you could provide with context documents (up to 200 files/1GB). The API could then be used as a way for discussion with it.
EDIT:
Open AI has recently introduced Fine Tuning beta.
https://beta.openai.com/docs/guides/fine-tuning
Thus it will be best answer to the question to follow through description on that link.

You can definitely retrain GPT-2. Are you only looking to train it for language generation purposes or do you have a specific downstream task you would like to adapt the GPT-2?
Both these tasks are possible and not too difficult. If you want to train the model for language generation i.e have it generate text on a particular topic, you can train the model exactly as it was trained during the pre-training phase. This means training it on a next-token prediction task with a cross-entropy loss function. As long as you have a dataset, and decent compute power, this is not too hard to implement.
When you say, 'discuss' the content of the book, it seems to me that you are looking for a dialogue model/chatbot. Chatbots are trained in a different way and if you are indeed looking for a dialogue model, you can look at DialoGPT and other models. They can be trained to become task-oriented dialog agents.

Related

What are the differences between adapter tuning and prefix tuning? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 months ago.
Improve this question
I am trying to understand the concept of adapter-tuning, prompt-tuning, and prefix-tuning in the context of few-shot learning.
It appears to me that I can apply prompt tuning to a black box language model.
I read for prompt tuning the entire pre-trained language model is frozen. If that's the case prompt tuning could be applied for an OpenAI model like gpt-3 and Codex.
How could I do prompt tuning with OpenAI Codex? I don't find any way so far.
How these techniques are different than in-context example that could be given by few-shot learning.
Can anyone please guide me in the correct direction?
In my understanding all three concepts mentioned are based on a pre-trained model so in general should work with the GPT model that is molded within OpenAI Codex.
Adapter-tuning involves adding small, task-specific "adapter" modules to the pre-trained model, which can be trained on a few examples to improve performance on the specific task. This is especially interesting in case you want to do task adaptation in my opinion.
The idea is to horizontally extend the model by additional layers. You are touching theta.
Prompt-tuning involves providing the model with a few examples of the desired output, along with a prompt indicating the task that the model should perform. You can also read up on this looking for cues or priors. Intuitively this can be understood in guiding the model explicitly.
The idea is to add prior knowledge through the input. You are touching x.
Prefix-tuning involves providing the model with a few examples of text inputs, along with a prefix that indicates the task that the model should perform. In my understanding this is basically prompt tuning but focusses on the specifics of natural language processing.
The idea is to add prior knowledge through the input. You are touching x.
In their paper on OpenAI Codex they explain how they did fine-tune and adapt their GPT model to the GitHub Data they use for copilot. Read it here.
And this is an open source project which tries to replicate OpenAI Codex - gets pretty close to what you are trying to do, if I understood your comment correctly.
These are alternatives to fine-tuning model. They are essentially solutions that reside between few-shot learning and complete fine-tuning of models.
The other answer in this SO post is completely wrong. Fine-tuning has nothing to do with neither prompt tuning nor prefix tuning. These two are completely different techniques than fine-tuning.
Correct reference to prompt tuning and prefix tuning are given below:
Prompt Tuning: For prompt tuning k learnable parameter i.e. continuous token embeddings is appended to the input. But the entire pre-trained language model is frozen.
Prefix Tuning: For k positions prepended to the input, concatenate additional learnable weights for keys and values at every attention layer. Different to prompt tuning (only learnable input vectors).
Papers that introduced these techniques are given below:
Prompt Tuning: https://aclanthology.org/2021.emnlp-main.243/
Prefix-Tuning: https://arxiv.org/abs/2101.00190

What orders of hyperparameter tuning [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I have using Neural Network for a classification problem and I am now at the point to tune all the hyperparameters.
For now, I saw many different hyperparameters that I have to tune :
Learning rate
batch-size
number of iterations (epoch)
For now, my tuning is quite "manual" and I am not sure I am not doing everything in a proper way. Is there a special order to tune the parameters? E.g learning rate first, then batch size, then ... I am not sure that all these parameters are independent. Which ones are clearly independent and which ones are clearly not independent? Should we then tune them together? Is there any paper or article which talks about properly tuning all the parameters in a special order?
There is even more than that! E.g. the number of layers, the number of neurons per layer, which optimizer to chose, etc...
So the real work in training a neural network is actually finding the best-suited parameters.
I would say there is no clear guideline because training a machine learning algorithm, in general, is always task-specific.
You see, there are many hyperparameters to tune, and you won't have time to try out every combination of each. For many hyperparameters, you will build somewhat of intuition on what a good choice would be, but for now, a great starting point is always using what has been proven by others to work. So if you find a paper on the same or similar task you could try to use the same or similar parameters as them too.
Just to share with you some small experiences I've made:
I rarely vary the learning rate. I mostly choose the Adam optimizer and stick with it.
The batch size I try to choose as big as possible without running out of memory
number of iterations you could just set to e.g. 1000. You can always look at the current loss and decide for yourself if you can stop when the net e.g. isn't learning anymore.
Keep in mind these are in no way rules or strict guidelines. Just some ideas until you've got a better intuition yourself. The more papers you've read and more nets you've trained you will understand what to chose when better.
Hope this serves a good starting point at least.

Use machine learning to analyze sports bets [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Lets say I have database with over 1 Million bets (all kinds of sports) made by couple thousands of users, over a period of 2 years (and still growing).
These data are just lying around doing nothing, so I thought if it would be possible to use something like https://www.tensorflow.org/, do a bit of tinkering and it would analyze all the bets in database and learn from it some patterns, whats good and whats not.
The point being is we dont have resources to employ dozens of people for god knows how long to write some complicated software from the ground up. So I was thinking we could use some module from TensorFlow and go from there.
So then I would feed the network with new open bets that are currently in the system (those would be bets that are on matches that are about to be played) and it would pick for me what I should bet on, for example there is a 90% chance this bet will win, because 10 very successful players made this bet, and they have very high success when betting on this particular sport.
We have lots of experienced users, they make lots of money from betting. So the system could be trained on the data we have and then it would know, for example, if user A bets on this league/team, its very likely he will win.
The question is, where do we go from here? Can anybody point us in the right direction? Or is this just too difficult to do, for 2 people in few months? Can we use some pre-programmed solutions, like TensorFlow?
Without having a look of the data, it is impossible to suggest what direction should you take your next steps but anyway your first step should be to explore your data throughly, create model on small subset of data and test your hypothesis.
Overall you can try to:
Use Python or R to Load and Clean Data
Take a random subset of data(some 10,000 rows) and create a simple model using SVM or Random Forest looks like a classification Win/Lose.
Test your results and verify your hypothesis with some data.
Explore about your data to see if you can generate better features
Design a small neural network first and then think about a deep neural network using tensorflow or keras etc.
Have a look at this: https://hackernoon.com/how-to-create-your-own-machine-learning-predictive-system-in-the-nba-using-python-7189d964a371
Yes, this is possible but can be more difficult than it appears.
Consider Microsoft's Cortana which (while only picking if a game will win outright and not ATS) is only approx. 63% accurate; which is quite good but not exactly 90% as you mention in your question (1).
The size of your database should be great for ANN models. It would be a very interesting project for sure!
To your question "where I go from here..." my answer is to simply explore the data in RStudio or using a cloud service such as Microsoft's Azure ML Studio (2) or Amazon's Machine Learning services (3).
Good luck!
Ref. 1: http://www.businessinsider.com/nfl-picks-microsoft-cortana-elo-week-5-2017-10
Ref. 2: https://studio.azureml.net/
Ref. 3: https://aws.amazon.com/amazon-ai/

What type of neural network would work best for credit scoring? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Let me just start by saying I only took the undergrad AI class at school so I know just enough to be dangerous.
Here's the problem I'm looking to solve...accurate credit scoring is a key part to the success of my business. Currently we rely on a team of actuaries and statistical analysis to suss out patterns in the few dozen variables we track about each individual that indicate that they may be a low or high credit risk. As I understand it this is exactly the type of job that neural nets are great at solving, that is, finding high order relationships across many inputs that a human would likely never spot and then rendering a decision or output that is on average more accurate than what a trained human could do. In short, I want to be able to input your name, address, marital status, what car you drive, where you work, hair color, favorite food, etc in and get a credit score back.
My question is what type or architecture for a neural network would be best for this particular problem. I've done a bit of research and it seems I'm generating questions faster than I'm finding answers at this point. The best I've been able to come up with is some kind of generative deep neural network with multiple hidden layers where each layer is able to abstract one level beyond the previous one. Im assuming it's going to be feed-forward just because it seems to be the default. We have historical data on all previous customers including the information we used to make the initial score as well as data on what type of credit risk they actually turned out to be. This would seem to lend itself to unsupervised learning. Where I'm lost is in number of layers, how the layers are different from each other, size of each layer, connectedness of each of the perceptrons and so on. The more I dig the more I'm getting into research papers that are over my head so I just need some smart person to point me in the right direction
Does anyone have any ideas? Again, I don't need a thorough explanation just a general area I should focus on.
This is supervised learning since you have actual data that can be labelled. It's also feedforward since you're not predicting time series but assigning scores. Further, you should probably just prepare your data (assigning credit scores manually or with some rough heuristic) and start experimenting with some tools before you invest time into implementing state-of-the-art architectures. A multi-layer-perceptron (MLP) with 1 hidden layer is a sufficient starting point for such a problem. From there on, you can train the network to generalize your credit assignment heuristic you began with.
You should know that most "new" architectures you probably read about while researching are dealing with much more difficult problems than credit scoring (speech/image/character recognition/detection). There is a collection of papers on the scenario of credit scoring / risk classification, so I'd recommend reshifting your focus from architectures to actual case studies (see e.g. this paper). Just pick a recent paper with MLPs and apply their parameters. Start simple and improve the system incrementally (as #roganjosh stated).

Publicly Available Spam Filter Training Set [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm new to machine learning, and for my first project I'd like to write a naive Bayes spam filter. I was wondering if there are any publicly available training sets of labeled spam/not spam emails, preferably in plain text and not a dump of a relational database (unless they pretty-print those?).
I know such a publicly available database exists for other kinds of text classification, specifically news article text. I just haven't been able to find the same sort of thing for emails.
Here is what I was looking for: http://untroubled.org/spam/
This archive has around a gigabyte of compressed accumulated spam messages dating 1998 - 2011. Now I just need to get non-spam email. So I'll just query my own Gmail for that using the getmail program and the tutorial at mattcutts.com
Sure, there's Spambase, which is as far as i'm aware, is the most widely cited spam data set in the machine learning literature.
I have used this data set many times; each time i am impressed how much effort has been put into the formatting and documentation of this data set.
A few characteristics of the Spambase set:
4601 data points--all complete
each comprised of 58 features
(attributes)
each data point is labelled 'spam' or
'no spam'
approx. 40% are labeled spam
of the features, all are continuous
(vs. discrete)
a representative feature: average
continuous sequence of capital
letters
Spambase is archived in the UCI Machine Learning Repository; in addition, it's also available on the Website for the excellent ML/Statistical Computation Treatise, Elements of Statistical Learning by Hastie et al.
SpamAssassin has a public corpus of both spam and non-spam messages, although it hasn't been updated in a few years. Read the readme.html file to learn what's there.
You might consider taking a look at the TREC spam/ham corpus (which I think is the collection of emails from Enron that was made public from the court case). TREC generally runs a bunch of competitive text processing tasks, so it might give you some references for comparison.
The downside is that they're stored in raw mbox format, though there are parsers available in many languages (Apache Tika is a good example).
The webpage isn't TREC, but this seems to be a good overview of the task with links to the data: http://plg.uwaterloo.ca/~gvcormac/spam/
A more modern one spam training set can be found at kaggle. Moreover, you can test accuracy of your classifier on their website by uploading your results.
I have also an answer, here you can find a daily refreshed Bayesian database for initial training and also a daily created archive containing captured spams. You will find the instructions how to use it on the site.

Resources