Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
My Question is theoretical rather technical, hence I am not posting my code here because code is available at sklearn website itself.
I constructed a classifier and cross-validating with 5 folds using sklearn. In my code, I am calculating various accuracy parameter like sensitivity, specificity, f1-score, etc. in each fold. After 5 fold cross-validation averaging all the accuracy parameters after the completion of the folds.
Finally, my script creates a ROC curve along with the AUC score and histogram for other accuracy parameters and generates an HTML report file.
Cross-validation means internal testing, but when I used an external test data set confusion starts.
My question is that how should I predict external data set, which is the correct amongst the bellow method.
After cross-validation, save the model which has averaged parameters from each and every fold and use this model to predict external test set and calculate assessment report. If this is the case how can I do this? Can you show me the example code which helps me to save a model after n fold cross-validation?
Build the model using the entire data set, save the model to predict the external test set and calculate the assessment report. If this is the correct way, Thank you I know the code.
Is there any other method that I missed, please share.
Thank you.
The correct approach is
Build the model using the entire data set, save the model to predict
the external test set and calculate the assessment report
The reason being we use cross validation only to measure the performance of hyper parameters only. We do that by keeping the every fold as a test fold once which means every data point gets a fair chance of being as a test data point once.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
Can we have a separate data set for both Training and Testing.
I'm working on a project to pick up effective test case
AS part of this, i analyse the bug database and come up with triggers which have yielded bug and arrive at a model . So this bug database forms my training set.
The test cases that I have written is my test data and i have to supply this test data to the model to say if the test case is effective or not.
so in this case, instead of splitting the dataset into training and test data, i have to have two different data sets (test data from bug database) and training data (test cases generated manually)
is this something do-able using Machine Learning? Kindly let me know.
Yes, the training dataset and testing dataset can be separate files. In real world cases, the testing data is generally some separate unseen dataset.
The main principle to follow is that when training the model, a dataset must be kept separate (hold out set) for testing. This data can be provided separately in different files, databases or even generate using splits. This is done to avoid data leaks (when testing data is somehow used for training the model).
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I'm trying to wrap my head around training OpenAI's language models on new data sets. Is there anyone here with experience in that regard?
My idea is to feed either GPT-2 or 3 (I do not have API access to 3 though) with a textbook, train it on it and be able to "discuss" the content of the book with the language model afterwards. I don't think I'd have to change any of the hyperparameters, I just need more data in the model.
Is it possible??
Thanks a lot for any (also conceptual) help!
Presently GPT-3 has no way to be finetuned as we can do with GPT-2, or GPT-Neo / Neo-X. This is because the model is kept on their server and requests has to be made via API. A Hackernews post says that finetuning GPT-3 is planned or in process of construction.
Having said that, OpenAI's GPT-3 provide Answer API which you could provide with context documents (up to 200 files/1GB). The API could then be used as a way for discussion with it.
EDIT:
Open AI has recently introduced Fine Tuning beta.
https://beta.openai.com/docs/guides/fine-tuning
Thus it will be best answer to the question to follow through description on that link.
You can definitely retrain GPT-2. Are you only looking to train it for language generation purposes or do you have a specific downstream task you would like to adapt the GPT-2?
Both these tasks are possible and not too difficult. If you want to train the model for language generation i.e have it generate text on a particular topic, you can train the model exactly as it was trained during the pre-training phase. This means training it on a next-token prediction task with a cross-entropy loss function. As long as you have a dataset, and decent compute power, this is not too hard to implement.
When you say, 'discuss' the content of the book, it seems to me that you are looking for a dialogue model/chatbot. Chatbots are trained in a different way and if you are indeed looking for a dialogue model, you can look at DialoGPT and other models. They can be trained to become task-oriented dialog agents.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am working on modeling a dataset from object detection. I am relatively new to deep learning. I am having a hard time extending the idea of cross-validation in the context of deep learning. Usually, the train time is huge with deep network and k-fold CV is not a reasonable approach. So, probably 1-fold cross-validation makes more sense (I have seen people use this in practice). I am trying to reason this choice and thinking about the idea behind cross-validation: hyper-parameter tuning, or quantify when the modeling starts to over-fit. My questions are the following:
What about the random sampling error with a 1-fold CV? My thoughts: with k-fold CV this error is averaged out when k>1. Also, with k=1, the hyper-parameter also doesn't seem reasonable to me: the values we end up finding can be coupled with the (random) sample we called validation set. So, what's the point of a 1-fold CV?
There's already a crunch of data points in the data I am working with. I have around ~4k images, 2 categories (object+background), bounding boxes for each image. I think it's common wisdom that deep networks learn better with more data. Why would I want to reduce my training set by keeping aside a validation set in this context? I don't see any clear advantages. On the contrary, it seems like using the entire dataset to train can lead to a better object detection model. If this is true, then how would one know when to stop, i.e. I could keep training, without any feedback into whether the model has started overfitting?
How are production models deployed? I guess I have never thought about this one much while taking courses. The approach was pretty clear that you always have a train, validation, test set. In actual settings, how do you leverage the entire data to create a production model? (probably connected to #2, i.e. dealing with practical aspects like how much to train etc.)
Public Computer Vision datasets in the domain of Object Detection are usually large enough that this isn't an issue. How much of an issue it is in your scenario can be shown by the gap in performance between validation and test set. Cross validation with n = 1 essentially means having a fixed validation set.
You want to keep the validation set in order to tune the parameters of your model. Increasing the number of weights will surely increase the performance on the training set but you want to check how this behaves on unseen data e.g. the validation set. That said, many people will tune parameters according to the performance on the validation set and then do one more training where they combine the training and validation data before finally testing it on the test set.
I think this is already answered in 2. You can extend this by training on all 3 sets but whatever performance you achieve on it will not be representative. The number of epochs/iterations you want to train for should therefore be decided before merging the data.
You have to decide what it is you want to optimize for. Most papers optimize for performance on the test set which is why it should never be used for training or validating parameter choices. In reality you might often favour a "better" model by including validation and test data into the training. You will never know how much "better" this model is until you find another test set. You're also risking that something "strange" will happen when including the test data. You're essentially training with closed eyes.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
I have an ML model that takes X seconds to detect an object in an image on which it is trained. Does that mean it took at least X or X+Y seconds during training per image? Can you provide a detailed insight?
For instance, Assume the training speed of SSD512 model is 30 images per second on a hardware platform, Does this imply that I will be able to achieve the inference speed of at least (if not more) than 30 images per second?
The question is not confined to neural network models. A generic insight is appreciated. I am dealing with Cascade Classifiers in my case. I am loading a cascade.xml trained model to detect an object. I want to know the relation between the time taken to train an image and the time taken to detect an object after loading the trained model.
Because unstated, I assume here you mean neural network ML model.
The training process could be seen as two steps: running the network to detect the object and updating the weights to minimize the loss function.
Running the network: while training, the backpropagation part you essentially run the network as if you are detecting the object using the current network weights, which take X time as you stated. It should take the same as when used after the training, for example on the test dataset (to make things simple I am ignoring the mini-batch learning usually used, which might change things).
Updating the weights: this part in the training is done by the completing the backpropagation algorithm which tells you how changing the weights will affect your detection performance (i.e. lower the loss function for the current image) then usually a stochastic gradient descent iteration is done, which updates the weights. This is the Y you stated, which in fact could be bigger than X.
These two parts are done for every image (more commonly, for every mini-batch) in the training process.
UPDATE: You said in your response that you are looking for an answer for a generic algorithm. It is an interesting question! When looking on the training task, you always need to learn some kind of weights W that is the outcome of the training process and is the essence of what was learned. The update needs to make the learned function better, which basically sounds harder than simply running the function. I really don't know of any algorithm (certainly not the commonly used ones) that would take less training time than running time per image, but it might be theoretically possible.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I want to find the definition of "flexibility" of a method in machine learning, just like Lasso, SVM, Least Squares.
here is a representation of the tradeoff between flexibility and interpretability.
And I also think flexibility is a detailed numerical thing.
Because of my reputation, I cannot upload the pictures. If you want to know some details, you can read An Introduction to Statistical Learning, the pictures are on page 25 and page 31.
Thank you.
You can think of "Flexibility" of a model as the model's "curvy-ness" when graphing the model equation. A linear regression is said to be be inflexible. On the other hand, if you have 9 training sets that are each very different, and you require a more rigid decision boundary, the model will be deemed flexible, just because the model can't be a straight line.
Of course, there's an essential assumption that these models are adequate representations of the training data (a linear representation doesn't work well for highly spread out data, and a jagged multinomial representation doesn't work well with straight lines).
As a result, A flexible model will:
Generalize well across the different training sets
Comes at a cost of higher variance. That's why flexible models are generally associated with low bias
Perform better as complexity increases and/or # of data points increase (up to a point, where it won't perform better)
There's no rigor definition of method's flexibility. The aforementioned book says
We can try to address this problem by choosing flexible models that can fit many different possible functional forms flexible for f.
In that sense Least Squares is less flexible since it's a linear model. Kernel SVM, on contrary, doesn't have such limitation and can model fancy non-linear functions.
Flexibility isn't measured in numbers, the picture in the book shows relational data only, not actual points on a 2D-plane.
Flexibility describes the ability to increase the degrees of freedom available to the model to "fit" to the training data.