Understanding of Hypothesis testing and further actions - machine-learning

I have few basic doubts regarding Hypothesis testing,
I know Hypothesis Testing is a statistical test, for a sample of data stands true for the entire population or not. That is, if a random sample's mean is same as that of the population mean. Here, we try to accept or reject NULL hypothesis by various tests like Z-Test/ T-Test / ANOVA / Chi-Square Test.
What we do after accepting or rejecting NULL hypothesis?
Do we exclude/include that sample from further process if we are building a machine learning model?
What are the significance of accepting NULL Hypothesis?
What are the significance of accepting Alternate Hypothesis?
Or is there any other insights we make with these tests?
I would like to know these in the perspective of machine learning for model building.
Kindly share your thoughts.

Related

What orders of hyperparameter tuning [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I have using Neural Network for a classification problem and I am now at the point to tune all the hyperparameters.
For now, I saw many different hyperparameters that I have to tune :
Learning rate
batch-size
number of iterations (epoch)
For now, my tuning is quite "manual" and I am not sure I am not doing everything in a proper way. Is there a special order to tune the parameters? E.g learning rate first, then batch size, then ... I am not sure that all these parameters are independent. Which ones are clearly independent and which ones are clearly not independent? Should we then tune them together? Is there any paper or article which talks about properly tuning all the parameters in a special order?
There is even more than that! E.g. the number of layers, the number of neurons per layer, which optimizer to chose, etc...
So the real work in training a neural network is actually finding the best-suited parameters.
I would say there is no clear guideline because training a machine learning algorithm, in general, is always task-specific.
You see, there are many hyperparameters to tune, and you won't have time to try out every combination of each. For many hyperparameters, you will build somewhat of intuition on what a good choice would be, but for now, a great starting point is always using what has been proven by others to work. So if you find a paper on the same or similar task you could try to use the same or similar parameters as them too.
Just to share with you some small experiences I've made:
I rarely vary the learning rate. I mostly choose the Adam optimizer and stick with it.
The batch size I try to choose as big as possible without running out of memory
number of iterations you could just set to e.g. 1000. You can always look at the current loss and decide for yourself if you can stop when the net e.g. isn't learning anymore.
Keep in mind these are in no way rules or strict guidelines. Just some ideas until you've got a better intuition yourself. The more papers you've read and more nets you've trained you will understand what to chose when better.
Hope this serves a good starting point at least.

Can Training dataset and testing data set be seperate instead of split [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
Can we have a separate data set for both Training and Testing.
I'm working on a project to pick up effective test case
AS part of this, i analyse the bug database and come up with triggers which have yielded bug and arrive at a model . So this bug database forms my training set.
The test cases that I have written is my test data and i have to supply this test data to the model to say if the test case is effective or not.
so in this case, instead of splitting the dataset into training and test data, i have to have two different data sets (test data from bug database) and training data (test cases generated manually)
is this something do-able using Machine Learning? Kindly let me know.
Yes, the training dataset and testing dataset can be separate files. In real world cases, the testing data is generally some separate unseen dataset.
The main principle to follow is that when training the model, a dataset must be kept separate (hold out set) for testing. This data can be provided separately in different files, databases or even generate using splits. This is done to avoid data leaks (when testing data is somehow used for training the model).

Fine-tuning GPT-2/3 on new data [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I'm trying to wrap my head around training OpenAI's language models on new data sets. Is there anyone here with experience in that regard?
My idea is to feed either GPT-2 or 3 (I do not have API access to 3 though) with a textbook, train it on it and be able to "discuss" the content of the book with the language model afterwards. I don't think I'd have to change any of the hyperparameters, I just need more data in the model.
Is it possible??
Thanks a lot for any (also conceptual) help!
Presently GPT-3 has no way to be finetuned as we can do with GPT-2, or GPT-Neo / Neo-X. This is because the model is kept on their server and requests has to be made via API. A Hackernews post says that finetuning GPT-3 is planned or in process of construction.
Having said that, OpenAI's GPT-3 provide Answer API which you could provide with context documents (up to 200 files/1GB). The API could then be used as a way for discussion with it.
EDIT:
Open AI has recently introduced Fine Tuning beta.
https://beta.openai.com/docs/guides/fine-tuning
Thus it will be best answer to the question to follow through description on that link.
You can definitely retrain GPT-2. Are you only looking to train it for language generation purposes or do you have a specific downstream task you would like to adapt the GPT-2?
Both these tasks are possible and not too difficult. If you want to train the model for language generation i.e have it generate text on a particular topic, you can train the model exactly as it was trained during the pre-training phase. This means training it on a next-token prediction task with a cross-entropy loss function. As long as you have a dataset, and decent compute power, this is not too hard to implement.
When you say, 'discuss' the content of the book, it seems to me that you are looking for a dialogue model/chatbot. Chatbots are trained in a different way and if you are indeed looking for a dialogue model, you can look at DialoGPT and other models. They can be trained to become task-oriented dialog agents.

Cross validation in the context of deep learning | Object Detection [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am working on modeling a dataset from object detection. I am relatively new to deep learning. I am having a hard time extending the idea of cross-validation in the context of deep learning. Usually, the train time is huge with deep network and k-fold CV is not a reasonable approach. So, probably 1-fold cross-validation makes more sense (I have seen people use this in practice). I am trying to reason this choice and thinking about the idea behind cross-validation: hyper-parameter tuning, or quantify when the modeling starts to over-fit. My questions are the following:
What about the random sampling error with a 1-fold CV? My thoughts: with k-fold CV this error is averaged out when k>1. Also, with k=1, the hyper-parameter also doesn't seem reasonable to me: the values we end up finding can be coupled with the (random) sample we called validation set. So, what's the point of a 1-fold CV?
There's already a crunch of data points in the data I am working with. I have around ~4k images, 2 categories (object+background), bounding boxes for each image. I think it's common wisdom that deep networks learn better with more data. Why would I want to reduce my training set by keeping aside a validation set in this context? I don't see any clear advantages. On the contrary, it seems like using the entire dataset to train can lead to a better object detection model. If this is true, then how would one know when to stop, i.e. I could keep training, without any feedback into whether the model has started overfitting?
How are production models deployed? I guess I have never thought about this one much while taking courses. The approach was pretty clear that you always have a train, validation, test set. In actual settings, how do you leverage the entire data to create a production model? (probably connected to #2, i.e. dealing with practical aspects like how much to train etc.)
Public Computer Vision datasets in the domain of Object Detection are usually large enough that this isn't an issue. How much of an issue it is in your scenario can be shown by the gap in performance between validation and test set. Cross validation with n = 1 essentially means having a fixed validation set.
You want to keep the validation set in order to tune the parameters of your model. Increasing the number of weights will surely increase the performance on the training set but you want to check how this behaves on unseen data e.g. the validation set. That said, many people will tune parameters according to the performance on the validation set and then do one more training where they combine the training and validation data before finally testing it on the test set.
I think this is already answered in 2. You can extend this by training on all 3 sets but whatever performance you achieve on it will not be representative. The number of epochs/iterations you want to train for should therefore be decided before merging the data.
You have to decide what it is you want to optimize for. Most papers optimize for performance on the test set which is why it should never be used for training or validating parameter choices. In reality you might often favour a "better" model by including validation and test data into the training. You will never know how much "better" this model is until you find another test set. You're also risking that something "strange" will happen when including the test data. You're essentially training with closed eyes.

Fuzzy logic vs AI vs Machine learning vs Deep learning [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
How do these four subjects differ from one another? From what I understand, they learn from numerous input data and output an estimated output. My understanding is very lacking thus me questioning these. It made no sense to me about the examples given by people such as the spam email, apple orange cat dog identification, neural network examples.
Is there a better representation of these four subjects in a more simpler example with coding to show the concept? I really appreciate that a lot.
Links to example you think that is very simple with code is more than welcome. I need something relate-able to get the code writing concept better.
Many thanks!
Fuzzy logic is a form of many-valued logic in which the truth values of variables may be any real number between 0 and 1. By contrast, in Boolean logic, the truth values of variables may only be the integer values 0 or 1. Fuzzy logic has been employed to handle the concept of partial truth, where the truth value may range between completely true and completely false.1 Furthermore, when linguistic variables are used, these degrees may be managed by specific (membership) functions.
The field of AI research defines itself as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of success at some goal. Colloquially, the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving" (known as Machine Learning).
Machine Learning by Tom Mitchell:
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.
Deep learning is machine learning with deep neural networks.
Hence: AI is a superset of Machine Learning. Machine Learning is a superset of Deep learning. AI includes fuzzy logic:
Resources
Fuzzy logic lecture notes (German)
Computerphile: Fuzzy logic
IEEE: Fuzzy logic
Tom Mitchel: Machine Learning
Michael Nielsen: Deep Learning

Resources