Determine parameter E, B and learning rate value - machine-learning

Please can anyone tell me how to choose or fix these parameters :E (numb_epoch), B (batch_size) and local learning rate, to train model with image dataset?
How can we determine these parameters?
Thank you for your help

Batch is just as No. of image process on single instant through model ,you can choose any value for batch size but always less than total no images in dataset
Generally batch uses are {8,16,24,32,64} it totally depend upon you
for epochs i recommend you max no value where model start to overwite generally it's value is
(Total No of Images)/batch size,but it not fixed you have find by manually and stop where your model start overwite or their is much diff between validation accuracy and training accuracy

Related

WER for wav2vec2-base model remains as 1 throughout the whole training process

I am trying to run the wav2vec2 speech recognition model as shared in https://huggingface.co/docs/transformers/tasks/asr
This is the loss and WER during the training process, whereby the validation loss is reducing significantly, whereas the WER remains as 1.
I tried to print out the predicted and label values and this is what I got for the last 3 outputs, which results in the WER = 1.
This is the set of parameters of the model. model param.
What may actually go wrong here? Please help.. Thanks!
I have tried tuning the hyperparameters and hoping to reduce the WER.
Thank you for providing some useful information for troubleshooting.
Your loss is reducing, which shows that the model is training, however your learning rate of 0.01 is very high. Consider changing this to something like 1e-5 as shown in the example on Hugging Face.
The other thing I noticed was that all your input text is in UPPER CASE LIKE THIS. Depending on the training data used for the original model, it may not be expecting upper case text. Try lower-casing your text to see if that yields a lower WER.
Your save_steps and eval_steps are also both far too high. This is how far the model "looks backwards" to evaluate - with a count of 1 on both these parameters, the model doesn't have enough history to compare better predictions. Increase these parameters and try again.

How to increase minimum interation in BigQuery ML

I've tried out ML functions and only 2 iterations are made, I've started reading how to set more iterations but only max iterations are configurable.
Is there a way to be able to have mininum iterations?
Btw is there an augmenting feature that lets you to generate training data?
Also what numbers should we try for l1_reg and l2_reg to improve an accuracy of 56%.
To increase the number of iterations:
1- You need to set the number of iterations using max_iterations (the default is 10 so you don't need to change this for now).
2- Set min_rel_progress to a number that is less than the loss improvements between two consecutive iterations. You can set that to 0.0001.
Without seeing your data and use case it is hard for me to say what should l1_reg and l2_reg be and in general why you are getting low accuracy. My general guess is that you do not have good training data or good features.
Another option is to set early_stop to false, so that BQML will run max_iterations iterations (default is 20).
The reason the training stopped is probably because the model is not converging and the training/evaluation loss is increasing after iteration.
JiaXun Wu's answer will allow the training continues even if the model is not converging.
You can also check if you have fill in null values yourself. I haven't found documentation regarding how null values are handled by BQML, but for my models, it failed to converge using default null value fill in method.

What's a "good" value for the loss function of a DL model like yolo?

I collected ~1500 labelled data and trained with yolo v3, got a training loss of ~10, validation loss ~ 16. Obviously we can use real test data to evaluate the model performance, but I am wondering if there is a way to tell if this training loss = 10 is a "good" one? Or does it indicate I need to use more training data to see if I can push it down to 5 or even less?
Ultimately my question is, for a well-known model with a pre-defined loss function, is there a "good" standard value for the training loss?
thanks.
you need to train your weights until avg loss become 0.0XXXXX. It is minimal requirement to detect object with matching anchor IOU.
Update:28th Nov, 2018
while training object detection model, Loss might be vary sometimes with large data set. but all you need to calculate is Mean Average Precision(MAP) which exactly gave the accuracy criteria of trained model.
./darknet detector map .data .cfg .weights
If your MAP is near to 0.1 i.e. 100%, model performing well.
Follow link to know more about MAP:
https://medium.com/#jonathan_hui/map-mean-average-precision-for-object-detection-45c121a31173
Your validation loss is a good indicator of if the training loss can further alleviate, I mean i don't have any one-shot solutions ,you will have to tweak Hyper-parameters and check on the val test and iterate.You can also get a nice idea by looking at the loss curve, was it decreasing when you stopped training or was it flat, you can get a sense of how the training has progressed and make changes accordingly.GoodLuck

Using Random Forest for time series dataset

For a time series dataset, I would like to do some analysis and create prediction model. Usually, we would split data (by random sampling throughout entire data set) into training set and testing set and use the training set with randomForest function. and keep the testing part to check the behaviour of the model.
However, I have been told that it is not possible to split data by random sampling for time series data.
I would appreciate if someone explain how to split data into training and testing for time series data. Or if there is any alternative to do time series random forest.
Regards
We live in a world where "future-to-past-causality" only occurs in cool scifi movies. Thus, when modeling time series we like to avoid explaining past events with future events. Also, we like to verify that our models, strictly trained on past events, can explain future events.
To model time series T with RF rolling is used. For day t, value T[t] is the target and values T[t-k] where k= {1,2,...,h}, where h is the past horizon will be used to form features. For nonstationary time series, T is converted to e.g. the relatively change Trel. = (T[t+1]-T[t]) / T[t].
To evaluate performance, I advise to check the out-of-bag cross validation measure of RF. Be aware, that there are some pitfalls possibly rendering this measure over optimistic:
Unknown future to past contamination - somehow rolling is faulty and the model using future events to explain the same future within training set.
Non-independent sampling: if the time interval you want to forecast ahead is shorter than the time interval the relative change is computed over, your samples are not independent.
possible other mistakes I don't know of yet
In the end, everyone can make above mistakes in some latent way. To check that is not happening you need to validate your model with back testing. Where each day is forecasted by a model strictly trained on past events only.
When OOB-CV and back testing wildly disagree, this may be a hint to some bug in the code.
To backtest, do rolling on T[t-1 to t-traindays]. Model this training data and forecast T[t]. Then increase t by one, t++, and repeat.
To speed up you may train your model only once or at every n'th increment of t.
Reading Sales File
Sales<-read.csv("Sales.csv")
Finding length of training set.
train_len=round(nrow(Sales)*0.8)
test_len=nrow(Sales)
Splitting your data into training and testing set here I have considered 80-20 split you can change that. Make sure your data in sorted in ascending order.
Training Set
training<-slice(SubSales,1:train_len)
Testing Set
testing<-slice(SubSales,train_len+1:test_len)

training time and overfitting with gamma and C in libsvm

I am now using libsvm for support vector machine classifier with Gaussian kernal. In its website, it provides a python script grid.py to select the best C and gamma.
I just wonder how training time and overfitting/underfitting change with gamma and C?
Is it correct that:
suppose C changes from 0 to +infinity, the trained model will go from underfitting to overfitting, and the training time increases?
suppose gamma changes from almost 0 to +infinity, the trained model will go from underfitting to overfitting, and the training time increases?
In grid.py, the default searching order is for C from small to big BUT gamma from big to small. Is it for the purpose of training time from small to big and trained model from underfitting to overfitting? So we can perhaps save time in selecting the values of C and gamma?
Thanks and regards!
Good question for which I don't have a sure answer, because I myself would like to know. But in response to the question:
So we can perhaps save time in selecting the values of C and gamma?
... I find that, with libsvm, there is definitely a "right" value for C and gamma that is highly problem dependent. So regardless of the order in which gamma is searched, many candidate values for gamma must be tested. Ultimately, I don't know any shortcut around this time-consuming (depending upon your problem) but necessary parameter search.

Resources