Model Training: naming scheme for saved models with different hyperparameters - machine-learning

For a classification task, I've been experimenting with different neural network architectures and training methods: varying the number of hidden layers, activation functions, batch size, epochs, loss function, etc.
Sometimes, I'll want to compare the performance of different models. After training/validation, each model is saved in S3, with a name that indicates its parameters + hyperparameters (e.g. '100_1000_0.5_100_..._1.model')
This naming scheme is messy, and I'd like to find a better way to save/load models based on the (hyper)parameters. One potential alternative is to create nested folders for different arguments (e.g. 100 -> 1000 -> 0.5 -> 100 -> ... -> 1.model) But this convention is still brittle — if I change the number/order of hyperparameters, I've got to reorganize the entire folder structure in S3.
Any suggestions on how to store networks with 10-15 different parameters? (Also, is it even worth storing multiple copies of the same model, trained with different hyperparameters?)

bene's suggestion was helpful: save each model in a flat directory, with an additional reference file to store the hyperparameters used for each training instance.

Related

Machine Learning model generalisation

I'm new to Machine Learning, and I'd like to make a question regarding the model generalization. In my case, I'm going to produce some mechanical parts, and I'm interested in the control of the input parameters to obtain certain properties on the final part.
More particularly, I'm interested in 8 parameters (say, P1, P2, ..., P8). In which to optimize the number of required pieces produced to maximize the combinations of parameters explored, I've divided the problem into 2 sets. For the first set of pieces, I'll vary the first 4 parameters (P1 ... P4), while the others will be held constant. In the second case, I'll do the opposite (variables P5 ... P8 and constants P1 ... P4).
So I'd like to know if it's possible to make a single model that has the eight parameters as inputs to predict the properties of the final part. I ask because as I'm not varying all the 8 variables at once, I thought that maybe I would have to do 1 model for each set of parameters, and the predictions of the 2 different models couldn't be related one to the other.
Thanks in advance.
In most cases having two different models will have a better accuracy then one big model. The reason is that in local models, the model will only look at 4 features and will be able to identify patterns among them to make prediction.
But this particular approach will most certainly fail to scale. Right now you only have two sets of data but what if it increases and you have 20 sets of data. It will not be possible for you to create and maintain 20 ML models in production.
What works best for your case will need some experimentation. Take a random sample from data and train ML models. Take one big model and two local models and evaluate their performance. Not just accuracy, but also their F1 score, AUC-PR and ROC curve too to find out what works best for you. If you do not see a major performance drop, then one big model for the entire dataset will be a better option. If you know that your data will always be divided into these two sets and you dont care about scalability, then go with two local models.

What's an approach to ML problem with multiple data sets?

What's your approach to solving a machine learning problem with multiple data sets with different parameters, columns and lengths/widths? Only one of them has a dependent variable. Rest of the files contain supporting data.
Your query is too generic and irrelevant to some extent as well. The concern around columns length and width is not justified when building a ML model. Given the fact that only one of the datasets has a dependent variable, there will be a need to merge the datasets based on keys that are common across datasets. Typically, the process followed before doing modelling is :
step 0: Identify the dependent variable and decide whether to do regression or classification (assuming you are predicting variable value)
Clean up the provided data by handling duplicates, spelling mistakes
Scan through the categorical variables to handle any discrepancies.
Merge the datasets and create a single dataset that has all the independent variables and the dependent variable for which prediction has to be done.
Do exploratory data analysis in order to understand the dependent variable's behavior with other independent variables.
Create model and refine the model based on VIF (Variance Inflation factor) and p-value.
Iterate and keep reducing the variables till you get a model which has all the
significant variables, stable R^2 value. Finalize the model.
Apply the trained model on the test dataset and see the predicted value against the variable in test dataset.
Following these steps at high level will help you to build models.

MobileNet Pre-Trained Model - Classification

I am currently working with a pre-trained MobileNet model that classifies images from a set of 1000 categories. For the purpose of my IOS application, I only need it to recognize/classify one type of object in the scene. How can I train the model so that it only classifies the one object I need but does it extremely well?
I am new to machine learning and unfamiliar with transfer learning techniques. Would doing this type of training reduce the model size and make it more efficient at recognizing the one object I need? If yes, what are resources that teach me how to keep training this pre-trained model for my objective.
Briefly, you want to turn your 1000-way classifier to a binary classifier.
The answer below assumes you have access to the original data, and that you know how to train the original model (that is, you have access to the training script). Here goes:
Assuming you're only interested in a single category C, you want to first map all instances (x, C) of the data to (x, 1) and all other instances (x, not_C) to (x, 0), then train a model on the resulting data (or, continue training the pre-trained model, if the training script also accepts a starting point for the model).
The model would then lose the ability to discern between non-C classes, and hopefully become better at discriminating C vs non-C instances.
Note: A less hacky approach would be to actually restrict the model to output only 0 or 1 and change the objective to a binary softmax. However, that would require some manipulation of the model's architecture, which you can do without.

Tensorflow queue runner - is it possible to queue a specific subset?

In tensorflow, I plan to build some model and compare it to other baseline models with respect to different subsets of the training data. I.e. I would like to train my model and the baseline models with the same subsets of training data.
In the naive way queue-runner and TFreaders are implemented (e.g. im2txt), this requires duplicating the data per each selection of subsets, which is my case, will require to use very large amounts of disk space.
It will be best, if there would be a way to tell the queue to fetch only samples from a specified subset of ids, or to ignore samples if they are not part of a given subset of ids.
If I understand correctly ignoring samples is not trivial, because it will require to stitch samples from different reads to a single batch.
Does anybody knows a way to do that? Or can suggest an alternative approach which does not requires pre-loading all the training data into the RAM?
Thanks!
You could encode your condition as part of keep_input parameter of tf.train.maybe_batch

Combining training, validation and test datasets

Is it possible to train a model based on training and validation data sets.Basically end up combining both of them to create a new model. And from that combined model use it to classify all of the data in the test dataset.
This is what is usually done. Assuming that you know how to transfer hyperparameters, as you usually fit model on train data, select hyperparameters based on the score on the valid one. Thus when you combine train + valid you get significantly bigger dataset, thus "optimal hyperparameters" might be completely different from the ones you selected before. So in general - yes, this is exactly what is usually done, but it might be more tricky than you expect (especially if your method is highly stochastic, non deterministic, etc.).

Resources