How to compare two deep learning models performance? - machine-learning

I'm new to deep learning so if the question doesn't make sense plz correct me.
In traditional machine learning I know how to compare models and choose one of the as the best with the metrics I choose.
However, in deep learning, each model is build with different layers, so how can I control variables to determine which model is the best fairly? Or usually people don't compare in this way?
For example I have a sequential data, I can use both CNN and LSTM model, so should I compare model with only one layer of CNN and one layer of LSTM? After that I can add more layers or tuning my model?
Or someone can just tell me the process of how to compare and choose the best deep learning model with chosen metrics?

For sequential data as you mention,
Number of layers has nothing to do with the comparison of two models. At certain number of layers your accuracy will start to decrease because of overfitting.
Comparing 1 layer of CNN with 1 layer of LSTM is not a correct approach.
You need to check the following factors for comparison like
accuracy,precision,recall,f1-score depending on your application objective.
For example , if you are working on the language translation data
LSTM would be better choice, because it over comes the problem of vanishing gradient.

Related

Machine learning with my car dataset

I’m very new to machine learning.
I have a dataset with data given me by a f1 race. User is playing this game and is giving me this dataset.
With machine learning, I have to work with this data and when a user (I know they are 10) plays a game I have to recognize who’s playing.
The data consists of datagram packet occurred in 1/10 second freq, the packets contains the following Time, laptime, lapdistance, totaldistance, speed, car position, traction control, last lap time, fuel, gear,..
I’ve thought to use a kmeans used in a supervised way.
Which algorithm could be better?
The task must be a multiclass classification. The very first step in any machine learning activity is to define a score metric (https://machinelearningmastery.com/classification-accuracy-is-not-enough-more-performance-measures-you-can-use/). That allows you to compare models between themselves and decide which is better. Then build a base model with random forest or/and logistic regression as suggested in another answer - they perform well out-of-the-box. Then try to play with features and understand which of them are more informative. And don't forget about a visualizations - they give many hints for data wrangling, etc.
this is somewhat a broad question, so I'll try my best
kmeans is unsupervised algorithm meaning it will find the classes itself and it best used when you know there are multiple classes but you don't know what exactly they are... using it with labeled data just means you will compute the distance of new vector v to each vector in the dataset and pick the one (or ones using majority vote) which give the min distance , this is not considered as machine learning
in this case when you do have the labels, supervised approach will yield much better results
I suggest try random forest and logistic regression at first, those are the most basic and common algorithms and they give pretty good results
if you haven't achieve the desired accuracy you can use deep learning and build a neural network with input layer as big as your packet's values and output layer of the number of classes, in between you can use one or multiple hidden layers with various nodes, but this is advanced approach and you better pick up some experience in machine learning field before pursue it
Note: the data is a time series, meaning that every driver has it's own behaviour of driving a car, so data should be considered as bulks of points, with this you can apply pattern matching technics, also there are a several neural networks build exactly for this data (like RNN) but this is far far advanced and much more difficult to implement

how to add more outputs to neural networks?

Of course no one making a neural network for image recognition and classification can make place for all possible image outputs. so If I make a neural network that takes the array input and get the output as a bird or not a bird. can I add more outputs for more images after I finish learning the first network or that will make the learning vanish.
so I add fixed input number and 1 output then I add 1 more and 1 more is that applicable or no?
Retrain
If you can spend the resources, it would be a good thing to re-train (or to be more specific: train something from scratch) your network. But read the approaches following when you might achieve something better (or at least less costly).
Transfer-learning
But if you are using one of the huge popular NNs which take weeks to train (on very costly) hardware, there might be a way touching the idea of transfer-learning.
There are at least two different approaches then:
Using the pretrained NN as feature-extractor
Here you will remove the final dense-layers and just use the trained NN to extract some features out of your images. Then you can build some arbitrarily new classifier on your new dataset, which maps OLD-NN-OUTPUT = FEATURES-INPUT -> classes (new softmax-NN or SVM/Kernel-SVM or anything else). This sounds quite robust if we assume that your pretrained NN is of high-quality and your new class is not too different from the learned ones.
In general this approach might be favorable if your new class + dataset is small and similar to the original one.
If the new data is not that similar, one might use some features at some earlier layer (more generic).
Continuing training
Here you would continue training the weights of your original NN, probably keeping the first layers (maybe even all but the final dense ones). As above the general idea is that we assume a good NN to be very general at the first layers (= extracting features) and more specific in the last ones.
This approach should be favorable if you got huge data for your new class. Depending on the similarity you might either continue to retrain all weights or if quite similar, fix some layer-weights (first ones).
There might be technical issues here how to achieve this approach (like different image-size inputs and other stuff). So it needs some work if some constraints of the original NN are broken. It's also important to tune the hyper-parameters for learning (maybe learning-rates should be lower!).

Deep learning Training dataset with Caffe

I am a deep-learning newbie and working on creating a vehicle classifier for images using Caffe and have a 3-part question:
Are there any best practices in organizing classes for training a
CNN? i.e. number of classes and number of samples for each class?
For example, would I be better off this way:
(a) Vehicles - Car-Sedans/Car-Hatchback/Car-SUV/Truck-18-wheeler/.... (note this could mean several thousand classes), or
(b) have a higher level
model that classifies between car/truck/2-wheeler and so on...
and if car type then query the Car Model to get the car type
(sedan/hatchback etc)
How many training images per class is a typical best practice? I know there are several other variables that affect the accuracy of
the CNN, but what rough number is good to shoot for in each class?
Should it be a function of the number of classes in the model? For
example, if I have many classes in my model, should I provide more
samples per class?
How do we ensure we are not overfitting to class? Is there way to measure heterogeneity in training samples for a class?
Thanks in advance.
Well, the first choice that you mentioned corresponds to a very challenging task in computer vision community: fine-grained image classification, where you want to classify the subordinates of a base class, say Car! To get more info on this, you may see this paper.
According to the literature on image classification, classifying the high-level classes such as car/trucks would be much simpler for CNNs to learn since there may exist more discriminative features. I suggest to follow the second approach, that is classifying all types of cars vs. truck and so on.
Number of training samples is mainly proportional to the number of parameters, that is if you want to train a shallow model, much less samples are required. That also depends on your decision to fine-tune a pre-trained model or train a network from scratch. When sufficient samples are not available, you have to fine-tune a model on your task.
Wrestling with over-fitting has been always a problematic issue in machine learning and even CNNs are not free of them. Within the literature, some practical suggestions have been introduced to reduce the occurrence of over-fitting such as dropout layers and data-augmentation procedures.
May not included in your questions, but it seems that you should follow the fine-tuning procedure, that is initializing the network with pre-computed weights of a model on another task (say ILSVRC 201X) and adapt the weights according to your new task. This procedure is known as transfer learning (and sometimes domain adaptation) in community.

Model selection with dropout training neural network

I've been studying neural networks for a bit and recently learned about the dropout training algorithm. There are excellent papers out there to understand how it works, including the ones from the authors.
So I built a neural network with dropout training (it was fairly easy) but I'm a bit confused about how to perform model selection. From what I understand, looks like dropout is a method to be used when training the final model obtained through model selection.
As for the test part, papers always talk about using the complete network with halved weights, but they do not mention how to use it in the training/validation part (at least the ones I read).
I was thinking about using the network without dropout for the model selection part. Say that makes me find that the net performs well with N neurons. Then, for the final training (the one I use to train the network for the test part) I use 2N neurons with dropout probability p=0.5. That assures me to have exactly N neurons active on average, thus using the network at the right capacity most of the time.
Is this a correct approach?
By the way, I'm aware of the fact that dropout might not be the best choice with small datasets. The project I'm working on has academic purposes, so it's not really needed that I use the best model for the data, as long as I stick with machine learning good practices.
First of all, model selection and the training of a particular model are completely different issues. For model selection, you would usually need a data set that is completely independent of both training set used to build the model and test set used to estimate its performance. So if you're doing for example a cross-validation, you would need an inner cross-validation (to train the models and estimate the performance in general) and an outer cross-validation to do the model selection.
To see why, consider the following thought experiment (shamelessly stolen from this paper). You have a model that makes a completely random prediction. It has a number of parameters that you can set, but have no effect. If you're trying different parameter settings long enough, you'll eventually get a model that has a better performance than all the others simply because you're sampling from a random distribution. If you're using the same data for all of these models, this is the model you will choose. If you have a separate test set, it will quickly tell you that there is no real effect because the performance of this parameter setting that achieves good results during the model-building phase is not better on the separate set.
Now, back to neural networks with dropout. You didn't refer to any particular paper; I'm assuming that you mean Srivastava et. al. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting". I'm not an expert on the subject, but the method to me seems to be similar to what's used in random forests or bagging to mitigate the flaws an individual learner may exhibit by applying it repeatedly in slightly different contexts. If I understood the method correctly, essentially what you end up with is an average over several possible models, very similar to random forests.
This is a way to make an individual model better, but not for model selection. The dropout is a way of adjusting the learned weights for a single neural network model.
To do model selection on this, you would need to train and test neural networks with different parameters and then evaluate those on completely different sets of data, as described in the paper I've referenced above.

Is cross-validation used to find the best model/architecture OR the best parameters of a model/architecture?

In my view, cross-validation is used to compare models by using as much data as possible. For example it can be used to compare a perceptron neural network and a decision tree for the same problem. Or it can be used to study the number of neurons of a neural network for a particular problem. Here it's about comparing models/architectures.
Nevertheless, in my view, cross-validation doesn't seem suitable to find the best weights of a neural network because at each round of the cross-validation, the weights are reinitialized.
Can you confirm my point of view ? that cross-validation is only used to compare models/architectures and is not suitable to find the best parameters of these models/architectures ?
Thank you.
You have the right idea, yes.
Typically you use cross validation to estimate the accuracy on unseen data. This estimate helps you to select the suitable model type/parameters etc.
Once you decided on the model configuration, you can train the model on the entire dataset. (Just always keep in mind that the training error on the entire dataset is not a good estimate for the error on unseen data.)

Resources