I was using a pretrained model on tensorflow like tf.keras.applications.resnet50.ResNet50()
Is it possible to change the model architecture? e.g. from original resnet to it's variant resnet-d
if its possible does the model still benefit from pretrained imagenet weights?
i'm new to machine learning. i just wonder if it could be done.
Related
I am trying to understand in what cases I can benefit from pre-trained weights. Sometimes, pre-trained weights works (can be fine-tuned) for other domains but I cannot understand how to make a decision about this.
By the way, I have a medical annotated data and I will use it in order to fine-tune a pre-trained network or train from scratch based on the suggestions.
Thanks!
i am a newbie in ML , i was trying to solve a multi-class classification problem . I have used XGBOOST to reduce the log loss, I also tried a Dense neural network to reduce the log loss , it also seem to work well. Now is there a way i can stack these two models so that i can further reduce the log loss.
You can do it with Apple coremltools.
Take your XGBoost model and convert to MLModel using the converter.
Create a pipeline model combining that model with any neural networks.
I'm sure there are other tools with pipeline. But you will need to convert XGBoost to some other format.
I'm trying to implement U-Net with PyTorch, using pre-trained networks in the encoder path.
The original U-Net paper trained a network from scratch. Are there any resources or principles on where skip connections should be placed, when a pre-trained backbone is used instead?
I already found some examples (e.g. this repo), but without any justification for the feature selection.
In the original U-Net paper features right before Max-Pool layers are used for the skip-connections.
The logic is exactly the same with pre-trained backbones: at each spatial resolution, the deepest feature layer is selected. Thanks for qubvel on github for pointing this out in an issue.
I'm learning transfer learning with some pre-trained models (vgg16, vgg19,…), and I wonder why I need to load pre-trained weight to train my own dataset.
I can understand if the classes in my dataset are included in the dataset that the pre-trained model is trained with. For example, VGG model was trained with 1000 classes in Imagenet dataset, and my model is to classify cat-dog, which are also in the Imagenet dataset. But here the classes in my dataset are not in this dataset. So how the pre-trained weight can help?
You don't have to use a pretrained network in order to train a model for your task. However, in practice using a pretrained network and retraining it to your task/dataset is usually faster and often you end up with better models yielding higher accuracy. This is especially the case if you do not have a lot of training data.
Why faster?
It turns out that (relatively) independent of the dataset and target classes, the first couple of layers converge to similar results. This is due to the fact that low level layers usually act as edge, corner and other simple structure detectors. Check out this example that visualizes the structures that filters of different layers "react" to. Having already trained the lower layers, adapting the higher level layers to your use case is much faster.
Why more accurate?
This question is harder to answer. IMHO it is due to the fact that pretrained models that you use as basis for transfer learning were trained on massive datasets. This means that the knowledge acquired flows into your retrained network and will help you to find a better local minimum of your loss function.
If you are in the compfortable situation that you have a lot of training data you should probably train a model from scratch as the pretained model might "point you in the wrong direction".
In this master thesis you can find a bunch of tasks (small datasets, medium datasets, small semantical gap, large semantical gap) where 3 methods are compared : fine tuning, features extraction + SVM, from scratch. Fine tuning a model pretrained on Imagenet is almost always a better choice.
I need a simple model that would be fast to train and would be suitable for time series prediction that would be used mainly to generate new features. Should I use LSTM or SVM or maybe something else?
The model which is suitable for your data is variable. But the simplest model in math is vanilla RNN.
There is a nice article for your reference:
http://colah.github.io/posts/2015-08-Understanding-LSTMs/