How to get the outputs of hidden layers of pre-trained StyleGAN? - stylegan

I currently study the styleGAN, and want to adjust some new loss functions, but the main problem I met is how to get the outputs of the hidden layers given a pre-trained styleGAN model. I want to get some feature maps in the middle layers of Discriminator and Generator as well. I've looked through the code released by Nvilab but didn't find a clue. Any suggestions? Thanks in advance.

Related

What is a multi-headed model? And what exactly is a 'head' in a model?

What is a multi-headed model in deep learning?
The only explanation I found so far is this: Every model might be thought of as a backbone plus a head, and if you pre-train backbone and put a random head, you can fine tune it and it is a good idea
Can someone please provide a more detailed explanation.
The explanation you found is accurate. Depending on what you want to predict on your data you require an adequate backbone network and a certain amount of prediction heads.
For a basic classification network for example you can view ResNet, AlexNet, VGGNet, Inception,... as the backbone and the fully connected layer as the sole prediction head.
A good example for a problem where you need multiple-heads is localization, where you not only want to classify what is in the image but also want to localize the object (find the coordinates of the bounding box around it).
The image below shows the general architecture
The backbone network ("convolution and pooling") is responsible for extracting a feature map from the image that contains higher level summarized information. Each head uses this feature map as input to predict its desired outcome.
The loss that you optimize for during training is usually a weighted sum of the individual losses for each prediction head.
Head is the top of a network. For instance, on the bottom (where data comes in) you take convolution layers of some model, say resnet. If you call ConvLearner.pretrained, CovnetBuilder will build a network with appropriate head to your data in Fast.ai (if you are working on a classification problem, it will create a head with a cross entropy loss, if you are working on a regression problem, it will create a head suited to that).
But you could build a model that has multiple heads. The model could take inputs from the base network (resnet conv layers) and feed the activations to some model, say head1 and then same data to head2. Or you could have some number of shared layers built on top of resnet and only those layers feeding to head1 and head2.
You could even have different layers feed to different heads! There are some nuances to this (for instance, with regards to the fastai lib, ConvnetBuilder will add an AdaptivePooling layer on top of the base network if you don’t specify the custom_head argument and if you do it won’t) but this is the general picture.
https://forums.fast.ai/t/terminology-question-head-of-neural-network/14819/2
https://youtu.be/h5Tz7gZT9Fo?t=3613 (1:13:00)

Fine-tuning and transfer learning by the example of YOLO

I have a general question regarding fine-tuning and transfer learning, which came up when I tried to figure out how to best get yolo to detect my custom object (being hands).
I apologize for the long text possibily containing lots of false information. I would be glad if someone had the patience to read it and help me clear my confusion.
After lots of googling, I learned that many people regard fine-tuning to be a sub-class of transfer learning while others believe that they are to different approaches to training a model. At the same time, people differentiate between re-training only the last classifier layer of a model on a custom dataset vs. also re-training other layers of the model (and possbibly adding an enirely new classifier instead of retraining?). Both approaches use pre-trained models.
My final confusien lies here: I followed these instructions: https://github.com/thtrieu/darkflow to train tiny yolo via darkflow, using the command:
# Initialize yolo-new from yolo-tiny, then train the net on 100% GPU:
flow --model cfg/yolo-new.cfg --load bin/tiny-yolo.weights --train --gpu 1.0
But what happens here? I suppose I only retrain the classifier because the instructions say to change the number of classes in the last layer in the configuration file. But then again, it is also required to change the number of filters in the second last layer, a convolutional layer.
Lastly, the instructions provide an example of an alternative training:
# Completely initialize yolo-new and train it with ADAM optimizer
flow --model cfg/yolo-new.cfg --train --trainer adam and I don't understand at all how this relates to the different ways of transfer learning.
If you are using AlexeyAB's darknet repo (not darkflow), he suggests to do Fine-Tuning instead of Transfer Learning by setting this param in cfg file : stopbackward=1 .
Then input ./darknet partial yourConfigFile.cfg yourWeightsFile.weights outPutName.LastLayer# LastLayer# such as :
./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81
It will create yolov3.conv.81 and will freeze the lower layer, then you can train by using weights file yolov3.conv.81 instead of original darknet53.conv.74.
References : https://github.com/AlexeyAB/darknet#how-to-improve-object-detection , https://groups.google.com/forum/#!topic/darknet/mKkQrjuLPDU
I have not worked on YOLO but looking at your problems I think I can help. Fine tuning, re-training, post-tuning are all somewhat ambiguous terms often used interchangeably. It's all about how much you want to change the pre-trained weights.
Since you are loading the weights in the first case with --load, the pre-trained weights are being loaded here - it could mean you are adjusting the weights a bit with a low learning rate or maybe not changing them at all. In the second case, however, you are not loading any weights, so probably you are training it from scratch. So when you make small (fine) changes, call it fine-tuning, post-tuning would be tuning again after initial training, maybe not as fine as fine-tuning and retraining would then be training the whole network or a part again
There would be separate ways in which you can freeze some layers optionally.

How to define the object that the selective search algorithm need to detect

Currently, I am trying to generate annotations "Bounding Box" for object detections with a deep neural network "RCNN", the problem is that I need to do it by hand, I have more than 500 images, and I want to generate the annotations automatically, for that I founded the "Selective Search Algorithm", but what I don't understant is how can I tell to my algorithm what label is correponding foreach generated bounding box
Thanks,
Look at the paper "Selective Search for Object Recognition". The author has provided the way to generate labels. You have to construct a classifier and train the classifier to get the labels. Hope this helps.

Can inception model be used for object counting in an image?

I have already gone through the image classification part in Inception model, but I require to count the objects in the image.
Considering the flowers data-set, one image can have multiple instances of a flower, so how can I get that count?
What you describe is known to research community as Instance-Level Segmentation.
In last year itself there have been a significant spike in papers addressing this problem.
Here are some of the papers:
https://arxiv.org/pdf/1412.7144v4.pdf
https://arxiv.org/pdf/1511.08498v3.pdf
https://arxiv.org/pdf/1607.03222v2.pdf
https://arxiv.org/pdf/1607.04889v2.pdf
https://arxiv.org/pdf/1511.08250v3.pdf
https://arxiv.org/pdf/1611.07709v1.pdf
https://arxiv.org/pdf/1603.07485v2.pdf
https://arxiv.org/pdf/1611.08303v1.pdf
https://arxiv.org/pdf/1611.08991v2.pdf
https://arxiv.org/pdf/1611.06661v2.pdf
https://arxiv.org/pdf/1612.03129v1.pdf
https://arxiv.org/pdf/1605.09410v4.pdf
As you see in these papers simple object classification network won't solve the problem.
If you search github you will find a few repositories with basic frameworks, you can build on top of them.
https://github.com/daijifeng001/MNC (caffe)
https://github.com/bernard24/RIS/blob/master/RIS_infer.ipynb (torch)
https://github.com/jr0th/segmentation (keras, tensorflow)
indraforyou answered the question in how to solve the problem you are having. I want to add something for the inception model specifically. In https://arxiv.org/pdf/1312.6229.pdf they propose a regressor network trained on the output of a model trained on the imagenet dataset like the inception model. This regressor model then is used to propose object boundaries for you to use for counting. The advantage of this approach is that you do not have to annotate any training examples and you can just use the ImageNet dataset for training.
If you do not want to train anything I would propose a heuristic in finding object boundaries. Literature in image segmentation https://en.wikipedia.org/wiki/Image_segmentation should help you find a suitable heuristic. I do think using a heuristic will decrease your accuracy though.
Last but not least this is an open problem in computer vision research. You should not expect to get 100% accuracy or even 95% accuracy on counting. Many very smart people have tried this and reported mixed results. Still some very cool things can be accomplished.
Any classification model like inception model will identify the object like flower in your case. However, when multiple items are there classifications won't work (get confused in simple language).
Thus:
You've to segment main image into child images with one object per image and use classification on each segment. This is termed as image segmentation in image processing.

Training a Text Detection System

I'm currently developping a text detection system in a given image using logistic regression, and I need training data like the image below:
The first column show a positive example (y=1) of text wheras the second column show images without text (y=0).
I'm wondering where I can get a labeled dataset of this kind??
Thanks in advance.
A good place to start for these sorts of things is the UC Irvine Machine Learning Repository:
http://archive.ics.uci.edu/ml/
But maybe also consider heading over to Cross-Validated as well, for machine learning-related questions:
https://stats.stackexchange.com/
You can get a similar dataset here.
Hope it helps.

Resources