I am making image segmentation with U-net architecture. My task is to do segmentation to cars. I am using Carvana Image Dataset provided by Kaggle. In both train and test, I have great accuracy. I made data augmentation(zoom, rotate) but still when i enter data like this:
or this:
I am getting terrible output, e.g.:
Both images are 224x224 as train and test sets. And here is my train-test inputs:
What might be the problem?
Related
In the case of a deep CNN task, I do understand that sometimes image pre-processing techniques such as gaussian filtering and cropping can be helpful for deep CNN modeling. I wonder if it is also acceptable to be applied to testing data as well. I've always thought that test data should never be touched whatsoever so that the model performance can be evaluated accurately.
As a matter of fact, you do need to apply those filters, which you have used on training data, on your test as well!
The fact that you should not touch your test data, is about not using them in during the training, so the generalization is being done only using the train, so that when you evaluate on a test, you get a realistic performance and quality of your model.
Any filtering, like Gaussian, applied on train data, before injecting them to model training, should be done on the test data as well.
For cropping, it really depends on how you crop, and what you crop. If your photos have always the frames around them, and in the train dataset you crop to remove those, I highly suggest doing the same for the test
Yes. The preprocessing is then part of your model and should therefore be part of testing.
I'm using alexnet to train my own dataset.
The example code in caffe comes with
bvlc_reference_caffenet.caffemodel
solver.prototxt
train_val.prototxt
deploy.prototxt
When I train with the following command:
./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt
I'd like to start with weights given in bvlc_reference.caffenet.caffemodel.
My questions are
How do I do that?
Is it a good idea to start from the those weights? Would this converge faster? Would this be bad if my data are vastly different from the Imagenet dataset?
1.
In order to use existing .caffemodel weights for fine-tuning, you need to use --weights command line argument:
./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt --weights=models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
2.
In most cases fine-tuning a net is quite a recommended practice, even when the input images are quite different than "imagenet" photos.
However, you should note that when training for the original weights you are about to use, some (very reasonable) assumptions were made. You should decide whether these assumptions are still true for your task.
For instance, most nets were trained with simple data augmentation using an image and its horizontal flip. However, if your task is to distinguish between images that are flipped you will find it very difficult to fine tune.
Let's suppose I would like to classify motorbikes by model.
there are couple of hundreds models of motorbikes I'm interested in.
I do have tens, sometimes hundreds of pictures of each motorbike model.
Can you please point me to the practical example that demonstrates how to train model on your data and then use it to classify images? It needs to be a deep learning model, not simple logistic regression.
I'm not sure about it, but it seems like I can't use pre-trained neural net because it has been trained on wide range of objects like cat, human, cars etc. They may be not too good at distinguishing the motorbike nuances I'm interested in.
I found couple of such examples (tensorflow has one), but sadly, all of them were using pre-trained model. None of it had example how to train it on your own dataset.
In cases like yours you either use transfer learning or fine tuning. If you have more then thousand images of motorbikes I would use fine tuning and if you have less transfer learning.
Fine tuning is using a pre trained model and using a different classifier part. Then the new classifier part maybe the last 1-2 layers of the trained model are trained to your dataset.
Transfer learning means using a pre trained model and letting it output features for an input image. Now you use a new classifier based on those features. Maybe a SVM or a logistic regression.
An example for this can be seen here: https://github.com/cpra/dlvc2016/blob/master/lectures/lecture10.pdf. slide 33.
This paper Quick, Draw! Doodle Recognition from a kaggle challenge may be similar enough to what you are doing. The code is on github. You may need some data augmentation if you only have a few hundred images for each category.
What you want is pretty EZ. Follow the darknet YOLO implementation
Instruction: https://pjreddie.com/darknet/yolo/
Code https://github.com/pjreddie/darknet
Training YOLO on COCO
You can train YOLO from scratch if you want to play with different training regimes, hyper-parameters, or datasets. Here's how to get it working on the COCO dataset.
Get The COCO Data
To train YOLO you will need all of the COCO data and labels. The script scripts/get_coco_dataset.sh will do this for you. Figure out where you want to put the COCO data and download it, for example:
cp scripts/get_coco_dataset.sh data
cd data
bash get_coco_dataset.sh
Add your data inside and make sure it is same as testing samples.
Now you should have all the data and the labels generated for Darknet.
Then call training script with the pre-trained weight.
Keep in mind that only training on your motorcycle may not result in good estimation. There would be biased result coming out, I red it somewhere b4.
The rest is all inside the link. Good luck
I have a set of images of a particular object. I want to find if some of these has anomalies with a machine learning algorithm. For example if I have many photos of glasses I want to find if one of these is broken or has something anomalous. Something like this:
GOOD!!
BAD!!
(Obviously I will use the same kind of glasses...)
The problem is that I don't know every negative situation, so, for training, I have only positive images.
In other words I want an algorithm that recognize if an image has something different from the dataset. Do you have any suggestion?
In particular is there a way to use convolutional neural network?
What you are looking for is usually called anomaly, outlier, or novelty detection. You have lots of examples of what your data should look like, and you want to know when something doesn't look like your data.
A good approach for this problem, since you are using images, you can get a feature vectorized version using a pre-trained CNN on image net. Then you can use an anomaly detector on that feature set. The isolation forest should be an easier one to get working.
This is a typical Classification problem. I do not understand why you need CNN for this ......
My suggestion would be to build/train a classification model
comprising of only GOOD images of glass. Here you would possibly
have all kinds of glasses that are intact with a regular shape.
If the model encounters anything other than GOOD images, it will
classify those as BAD images. This so called BAD images may
include cracked/broken glasses having an irregular shape.
Another option that might work is to use an autoencoder.
Autoencoders are unsupervised neural networks with bottleneck architecture that try to reconstruct its own input.
We could train a deep convolutional autoencoder with examples of good glasses so that it gets specialized in reconstructing those type of images. You don't need to train autoencoder with bad glasses.
Therefore I would expect the trained autoencoder to produce low error rate for good glasses and high error rate for bad glasses. Error rate could be measured with MSE based on the difference between the reconstructed and original values (pixels).
From the trained autoencoder you can plot the MSEs for good vs bad glasses to help you define the right threshold. Or you can also try statistic thresholds such as: avg + 2*std, median + 2*MAD, etc.
Autoencoder details:
http://ufldl.stanford.edu/tutorial/unsupervised/Autoencoders/
Deep autoencoder for images:
https://cds.cern.ch/record/2209085/files/Outlier%20detection%20using%20autoencoders.%20Olga%20Lyudchick%20(NMS).pdf
I am working on a project about the feedforward pathway of the ventral stream, and i have 6 images to be recognized at the InferoTemporal Layer.
Please can someone give me images' exmamples showing to me what is the difference between training images and test images. So what i should add to my folder that contain my training images? Does i should add another folder that contain a list of test images ? if yes, what should be these test images?
Does the training images must contains the images to be analysed or recognized and the test images must contains the images in memory? In other words, if we have for example 16 training faces and one or two test faces. So we should analyse what is the face in the training that correspond to the face in test ? Is that true ??
Note: I don't need a code, I am only interested to get a brief explanations about the difference between test ans training images.
Any help will be very appreciated.
The only difference between training and test images is the fact, that test images are not used for selecting your models parameters. Each model has some kind of paramters, variables, which it fits to the data. This is called a training process. The training/test set separation ensures, that your model (algorithm) can actually do something more that just memorizing images - so you test it on test images, which has not been used during the training phase.
It has been already discussed in detail on SO: whats is the difference between train, validation and test set, in neural networks?
In HMAX, you use all the data at the input image layer. And garbor filter, max-pooling, radial basis kernel functions on all of them. Only at C2 layer, you start to train a subset of the images (mostly with a linear kernel based SVM). The subset is set to training data. And the rest are test data. In one word, training images are first used to build the SVM and then the test images are assigned to digit classes using the majority-voting method.
But this is in fact equivalent as you put the training images at the image layer at first. After all the layers going through, you then put the test images at the image layer to restart for the recognition. Since both training and test image need scaling, and all the operations at previous layers prior to C2 are the same, you just mix them altogether at the beginning.
Although you use all the training and test images at the image layer, you still need to shuffle the data and pick up some of them as the training, and the others as the testing.