I'm trying to train deeplabv3 model with mobilenetv3_small_seg architecture. I trained the model but the predictions I get is a complete blank mask with no class predictions. Steps I followed for training are:
Cloned official repository in Google Colab.
I prepared dataset with only one class (segmenting lips in a face). I followed Pascal VOC12 dataset format. I created RGB masks (0, 255, 0) with white boundaries around it (255, 255, 255) and black background (0, 0, 0) as shown below.
I then converted the RGB mask into single channel png (8 bit) with background:0, forground:1 and boundaries:255 with the help of this script as shown below:
I then successfully converted the dataset into tfrecord by modifying this script.
Then I added my dataset description in the data_generator.py with ignore_label=255 and num_classes=2.
Finally I started training with the following command:
!python train.py \
--logtostderr \
--training_number_of_steps=10000 \
--train_split="val" \
--model_variant="mobilenet_v3_small_seg" \
--decoder_output_stride=16 \
--train_crop_size="256,256" \
--train_batch_size=16 \
--dataset="pqr" \
--save_interval_secs=600 \
--save_summaries_secs=300 \
--save_summaries_images=True \
--log_steps=200 \
--train_logdir=${PATH_TO_TRAIN_DIR} \
--dataset_dir=${PATH_TO_DATASET}
After when training is complete, I tested the model with several different images. The output of the model is (256, 256) array with all values equals to 0. Not a single values I get 1 or anything else.
I'm new to machine learning. I want to know that
what's wrong with my process? I watched many tutorials but I couldn't find the answer.
Is there anything wrong with my dataset? The dataset contained total of 2000 images.
I couldn't find the pretrained weights for mobilenetv3_small. If anybody knows, kindly share it so I can do the transfer learning.
I set the number of classes to 2 (background and foreground). Is that right?
Related
I'd like to retrain a YOLOv7-tiny Object Detector on a custom dataset with 4 classes.
According to the README of the official git repo in order to transfer learn one should pass the pre-trained weights as argument:
python train.py --workers 8 --device 0 --batch-size 32 --data data/custom.yaml --img 640 640 --cfg cfg/training/yolov7-custom.yaml --weights 'yolov7_training.pt' --name yolov7-custom --hyp data/hyp.scratch.custom.yaml
As for the normal YOLOv7 model, there are the weight files yolov7.pt and yolov7_training.pt present in the repo (see here). What´s the difference between these two and which should be used in which occasion?
Also there's a yolov7-tiny.pt weight file, but no respective yolov7-tiny_training.pt file. Is there a particular reason for that?
Also I learned that for Transfer Learning it's helpful to "freeze" the base models weights (make them untrainable) first, then train the new model on the new dataset, so only the new weights get adjusted. After that you can "unthaw" the frozen weights to fine-tune the entire model. The train.py script has a --freeze argument to freeze backbone layers.
Is this approach recommended for retraining YOLOv7? If so, should you freeze all 50 backbone layers of YOLOv7 (and would that command be --freeze 50 or smth. different)? And how would you then unthaw these layers and resume the training of said model?
I'm trying to train my own Haar Cascade classifier for apple according to this article. I collected 1000 positive images and 500 negative images from the internet. Each image has different sizes and I cropped the images to create "info.txt" file. While I creating samples like this,
createsamples.exe -info positive/info.txt -vec vector/applevector.vec -
num 1000 -w 24 -h 24
there is some parameters -w and -h. What does it mean ? Should I resize all my positive and negative images ? I tried to train my classifier with default parameters (-w 24 and -h 24) but accuracy of my classifier is so weak. Can it be related with this parameters ? Thank you for advice.
UPDATE
There is some examples of my positive images. I collected them from the internet.
I am trying to do multiple classification problem with Vowpal Wabbit.
I have a train file that look like this:
1 |feature_space
2 |feature_space
3 |feature_space
As an output I want to get probabilities of test item belonging to each class, like this:
1: 0.13 2:0.57 3:0.30
think of sklearn classifiers predict_proba methods, for example.
I've tried the following:
1) vw -oaa 3 train.file -f model.file --loss_function logistic --link logistic
vw -p predict.file -t test.file -i model.file -raw_predictions = pred.txt
but the pred.txt file is empty (contains no records, but is created). Predict.file contains only the final class, and no probabilities.
2) vw - csoaa3 train.file -f model.file --link logistic
I've modified the input files accordingly to fit the cs format. csoaa doesn't accept loss_function logistic with following error message: "You are using a label not -1 or 1 with a loss function expecting that!"
If used with default square loss function, and similar output command, I get pred.txt with raw predictions for each class per item, for example:
2.33 1.67 0.55
I believe it's the resulting square distance.
Is there a way to get VW to output class probabilites or somehow convert these distances into probabilities?
There was a bug in VW version 7.9.0 and fixed in 7.10.0 resulting in the empty raw predictions file.
Since November 2015, the easiest way how to obtain probabilities is to use --oaa=N --loss_function=logistic --probabilities -p probs.txt. (Or if you need label-dependent features: --csoaa_ldf=mc --loss_function=logistic --probabilities -p probs.txt.)
I have used ImageMagick in my application. I used ImageMagick for comparing images using the compare command with the -subimage-search option.
But there is very little documentation of about how -subimage-search works.
Can anyon provide me more information on how it works? For example:
Does it compare using colormodel or does it image segmentation to achieve its task?
What I know right now is it searches for the second image in the first.
But how this is done? Please explain.
Warning: Conducting a subimage-search is slow -- extremely slow even.
Theory
This slowness is due to how the subimage searching is designed to work: it is carries out a compare of the small image at every possible position within the larger image (with that area it currently covers at this location).
The basic command to use -subimage-search is this:
compare -subimage-search largeimage.ext subimage.ext resultimage.ext
As a result of this command you should get not one, but two images:
results-0.ext : this image should display the (best) matching location.
results-1.ext : this should be a "heatmap" of potential top-left corner locations.
The second image (map of locations) displays how well the sub-image matches at the respective position: the brighter the pixel, the better the match.
The "map" image has smaller dimensions, because it contains only locations or each potential top-left corner of the sub-image while fitting completely into the larger one. Its dimensions are:
width = width_of_largeimage - width_of_subimage + 1
height = height_of_largeimage - height_of_subimage + 1
The searching itself is conducted on the basis of differences of color vectors. Therefore it should result in fairly accurate color comparisons.
In order to improve efficiency and speed of searching, you could follow this strategical plan:
First, compare a very, very small sub-image of the sub-image with the larger image. This should find different possible locations faster.
Then use the results from step 1 to conduct a difference compare at each previously discovered potential location for more accurate matches.
Practical Example
Let's create two different images first:
convert rose: subimage.jpg
convert rose: -mattecolor blue -frame 20x5 largeimage.png
The first image, sub-image.jpg (on the left), being a JPEG, will have some lossiness in the color encodings, so sub-image can not possibly create an exact match.
The main difference of second image, largeimage.png (on the right), will be the blue frame around the main part:
Now time the compare-command:
time compare -subimage-search largeimage.png subimage.jpg resultimage.png
# 40,5
real 0m17.092s
user 0m17.015s
sys 0m0.027s
Here are the results:
resultimage-0.png (displaying best matching location) on the left;
resultimage-1.png (displaying the "heatmap" of potential matches) on the right.
Conclusion: Incorrect result? Bug?
Looking at the resulting images, and knowing how the two images were constructed, it seems to me that the result is not correct:
The command should have returned # 20,5 instead of # 40,5.
The resultimage-0.png should have the red area moved to the left by 20 pixels.
The heatmap, resultimage-1.png seems to indicate the best matching location as the darkest pixel; maybe I was wrong about my above "the brighter the pixel the better the match" statement, and it should be "the darker the pixel..."?.
I'll submit a bug report to the ImageMagick developers and see what they have to say about it....
Update
As suggested by #dlemstra, a ImageMagick developer, I tested with adding a -metric operation to the subimage-search. This operation returns a numerical value indicating the closeness of a match. There are various metrics available, which can be listed with
convert -list metric
This returns the following list on my notebook (running ImageMagick v6.9.0-0 Q16 x86_64):
AE Fuzz MAE MEPP MSE NCC PAE PHASH PSNR RMSE
The meanings of these abbreviations are:
AE : absolute error count, number of different pixels (-fuzz effected)
Fuzz : mean color distance
MAE : mean absolute error (normalized), average channel error distance
MEPP : mean error per pixel (normalized mean error, normalized peak error)
MSE : mean error squared, average of the channel error squared
NCC : normalized cross correlation
PAE : peak absolute (normalized peak absolute)
PHASH : perceptual hash
PSNR : peak signal to noise ratio
RMSE : root mean squared (normalized root mean squared)
An interesting (and relatively recent) metric is phash ('perceptual hash'). It is the only one that does not require identical dimensions for comparing images directly (without the -subimage-search option). It normally is the best 'metric' to narrow down similarly looking images (or at least to reliably exclude these image pairs which look very different) without really "looking at them", on the command line and programatically.
I did run the subimage-search with all these metrics, using a loop like this:
for m in $(convert -list metric); do
echo "METRIC $m";
compare -metric "$m" \
-subimage-search \
largeimage.png \
sub-image.jpg \
resultimage---metric-${m}.png;
echo;
done
This was the command output:
METRIC AE
compare: images too dissimilar `largeimage.png' # error/compare.c/CompareImageCommand/976.
METRIC Fuzz
1769.16 (0.0269957) # 20,5
METRIC MAE
1271.96 (0.0194089) # 20,5
METRIC MEPP
compare: images too dissimilar `largeimage.png' # error/compare.c/CompareImageCommand/976.
METRIC MSE
47.7599 (0.000728769) # 20,5
METRIC NCC
0.132653 # 40,5
METRIC PAE
12850 (0.196078) # 20,5
METRIC PHASH
compare: images too dissimilar `largeimage.png' # error/compare.c/CompareImageCommand/976.
METRIC PSNR
compare: images too dissimilar `largeimage.png' # error/compare.c/CompareImageCommand/976.
METRIC RMSE
1769.16 (0.0269957) # 20,5
So the following metric settings did not work at all with -subimage-search, as also indicated by the "images too dissimilar" message:
PSNR, PHASH, MEPP, AE
(I'm actually a bit surprised that the failed metrics include the PHASH one here. This may require further investigations...)
The following resultimages looked largely correct:
resultimage---metric-RMSE.png
resultimage---metric-FUZZ.png
resultimage---metric-MAE.png
resultimage---metric-MSE.png
resultimage---metric-PAE.png
The following resultimages look similarly incorrect as my first run above where no -metric result was asked for:
resultimage---metric-NCC.png (also returning the same incorrect coordinates as # 40,5)
Here are the two resulting images for -metric RMSE (what Dirk Lemstra had suggested to use):
I found this tutorial on creating your own haar-classifier cascades.
This raised the question with me: what are the advantages, if any, of running HaarTraining, and creating your own classifier (as opposed to using the cascades provided by OpenCv)?
Haar or LBP cascades classifiers are common technique used for detection or rigid objects. So here are two major points for training your own cascade:
Cascades coming with OpenCV do not cover all possible object types. So you can use one of OpenCV cascades if you are going to create a face-detection application but there no ready to use cascades if you need to detect for example dogs.
And cascades from OpenCV are good but they are not the best possible. It is a challenging task but it is possible to train a cascade that will have higher detection rate and produce less false-positives and false-negatives.
And one major remark: haartraining application used in your tutorial is now considered as deprecated by OpenCV team. opencv_traincascade is a newer version and it has 2 important features: it supports LBP features and it supports multi-threading (TBB). Typical difference looks like this
haartraining + singlecore > 3 weeks for one classifier.
traincascades + multicore < 30 minutes for one classifier.
But the worst of all I don't know any good tutorials explaining usage of opencv_traincascade. See this thread for details.
I can give you a Linux example. The code and techniques were pulled from a variety of sources. It follows this example but with a python version of mergevec, so you don't have to compile the mergevec.cpp file.
Assuming that you have two folders with cropped & ready positive & negative images (.png files in this example), you create two text files with all the image names in:
find positive_images -iname "*.png" > positives.txt
find negative_images -iname "*.png" > negatives.txt
Then, using the createsamples.pl script provided by Naotoshi Seo (in the OpenCV/bin folder), which takes the two text files and an output folder, and creates the .vec files:
perl createsamples.pl positives.txt negatives.txt 'output' 1500 "opencv_createsamples -bgcolor 0 -bgthresh 0 -maxzangle 0.5 -w 50 -h 50"
Follow that with a python script created by Blake Wulfe called mergevec.py, which will create an output.vec file by combining all the .vec files in the subfolder
python mergevec.py -v samples -o output.vec
Assuming that is all done, using opencv_traincascade as follows should help:
opencv_traincascade -data classifier -vec output.vec -bg negatives.txt \
-numStages 10 -minHitRate 0.999 -maxFalseAlarmRate 0.5 -numPos 200 \
-numNeg 400 -w 50 -h 50 -mode ALL
If all that goes well, use your newly created cascade (classifier/cascade.xml) with something like facedetect.py from opencv samples:
opencv-3.0.0-rc1/samples/python2/facedetect.py --cascade classifier/cascade.xml test_movie.mp4