I'm working on a project for which I'd like to create a dataset of drawn faces (similar in concept to the CUFS dataset). Hand-drawing the faces aside, how would I go from "I have uploaded these image files to my computer and have ensured that they all have identical dimensions" to having a ready-to-use dataset? (I'd like to train/test LeNet with this dataset.) I've never created my own dataset before so am pretty unsure as to how to start.
Thanks!
You should convert the images into levelDB or LMDB. You can follow the exsample of convert_mnist_data.cpp.
Related
i am trying to create a personal project taking 4-5 guns from Pubg mobile along with their different skins. I want to create a image classifier , classifying all these guns separately. Can you please help me that how should I start and proceed. For example how to create the dataset , how to take images? What data augmentation to apply Like scaling, shifting, rotating etc. Which model to use Alex net? Vgg model?. Key points to keep in mind. Python libraries everything.
I am new to computer vision but I am trying to code an android/ios app which does the following:
Get the live camera preview and try to detect one flat image (logo or painting) in that. In real-time. Draw a rect around the logo if found. If there is no match, dont draw the rectangle.
I found the Tensorflow Object Detection API as a good starting point. And support was just announced for importing TensorFlow models into Core ML.
I followed a lot of tutorials to train my own object detector. The training data is the key. I found a pretty good library to generate augmented image. I have created hundreds of variation of my image source (rotation, skew etc ...).
But it has failed! This dataset is probably good for image classification (with my image in full screen) but not in context (the room).
I think transfer-learning is the key, In my case, I used the ssd_mobilenet_v1_coco model as a base. I tried to fake the context of my augmented image with the Random Erasing Data Augmentation technique without success.
What are my available solutions? Do I tackle the problem rightly? I need to make the model training as fast as possible.
May I have to use some datasets for indoor-outdoor image classification and put my image randomly above? How important are the perspectives?
Thank you!
I have created hundreds of variation of my image source (rotation, skew etc ...). But it has failed!
So that mean your model did not converge or the final performance was bad? If your model did not converge then add more data. "Hundred of samples" is very few. So use more images and make more samples, and make your sample s dispersed as possible.
I think transfer-learning is the key, In my case, I used the ssd_mobilenet_v1_coco model as a base. I tried to fake the context of my augmented image with the Random Erasing Data Augmentation technique without success.
You mean fine-tuning. Did you reduced the label to 2 (your image and background) and did fine-tuning. If you didn't then you surely failed. Oh man, you should at least show me your model definition.
What are my available solutions? Do I tackle the problem rightly? I need to make the model training as fast as possible.
To make training converge faster, just add more GPUs and train on multiple GPUs. If you don't have money, rent some GPU cluster on Azure. Believe me, it is not that expensive.
Hope that help
I am trying to understand openface.
I have installed the docker container, run the demos and read the docks.
What I am missing is, how to start using it correctly.
Let me explain to you my goals:
I have an app on a raspberry pi with a webcam. If I start the app it will take a picture of the person infront.
Now it should send this picture to my openface app and check, if the face is known. Known in this context means, that I already added pictures of this person to openface before.
My questions are:
Do I need to train openface before, or could I just put the images of the persons in a directory or s.th. and compare the webcam picture on the fly with these directories?
Do I compare with images or with a generated .pkl?
Do I need to train openface for each new person?
It feels like I am missing a big thing that makes the required workflow clearer to me.
Just for the record: With help of the link I mentioned I could figure it out somehow.
Do I need to train openface before, or could I just put the images of the persons in a directory or s.th. and compare the webcam picture on the fly with these directories?
Yes, a training is required in order to compare any images.
Do I compare with images or with a generated .pkl?
Images are compared with the generated classifier pkl.
The pkl file gets generated when training openface:
./demos/classifier.py train ./generated-embeddings/
This will generate a new file called ./generated-embeddings/classifier.pkl. This file has the SVM model you'll use to recognize new faces.
Do I need to train openface for each new person?
OK, for this question I don't have an answer yet. But just because I did not look deeper into this topic yet.
So I am working on a project for school and what we are trying to do is to teach a neural network to recognize buildings from non-buildings. The problem I am having right now is representing the data in a form, that would be "readable" by the classifier function.
The training data is a bunch of pictures + .wkt file with coordinates of buildings on a picture. So far we have been able to rescale the polygons, but kinda got stuck there.
Can you give any hints or ideas of how to bring this all to an appropriate form?
Edit: I do not need the code written for me, a link to an article on a similar subject or a book is more of stuff I am looking for.
You did not mention what framework you are using, but I will give an answer for caffe.
Your problem is very close to detecting objects within an image. You have full images with object (building in your case) bounding boxes.
The easiest way of doing this is through a python data layer which reads an image and a file with stored coordinates for that image and feeds that into your network. A tutorial on how to use it can be found here: https://github.com/NVIDIA/DIGITS/tree/master/examples/python-layer
To accelerate the process you may want to store image, coordinate pairs in your custom lmdb database.
Finally a good working example with complete caffe implementation can be found within Faster-RCNN library here: https://github.com/rbgirshick/caffe-fast-rcnn/
You should check roi_pooling_layer.cpp in their custom caffe branch and roi_data_layer on how the data is fed into the network.
I am doing a project on Writer Identification. I want to extract HOG features from Line Images of Arabic Handwriting. And than use Gaussian Mixture Model for Classification.
The link to the database containing the line Images is : http://khatt.ideas2serve.net/
So my questions are as follows;
There are three folders namely Test, Train and Validate. So, from which folder do I need to extract the features. And for what purpose should we use each of the folders.
Do we need to extract the features from individual images and merge them or is there any method to extract features of all the images together.
Test, Train and Validate
Read this stats SE question: What is the difference between test set and validation set?
This is basic machine learning, so you should probably go back and review your course literature, since it seems like you're missing some pretty important machine learning concepts.
Do we need to extract the features from individual images and merge them or is there any method to extract features of all the images together.
It seems, again, like you're missing basic concepts here. Histogram of oriented gradients subdivides the image and finds the oriented gradient. See this SO question for examples of hos this looks.
The traditional way of using HoG is: for each image in your training set, you extract the HoG, use these to train a SVM, validate the training with the validation set, then actually use the trained SVM on the test set.
You need to extract the HOG features from each image separately. Furthermore, you have to resize all images to be of the same size, otherwise all your HOG vectors will be of different length.
You can use the extractHOGFeatures function in MATLAB. See this example.