Deep Convolutional Networks [closed] - image-processing

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I would like to do some object detection where have two restrictions.
First one is that at the moment I don't have large number of images for training (at the moment are around 550 images).
Second, most likely I will not be able to see the whole object, there will be available only some part of the object that I try to detect.
My question is it good to try Deep Convolutional Networks
via Bayesian Optimization and Structured Prediction for this kind of situation?
I have this paper as a reference:
Deep Convolutional Networks via Bayesian Optimization and Structured Prediction.

You need to offer us more details. The answer to what CNN should I use? and do I have enough images for that? depends on several factors:
1- How many objects for 550 images? Each object is a class, if you have 550 images from 2 different objects that might be enough, but if you have 550 objects thats only 1 image per object, which is definitely not enough.
2- What is the size of your images? Does it change among them? The 550 images contain parts of the object or the whole object?
After knowing the answer to these questions you can select your CNNs architecture and your data augmentation strategy.
Structured receptive fields have shown better results for small datasets than the normal CNN. Here's a papers to it: https://arxiv.org/abs/1605.02971

Related

How to choose which model to fit to data? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
My question is given a particular dataset and a binary classification task, is there a way we can choose a particular type of model that is likely to work best? e.g. consider the titanic dataset on kaggle here: https://www.kaggle.com/c/titanic. Just by analyzing graphs and plots, are there any general rules of thumb to pick Random Forest vs KNNs vs Neural Nets or do I just need to test them out and then pick the best performing one?
Note: I'm not talking about image data since CNNs are obv best for those.
No, you need to test different models to see how they perform.
The top algorithms based on the papers and kaggle seem to be boosting algorithms, XGBoost, LightGBM, AdaBoost, stack of all of those together, or just Random Forests in general. But there are instances where Logistic Regression can outperform them.
So just try them all. If the dataset is >100k, you're not gonna lose that much time, and you might learn something valuable about your data.

What are the good practices to building your own custom facial recognition? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am working on building a custom facial recognition for our office.
I am planning to use Google FaceNet,
Now my question is that you can find or create your own version of facenet model in keras or pytorch there's no issue in that, but regarding creating dataset ,I want to know what are the best practices to capture photo of person when I don't have any prior photo of that person,all I have is a camera and a person ,should I create variance in by changing lightning condition or orientation or face size ?
A properly trained FaceNet model should already be somewhat invariant to lighting conditions, pose and other features that should not be a part of identifying a face. At least that is what is claimed in a draft of the FaceNet paper. If you only intend to compare feature vectors generated from the network, and intend to recognize a small group of people, your own dataset likely does not have to be particulary large.
Personally I have done something quite similar to what you are trying to achieve for a group of around ~100 people. The dataset consisted of 1 image per person and I used a 1-N-N classifier to classify the generated feature vectors. While I do not remember the exact results, it did work quite well. The pretrained network's architecture was different from FaceNet's but the overall idea was the same though.
The only way to truly answer your question though would be to experiment and see how well things work out in practice.

Google vision api vs build your own [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I have quite a challenging use case for image recognition. I want to detect composition of mixed recycling e.g. Crushed cans,paper,bottles and detect any anomalies such as glass, bags, shoes etc.
Trying images with the google vision api the results are mainly "trash", "recycling" "plastic" etc likely because the api hasn't been trained on mixed and broken material like this?.
For something like this would I have to go for something like tensor flow and build a neural network from my own images? I guess I wouldn't need to use google for this as tensor flow is open source?
Thanks.
So generally, when ever you apply machine learning to a new, real world use case, it is a good idea to get your hands on a representative dataset, in your case it would be images of these trash materials.
Then you can pick an appropriate detection model (VGG, Inception, ResNet), modify the final classification layer to output as many category labels as you require (maybe 'normal' or 'anomaly' in your case, so 2 classes).
Then you load the pre-trained weights for this network, because the learned features generalize (google 'Transfer Learning'), initialize your modified classification layer randomly, and then train the last layer, maybe train the last two layers, or last three layers (depending on what works best, how much data you have, generalization) etc.
So, in short:
1. Pick a pretrained model.
2. Modify it for your problem.
3. Finetune the weights on your own dataset.

how to handle text and image input together in neural network algorithm [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am working on neural network machine learning algorithm . i wanted to know input data formats applicable in NN feature list. is there possibility to handle text & image together as a input in CNN or using any other machine learning algorithm. how will i make sense of output.
Yes, it is possible to handle text and image data together.
The feature vectors created using each text data-point or image data-point can be combined together and used in parallel as a new big feature vector.
After vectorization of text data, there is not much difference between the pixel vectors and the text vectors.
Specifically in case of CNN, for the final model, a combined neural network can be created that has a convolutional branch one on side while the vectorized words branch on the other side.
Image Credits: This image from Christopher Bonnett's article
For more details, please refer to above mentioned article. It has explained how e-commerce products can be classified into various category hierarchies using both image and text data.

Tensorflow Count Objects in Image [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
New to machine learning so looking for some direction how to get started. The end goal is to be able to train a model to count the number of objects in an image using Tensorflow. My initial focus will be to train the model to count one specific type of object. So lets say I take coins. I will only train the model to count coins. Not worried about creating a generic counter for all different types of objects. I've only done Google's example of image classification of flowers and I understand the basics of that. So looking for clues how to get started. Is this an image classification problem and I can use the same logic as the flowers...etc etc?
Probably the best performing solution for the coin problem would be to use a regression to solve this. Annotate 5k images with the amount of objects in the scene and run your model on it. Then your model just outputs the correct number. (Hopefully)
Another way is to classify if an image shows a coin and use a sliding window approach like this one: https://arxiv.org/pdf/1312.6229.pdf to classify for each window if it shows a coin. Then you count the found regions. This one is easier to annotate and learn and better extensible. But you have the problem of choosing good windows and using the result of those windows in a concise way.

Resources