Google vision api vs build your own [closed] - machine-learning

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I have quite a challenging use case for image recognition. I want to detect composition of mixed recycling e.g. Crushed cans,paper,bottles and detect any anomalies such as glass, bags, shoes etc.
Trying images with the google vision api the results are mainly "trash", "recycling" "plastic" etc likely because the api hasn't been trained on mixed and broken material like this?.
For something like this would I have to go for something like tensor flow and build a neural network from my own images? I guess I wouldn't need to use google for this as tensor flow is open source?
Thanks.

So generally, when ever you apply machine learning to a new, real world use case, it is a good idea to get your hands on a representative dataset, in your case it would be images of these trash materials.
Then you can pick an appropriate detection model (VGG, Inception, ResNet), modify the final classification layer to output as many category labels as you require (maybe 'normal' or 'anomaly' in your case, so 2 classes).
Then you load the pre-trained weights for this network, because the learned features generalize (google 'Transfer Learning'), initialize your modified classification layer randomly, and then train the last layer, maybe train the last two layers, or last three layers (depending on what works best, how much data you have, generalization) etc.
So, in short:
1. Pick a pretrained model.
2. Modify it for your problem.
3. Finetune the weights on your own dataset.

Related

When should I train my own models and when should I use pretrained models? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Is it recommended to train my own models for things like sentiment analysis, despite only having a very small dataset (5000 reviews), or is it best to use pretrained models which were trained on way larger datasets, however aren't "specialized" on my data.
Also, how could I train my model on my data and then later use it on it too? I was thinking of an iterative approach where the training data would be randomly selected subset of my total data for each learning epoch.
I would go like this:
Try the pre-trained model and see how it goes
If results are non satisfactory, you can fine tune it (see this tutorial). Basically, you are using your own examples to change the weights of the pre-trained model. This should improve the results, but it depends on how your data is and how many examples you can provide. The more you have, the better it should be (I would try to use 10-20k at least)
Also, how could I train my model on my data and then later use it on it too?
Be careful to distinguish between pre-train and fine-tuning.
For pre-training you need a huge amount of text (like billions of characters), it is very resource demanding, and tipically you don't want to do that, unless for a very good reason (for example, a model for your target language does not exist).
Fine-tuning requires much much less examples (some tents of thousands), it take tipycally less than a day on a single GPU and allow you to exploit pre-trained model created by someone else.
From what you write, I would go with fine-tune.
Of course you can save the model for later, as you can see in the tutorial I linked above:
model.save_pretrained("my_imdb_model")

How do I collect huge face dataset programatically for facial recognition? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
So, I am trying to work on this facial recognition system using Facenet. The difficulty with the project is the data, because I need at-least 100K class of labeled face images. Eventually I want to store the encodings in the database for real time face detection. There are datasets like 'labeled faces in the wild' which consists of huge face dataset but are inconsistent with the quality of the face images and the number of images on each class. I also looked into how facenet was trained on and found out that it was trained on '1 million celebrity face dataset'. I assume I can't use it because it was used to train the facenet which I am trying to use for my project. So, my question is how do I programatically collect face dataset? Thank you
I recommend you to look at VGGFace2 data set. It stores 3.3M face images of 9K+ identities.
Another good one is FaceScrub. It stores 100K face images of 530 identities.
FaceNet is neither trained with VGGFace2 or FaceScrub. Nowadays, many studies train models with those data set and test models on Labeled Faces in the Wild (LFW) Data set. LFW stores 13K face images of 5749 identities.

What are the good practices to building your own custom facial recognition? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am working on building a custom facial recognition for our office.
I am planning to use Google FaceNet,
Now my question is that you can find or create your own version of facenet model in keras or pytorch there's no issue in that, but regarding creating dataset ,I want to know what are the best practices to capture photo of person when I don't have any prior photo of that person,all I have is a camera and a person ,should I create variance in by changing lightning condition or orientation or face size ?
A properly trained FaceNet model should already be somewhat invariant to lighting conditions, pose and other features that should not be a part of identifying a face. At least that is what is claimed in a draft of the FaceNet paper. If you only intend to compare feature vectors generated from the network, and intend to recognize a small group of people, your own dataset likely does not have to be particulary large.
Personally I have done something quite similar to what you are trying to achieve for a group of around ~100 people. The dataset consisted of 1 image per person and I used a 1-N-N classifier to classify the generated feature vectors. While I do not remember the exact results, it did work quite well. The pretrained network's architecture was different from FaceNet's but the overall idea was the same though.
The only way to truly answer your question though would be to experiment and see how well things work out in practice.

how to handle text and image input together in neural network algorithm [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am working on neural network machine learning algorithm . i wanted to know input data formats applicable in NN feature list. is there possibility to handle text & image together as a input in CNN or using any other machine learning algorithm. how will i make sense of output.
Yes, it is possible to handle text and image data together.
The feature vectors created using each text data-point or image data-point can be combined together and used in parallel as a new big feature vector.
After vectorization of text data, there is not much difference between the pixel vectors and the text vectors.
Specifically in case of CNN, for the final model, a combined neural network can be created that has a convolutional branch one on side while the vectorized words branch on the other side.
Image Credits: This image from Christopher Bonnett's article
For more details, please refer to above mentioned article. It has explained how e-commerce products can be classified into various category hierarchies using both image and text data.

Tensorflow Count Objects in Image [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
New to machine learning so looking for some direction how to get started. The end goal is to be able to train a model to count the number of objects in an image using Tensorflow. My initial focus will be to train the model to count one specific type of object. So lets say I take coins. I will only train the model to count coins. Not worried about creating a generic counter for all different types of objects. I've only done Google's example of image classification of flowers and I understand the basics of that. So looking for clues how to get started. Is this an image classification problem and I can use the same logic as the flowers...etc etc?
Probably the best performing solution for the coin problem would be to use a regression to solve this. Annotate 5k images with the amount of objects in the scene and run your model on it. Then your model just outputs the correct number. (Hopefully)
Another way is to classify if an image shows a coin and use a sliding window approach like this one: https://arxiv.org/pdf/1312.6229.pdf to classify for each window if it shows a coin. Then you count the found regions. This one is easier to annotate and learn and better extensible. But you have the problem of choosing good windows and using the result of those windows in a concise way.

Resources