Making a model for Nauto hand sign detection - machine-learning

I wanted to make a naruto hand sign detection model to later use it in a AR game and I tried implementing it in fastai using resnet50 after getting crowdsourcing the data( mostly me). But, I got a model predicting only one category for everything.
This is the kaggle link:
https://www.kaggle.com/vikranthkanumuru/naruto-hand-sign-detection-usin-fastai-diff-method
Not sure if this is a problem, but earlier I had around 28 images per group so I made a video of myself doing the various signs and used opencv to save frame by frame. I later removed the ones that did not confine to any group and this increased the size of the dataset from 220mb to 2GB. Was this proper or is it the reason the model is bad?
This is the link to the dataset
https://www.kaggle.com/vikranthkanumuru/naruto-hand-sign-dataset
I am not sure how to proceed further and would appreciate any help. Thank you very much.
Edit: If anyone is interested in the completed thing, here's the link: https://www.linkedin.com/feed/update/urn:li:activity:6640529067936440320/

I have fixed a bit of your code. Have a look. I did not run it for the whole stretch though but it should show you what you need.
The test set only contained those images but take a look at the validation.
Also switch to vgg16. resnet50 is not necessary here.
https://www.kaggle.com/subhaditya/naruto-hand-sign-detection-using-fastai?scriptVersionId=29471636

Related

Recognize specific images, not the objects in the images

I need to recognize specific images using the iPhone camera. My goal is to have a set of 20 images, that when a print or other display of one of them is present in front of the camera, the app recognizes that image.
I thought about using classifiers (CoreML), but I don't think it would give the intended result. For example, if I had a model that recognizes fruits, and then I showed it two different pictures of a banana, It would recognize them both as bananas, which is not what I want. I want my app to recognize specific images, regardless of its content.
The behavior I want is exactly what ARToolKit does (https://www.artoolkit.org/documentation/doku.php?id=3_Marker_Training:marker_nft_training), but I do not wish to use this library.
So my question is: Are the any other libraries, or other ways, for me to recognize specific images from the camera on iOS (preferably in Swift).
Since you are using images specific to your use case there isn't going to be an existing model that you can use. You'd have to create a model, train it, and then import it into CoreML. It's hard to provide specific advice since I know nothing about your images.
As far as libraries are concerned checkout this list and Swift-AI.
Swift-AI has a neural network that you might be able to train if you had enough images.
Most likely you will have to create the model in another language, such as Python and then import it into your Xcode project.
Take a look at this question.
This blog post goes into some detail about how to train your own model for CoreML.
Keras is probably your best bet to build your model. Take a look at this tutorial.
There are other problems too though like you only have 20 images. This is certainly not enough to train an accurate model. Also the user can present modified versions of these images. You'd have to generate realistic sample of each possible image and then use that entire set to train the model. I'd say you need a minimum of 20 images of each image (400 total).
You'll want to pre-process the image and extract features that you can compare to the known features of your images. This is how facial recognition works. Here is a guide for facial recognition that might be able to help you with feature extraction.
Simply put without a model that is based on your images you can't do much.
Answering my own question.
I ended up following this awesome tutorial that uses OpenCV to recognize specific images, and teaches how to make a wrapper so this code can be accessed by Swift.

Auto Text Recognition (OCR) from Image

I want to recognize nutrient information from package labels Sample Nutrient label . This is one package image, different brands may style/layout their labels differently. But I know some things for sure, layout would be somewhat tabular with certain key words in heading like 'Nutrient' as well as the content of the table will have certain common words, like Energy/Fat etc. I want to extract these values in text form and save it into my db.
The sample image is part of a bigger problem, finding the contour/box that might contain this section 'Nutrient Label'.
As I understand their are 3 broad steps.
Scan the input image (product front/back/side image) to look for the best contour that could be my target contour containing these nutrient information
Go to this contour and perform OCR (possibly retain the layout information and not output everything in 1 line)
scan the text and look for needed info.
I am a beginner in Image Recognition. it would be a great help,
If i could get a feedback on my approach. for instance should I look for text in Image or gather similar images and train a model and then do classification? similar to performing face recognition.
if someone has already solved this problem, it would be great to get some pointers (their is no fun reinventing the wheel).
If its a research problem, then relevant code/libraries/pointers/similar SO questions that I could refer to.
It would be highly appreciable if the answers are not general (like perform feature extraction, I would no clue what is feature extraction, instead a sample code pointer would be awesome.)
I thank you for your time and help.
thanks
Chahat
It would be required to collect at least 200-300 images for sufficient training.
2/3. I did solve the problem, but it was done using not a free solution, so I'm not supposed to give a directions here.

Haar training - where to obtain eyeglasses images?

I want to train a new haar-cascade for glasses as I'm not satisfied with the results I'm getting from the cascade that is included in OpenCV.
My main problem is that I'm not sure where to get eyeglasses images. I can manually search and download, but that's not practical for the amount of images I really need. I'm specifically looking for images of people wearing eyeglasses.
As this forum contain many experienced computer vision experts, I hope someone here can guide as to how to obtain images for training.
I'll also be happy to hear other approaches for detecting eyeglasses (on people).
Thanks in advance,
Gil
If you simply want images, it looks like #herhuyongtao pointed you to a good place. Then you can follow opencv's tutorial on training.
Another option is to see what others have trained:
There's a trained data set found here that might be of use, which states simply that it is "better". I'm assuming that it's supposed to be better than opencv.
I didn't immediately see any other places for trained or labeled data.

Sketch-based Image Retrieval with OpenCV or LIRe

I'm currently reading for BSc Creative Computing with the University of London and I'm in my last year of my studies. The only remaining module I have left in order to complete the degree is the Project.
I'm very interested in the area of content-based image retrieval and my project idea is based on that concept. In a nutshell, my idea is to help novice artists in drawing sketches in perspective with the use of 3D models as references. I intend to achieve this by rendering the side/top/front views of each 3D model in a collection, pre-process these images and index them. While drawing, the user gets a series of models (that have been pre-processed) that best match his/her sketch, which can be used as guidelines to further enhance the sketch. Since this approach relies on 3D models, it is also possible for the user to rotate the sketch in 3D space and continue drawing based on that perspective. Such approach could help comic artists or concept designers in quickly sketching their ideas.
While carrying out my research I came across LIRe and I must say I was really impressed. I've downloaded the LIRe demo v0.9 and I played around with the included sample. I've also developed a small application which automatically downloades, indexes and searches for similar images in order to better understand the inner workings of the engine. Both approaches returned very good results even with a limited set of images (~300).
Next experiment was to test the output response when a sketch rather than an actual image is provided as input. As mentioned earlier, the system should be able to provide a set of matching models based on the user's sketch. This can be achieved by matching the sketch with the rendered images (which are of course then linked to the 3D model). I've tried this approach by comparing several sketches to a small set of images and the results were quite good - see http://claytoncurmi.net/wordpress/?p=17. However when I tried with a different set of images, results weren't as good as the previous scenario. I used the Bag of Visual Words (using SURF) technique provided by LIRe to create and search through the index.
I'm also trying out some sample code that comes with OpenCV (I've never used this library and I'm still finding my way).
So, my questions are;
1..Has anyone tried implementing a sketch-based image retrieval system? If so, how did you go about it?
2..Can LIRe/OpenCV be used for sketch-based image retrieval? If so, how this can be done?
PS. I've read several papers about this subject, however I didn't find any documentation about the actual implementation of such system.
Any help and/or feedback is greatly appreciated.
Regards,
Clayton

GUI version of OpenCV for feature-detection (SIFT etc.) prototyping before actual project development?

I had an idea for which I need to be able to recognize certain objects or models from a rendered three dimensional digital movie.
After limited research, I know now that what I need is called feature detection in the field of Computer Vision.
So, what I want to do is:
create a few screenshots of a certain character in the movie (eg. front/back/leftSide/rightSide)
play the movie
while playing the movie, continuously create new screenshots of the movie
for each screenshot, perform feature detection (SIFT?, with openCV?) to see if any of our character appearances are there (they must still be recognized if the character is further away and thus appears smaller, or if the character is eg. lying down).
give a notice whenever the character is found
This would be possible with OpenCV, right?
The "issue" is that I would have to learn c++ or python to develop this application. This is not a problem if my movie and screenshots are applicable for what I want to do.
So, I would like to first test my screenshots of the movie. Is there a GUI version of OpenCV that I can input my test data and then execute it's feature detection algorithms manually as a means of prototyping?
Any feedback is appreciated. Thanks.
There is no GUI of OpenCV able to do what you want. You will be able to use OpenCV for some aspects of your problem, but there is no ready-made solution waiting there for you.
While it's definitely possible to solve your problem, the learning curve for this problem is quite long. If you're a professional, then an alternative to learning about it yourself would be to hire an expert to do it for you. It would cost money, but save you time.
EDIT
As far as template matching goes, you wouldn't normally use it to solve such a problem because the thing you're looking for is changing appearance and shape. There aren't really any "dynamic parameters to set". The closest thing you could try is have a massive template collection that would try to cover the expected forms that your target may take. But it would hardly be an elegant solution. Plus it wouldn't scale.
Next, to your point about face recognition. This is kind of related, but most facial recognition applications deal with a controlled environment: lighting, distance, pose, angle, etc. Outside of that controlled environment face detection effectiveness drops significantly. If you're detecting objects in a movie, then your environment isn't really controlled.
You may want to first try a simpler problem of accurately detecting where the characters are, without determining who they are (video surveillance, essentially). While it may sound simple, you'll find that it's actually non-trivial for arbitrary scenes. The result of solving that problem may be useful in identifying the characters.
There is Find-Object by Mathieu Labbé. It was very helpful for me to start getting an understanding of the descriptors since you can change them while your video is running to see what happens.
This is probably too late, but might help someone else looking for a solution.
Well, using OpenCV you would of taking a frame of a video file and do any computations on it.
You can do several different methods of detecting a character on that image, but it's not so easy to have it as flexible so you can even get that person if it's lying on the floor for example, if you only entered reference images of that character standing.
Basically you could try extracting all important features from your set of reference pictures and have a (in your case supervised) learning algorithm that gets a good feature-vector of that character for classification.
You then need to write your code that plays the video and which takes a video frame let's say each 500ms (or other as you desire), gets a segmentation of the object you thing would be that character and compare it with the reference values you get from your learning algorithm. If there's a match, your code can yell "Yehaaawww!" or do other things...
But all this depends on how flexible you want this to be. You could also try a template match or cross-correlation which basically shifts the reference image(s) over the frame and checks how equal both parts are. But this unfortunately is very sensitive for rotation, deformations or other noise... so you wouldn't get that person if its i.e. laying down. And I doubt you can get all those calculations done in realtime...
Basically: Yes OpenCV is good to use for your image processing/computer vision tasks. But it offers a lot of methods and ways and you'd need to find a way that works for your images... it's not a trivial task though...
Hope that helps...
Have you tried looking at some of the work of the Oxford visual geometry group?
Their Video Google system describes to a large extent what you want, instance detection.
Their work into Naming People in TV shows is also pretty relevant. A face detection and facial feature pipeline is included that can be run from Matlab. Are you familiar with Matlab?
Have you tried computer vision frameworks like Cassandra? There you can exactly do that just by some mouse clicks.

Resources