How to do image augmentation and use XML file - image-processing

it's my first time posting here so I apologize in advance if I'm doing anything wrong.
So I'm currently creating my dataset for my NN. And after doing hand-labeling I've realized I would like to have a bigger dataset. However, doing hand-labeling in thousands of images is not very time efficient. But I can't seem to find a way to do image augmentation that automatically use and update the XML file. Is there a way to do this?
Thank you, again

Related

In ML, using RNN for an NLP project, is it necessary for DATA redundancy?

Is it necessary to repeat similar template data... Like the meaning and context is the same, but the smaller details vary. If I remove these redundancies, the dataset is very small (size in hundreds) but if the data like these are included, it easily crosses thousands. Which is the right approach?
SAMPLE DATA
This is acutally not a question suited for stack overflow but I'll answer anyways:
You have to think about how the emails (or what ever your data this is) will look in real-life usage: Do you want to detect any kind of spam or just similiar to what your sample data shows? If the first is the case, your dataset is just not suited for this problem since there are not enough various data samples. When you think about it, every of the senteces are exactly the same because the company name isn't really valueable information and will probably not be learned as a feature by your RNN. So the information is almost the same. And since every input sample will run through the network multiple times (once each epoch) it doesnt really help having almost the same sample multiple times.
So you shouldnt have one kind of almost identical data samples dominating your dataset.
But as I said: When you primarily want to filter out "Dear customer, we wish you a ..." you can try it with this dataset but you wouldnt really need an RNN to detect that. If you want to detect all kind of spam, you should search for a new dataset since ~100 unique samples are not enough. I hope that was helpful!

OpenCv Training Taking a long time

Hi I would like to ask if anybody has tried OpenCV.Eigenface.Train function on labelled faces of the wild if it had taken a long time for them as well?
Currently, training has lasted almost 24hr
you probably need to set how many faces you want to take from the dataset for the training purpose because there are probably a lot of images in the files you described.

Large dataset processing for Tensorflow Federated

What is the efficient way to prepare ImageNet (or other big datasets) for Tensorflow federated simulations? Particularly with applying custom map function on tf.Dataset object? I looked into the tutorials and docs but did not find anything helpful for this usecase. This tutorial (https://www.tensorflow.org/federated/tutorials/custom_federated_algorithms_2) shows MNIST processing but this dataset is relatively small.
Could you please clarify what exactly you mean by "efficient" in this context. I presume you've tried something, and it wasn't working as expected. Could you please describe here how you went about setting it up, and what problems you ran into. Thanks!
One thing to note is that the runtime included in the first release will only work with datasets that fit in memory. Perhaps this is the limitation you are running into.

Haar training - where to obtain eyeglasses images?

I want to train a new haar-cascade for glasses as I'm not satisfied with the results I'm getting from the cascade that is included in OpenCV.
My main problem is that I'm not sure where to get eyeglasses images. I can manually search and download, but that's not practical for the amount of images I really need. I'm specifically looking for images of people wearing eyeglasses.
As this forum contain many experienced computer vision experts, I hope someone here can guide as to how to obtain images for training.
I'll also be happy to hear other approaches for detecting eyeglasses (on people).
Thanks in advance,
Gil
If you simply want images, it looks like #herhuyongtao pointed you to a good place. Then you can follow opencv's tutorial on training.
Another option is to see what others have trained:
There's a trained data set found here that might be of use, which states simply that it is "better". I'm assuming that it's supposed to be better than opencv.
I didn't immediately see any other places for trained or labeled data.

CRF++ or CRFSuite

I'm starting to work with crf++ and crfsuite (both use a very similar file format). I want to do things related to images (segmentation, activiy recognition, etc). My main problem is how to build the training file. Has anybody work with crf and images? Has anybody explain me or give some file to learn.
Thanks in advance.
CRFsuite is faster than CRF++ and it can deal with a huge training data. I tried both of them. They perfectly work on a reasonable amount of data, but when my dataset increased to be more than 100,000 sentences, CRF++ did not manage to deal with it and suddenly stopped working.
Look at the following link
CRFsuite - CRF Benchmark test
there is a comparison between many CRF software in some criteria
I used CRF++ before and it worked very well.
But my field is natural language processing, and I use CRF++ for named entity recognition or POS tagging. CRF++ is easy to install on Linux but has some minor issue when compiling on windows.
You can just follow its document for training data format: each row represents a data sample and each column represents a feature type.
Or, you can also consider Mallet which has a CRF component.
Probably you should start with the DGM library (https://github.com/Project-10/DGM), which is the best choice for those, who never worked with CRFs before. It includes a number of ready-to-go demo projects, which will classify/ segment your images just out-of-the-box. It is also well documented.
I have just came across this one for Windows:
http://crfsharp.codeplex.com/
maybe you also want to try CRF component in Mallet package.

Resources