Training a Text Detection System - machine-learning

I'm currently developping a text detection system in a given image using logistic regression, and I need training data like the image below:
The first column show a positive example (y=1) of text wheras the second column show images without text (y=0).
I'm wondering where I can get a labeled dataset of this kind??
Thanks in advance.

A good place to start for these sorts of things is the UC Irvine Machine Learning Repository:
http://archive.ics.uci.edu/ml/
But maybe also consider heading over to Cross-Validated as well, for machine learning-related questions:
https://stats.stackexchange.com/

You can get a similar dataset here.
Hope it helps.

Related

new features in dataset

I'm now in the middle of the semester and trying to understand the background of the algorithms and features.
I would like to understand some theory.
If I have a dataset with N samples.
each sample has 5 features for example.
I have done 3 kinds of classifications algorithms for example : SVM, decision tree and kMeans.
In all 3, I got nice results
In a mystery way, a new feature added to the dataset. The value of the features for every sample selected randomly.
I restarted the algorithms on the dataset ( with the new feature)
Are the classification results gonna change from the first results without the new feature? If yes, why are they gonna change and by how much ?
In addition, if I do not have the dataset how can I know how to recognize that new feature?
The results of your classification algorithm are going to either change or stay the same depending on how much information the model gains from the feature. If the feature for instance is random noise then it will have little to no effect on your model, other than slowing it down. If it contains useful information it might be able to increase parameters such as recall and precision. Hope this might help.

How to get the outputs of hidden layers of pre-trained StyleGAN?

I currently study the styleGAN, and want to adjust some new loss functions, but the main problem I met is how to get the outputs of the hidden layers given a pre-trained styleGAN model. I want to get some feature maps in the middle layers of Discriminator and Generator as well. I've looked through the code released by Nvilab but didn't find a clue. Any suggestions? Thanks in advance.

Text Image Combined Classification Model

I am working on a problem statement where I have to match (text, image) pair. Given a furniture description and furniture image, I have to say they are same or not. This is a binary classification problem but I have to combine both text and image data.
One possible solution I am trying as follows
In the above diagram, I am combining the feature from the pre-trained text and image model and training the linear layer end to end.
Is there any other way to handle this type of problem. any leads are most welcome. thanks a lot in advance for your help.
I got a few recent works for the problem of "Image-Text Matching" which is slightly different from my problem statement but I can adjust the code for my project.
Transformer Reasoning Network for Image-Text Matching and Retrieval enter link description here
Stacked Cross Attention for Image-Text Matching enter link description here

Which deep learning model to use for capturing minor features in a image?

I have class which has slightly different features from the other class:
ex - This image has buckle in it (consider it as a class) https://6c819239693cc4960b69-cc9b957bf963b53239339d3141093094.ssl.cf3.rackcdn.com/1000006329245-822018-Black-Black-1000006329245-822018_01-345.jpg
But This image is quite similar to it but has no buckle :
https://sc01.alicdn.com/kf/HTB1ASpYSVXXXXbdXpXXq6xXFXXXR/latest-modern-classic-chappal-slippers-for-men.jpg
I am little confused about which model to use in these kind of cases which actually learns pixel to pixel values.
Any thoughts will be appreciable.
thanks !!
I have already tried Inception,Resnet etc models.
With a less volume train data (300-400 around each class) can we reach a good recall/precision/F1 score.
You might want to look into transfer learning due to the small dataset, what you can do is use a transferred ResNet model to work as a feature extractor and try a YOLO(You only look once) algorithm on it, look through each window(Look Sliding window implementation using ConvNets) to obtain a belt buckle and based on that you can classify the image.
Based on my understanding of your dataset, to do the above approach though you will need to re-annotate your dataset as per the requirements of YOLO algorithm.
To look at an example of the above approach, visit https://mc.ai/implementing-yolo-using-resnet-as-feature-extractor/
Edit If you have XML annotated Dataset and need to convert it to csv to follow the above example use https://github.com/datitran/raccoon_dataset
Happy modelling.

SVM Classification - minimum number of input sets for each class

I'm trying to build an app to detect images which are advertisements from the webpages. Once I detect those I`ll not be allowing those to be displayed on the client side.
From the help that I got on this Stackoverflow question, I thought SVM is the best approach to my aim.
So, I have coded SVM and an SMO myself. The dataset which I have got from UCI data repository has 3280 instances ( Link to Dataset ) where around 400 of them are from class representing Advertisement images and rest of them representing non-advertisement images.
Right now I'm taking the first 2800 input sets and training the SVM. But after looking at the accuracy rate I realised that most of those 2800 input sets are from non-advertisement image class. So I`m getting very good accuracy for that class.
So what can I do here? About how many input set shall I give to SVM to train and how many of them for each class?
Thanks. Cheers. ( Basically made a new question because the context was different from my previous question. Optimization of Neural Network input data )
Thanks for the reply.
I want to check whether I`m deriving the C values for ad and non-ad class correctly or not.
Please give me feedback on this.
Or you u can see the doc version here.
You can see graph of y1 eqaul to y2 here
and y1 not equal to y2 here
There are two ways of going about this. One would be to balance the training data so it includes an equal number of advertisement and non-advertisement images. This could be done by either oversampling the 400 advertisement images or undersampling the thousands of non-advertisement images. Since training time can increase dramatically with the number of data points used, you should probably first try undersampling the non-advertisement images and create a training set with the 400 ad images and 400 randomly selected non-advertisements.
The other solution would be to use a weighted SVM so that margin errors for the ad images are weighted more heavily than those for non-ads, for the package libSVM this is done with the -wi flag. From your description of the data, you could try weighing the ad images about 7 times more heavily than the non-ads.
The required size of your training set depends on the sparseness of the feature space. As far as I can see, you are not discussing what image features you have chose to use. Before you can train, you need to to convert each image into a vector of numbers (features) that describe the image, hopefully capturing the aspects that you care about.
Oh, and unless you are reimplementing SVM for sport, I'd recomment just using libsvm,

Resources