which classifier would be better for scanned images? - image-processing

I am presently working with 500 of scanned images and have done feature extraction on them and I'm using the obtained offset values for classifying the images. I tried using k nearest neighbor classifier and want to know if i am proceeding in the right way?My main objective is to classify the images
Any help would be appreciated... Thank you

There are many way to classify images. Most of time I do it in 3 steps :
Preprocessing (denoising, deskewing, etc.)
Extracting features (colors, shapes, textures, SIFT, ...)
Doing classification (kNN, SVM, Neural Network, ...)
Depending on the content of your images (natural images, biological images, document images, ...), you have to chose the features and then depending on the features you have to chose your classifier.
Be aware that supervised classification need some examples in order to learn.

KNN is one of the solutions.
What's the goal of your classification projects? For example:Do you need very high accuracy or recall? Do you need supervised or unsupervised solution.
Once you setup your goal, then you can do some related feature engineering and choose the right algorithm.

Related

Is there any alternative to convolutional neural networks to classify Images?

Deep learning is famous for classifying images into different categories. However, I am interested to use any other machine learning model which is capable of classifying the images. The images are about 2000 and are in png format. Does anybody know any machine learning model which can be applied in python to classify images other than Deep learning models.
You can take a look to SVMs (scikit-learn). I can advise you to extract features from images first, with SIFT or SURF for example.
EDIT: SIFT and SURF are using the principle of convolution, but it exists plenty of other feature descriptors.

Inception V3 Image Classification

How can I understand what features is the Google Inception V3 model using to classify a set of images, what features or pixels of the images are more significant for classifying them?
For instance, if the classifier were to distinguish between a Cheetah and a Leopard, it would probably do so by judging based on their spots. How can I determine what aspects of my images the classifier values most?
Your question is not easily answerable, Neural nets in general compose of hierarchical features where in the initial layers the neural net may learn to detect edges and blobs and in the deeper layers it learn more abstract features, so in a n class classification problems, where n might be a large number it is notoriously difficult to interpret what exactly the network learns and uses to classify images. Having said that Obviously work has been done,But i will refer you to https://distill.pub/2017/feature-visualization/, this should help you a bit

Image Classification with Support Vector Machine

I worked with Support Vector Machine for classification with skicit-learn library several time previously. But I only interacted with data contain text and number in ".csv" format. Currently, I am wanting to use Support Vector Machine for image classification. Can you help me how to convert image to type like ".csv" format in order to classification.
I would be very appreciated with any help. Thank you.
Sure, in general, one would define a so-called Feature Vector. It's a vector which contains numeric representations of certain, usually hand-crafted features. In the case of image classification this heavily depends on what you want to classify. Usually, the features in image classification systems are extracted by image processing algorithms such as HOG and SIFT.
But honestly, I wouldn't use SVMs in image classification task because it's usually a lot of work to define and combine features to get a good classifier. Try Convolutional Neural Networks instead. Those learn the necessary feature by them selfs. If you spend months of feature engineering for a good SVM classifier, a CNN could easily outperform your work after the first training.
There are two ways to implement SVM for image classification.
Extract hand crafted features like SIFT,HOG or similar for each image and store them in csv. Finally, apply svm over them.
Use deep learning, extract features before soft max classifier. Store those features in .csv and apply svm over it.

OpenCV: Training a soft cascade classifier

I've built an algorithm for pedestrian detection using openCV tools. To perform classification I use a boosted classifier trained with the CvBoost class.
The problem of this implementation is that I need to feed my classifier the whole set of features I used for training. This makes the algorithm extremely slow, so much that each image takes around 20 seconds to be fully analysed.
I need a different detection structure, and openCV has this Soft Cascade class that seems like exactly what I need. Its basic principle is that there is no need to examine all the features of a testing sample, since a detector can reject most negative samples using a small number of features. The problem is that I have no idea how to train one given a fully labeled set of negative and positive examples.
I find no information about this online, so I am looking for any tips you can give me on how to use this soft cascade to make classification.
Best regards

Adaboost feature selection

I am trying to train an adaboost classifier using the openCV library, for visual pedestrian detection.
I've come across the notion that adaboost allows the selection of the most relevant features, meaning, if I harvest 50.000 features from images and then use them to train a classifier, in the end of the training process I would be able to select, for example, the best 2000 out of those 50.000.
Then, this would allow me to harvest only those 2000 during the actual process for the sake of speed.
Is this even true? Or am I falling in a misconception?
If true,, is it possible to be done using the openCV library?
Best regards
Yes, this is true. That's exactly what boosting is all about.
Please, check the OpenCV documentation about training a cascade of boosted classifiers.

Resources