I am new to Weka, and from the examples on how to use it, I have only seen text problems. Can I use images in Weka with the machine learning classifiers?
You can directly do pixel classification using the Trainable Weka Segmentation plugin (former Advanced Weka Segmentation plugin) from Fiji/ImageJ.
The plugin is designed for segmentation via interactive learning. This means the user is expected to select a set of features (edge detectors, texture filters, etc.), choose the number of classes (by default there are 2) and interactively draw (with the ROI tools) samples of all classes. After training the classifier based on those samples, the whole image pixels will be classified and the segmentation result will be displayed overlaying the original image. The idea is to repeat this process (drawing + training) until obtaining a satisfying segmentation.
The plugin provides as well a set of tools to save/load the samples in ARFF format and save/load the classifier in .model format, so it's completely compatible with the latest version of WEKA.
If what you want to do is image classification, you might be able to reuse some of the plugin's methods as well.
You can use open source Image processing application such as ImageJ and Fiji to extract features from your image and use it in Weka
Fiji has a plugin called Advanced Weka Segmentation which should be very useful in applying Weka classifiers to Image
Weka machine learning classifiers works with numerical and categorical features. Before using weka with images, you need to extract features from your images.
According to your needs, simple features like average, maximum, mean may be enough. Or you may need to use some other algorithms for your images.
Below wikipedia feature extraction algorithms.
Low-level
Edge detection
Corner detection
Blob detection
Ridge detection
Scale-invariant feature transform
I suggest reading a optical character recognition survey to understand how they are used. OCR is pretty simple example for you to use. Standard data sets and algorithms exists for OCR. Therefore it is very instructive to learn about it.
Related
I want to classify image documents(like Passport, Driving Licence etc) using Machine Learning.
Does anybody has any link or documents where I can get idea to do this task.
What I am thinking is of first converting the document to text format and then fro Text file extract the information.But this I can do with one file at a time.
I want to know how can I perform this in millions of document.
You don't need to convert documents to text, you can do this with images directly.
To do image classification you can build basic CNNs with Keras library.
https://towardsdatascience.com/building-a-convolutional-neural-network-cnn-in-keras-329fbbadc5f5
This basic CNN will be enough for you to train an image classifier. But you want to get state of the art accuracy, I recommend get a pretrained resnet50 and train it to build an image classifier. Other than accuracy, there is another major advantage of using pre trained network, you'll need less data to train a robust image classifier.
https://engmrk.com/kerasapplication-pre-trained-model/?utm_campaign=News&utm_medium=Community&utm_source=DataCamp.com
The only thing that you'll need to change is number of output classes from 1000 to the number of classes you want.
I worked with Support Vector Machine for classification with skicit-learn library several time previously. But I only interacted with data contain text and number in ".csv" format. Currently, I am wanting to use Support Vector Machine for image classification. Can you help me how to convert image to type like ".csv" format in order to classification.
I would be very appreciated with any help. Thank you.
Sure, in general, one would define a so-called Feature Vector. It's a vector which contains numeric representations of certain, usually hand-crafted features. In the case of image classification this heavily depends on what you want to classify. Usually, the features in image classification systems are extracted by image processing algorithms such as HOG and SIFT.
But honestly, I wouldn't use SVMs in image classification task because it's usually a lot of work to define and combine features to get a good classifier. Try Convolutional Neural Networks instead. Those learn the necessary feature by them selfs. If you spend months of feature engineering for a good SVM classifier, a CNN could easily outperform your work after the first training.
There are two ways to implement SVM for image classification.
Extract hand crafted features like SIFT,HOG or similar for each image and store them in csv. Finally, apply svm over them.
Use deep learning, extract features before soft max classifier. Store those features in .csv and apply svm over it.
Is it possible to feed image features, say SIFT features, to a convolutional neural network model in Tensorflow? I am trying a tensorflow implementation of this project in which a grayscale image is coloured. Will image features be a better choice than feeding the images as is to the model?
PS. I am a novice to machine learning and is not familiar with creating neural n/w models
You can feed tensorflow neural net almost anything.
If you have extra features for each pixel, then instead of using one channel (intensity) you would use multiple channels.
If you have extra features, which are about whole image, you can make separate input a merge features at some upper layer.
As for the better performance, you should try both approaches.
General intuition is that, extra features help if you don't have many samples and their effect is diminishing if you have many samples and network can learn features by itself.
Also one more point: If you are novice, I strongly recommend using higher level framework like keras.io (which is layer over tensorflow) instead of tensorflow.
Does it make any sense to perform feature extraction on images using, e.g., OpenCV, then use Caffe for classification of those features?
I am asking this as opposed to the traditional way of passing the images directly to Caffe, and letting Caffe do the extraction and classification procedures.
Yes, it does make sense, but it may not be the first thing you want to try:
If you have already extracted hand-crafted features that are suitable for your domain, there is a good chance you'll get satisfactory results by using an easier-to-use machine learning tool (e.g. libsvm).
Caffe can be used in many different ways with your features. If they are low-level features (e.g. Histogram of Gradients), then several convolutional layers may be able to extract the appropriate mid-level features for your problem. You may also use caffe as an alternative non-linear classifier (instead of SVM). You have the freedom to try (too) many things, but my advice is to first try a machine learning method with a smaller meta-parameter space, especially if you're new to neural nets and caffe.
Caffe is a tool for training and evaluating deep neural networks. It is quite a versatile tool allowing for both deep convolutional nets as well as other architectures.
Of course it can be used to process pre-computed image features.
I am doing a project on Writer Identification. I want to extract HOG features from Line Images of Arabic Handwriting. And than use Gaussian Mixture Model for Classification.
The link to the database containing the line Images is : http://khatt.ideas2serve.net/
So my questions are as follows;
There are three folders namely Test, Train and Validate. So, from which folder do I need to extract the features. And for what purpose should we use each of the folders.
Do we need to extract the features from individual images and merge them or is there any method to extract features of all the images together.
Test, Train and Validate
Read this stats SE question: What is the difference between test set and validation set?
This is basic machine learning, so you should probably go back and review your course literature, since it seems like you're missing some pretty important machine learning concepts.
Do we need to extract the features from individual images and merge them or is there any method to extract features of all the images together.
It seems, again, like you're missing basic concepts here. Histogram of oriented gradients subdivides the image and finds the oriented gradient. See this SO question for examples of hos this looks.
The traditional way of using HoG is: for each image in your training set, you extract the HoG, use these to train a SVM, validate the training with the validation set, then actually use the trained SVM on the test set.
You need to extract the HOG features from each image separately. Furthermore, you have to resize all images to be of the same size, otherwise all your HOG vectors will be of different length.
You can use the extractHOGFeatures function in MATLAB. See this example.