SVMlight train data formatting - machine-learning

I am trying to classify the reuters text using svm light but my train data does not follow the format
<'line> .=. <'target> <'feature>:<'value> <'feature>:<'value> ... <'feature>:<'value> # <'info>
it is of the form
<'line> .=. <'feature>:<'value> <'feature>:<'value> ... <'feature>:<'value> # <'info>
The target label is in a separate file.
I know there's an option in SVM light that lets you specify a separate target label file but i cannot find it on the SVM light website because a get an arror message:
Reading examples into memory...Line must start with label or 0!!!
whenever i load my training data using
svm_learn example1/train.dat example1/model
any help ?

Performing a rigorous research i realized that there is no syntax in SVM light that allows users to specify an external class label file for training data. The class labels must be part of the training data and it should follow the "target feature:value" format of SVM light

Related

Yolo training yolo with own dataset

I want to build a database with Yolo and this is my first time working with deep learning
how can I build a database for Yolo and train it?
How do I get the weights of the classifications?
Is it too difficult for someone new to Deep Learning?
Yes you can do it with ease!! and welcome to the Deep learning Community. You are welcome.
First download the darknet folder from Link
Go inside the folder and type make in command prompt
git clone https://github.com/pjreddie/darknet
cd darknet
make
Define these files -
data/custom.names
data/images
data/train.txt
data/test.txt
Now its time to label the images using LabelImg and save it in YOLO format which will generate corresponding label .txt files for the images dataset.
Labels of our objects should be saved in data/custom.names.
Using the script you can split the dataset into train and test-
import glob, os
dataset_path = '/media/subham/Data1/deep_learning/usecase/yolov3/images'
# Percentage of images to be used for the test set
percentage_test = 20
# Create and/or truncate train.txt and test.txt
file_train = open('train.txt', 'w')
file_test = open('test.txt', 'w')
# Populate train.txt and test.txt
counter = 1
index_test = round(100 / percentage_test)
for pathAndFilename in glob.iglob(os.path.join(dataset_path, "*.jpg")):
title, ext = os.path.splitext(os.path.basename(pathAndFilename))
if counter == index_test+1:
counter = 1
file_test.write(dataset_path + "/" + title + '.jpg' + "\n")
else:
file_train.write(dataset_path + "/" + title + '.jpg' + "\n")
counter = counter + 1
For train our object detector we can use the existing pre trained weights that are already trained on huge data sets. From here we can download the pre trained weights to the root directory.
Create a yolo-custom.data file in the custom_data directory which should contain information regarding the train and test data sets
classes=2
train=custom_data/train.txt
valid=custom_data/test.txt
names=custom_data/custom.names
backup=backup/
Now we have to make changes in our yolov3.cfg for training our model. For two classes. Based on the required performance we can select the YOLOv3 configuration file. For this example we will be using yolov3.cfg. We can duplicate the file from cfg/yolov3.cfg to custom_data/cfg/yolov3-custom.cfg
The maximum number of iterations for which our network should be trained is set with the param max_batches=4000. Also update steps=3200,3600 which is 80%, 90% of max_batches.
We will need to update the classes and filters params of [yolo] and [convolutional] layers that are just before the [yolo] layers.
In this example since we have a single class (tesla) we will update the classes param in the [yolo] layers to 1 at line numbers: 610, 696, 783
Similarly we will need to update the filters param based on the classes count filters=(classes + 5) * 3. For two classes we should set filters=21 at line numbers: 603, 689, 776
All the configuration changes are made to custom_data/cfg/yolov3-custom.cfg
Now, we have defined all the necessary items for training the YOLOv3 model. To train-
./darknet detector train custom_data/detector.data custom_data/cfg/yolov3-custom.cfg darknet53.conv.74
Also you can mark bounded boxes of objects in images for training Yolo right in your web browser, just open url. This tool is deployed to GitHub Pages.
Use this popular forked darknet repository https://github.com/AlexeyAB/darknet. The author describes many steps that will help you to build and use your own Yolo detector model.
How to build your own custom dataset and train it? Follow this step . He suggests to use Yolo Mark labeling tool to build your dataset, but you can also try another tool as described in here and here.
How to get the weights? The weights will be stored in darknet/backup/ directory after every 1000 iterations (you can adjust this value later). The link above explains everything about how to make and use the weights file.
I don't think it will be so difficult if you already know math, statistic and programming. Learning the basic neural network like perceptron, MLP then move to modern Machine Learning is a good start. Then you might want to expand your knowledge to Computer Vision related or NLP related area
Depending on what kind of OS you have. You can either hit up https://github.com/AlexeyAB/darknet [especially for Windows] or stick to https://github.com/pjreddie/darknet.
Steps to do so:
1) Setup darknet as detailed in the posts.
2) I used LabelIMG to label my images. make sure that the format
you save the images is in YOLO. If you save using the PascalVOC format or others you can write scripts to change it to the format that darknet expects.[YOLO]. Also, make sure that you do not change your labels file. If you want to add new labels, at it at the end of the file, not in between. YOLO format is quite different, so your previously labelled images may get messed up if you make changes in between the classes.
3)The weights will be generated as you train your model in a specific folder in darknet.[If you need more details I am happy to help answer that]. You can download the .74 file in YOLO and start training. The input to train needs a built darknet.exe a cfg file a .74 file and your training data location/access.
The setup is draconian, the process itself is not.
To build your own dataset, you should use LabelImg. It's a free and very easy software which will produce for you all the files you need to build a dataset. In fact, because you are working with yolo, you need a txt file for each of your image which will contain important information like bbox coordinates, label name. All these txt files are automatically produced by LabelImg so all you have to do is open the directory which contains all your images with LabelImg, and start the labelisation. Then, well you will have all your txt files, you will also need to create some other files in order to start training (see https://blog.francium.tech/custom-object-training-and-detection-with-yolov3-darknet-and-opencv-41542f2ff44e).

what's meaning of function predict's returned value in OpenCV?

I use function predict in opencv to classify my gestures.
svm.load("train.xml");
float ret = svm.predict(mat);//mat is my feature vector
I defined 5 labels (1.0,2.0,3.0,4.0,5.0), but in fact the value of ret are (0.521220207,-0.247173533,-0.127723947······)
So I am confused about it. As Opencv official document, the function returns a class label (classification) in my case.
update: I don't still know why to appear this result. But I choose new features to train models and the return value of predict function is what I defined during train phase (e.g. 1 or 2 or 3 or etc).
During the training of an SVM you assign a label to each class of training data.
When you classify a sample the returned result will match up with one of these labels telling you which class the sample is predicted to fall into.
There's some more documentation here which might help:
http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html
With Support Vector Machines (SVM) you have a training function and a prediction one. The training function is to train your data and save those informations on an xml file (it facilitates the prediction process in case you use a huge number of training data and you must do the prediction function in another project).
Example : 20 images per class in your case : 20*5=100 training images,each image is associated with a label of its appropriate class and all these informations are stocked in train.xml)
For the prediction function , it tells you what's label to assign to your test image according to your training DATA (the hole work you did in training process). Your prediction results might be good and might be bad , it's all about your training data I think.
If you want try to calculate the error rate for your classifier to see how much it can give good results or bad ones.

Where can I find the label map between trained model like googleNet's output to there real class label?

everyone, I am new to caffe. Currently, I try to use the trained GoogleNet which was downloaded from model zoo to classify some images. However, the network's output seem to be a vector rather than real label(like dog, cat).
Where can I find the label-map between trained model like googleNet's output to their real class label?
Thanks.
If you got caffe from git you should find in data/ilsvrc12 folder a shell script get_ilsvrc_aux.sh.
This script should download several files used for ilsvrc (sub set of imagenet used for the large scale image recognition challenge) training.
The most interesting file (for you) that will be downloaded is synset_words.txt, this file has 1000 lines, one line per class identified by the net.
The format of the line is
nXXXXXXXX description of class

WEKA 3.7.10 not compatible format, class index differ

I use weka for text classification, I have a train set and untagged test set, the goal is to classify test set.
In WEKA 3.6.6 everything goes well, I can select Supplied test set and train the model and get result.
On the same files, WEKA 3.7.10 says that
Train and test set are not compatible. Would you like to automatically wrap the classifier in "inputMappedClassifier" before porceeding?
And when I press No it outputs the following error message
Problem evaluating classfier: Train and test are not compatible Class index differ
: 2!= 0
I understand that the key is Class index differ: 2!= 0.
However what does it mean? Why it works in WEKA 3.6.6 and not compatible in WEKA 3.7.10?
How can I make the test set compatible to train set?
When you import the supplied test set, are you selecting the same class attribute as the one that you use in the train set? If you don't change this field, weka selects the last attribute as being the class automatically.

Predicting text data labels in test data set with Weka?

I am using the Weka gui to train a SVM classifier (using libSVM) on a dataset. The data in the .arff file is
#relation Expandtext
#attribute message string
#attribute Class {positive, negative, objective}
#data
I turn it into a bag of words with String-to-Word Vector, run SVM and get a decent classification rate. Now I have my test data I want to predict their labels which I do not know. Again it's header information is the same but for every class it is labeled with a question mark (?) ie
'Musical awareness: Great Big Beautiful Tomorrow has an ending\u002c Now is the time does not', ?
Again I pre-processed it, string-to-word-vector, class is in the same position as the training data.
I go to the "classify" menu, load up my trained SVM model, select "supplied test data", load in the test data and right click on the model saying "Re-evaluate model on current test set" but it gives me the error that test and train are not compatible. I am not sure why.
Am I going about this the wrong way to label the test data? What am I doing wrong?
For almost any machine learning algorithm, the training data and the test data need to have the same format. That means, both must have the same features, i.e. attributes in weka, in the same format, including the class.
The problem is probably that you pre-process the training set and the test set independently, and the StrintToWordVectorFilter will create different features for each set. Hence, the model, trained on the training set, is incompatible to the test set.
What you rather want to do is initialize the filter on the training set and then apply it on both training and test set.
The question Weka: ReplaceMissingValues for a test file deals with this issue, but I'll repeat the relevant part here:
Instances train = ... // from somewhere
Instances test = ... // from somewhere
Filter filter = new StringToWordVector(); // could be any filter
filter.setInputFormat(train); // initializing the filter once with training set
Instances newTrain = Filter.useFilter(train, filter); // configures the Filter based on train instances and returns filtered instances
Instances newTest = Filter.useFilter(test, filter); // create new test set
Now, you can train the SVM and apply the resulting model on the test data.
If training and testing have to be in separate runs or programs, it should be possible to serialize the initialized filter together with the model. When you load (deserialize) the model, you can also load the filter and apply it on the test data. They should be compatibel now.

Resources