Yolo training yolo with own dataset - opencv

I want to build a database with Yolo and this is my first time working with deep learning
how can I build a database for Yolo and train it?
How do I get the weights of the classifications?
Is it too difficult for someone new to Deep Learning?

Yes you can do it with ease!! and welcome to the Deep learning Community. You are welcome.
First download the darknet folder from Link
Go inside the folder and type make in command prompt
git clone https://github.com/pjreddie/darknet
cd darknet
make
Define these files -
data/custom.names
data/images
data/train.txt
data/test.txt
Now its time to label the images using LabelImg and save it in YOLO format which will generate corresponding label .txt files for the images dataset.
Labels of our objects should be saved in data/custom.names.
Using the script you can split the dataset into train and test-
import glob, os
dataset_path = '/media/subham/Data1/deep_learning/usecase/yolov3/images'
# Percentage of images to be used for the test set
percentage_test = 20
# Create and/or truncate train.txt and test.txt
file_train = open('train.txt', 'w')
file_test = open('test.txt', 'w')
# Populate train.txt and test.txt
counter = 1
index_test = round(100 / percentage_test)
for pathAndFilename in glob.iglob(os.path.join(dataset_path, "*.jpg")):
title, ext = os.path.splitext(os.path.basename(pathAndFilename))
if counter == index_test+1:
counter = 1
file_test.write(dataset_path + "/" + title + '.jpg' + "\n")
else:
file_train.write(dataset_path + "/" + title + '.jpg' + "\n")
counter = counter + 1
For train our object detector we can use the existing pre trained weights that are already trained on huge data sets. From here we can download the pre trained weights to the root directory.
Create a yolo-custom.data file in the custom_data directory which should contain information regarding the train and test data sets
classes=2
train=custom_data/train.txt
valid=custom_data/test.txt
names=custom_data/custom.names
backup=backup/
Now we have to make changes in our yolov3.cfg for training our model. For two classes. Based on the required performance we can select the YOLOv3 configuration file. For this example we will be using yolov3.cfg. We can duplicate the file from cfg/yolov3.cfg to custom_data/cfg/yolov3-custom.cfg
The maximum number of iterations for which our network should be trained is set with the param max_batches=4000. Also update steps=3200,3600 which is 80%, 90% of max_batches.
We will need to update the classes and filters params of [yolo] and [convolutional] layers that are just before the [yolo] layers.
In this example since we have a single class (tesla) we will update the classes param in the [yolo] layers to 1 at line numbers: 610, 696, 783
Similarly we will need to update the filters param based on the classes count filters=(classes + 5) * 3. For two classes we should set filters=21 at line numbers: 603, 689, 776
All the configuration changes are made to custom_data/cfg/yolov3-custom.cfg
Now, we have defined all the necessary items for training the YOLOv3 model. To train-
./darknet detector train custom_data/detector.data custom_data/cfg/yolov3-custom.cfg darknet53.conv.74

Also you can mark bounded boxes of objects in images for training Yolo right in your web browser, just open url. This tool is deployed to GitHub Pages.

Use this popular forked darknet repository https://github.com/AlexeyAB/darknet. The author describes many steps that will help you to build and use your own Yolo detector model.
How to build your own custom dataset and train it? Follow this step . He suggests to use Yolo Mark labeling tool to build your dataset, but you can also try another tool as described in here and here.
How to get the weights? The weights will be stored in darknet/backup/ directory after every 1000 iterations (you can adjust this value later). The link above explains everything about how to make and use the weights file.
I don't think it will be so difficult if you already know math, statistic and programming. Learning the basic neural network like perceptron, MLP then move to modern Machine Learning is a good start. Then you might want to expand your knowledge to Computer Vision related or NLP related area

Depending on what kind of OS you have. You can either hit up https://github.com/AlexeyAB/darknet [especially for Windows] or stick to https://github.com/pjreddie/darknet.
Steps to do so:
1) Setup darknet as detailed in the posts.
2) I used LabelIMG to label my images. make sure that the format
you save the images is in YOLO. If you save using the PascalVOC format or others you can write scripts to change it to the format that darknet expects.[YOLO]. Also, make sure that you do not change your labels file. If you want to add new labels, at it at the end of the file, not in between. YOLO format is quite different, so your previously labelled images may get messed up if you make changes in between the classes.
3)The weights will be generated as you train your model in a specific folder in darknet.[If you need more details I am happy to help answer that]. You can download the .74 file in YOLO and start training. The input to train needs a built darknet.exe a cfg file a .74 file and your training data location/access.
The setup is draconian, the process itself is not.

To build your own dataset, you should use LabelImg. It's a free and very easy software which will produce for you all the files you need to build a dataset. In fact, because you are working with yolo, you need a txt file for each of your image which will contain important information like bbox coordinates, label name. All these txt files are automatically produced by LabelImg so all you have to do is open the directory which contains all your images with LabelImg, and start the labelisation. Then, well you will have all your txt files, you will also need to create some other files in order to start training (see https://blog.francium.tech/custom-object-training-and-detection-with-yolov3-darknet-and-opencv-41542f2ff44e).

Related

Image Classification with single class dataset using Transfer Learning [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I only have around 1000 images of computers. I need to train a model that can identify if the image is computer or not-computer. I do not have a dataset for not-computer, as it could be anything.
I guess the best method for this would be to apply transfer learning. I am trying to train data on a pre-trained VGG19 Model. But still, I am unaware on how to train a model with just computers images without any non-computer images.
I am new to ML Overall, so sorry if question is not to the point.
No way, I'm sorry. You'll need a lot (at least other 1000 images) of non-computer images. You can take them from everywhere, the more they "vary" the better is for your model to extract what features characterize a computer.
Imagine to be a baby that is trained to always say "yes" in front of something, next time you'll se something you'll say "yes" no matter what is in front of you...
The same is for machine learning models, you need positive examples and negative examples, or your model will have 100% accuracy by predicting always "yes".
If you want to see it a mathematically/geometrically, you can see each sample (in your case an image) as a point in the feature space: imagine to draw an axis for each attribute you have (x,y,z an so on), an image will be a point in that space.
For simplicity let's consider a 2-dimension space, which means that each image could be described with 2 attributes (not the case for images, usually the features are a lot, but for simplicity imagine feature_1 = number of colors, feature_2 = number of angles), in this example we can simply draw a point in a cartesian graph, one for each image:
The objective of a classifier is to draw a line which better separate the red dots from the blue dots, which means separate positive examples, from negative examples.
If you give the model only positive samples (which is what you were going to do), you'll have infinite models with 100% accuracy! Because you can put a line wherever you want, the only requirement is to not "cut" your dataset.
Given that I suppose you are a beginner, I'll just tell you what to do, not how because it would take years ;)
1) Collect data - as I told you, even negative examples, at least other 1000 samples
2) Split the data into train/test - a good split could be 2/3 of the samples in the training set and 1/3 in the test set. [REMEMBER] Keep consistency of the final class distribution, i.e. if you had 50%-50% of classes "Computer"-"Non computer", you should keep that percentage for both train set and test set
3) Train a model - have a look at this link for a guided examples, it uses the MNIST dataset, which is a famous image classification one, you should use your data
4) Test the model on the test set and look at performance
While it is not impossible to take data belonging to one only one class of data and then use methods to classify whether other data belong to the same class or not, you usually do not end up with too good accuracy that way.
One way to do this, is to use something called "autoencoders". The point here is that you use the same image as input and as the target, and you make sure that the (usually neural network) is forced to compress the image in some way so that it only stores what is important to recreate images of computers. Ideally, this should lead to a model which is good at recreating images of computers, and bad at everything else, meaning you can test how high the loss is on the output, and if it higher than some threshold you've decided on, you deem it to be something else. Again, you're probably not going to get anything close to 90% accuracy doing this, but it is an approach to your problem.
A better approach is to go hunting for models which have been pre-trained on some dataset which had computers as part of the dataset, take the same dataset and set all computers to one class (+ your own images, make sure they adhere to the dataset format) and a selection of the other images to the other class. Make sure to not make the classes too unbalanced, otherwise your model will suffer from it. Extend the pre-trained model with a couple of layer, fully connected should probably do fine, and make the pre-trained part of the model not trainable, so you don't mess up the good weights there when you're practically telling it to ignore everything which is not a computer.
This is probably your best bet, but is going to require a bit more effort on your side in terms of finding all of these parts which you need to make it happen, and to understand how to integrate that code into yours.
You can either use transfer learning using a pretrained model on the imagenet dataset. As mentioned in another answer, there are a bunch of classes inside imagenet close to computers and electronic devices (such as monitors, CD players, laptops, speakers, etc.). So you can fine-tune the model on your dataset and train it to predict computers (train on around 750 images and test on the remaining 250).
You can manually collect images for objects other than computers, preferably a lot of electronic devices (because they are close to computers) and a bunch of other household things (there is a home objects dataset by Caltech). You should collect about 1000 such images to have a class balance. You can train your own custom model once you have this dataset.
No problem!
step one: install a deep-learning toolkit of your choice. they all come with nice tutorials these days.
step two: grab a pre-trained imagenet model. In that model, there are already a few computer classes built into it! ( "desktop_computer", "laptop", 'notebook", and another class for hand-held computers "hand-held_computer")
step three: use model to predict. for this, you'll need to have your images the correct size.
more steps: further fine-tune the model...a bit more advanced but will give you some gains.
Something to think about is what is your goal? accuracy? false positives/negatives, etc? It's always good having a goal of what you need to accomplish from the start.
EDIT: probably the easiest way to get started(if you don't have libraries, gpu, etc) is to go to google colab ( https://colab.research.google.com/notebooks/welcome.ipynb ) and make a notebook in your browser and run the following code.
#some code take and modded from https://www.learnopencv.com/keras-tutorial- using-pre-trained-imagenet-models/
import keras
import numpy as np
from keras.applications import vgg16
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.imagenet_utils import decode_predictions
import matplotlib.pyplot as plt
from PIL import Image
import requests
from io import BytesIO
%matplotlib inline
vgg_model = vgg16.VGG16(weights='imagenet')
def predict_image(image_url, model):
response = requests.get(image_url)
original = Image.open(BytesIO(response.content))
newsize = (224, 224)
original = original.resize(newsize)
# convert the PIL image to a numpy array
# IN PIL - image is in (width, height, channel)
# In Numpy - image is in (height, width, channel)
numpy_image = img_to_array(original)
# Convert the image / images into batch format
# expand_dims will add an extra dimension to the data at a particular axis
# We want the input matrix to the network to be of the form (batchsize, height, width, channels)
# Thus we add the extra dimension to the axis 0.
image_batch = np.expand_dims(numpy_image, axis=0)
plt.imshow(np.uint8(image_batch[0]))
plt.show()
# prepare the image for the VGG model
processed_image = vgg16.preprocess_input(image_batch.copy())
# get the predicted probabilities for each class
predictions = model.predict(processed_image)
# convert the probabilities to class labels
# We will get top 5 predictions which is the default
label = decode_predictions(predictions)
print label[0][0:2] #just display top 2
urls = ['https://4.imimg.com/data4/CO/YS/MY-29352968/samsung-desktop-computer-500x500.jpg', 'https://cdn.britannica.com/77/170477-050-1C747EE3/Laptop-computer.jpg']
for u in urls:
predict_image(u, vgg_model)
This should be a good starting point. Oh, and if the top predicted label is not in the computer, laptop, etc set, then it's NOT a computer!

Create LMDB for image dataset with k-hot labels

I wanna to create a classifier for an image dataset that each image is in multiple classes from all classes, so the target values are k-hot vectors. Now I create a text file which contains address if image file and space and a k-hot vector in each line but when i try to run scripts to create lmdb files it raise errors that can not open or find files. I try the same process with same data and just a number as class label and everything goes well. So I think it cannot parse .txt file correctly when labels are vectors.
Any suggestion...
Thank you
Caffe "Data" layers and convert_imageset script were written with a very specific use case in mind: image classification. Therefore the basic element stored in (and fetched from) LMDB by caffe is Datum that has a room for a single integer label.
You can see a more lengthy discussion on this subject here
It does not mean Caffe cannot facilitate different types of inputs/tasks.
You can use "HDF5Data" layer instead. When it comes to hdf5 inputs caffe has almost no restrictions on the input shape and size.
See, e.g., this answer and this one for more details on how to actually make it work.

How to split documents into training set and test set?

I am trying to build a classification model. I have 1000 text documents in local folder. I want to divide them into training set and test set with a split ratio of 70:30(70 -> Training and 30 -> Test) What is the better approach to do so? I am using python.
I wanted a approach programatically to split the training set and test set. First to read the files in local directory. Second, to build a list of those files and shuffle them. Thirdly to split them into a training set and test set.
I tried a few ways by using built in python keywords and functions only to fail. Lastly I got the idea of approaching it. Also Cross-validation is a good option to be considered for the building general classification models.
Not sure exactly what you're after, so I'll try to be comprehensive. There will be a few steps:
Get a list of the files
Randomize the files
Split files into training and testing sets
Do the thing
1. Get a list of the files
Let's assume that your files all have the extension .data and they're all in the folder /ml/data/. What we want to do is get a list of all of these files. This is done simply with the os module. I'm assuming you have no subdirectories; this would change if there were.
import os
def get_file_list_from_dir(datadir):
all_files = os.listdir(os.path.abspath(datadir))
data_files = list(filter(lambda file: file.endswith('.data'), all_files))
return data_files
So if we were to call get_file_list_from_dir('/ml/data'), we would get back a list of all the .data files in that directory (equivalent in the shell to the glob /ml/data/*.data).
2. Randomize the files
We don't want the sampling to be predictable, as that is considered a poor way to train an ML classifier.
from random import shuffle
def randomize_files(file_list):
shuffle(file_list)
Note that random.shuffle performs an in-place shuffling, so it modifies the existing list. (Of course this function is rather silly since you could just call shuffle instead of randomize_files; you can write this into another function to make it make more sense.)
3. Split files into training and testing sets
I'll assume a 70:30 ratio instead of any specific number of documents. So:
from math import floor
def get_training_and_testing_sets(file_list):
split = 0.7
split_index = floor(len(file_list) * split)
training = file_list[:split_index]
testing = file_list[split_index:]
return training, testing
4. Do the thing
This is the step where you open each file and do your training and testing. I'll leave this to you!
Cross-Validation
Out of curiosity, have you considered using cross-validation? This is a method of splitting your data so that you use every document for training and testing. You can customize how many documents are used for training in each "fold". I could go more into depth on this if you like, but I won't if you don't want to do it.
Edit: Alright, since you requested I will explain this a little bit more.
So we have a 1000-document set of data. The idea of cross-validation is that you can use all of it for both training and testing — just not at once. We split the dataset into what we call "folds". The number of folds determines the size of the training and testing sets at any given point in time.
Let's say we want a 10-fold cross-validation system. This means that the training and testing algorithms will run ten times. The first time will train on documents 1-100 and test on 101-1000. The second fold will train on 101-200 and test on 1-100 and 201-1000.
If we did, say, a 40-fold CV system, the first fold would train on document 1-25 and test on 26-1000, the second fold would train on 26-40 and test on 1-25 and 51-1000, and on.
To implement such a system, we would still need to do steps (1) and (2) from above, but step (3) would be different. Instead of splitting into just two sets (one for training, one for testing), we could turn the function into a generator — a function which we can iterate through like a list.
def cross_validate(data_files, folds):
if len(data_files) % folds != 0:
raise ValueError(
"invalid number of folds ({}) for the number of "
"documents ({})".format(folds, len(data_files))
)
fold_size = len(data_files) // folds
for split_index in range(0, len(data_files), fold_size):
training = data_files[split_index:split_index + fold_size]
testing = data_files[:split_index] + data_files[split_index + fold_size:]
yield training, testing
That yield keyword at the end is what makes this a generator. To use it, you would use it like so:
def ml_function(datadir, num_folds):
data_files = get_file_list_from_dir(datadir)
randomize_files(data_files)
for train_set, test_set in cross_validate(data_files, num_folds):
do_ml_training(train_set)
do_ml_testing(test_set)
Again, it's up to you to implement the actual functionality of your ML system.
As a disclaimer, I'm no expert by any means, haha. But let me know if you have any questions about anything I've written here!
that's quite simple if you use numpy, first load the documents and make them a numpy array, and then:
import numpy as np
docs = np.array([
'one', 'two', 'three', 'four', 'five',
'six', 'seven', 'eight', 'nine', 'ten',
])
idx = np.hstack((np.ones(7), np.zeros(3))) # generate indices
np.random.shuffle(idx) # shuffle to make training data and test data random
train = docs[idx == 1]
test = docs[idx == 0]
print(train)
print(test)
the result:
['one' 'two' 'three' 'six' 'eight' 'nine' 'ten']
['four' 'five' 'seven']
Just make a list of the filenames using os.listdir(). Use collections.shuffle() to shuffle the list, and then training_files = filenames[:700] and testing_files = filenames[700:]
You can use train_test_split method provided by sklearn. See documentation here:
http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

How to save feature values of all batch data from pretrained torch networks?

Now I'm using fb torch library from github fb torch resnet
It's my first time to use torch and lua, so Im encountering some problems.
My goal is to save the feature vector of specific layer (last avg pooling of resnet) into a one file with the class of the input image. All input images are from cifar-10 db.
The file format that i want to get is like belows
image1.txt := class index of image and feature vector of image 1 of cifar-10
image2.txt := class index of image and feature vector of image 2 of cifar-10
// and so on through all images of cifar-10
Now I have seen some sample code of that github extract-features.lua
Because it's my first time for lua, I feel so hard to understand this code and to modify to the way i want. And i don't want my data to save into t7 file format.
How can i access only one specific layer from network in torch via lua? (last average pooling)
How can i access values of the layer and classification result index?
How can read all each images from cifar-10 db file(t7 batch)?
Sorry for too many questions. But im feeling hard using torch because of pool amouns of community threads and posting of torch.. please understand me.
How can i access only one specific layer from network in torch via lua? (last average pooling)
To access each layer you just have to load the model and get it using an integer number. If you do print model you will be able to see in which position the last average pooling is.
model = torch.load(path_to_model):cuda()
avg_pooling_layer = model:get(position_of_the_avg_pooling_layer)
How can i access values of the layer and classification result index?
I do not quite understand what you mean by this. If you want to see the output or the weights from a specific layer. (following the code above) You need to get these elements from the layer table. Again, to see which ones are the possible elements to get use print avg_pooling_layer
weights = avg_pooling_layer.weight -- get the weights of the layer
output = avg_pooling_layer.output -- get the output of the layer
How can read all each images from cifar-10 db file(t7 batch)?
To read the images from a t7 file use the torch function torch.load. (used before to load the model).
cifar_10 = torch.load("path_to_cifar-10.t7")
Once loaded you could have the training and test set in subtables or functions. Again, print the table and visualize which values are the ones you need to get.
Hope this helps!

Where can I find the label map between trained model like googleNet's output to there real class label?

everyone, I am new to caffe. Currently, I try to use the trained GoogleNet which was downloaded from model zoo to classify some images. However, the network's output seem to be a vector rather than real label(like dog, cat).
Where can I find the label-map between trained model like googleNet's output to their real class label?
Thanks.
If you got caffe from git you should find in data/ilsvrc12 folder a shell script get_ilsvrc_aux.sh.
This script should download several files used for ilsvrc (sub set of imagenet used for the large scale image recognition challenge) training.
The most interesting file (for you) that will be downloaded is synset_words.txt, this file has 1000 lines, one line per class identified by the net.
The format of the line is
nXXXXXXXX description of class

Resources