I used tensorflow object detection api on a custom object. The models works perfectly but now i want to know the co-ordinates of the boxes. Is there a way to know the co-ordinates of the boxes for every object detected??
Have a look at the Tutorial IPython notebook, there is the code they use to do the detection (link to github):
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
for image_path in TEST_IMAGE_PATHS:
image = Image.open(image_path)
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
image_np = load_image_into_numpy_array(image)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
# Actual detection.
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
They get the coordinates stored in the boxes variable.
Related
I'm new to machine learning and I'm working on a dataset with 14k pictures of sea, forest, glaciers, streets, buildings and mountains (6 classes). I have been training my model with it and achieved a val acc of 91% but for some reason it is biased, when I try to predict new images with my inference code the only classes chosen are glaciers and sea. Here is the Github with the model creation code and the inference code.
train_datagen = ImageDataGenerator(
rotation_range= 20, # Rotate the augmented image by 20 degrees
zoom_range=0.3, # Zoom by 20% more or less
horizontal_flip=True, # Allow for horizontal flips of augmented images
vertical_flip=True, # Allow for vertical flips of augmented images
brightness_range=[0.6, 1.2], # Lighter and darker images
fill_mode='nearest',
preprocessing_function=preprocess_input)
img_data_iterator = train_datagen.flow_from_directory(
# Where to take the data from, the classes are the sub folder names
'../Q2B/archive/seg_train/seg_train/',
class_mode="categorical", # classes are in 2D one hot encoded way
shuffle=True, # shuffle the data, default is true but just to point it out
batch_size=32,
target_size=(150, 150), # This size is the default of mobilenet NN)
validation_generator = ImageDataGenerator(
preprocessing_function=preprocess_input).flow_from_directory(
'../Q2B/archive/seg_test/seg_test/',
class_mode="categorical",
shuffle=True,
batch_size=32,
target_size=(150, 150),)
My guess is that it is related to the way I pre-processed the data.
can you post more of your code.
Change the class_mode to 'categorical' for the train and test generators
Change the final dense layer from 1 to 2 so this will return scores/probabilities for both classes. So when you use argmax, it will return the index position of the top score indicating which class it has predicted.
I am loading a yolo model with opencv in python with cv2.dnn_DetectionModel(cfg,weights)
and then calling net.detect(img). I think I can get a speed-up per image using batches, but I don't see any support for batch size other than one.
Is it possible to set the batch size?
net.detect does not support batch size > 1.
However, it's possible to do inference with batch size > 1 on darknet models with some extra work. Here is some partial code:
net = cv2.dnn.readNetFromDarknet(cfg,weights)
net.setInputNames(["input"])
net.setInputShape("input",(batch_size,3,h,w))
blob = cv2.dnn.blobFromImages(image_list)
net.setInput(blob)
results = net.forward(net.getUnconnectedOutLayersNames())
Now loop over all the images, and for each layer output in results, extract the boxes and confidences for this image each class, and having collected this info for every layer, pass this through cv2.dnn.NMSBoxes. This part is non-trivial, but doable.
one idea is you combine images manually and pass it into net and after getting result you separate them:
h1, w1 = im1.shape[:2]
h2, w2 = im2.shape[:2]
#create empty matrix
vis = np.zeros((max(h1, h2), w1+w2,3), np.uint8)
#combine 2 images
vis[:h1, :w1,:3] = im1
vis[:h2, w1:w1+w2,:3] = im2
after inference, you can sperate them again:
result1=pred[:h1,:w1,:]
result2=pred[:h2, w1:w1+w2,:]
I have the cnn model code.
classifier = Sequential()
classifier.add(Convolution2D(32,3,3, input_shape =
(256,256,3),activation = "relu"))
classifier.add(MaxPooling2D(pool_size = (2,2)))
So now i need to find what values the 32 filters were initialized with ? Any code that helps in printing the values of the filters
Here is the default keras Conv2d initialization : kernel_initializer='glorot_uniform' (or init='glorot_uniform' for older version of keras).
You can look at what this initializer does here : Keras initializers
Finally, here is one way to access the weights of your first layer :
classifier = Sequential()
classifier.add(Convolution2D(32,3,3, input_shape =
(256,256,3),activation = "relu"))
classifier.add(MaxPooling2D(pool_size = (2,2)))
first_layer = classifier.layers[0]
print(first_layer.get_weights()) # You may need to process this output tensor to get a readable output and not just a raw tensor
Get the corresponding layer from the model
layer = classifier.layers[0] # 0th layer is the convolution in your architecture
There will be two variables for each convolution layer (Filter kernels and Bias). Get the corresponding one
filters = layer.weights[0] # kernel is the 0th index
Now filters contain the values you are looking for and it is a tensor. To get the values of the tensor, use get_value() function of Keras backend
import keras.backend as K
print(K.get_value(wt))
This will print an array of shape (3, 3, 3, 32) which translates to 32 filters of kernel size 3x3 for 3 channels.
The Keras ImageDataGenerator class provides the two flow methods flow(X, y) and flow_from_directory(directory) (https://keras.io/preprocessing/image/).
Why is the parameter
target_size: tuple of integers, default: (256, 256). The dimensions to which all images found will be resized
Only provided by flow_from_directory(directory) ? And what is the most concise way to add reshaping of images to the preprocessing pipeline using flow(X, y) ?
flow_from_directory(directory) generates augmented images from directory with arbitrary collection of images. So there is need of parameter target_size to make all images of same shape.
While flow(X, y) augments images which are already stored in a sequence in X which is nothing but numpy matrix and can be easily preprocessed/resized before passing to flow. So no need for target_size parameter. As for resizing I prefer using scipy.misc.imresize over PIL.Image resize, or cv2.resize as it can operate on numpy image data.
import scipy
new_shape = (28,28,3)
X_train_new = np.empty(shape=(X_train.shape[0],)+new_shape)
for idx in xrange(X_train.shape[0]):
X_train_new[idx] = scipy.misc.imresize(X_train[idx], new_shape)
For large training dataset, performing transformations such as resizing on the entire training data is very memory consuming. As Keras did in ImageDataGenerator, it's better to do it batch by batch. As far as I know, there're 2 ways to achieve this other than operating the whole dataset:
You can use Lambda Layer to create a layer and then feed original training data to it. The output is the resized you need.
Here is the sample code if you use TensorFlow as the backend of Keras:
original_dim = (32, 32, 3)
target_size = (64, 64)
input = keras.layers.Input(original_dim)
x = tf.keras.layers.Lambda(lambda image: tf.image.resize(image, target_size))(input)
As #Retardust mentioned, maybe you can customize your own ImageDataGenerator as well as the preprocessing_function.
For anyone else who wants to do this, .flow method of ImageDataGenerator does not have a target_shape parameter and we cannot resize an image using preprocessing_function parameter as the documentation states The function will run after the image is resized and augmented. The function should take one argument: one image (Numpy tensor with rank 3), and should output a Numpy tensor with the same shape.
So in order to use .flow, you will have to pass resized images only otherwise use a custom generator that resizes them on the fly.
Here's a sample of custom generator in keras (can also be made using python generator or any other method)
class Custom_Generator(keras.utils.Sequence) :
def __init__(self,...,datapath, batch_size, ..) :
def __len__(self) :
#calculate data len, something like len(train_labels)
def load_and_preprocess_function(self, label_names, ...):
#do something...
#load data for the batch using label names with whatever library
def __getitem__(self, idx) :
batch_y = train_labels[idx:idx+batch_size]
batch_x = self.load_and_preprocess_function()
return ( batch_x, batch_y )
X_data_resized = numpy.asarray([skimage.transform.resize(image, new_shape) for image in X_data])
because of the above code is now depreciated...
There is also (newer) method flow_from_dataframe() which accepts a Pandas dataframe with file paths and y data as columns - and it also allows to specify the target size. Just in case your image data is not organized directory-wise!
I am trying to use RBFNN for point cloud to surface reconstruction but I couldn't understand what would be my feature vectors in RBFNN.
Can any one please help me to understand this one.
A goal to get to this:
From inputs like this:
An RBF network essentially involves fitting data with a linear combination of functions that obey a set of core properties -- chief among these is radial symmetry. The parameters of each of these functions is learned by incremental adjustment based on errors generated through repeated presentation of inputs.
If I understand (it's been a very long time since I used one of these networks), your question pertains to preprocessing of the data in the point cloud. I believe that each of the points in your point cloud should serve as one input. If I understand properly, the features are your three dimensions, and as such each point can already be considered a "feature vector."
You have other choices that remain, namely the number of radial basis neurons in your hidden layer, and the radial basis functions to use (a Gaussian is a popular first choice). The training of the network and the surface reconstruction can be done in a number of ways but I believe this is beyond the scope of the question.
I don't know if it will help, but here's a simple python implementation of an RBF network performing function approximation, with one-dimensional inputs:
import numpy as np
import matplotlib.pyplot as plt
def fit_me(x):
return (x-2) * (2*x+1) / (1+x**2)
def rbf(x, mu, sigma=1.5):
return np.exp( -(x-mu)**2 / (2*sigma**2));
# Core parameters including number of training
# and testing points, minimum and maximum x values
# for training and testing points, and the number
# of rbf (hidden) nodes to use
num_points = 100 # number of inputs (each 1D)
num_rbfs = 20.0 # number of centers
x_min = -5
x_max = 10
# Training data, evenly spaced points
x_train = np.linspace(x_min, x_max, num_points)
y_train = fit_me(x_train)
# Testing data, more evenly spaced points
x_test = np.linspace(x_min, x_max, num_points*3)
y_test = fit_me(x_test)
# Centers of each of the rbf nodes
centers = np.linspace(-5, 10, num_rbfs)
# Everything is in place to train the network
# and attempt to approximate the function 'fit_me'.
# Start by creating a matrix G in which each row
# corresponds to an x value within the domain and each
# column i contains the values of rbf_i(x).
center_cols, x_rows = np.meshgrid(centers, x_train)
G = rbf(center_cols, x_rows)
plt.plot(G)
plt.title('Radial Basis Functions')
plt.show()
# Simple training in this case: use pseudoinverse to get weights
weights = np.dot(np.linalg.pinv(G), y_train)
# To test, create meshgrid for test points
center_cols, x_rows = np.meshgrid(centers, x_test)
G_test = rbf(center_cols, x_rows)
# apply weights to G_test
y_predict = np.dot(G_test, weights)
plt.plot(y_predict)
plt.title('Predicted function')
plt.show()
error = y_predict - y_test
plt.plot(error)
plt.title('Function approximation error')
plt.show()
First, you can explore the way in which inputs are provided to the network and how the RBF nodes are used. This should extend to 2D inputs in a straightforward way, though training may get a bit more involved.
To do proper surface reconstruction you'll likely need a representation of the surface that is altogether different than the representation of the function that's learned here. Not sure how to take this last step.