How to remove Detection score(percentage)? - object-detection-api

I am trying to detect custom objects using faster_rcnn_inception_v2 model and I'm using Tensorflow Object-Detection API.
While testing the model it detects object as object name with score, for example *Person: 99%*.
How to remove the score
This is my visualization function
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8)
I have changed scores to none
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
None,
category_index,
use_normalized_coordinates=True,
line_thickness=8)
After changing, I got this result

To answer your original question, you should set the skip_scores and skip_labels input arguments of the visualize_boxes_and_labels_on_image_array to True.
You are getting redundant boxes because the visualization function is no longer able to set a threshold on prediction scores when you pass None as scores.
Look at the definition of the visualize_boxes_and_labels_on_image_array, you'll notice a min_score_thresh input argument that is set to 0.5 by default. Detected boxes with scores less than 0.5 are not visualized by default, unless you don't pass scores to this function in which case all boxes are visualized.

I assume that you're using the code provided from the official Object Detection Demo notebook, or some variant of it? If so, this part of the code here is the part that's responsible for rendering the bounding boxes:
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks'),
use_normalized_coordinates=True,
line_thickness=8)
To remove the detection scores from the rendered bounding boxes, you just need to replace output_dict['detection_scores'] with scores=None:
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
scores=None, # replace here
category_index,
instance_masks=output_dict.get('detection_masks'),
use_normalized_coordinates=True,
line_thickness=8)
You can look at the source code of this function in tensorflow/models/research/object_detection/utils/visualization_utils.py. This is what it says in one of the comments:
scores: a numpy array of shape [N] or None. If scores=None, then this function assumes that the boxes to be plotted are groundtruth boxes and plot all boxes as black with no classes or scores.

Related

How to apply low-pass filter to a sound record on python?

I have to reduce white noise from a sound record.Because of that i used fourier transform.But i dont know how to use the fft function's return values which is in frequincy domain.How can i use the fft data to reduce noise?
Here is my code
from scipy.io import wavfile
import matplotlib.pyplot as plt
import simpleaudio as sa
from numpy.fft import fft,fftfreq,ifft
#reading wav file
fs,data=wavfile.read("a.wav","wb")
n=len(data)
freqs=fftfreq(n)
mask=freqs>0
#calculating raw fft values
fft_vals=fft(data)
#calculating theorical fft values
fft_theo=2*np.abs(fft_vals/n)
#ploting
plt.plot(freqs[mask],fft_theo[mask])
plt.show()```
It is better for such questions to build a synthetic example, so you don't have to post a big datafile and people can still follow your question (MCVE).
It is also important to plot intermediate results since we are talking about operations on complex numbers, so we often have to take re, im parts, or absolutes and angles respectively.
The Fourier transform of a real function is complex but is symmetric for positive vs negative frequencies. One can also look at that from an information theoretical viewpoint: you wouldn't want N independent real numbers in time to result in 2N independent real numbers describing the spectrum.
While you normally plot the absolute or absolute squared (voltage vs. power) of the spectrum, you can leave it complex when you apply the filter. After back-conversion to time via the IFFT, to plot it, you'll have to convert it to a real number again, in this case by taking the absolute.
If you design the filter kernel in the time domain (FFT of a Gaussian will be a Gaussian), the IFFT of the product of the FFT of the filter and the spectrum will have only very small imaginary parts and you can then take the real part (which makes more sense from a physics viewpoint, you started with real part, end with real part).
import numpy as np
import matplotlib.pyplot as p
%matplotlib inline
T=3 # secs
d=0.04 # secs
n=int(T/d)
print(n)
t=np.arange(0,T,d)
fr=1 # Hz
y1= np.sin(2*np.pi*fr*t) +1 # dc offset helps with backconversion, try setting it to zero
y2= 1/5*np.sin(2*np.pi*7*fr*t+0.5)
y=y1+y2
f=np.fft.fftshift(np.fft.fft(y))
freq=np.fft.fftshift(np.fft.fftfreq(n,d))
filter=np.exp(- freq**2/6) # simple Gaussian filter in the frequency domain
filtered_spectrum=f*filter # apply the filter to the spectrum
filtered_data = np.fft.ifft(filtered_spectrum) # then backtransform to time domain
p.figure(figsize=(24,16))
p.subplot(321)
p.plot(t,y1,'.-',color='red', lw=0.5, ms=1, label='signal')
p.plot(t,y2,'.-',color='blue', lw=0.5,ms=1, label='noise')
p.plot(t,y,'.-',color='green', lw=4, ms=4, alpha=0.3, label='noisy signal')
p.xlabel('time (sec)')
p.ylabel('amplitude (Volt)')
p.legend()
p.subplot(322)
p.plot(freq,np.abs(f)/n, label='raw spectrum')
p.plot(freq,filter,label='filter')
p.xlabel(' freq (Hz)')
p.ylabel('amplitude (Volt)');
p.legend()
p.subplot(323)
p.plot(t, np.absolute(filtered_data),'.-',color='green', lw=4, ms=4, alpha=0.3, label='cleaned signal')
p.legend()
p.subplot(324)
p.plot(freq,np.abs(filtered_spectrum), label = 'filtered spectrum')
p.legend()
p.subplot(326)
p.plot(freq,np.log( np.abs(filtered_spectrum)), label = 'filtered spectrum')
p.legend()
p.title(' in the log plot the noise is still visible');

Keras model accuracy not improving

I'm trying to train a neural network to predict the ratings for players in FIFA 18 by easports (ratings are between 64-99). I'm using their players database (https://easports.com/fifa/ultimate-team/api/fut/item?page=1) and I've processed the data into training_x, testing_x, training_y, testing_y. Each of the training samples is a numpy array containing 7 values...the first 6 are the different stats of the player (shooting, passing, dribbling, etc) and the last value is the position of the player (which I mapped between 1-8, depending on the position), and each of the testing values is a single integer between 64-99, representing the rating of that player.
I've tried many different hyperparameters, including changing the activation functions to tanh and relu, and I've tried adding a batch normalization layer after the first dense layer (I thought that it might be useful since one of my features is very small and the other features are between 50-99), I've played around with the SGD optimizer (changed the learning rate, momentum, even tried changing the optimizer to Adam), tried different loss functions, added/removed dropout layers, and tried different regularizers for the weights of the model.
model = Sequential()
model.add(Dense(64, input_shape=(7,),
kernel_regularizer=regularizers.l2(0.01)))
//batch normalization?
model.add(Activation('sigmoid'))
model.add(Dense(64, kernel_regularizer=regularizers.l2(0.01),
activation='sigmoid'))
model.add(Dropout(0.3))
model.add(Dense(32, kernel_regularizer=regularizers.l2(0.01),
activation='sigmoid'))
model.add(Dense(1, activation='linear'))
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_absolute_error', metrics=['accuracy'],
optimizer=sgd)
model.fit(training_x, training_y, epochs=50, batch_size=128, shuffle=True)
When I train the model, the loss is always nan and the accuracy is always 0, even though I've tried adjusting a lot of different parameters. However, if I remove the last feature from my data, the position of the players, and update the input shape of the first dense layer, the model actually "trains" and ends up with around 6% accuracy no matter what parameters I change. In that case, I've found that the model only predicts 79 to be the player's rating. What am I doing inherently wrong?
You can try the following steps :
Use mean squared error loss function.
Use Adam which will help you converge faster with low learning rate like 0.0001 or 0.001. Otherwise, try using the RMSprop optimizer.
Use the default regularizers. That is none actually.
Since this is a regression task, use activation function like ReLU in all the layers except the output layer ( including the input layer ). Use linear activation in output layer.
As mentioned in the comments by #pooyan , normalize the features. See here. Even try standardizing the features. Use whichever suites the best.

Why does my NN not classify these tic tac toe pattern correctly?

I'm trying to teach an AI to recognize patterns of tic-tac-toe with a winning line.
Unfortunately, it's not learning to recognize them correctly. I think my way of representing/encoding the game into vectors is wrong.
I choose a way that is easy for an human (me, in particular!) to understand:
training_data = np.array([[0,0,0,
0,0,0,
0,0,0],
[0,0,1,
0,1,0,
0,0,1],
[0,0,1,
0,1,0,
1,0,0],
[0,1,0,
0,1,0,
0,1,0]], "float32")
target_data = np.array([[0],[0],[1],[1]], "float32")
This uses an array of length 9 to represent a 3 x 3 board. The first three items represent the first row, the next three the second row, and so on. The line breaks should make it obvious. The target data then maps the first two game states to "no wins" and the last two game states to "wins".
Then I wanted to create some validation data that is slightly different to see if it generalizes.
validation_data = np.array([[0,0,0,
0,0,0,
0,0,0],
[1,0,0,
0,1,0,
1,0,0],
[1,0,0,
0,1,0,
0,0,1],
[0,0,1,
0,0,1,
0,0,1]], "float32")
Obviously, again the last two game states should be "wins" whereas the first two should not.
I tried to play with the number of neurons and learning rate, but no matter what I try, my output looks pretty off, e.g.
[[ 0.01207292]
[ 0.98913926]
[ 0.00925775]
[ 0.00577191]]
I tend to think it's the way how I represent the game state that may be wrong but actually I have no idea :D
Can anyone help me out here?
This is the entire code that I use
import numpy as np
from keras.models import Sequential
from keras.layers.core import Activation, Dense
from keras.optimizers import SGD
training_data = np.array([[0,0,0,
0,0,0,
0,0,0],
[0,0,1,
0,1,0,
0,0,1],
[0,0,1,
0,1,0,
1,0,0],
[0,1,0,
0,1,0,
0,1,0]], "float32")
target_data = np.array([[0],[0],[1],[1]], "float32")
validation_data = np.array([[0,0,0,
0,0,0,
0,0,0],
[1,0,0,
0,1,0,
1,0,0],
[1,0,0,
0,1,0,
0,0,1],
[0,0,1,
0,0,1,
0,0,1]], "float32")
model = Sequential()
model.add(Dense(2, input_dim=9, activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)
history = model.fit(training_data, target_data, nb_epoch=10000, batch_size=4, verbose=0)
print(model.predict(validation_data))
UPDATE
I tried to follow the advice and used more training data with no success so far.
My training set looks like this now
training_data = np.array([[0,0,0,
0,0,0,
0,0,0],
[0,0,1,
0,0,0,
1,0,0],
[0,0,1,
0,1,0,
0,0,1],
[1,0,1,
0,1,0,
0,0,0],
[0,0,0,
0,1,0,
1,0,1],
[1,0,0,
0,0,0,
0,0,0],
[0,0,0,
0,0,0,
1,0,0],
[0,0,0,
0,1,0,
0,0,1],
[1,0,1,
0,0,0,
0,0,0],
[0,0,0,
0,0,0,
0,0,1],
[1,1,0,
0,0,0,
0,0,0],
[0,0,0,
1,0,0,
1,0,0],
[0,0,0,
1,1,0,
0,0,0],
[0,0,0,
0,0,1,
0,0,1],
[0,0,0,
0,0,0,
0,1,1],
[1,0,0,
1,0,0,
1,0,0],
[1,1,1,
0,0,0,
0,0,0],
[0,0,0,
0,0,0,
1,1,1],
[0,0,1,
0,1,0,
1,0,0],
[0,1,0,
0,1,0,
0,1,0]], "float32")
target_data = np.array([[0],[0],[0],[0],[0],[0],[0],[0],[0],[0],[0],[0],[0],[0],[0],[1],[1],[1],[1],[1]], "float32")
Considering that I only count patterns of 1 as wins there are only 8 different win states for the way I represent the data. I made the NN see 5 of them so that I still have 3 to test against to see if the generalization works. I'm now feeding it 15 states that it should not consider a win.
However, the outcome for my validation seems to actually get worse.
[[ 1.06987642e-07]
[ 4.72647212e-02]
[ 1.97011139e-03]
[ 2.93282426e-07]]
Things I tried:
Changing from sigmoid to softmax
Adding more neurons
Adding more layer
A mix of all of the above
I see your problem immediately: your training set is far too small. Your problem space consists of the 512 corners a 9-dimensional hypercube. Your training colours two of the corners green, and two others red. You now somehow expect the trained model to have correctly intuited the proper colourings for the remaining 508 corners.
No general-purpose machine-learning algorithm will intuit the pattern of "does this board position contain any of the eight approved sequences of three evenly-spaced '1' values?" from only two positive and two negative examples. For one thing, note that your training data has no row wins, does not exclude evenly-spaced points that aren't a win, and ... well, many other patterns in the space.
I expect that you'll need at least two dozen well-chosen examples on each side of the classification to get any appreciable performance from your model. Think in terms of test cases: bits 1-2-3 make a win, but 3-4-5 does not; 3-5-7 make a win, but 1-3-5 and 2-4-6 do not.
Does this move you toward a solution?
One thing you might try is to generate random vectors and then classify them with a subroutine; feed these as training data. Do more for testing and validation data.
What Prune said makes a lot of sense. Given that your problem space is 138 terminal board positions (and that's excluding rotations and reflections! - see wiki) it is very unlikely that the learning algorithm can sufficiently adjust the weights and biases, just by training on a 4-entry data set. I had a similar experience in one of my "learning experiments", where, even though the net was trained on the complete data set, because the set was very small, I ended up having to train it over multiple epochs until it was able to output decent predictions.
I think what's important to remember here is that what training a FF neural net ultimately does is to fine-tune weights and biases so that the loss function is minimised as much as possible. The lower the loss, the closer the predictions get to the expected outputs and the better the neural net gets. This means the more training data the merrier :)
I found this complete training set for tic tac toe, though it's not in the format that you set out with, but who knows, perhaps it will be useful for you. I would be curious to know, what the min subset of that training set would be, for the net to start making reliable predictions :P
This is an interesting problem. I think you're really wanting your system to recognize "lines", but as others have said, with so little training data it's hard for the system to generalize.
A different and counterintuitive approach might be to start with a larger board, say, 10x10, not 3x3, and generate random lines in that space and try to make the model learn them. You might explore convolutional networks in that case. This would be a lot like the handwritten digit recognition problem, and I expect it would succeed easily. Once your system is good at recognizing lines, maybe you can creatively adapt it somehow and scale it down to recognize the tiny lines in the 3x3 case.
(That said, I think you can learn this particular 3x3 problem just by giving your network ALL the data. It might be too small for generalization, so I wouldn't even try in this case. After all, in training a net to learn the binary XOR function, we just fee it all 4 examples -- the complete space. You can't train it reliably from just 3 examples.)
I think there are problems here beyond a small data set, and these lie in your representation of the game state. In Tic-Tac-Toe, there are three possible states for each space on the board at any given time: [X], [O], or empty []. Furthermore there are conditions on the game which limit possible board configurations. i.e. there can be no more then n+1 [X] squares, given n [O] squares. I suggest going back and thinking about how to represent the three-state nature of the game-squares.
After playing around with this for a while I think I learned enough to add a valuable answer as well.
1. Grid size
Increasing the size of the grid will make it much easier to come up with more samples for the training while still leaving enough room for validation data that the NN won't see during the training. I'm not saying it can't be done for a 3 x 3 grid but increasing the size of the grid will definitely help. I ended up increasing the size to 6 x 6 and looking for straight lines of a min length of four connected points.
2. Data representation
Representing the data in a one dimensional vector isn't optimal.
Think about it. When we want to represent the following line in our grid...
[0,1,0,0,0,0,
0,1,0,0,0,0,
0,1,0,0,0,0,
0,1,0,0,0,0,
0,0,0,0,0,0,
0,0,0,0,0,0]
...how should our NN know that what we mean isn't actually this pattern in a grid of size 3 x 12?
[0,1,0,0,0,0,0,1,0,0,0,0,
0,1,0,0,0,0,0,1,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0]
We can provide much more context to our NN if we represent the data in a way that the NN knows we are talking about a grid of size 6 x 6.
[[0,1,0,0,0,0],
[0,1,0,0,0,0],
[0,1,0,0,0,0],
[0,1,0,0,0,0],
[0,0,0,0,0,0],
[0,0,0,0,0,0]]
The good news is that we can do exactly that using a Convolution2D layer in keras.
3. Target data representation
It's not only helpful to rethink the representation of our training data, we can also tweak the representation of our target data. Initially I wanted to go with with a binary question: Does this grid contain a straight line or not? 1 or 0.
Turns out we can do much better by using the same shape for our target data that we use for our input data and redefine our question as: Does this pixel belong to a straight line or not? So, considering we have an input sample that looks like this:
[[0,1,1,0,0,1],
[0,1,0,1,0,0],
[0,1,0,0,1,0],
[0,1,0,0,0,1],
[0,0,0,1,0,0],
[1,0,1,0,0,0]]
Our target output would look like this.
[[0,1,1,0,0,0],
[0,1,0,1,0,0],
[0,1,0,0,1,0],
[0,1,0,0,0,1],
[0,0,0,0,0,0],
[0,0,0,0,0,0]]
That way we're giving the NN much more context about we are actually looking for. Think about it, if you had to make sense of these samples I'm sure this target data representation would also hint your brain much better compared to a target data representation that is just 0 or 1.
Now the question is. How can we model our NN to have a target shape that is of the same shape as our input data? Because what usually happens is that each convolutional layer slices the grid in smaller grids to look for certain features which effectively then changes the shape of our data that is passed to the next layer.
However we can set border_mode='same' for our convolutional layers which essentially fills up the smaller grids with a border of zeros so that the original shape is preserved.
4. Measure
Measuring the performance of our model is the key to make the right adjustments. In particular, we want to see how accurate the predictions of our NN are for the training data and the validation data. Having these numbers gives us the right hints.
For instance if the accurancy for the predictions of our training data goes up while the accuracy of the predictions for our validation data is stale or even declines, that means that our NN is overfitting. That means, it basically memorizes the training data but it doesn't actually generalize the learnings so that it can apply them to data it hasn't seen before (e.g. our validation data).
There are three things we want to do:
A.) we want to set validation_data = (val_input_data, val_target_data) when we call model.fit(...) so that keras can inform us about the accuracy for our validation data after each epoch.
B.) we want to set verbose=2 when we call model.fit(...) so that keras actually prints out the progress after each epoch.
C.) we want to set metrics=['binary_accuracy'] when we call model.compile(...) to actually include the right metric in these progress logs that keras gives us after each epoch.
5. Data generation
Last but not least, as most of the other answers suggest. The more data, the better. I ended up writing a data generator that produces the training data and target data samples for me. My validation data is hand picked and I made sure that the generator does not generate training data that is identical with my validation data. I ended up training with 1000 samples.
The final model
This is the model I ended up with using. It uses a Dropout and a feature size of 64. That said, you can play with these numbers and will notice there are lots of models that will work pretty well.
model = Sequential()
model.add(Convolution2D(64, 3, 3, input_shape=(1, 6, 6), activation='relu', border_mode='same'))
model.add(Dropout(0.25))
model.add(Convolution2D(64, 3, 3, activation='relu', border_mode='same'))
model.add(Dropout(0.25))
model.add(Convolution2D(64, 3, 3, activation='relu', border_mode='same'))
model.add(Dropout(0.25))
model.add(Convolution2D(1, 1, 1, activation='sigmoid', border_mode='same'))

train classifier to detect only eyelashes/nose features dlib and opencv?

I want to know that how can i train cascade classifier to detect only eyelashes or nose feature points in DLIB and [OPENCV][2]#
To be more clear i just want to extract some particular feature points to text file.
i tried extracting features but to no avail it gives all 68 points.
[2]: http://opencv.org/#I want to know that how can i train cascade classifier to detect only eyelashes or nose feature points in [A][1] and [B][2]#
1. To be more clear i just want to extract some particular feature points to text file.
2. i tried extracting features but to no avail it gives all 68 points.
For Dlib python api starting point should be this sample http://dlib.net/face_landmark_detection.py.html
As you see - it has face detection and shape prediction:
dets = detector(img, 1)
...
shape = predictor(img, d)
The shape object contain face shape as a list of feature point coordinates - parts. Each part is one point, for example shape.part(30) is a tip of nose. You can see their numbers on sample pictures from this blog
As I understand, you need simply save this points into file, that can be done like this:
with open("sample_file.txt", "w") as f:
for i in range(30, 32):
f.write("{};{}\n".format(i, shape.part(i)))
Where 30-32 are part numbers that you want to write to file

OpenCV + HOG +SVM: help needed with SVM single feature vector

I try to implement a people detecting system based on SVM and HOG using OpenCV2.3. But I got stucked.
I came this far:
I can compute HOG values from an image database and then I calculate with LIBSVM the SVM vectors, so I get e.g. 1419 SVM vectors with 3780 values each.
OpenCV just wants one feature vector in the method hog.setSVMDetector(). Therefore I have to calculate one feature vector from my 1419 SVM vectors, that LIBSVM has calculated.
I found one hint, how to calculate this single feature vector: link
“The detecting feature vector at component i (where i is in the range e.g. 0-3779) is built out of the sum of the support vectors at i * the alpha value of that support vector, e.g.
det[i] = sum_j (sv_j[i] * alpha[j]) , where j is the number of the support vector, i
is the number of the components of the support vector.”
According to this, my routine works this way:
I take the first element of my first SVM vector, multiply it with the alpha value and add it with the first element of the second SVM vector that has been multiplied with alpha value, …
But after summing up all 1419 elements I get quite high values:
16.0657, -0.351117, 2.73681, 17.5677, -8.10134,
11.0206, -13.4837, -2.84614, 16.796, 15.0564,
8.19778, -0.7101, 5.25691, -9.53694, 23.9357,
If you compare them, to the default vector in the OpenCV sample peopledetect.cpp (and hog.cpp in the OpenCV source)
0.05359386f, -0.14721455f, -0.05532170f, 0.05077307f,
0.11547081f, -0.04268804f, 0.04635834f, -0.05468199f, 0.08232084f,
0.10424068f, -0.02294518f, 0.01108519f, 0.01378693f, 0.11193510f,
0.01268418f, 0.08528346f, -0.06309239f, 0.13054633f, 0.08100729f,
-0.05209739f, -0.04315529f, 0.09341384f, 0.11035026f, -0.07596218f,
-0.05517511f, -0.04465296f, 0.02947334f, 0.04555536f,
you see, that the default vector values are in the boundaries between –1 and +1, but my values exceed them far.
I think, my single feature vector routine needs some adjustment, any ideas?
Regards,
Christoph
The aggregated vector's values do look high.
I used the loadSVMfromModelFile() located in http://lnx.mangaitalia.net/trainer/main.cpp
I had to remove svinstr.sync(); from the code since it caused losing parts of the lines and getting wrong results.
I don't know much about the rest of the file, I only used this function.

Resources