How to manipulate frame per second in face recognition? - opencv

I am working on face recognition which involves object detection too. I am using Yolov5 for object detection and Facenet for face recognition. I am getting very low fps (~0.400) which makes the task laggy. So how do I limit the fps for first N frames for few preliminary tasks and then instead of 30 frames per second I want to take only 1 frame per second for recognition task?
I tried using cap.set(cv2.CAP_PROP_FPS, 5) but I get an error saying 'Can't grab a frame.'
with tf.Graph().as_default():
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.6)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options, log_device_placement=False))
with sess.as_default():
pnet, rnet, onet = detect_face.create_mtcnn(sess, './models/')
minsize = 20 # minimum size of face
threshold = [0.6, 0.7, 0.7] # three steps's threshold
factor = 0.709 # scale factor
margin = 44
frame_interval = 3
batch_size = 1000
image_size = 182
input_image_size = 160
print('Loading feature extraction model')
modeldir = './models/'
facenet.load_model(modeldir)
images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")
phase_train_placeholder = tf.get_default_graph().get_tensor_by_name("phase_train:0")
embedding_size = embeddings.get_shape()[1]
classifier_filename = './myclassifier/my_classifier.pkl'
classifier_filename_exp = os.path.expanduser(classifier_filename)
with open(classifier_filename_exp, 'rb') as infile:
(model, class_names) = pickle.load(infile)
print('load classifier file-> %s' % type(class_names))
HumanNames = class_names
video_capture = cv2.VideoCapture(0)
c = 0
print('Start!')
prevTime = 0
FPSLimit = 10
StartTime = time.time()
while True:
ret, frame = video_capture.read()
# frame = cv2.resize(frame, (0,0), fx=0.5, fy=0.5) #resize frame (optional)
curTime = time.time() # calcq fps
timeF = frame_interval
#if int(curTime - StartTime) > FPSLimit:
if (c % timeF == 0):
DETECTION TASK
if nrof_faces > 0:
OBJECT DETECTION TASKS
RECOGNITION TASK

I tried to add this as a comment, but it got too long. This isn't exactly an answer to your question, but it might help for an approach. I would start with identifying the delay by measuring the milliseconds for these tasks: DETECTION TASK, OBJECT DETECTION TASKS, RECOGNITION TASK in your code above.
You should be able to do detection in real-time, or at least 5-10 FPS or better, which may suit your needs. If recognition is your bottleneck, I would do that on another thread. That works because you don't need to detect the same face over and over. If you have 30 FPS and the same face in the frame for 5 seconds, then only perform recognition on that face once, not 5x30 times.
Use multi object tracking to track objects (faces) across frames without having to perform face recognition on each one. This tracking algorithm is easy to implement and works fast. So keep track of objects across frames, then submit for recognition only once per track - and do that on another thread.

Related

Why does RMSE increase with horizon when using the timeslice method in caret's trainControl function?

I'm using the timeslice method in caret's trainControl function to perform cross-validation on a time series model. I've noticed that RMSE increases with the horizon argument.
I realise this might happen for several reasons, e.g., if explanatory variables are being forecast and/or there's autocorrelation in the data such that the model can better predict nearer vs. farther ahead observations. However, I'm seeing the same behaviour even when neither is the case (see trivial reproducible example below).
Can anyone explain why RSMEs are increasing with horizon?
# Make data
X = data.frame(matrix(rnorm(1000 * 3), ncol = 3))
X$y = rowSums(X) + rnorm(nrow(X))
# Iterate over different different forecast horizons and record RMSES
library(caret)
forecast_horizons = c(1, 3, 10, 50, 100)
rmses = numeric(length(forecast_horizons))
for (i in 1:length(forecast_horizons)) {
ctrl = trainControl(method = 'timeslice', initialWindow = 500, horizon = forecast_horizons[i], fixedWindow = T)
rmses[i] = train(y ~ ., data = X, method = 'lm', trControl = ctrl)$results$RMSE
}
print(rmses) #0.7859786 0.9132649 0.9720110 0.9837384 0.9849005

Grabbing images from webcam using opencv at specific FPS

Is there an OpenCV function that allows me to grab still images at a specific frame rate. Like I can tell this function to grab 5 images at 10fps, it will take 5 images 0.1 seconds apart exactly.
If not what is a good way to achieve this? My current attempt is to constantly grab images and only save when time is 0.1 seconds after previous frame but not accurate 10fps
afterNextFrame = False
while x < 20:
now = time.monotonic()
if now >= nextFrame:
afterNextFrame = True
if afterNextFrame == True:
cameraCap.grab()
print("\nNow: ", now, "\n")
_, frame = cameraCap.retrieve()
# save frame here
nextFrame += 0.1 # wait 0.1 second for 10 fps
afterNextFrame = False

tensorflow resize nearest neighbor approach don't optmize weights

I'm beginner in tensorflow and i'm working on a Model which Colorize Greyscale images and in the last part of the model the paper say :
Once the features are fused, they are processed by a set of
convolutions and upsampling layers, the latter which consist of simply
upsampling the input by using the nearest neighbour technique so that
the output is twice as wide and twice as tall.
when i tried to implement it in tensorflow i used tf.image.resize_nearest_neighbor for upsampling but when i used it i found the cost didn't change in all the epochs except of the 2nd epoch, and without it the cost is optmized and changed
This part of code
def Model(Input_images):
#some code till the following last part
Color_weights = {'W_conv1':tf.Variable(tf.random_normal([3,3,256,128])),'W_conv2':tf.Variable(tf.random_normal([3,3,128,64])),
'W_conv3':tf.Variable(tf.random_normal([3,3,64,64])),
'W_conv4':tf.Variable(tf.random_normal([3,3,64,32])),'W_conv5':tf.Variable(tf.random_normal([3,3,32,2]))}
Color_biases = {'b_conv1':tf.Variable(tf.random_normal([128])),'b_conv2':tf.Variable(tf.random_normal([64])),'b_conv3':tf.Variable(tf.random_normal([64])),
'b_conv4':tf.Variable(tf.random_normal([32])),'b_conv5':tf.Variable(tf.random_normal([2]))}
Color_layer1 = tf.nn.relu(Conv2d(Fuse, Color_weights['W_conv1'], 1) + Color_biases['b_conv1'])
Color_layer1_up = tf.image.resize_nearest_neighbor(Color_layer1,[56,56])
Color_layer2 = tf.nn.relu(Conv2d(Color_layer1_up, Color_weights['W_conv2'], 1) + Color_biases['b_conv2'])
Color_layer3 = tf.nn.relu(Conv2d(Color_layer2, Color_weights['W_conv3'], 1) + Color_biases['b_conv3'])
Color_layer3_up = tf.image.resize_nearest_neighbor(Color_layer3,[112,112])
Color_layer4 = tf.nn.relu(Conv2d(Color_layer3, Color_weights['W_conv4'], 1) + Color_biases['b_conv4'])
return Color_layer4
The Training Code
Prediction = Model(Input_images)
Colorization_MSE = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(Prediction,tf.Variable(tf.random_normal([2,112,112,32]))))
Optmizer = tf.train.AdadeltaOptimizer(learning_rate= 0.05).minimize(Colorization_MSE)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
for epoch in range(EpochsNum):
epoch_loss = 0
Batch_indx = 1
for i in range(int(ExamplesNum / Batch_size)):#Over batches
print("Batch Num ",i + 1)
ReadNextBatch()
a, c = sess.run([Optmizer,Colorization_MSE],feed_dict={Input_images:Batch_GreyImages})
epoch_loss += c
print("epoch: ",epoch + 1, ",Los: ",epoch_loss)
So what is wrong with my logic or if the problem is in
tf.image.resize_nearest_neighbor what should i do or what is it's replacement ?
Ok, i solved it, i noticed that tf.random normal was the problem and when i replaced it with tf.truncated normal it is works well

Generating a spectrogram for a sequence of 2D movie frames

I have some data that consists of a sequence of video frames which represent changes in luminance over time relative to a moving baseline. In these videos there are two kinds of 'event' that can occur - 'localised' events, which consist of luminance changes in small groups of clustered pixels, and contaminating 'diffuse' events, which affect most of the pixels in the frame:
I'd like to be able to isolate local changes in luminance from diffuse events. I'm planning on doing this by subtracting an appropriately low-pass filtered version of each frame. In order to design an optimal filter, I'd like to know which spatial frequencies of my frames are modulated during diffuse and local events, i.e. I'd like to generate a spectrogram of my movie over time.
I can find lots of information about generating spectrograms for 1D data (e.g. audio), but I haven't come across much on generating spectrograms for 2D data. What I've tried so far is to generate a 2D power spectrum from the Fourier transform of the frame, then perform a polar transformation about the DC component and then average across angles to get a 1D power spectrum:
I then apply this to every frame in my movie, and generate a raster plot of spectral power over time:
Does this seem like a sensible approach to take? Is there a more 'standard' approach to doing spectral analysis on 2D data?
Here's my code:
import numpy as np
# from pyfftw.interfaces.scipy_fftpack import fft2, fftshift, fftfreq
from scipy.fftpack import fft2, fftshift, fftfreq
from matplotlib import pyplot as pp
from matplotlib.colors import LogNorm
from scipy.signal import windows
from scipy.ndimage.interpolation import map_coordinates
def compute_2d_psd(img, doplot=True, winfun=windows.hamming, winfunargs={}):
nr, nc = img.shape
win = make2DWindow((nr, nc), winfun, **winfunargs)
f2 = fftshift(fft2(img*win))
psd = np.abs(f2*f2)
pol_psd = polar_transform(psd, centre=(nr//2, nc//2))
mpow = np.nanmean(pol_psd, 0)
stdpow = np.nanstd(pol_psd, 0)
freq_r = fftshift(fftfreq(nr))
freq_c = fftshift(fftfreq(nc))
pos_freq = np.linspace(0, np.hypot(freq_r[-1], freq_c[-1]),
pol_psd.shape[1])
if doplot:
fig,ax = pp.subplots(2,2)
im0 = ax[0,0].imshow(img*win, cmap=pp.cm.gray)
ax[0,0].set_axis_off()
ax[0,0].set_title('Windowed image')
lnorm = LogNorm(vmin=psd.min(), vmax=psd.max())
ax[0,1].set_axis_bgcolor('k')
im1 = ax[0,1].imshow(psd, extent=(freq_c[0], freq_c[-1],
freq_r[0], freq_r[-1]), aspect='auto',
cmap=pp.cm.hot, norm=lnorm)
# cb1 = pp.colorbar(im1, ax=ax[0,1], use_gridspec=True)
# cb1.set_label('Power (A.U.)')
ax[0,1].set_title('2D power spectrum')
ax[1,0].set_axis_bgcolor('k')
im2 = ax[1,0].imshow(pol_psd, cmap=pp.cm.hot, norm=lnorm,
extent=(pos_freq[0],pos_freq[-1],0,360),
aspect='auto')
ax[1,0].set_ylabel('Angle (deg)')
ax[1,0].set_xlabel('Frequency (cycles/px)')
# cb2 = pp.colorbar(im2, ax=(ax[0,1],ax[1,1]), use_gridspec=True)
# cb2.set_label('Power (A.U.)')
ax[1,0].set_title('Polar-transformed power spectrum')
ax[1,1].hold(True)
# ax[1,1].fill_between(pos_freq, mpow - stdpow, mpow + stdpow,
# color='r', alpha=0.3)
ax[1,1].axvline(0, c='k', ls='--', alpha=0.3)
ax[1,1].plot(pos_freq, mpow, lw=3, c='r')
ax[1,1].set_xlabel('Frequency (cycles/px)')
ax[1,1].set_ylabel('Power (A.U.)')
ax[1,1].set_yscale('log')
ax[1,1].set_xlim(-0.05, None)
ax[1,1].set_title('1D power spectrum')
fig.tight_layout()
return mpow, stdpow, pos_freq
def make2DWindow(shape,winfunc,*args,**kwargs):
assert callable(winfunc)
r,c = shape
rvec = winfunc(r,*args,**kwargs)
cvec = winfunc(c,*args,**kwargs)
return np.outer(rvec,cvec)
def polar_transform(image, centre=(0,0), n_angles=None, n_radii=None):
"""
Polar transformation of an image about the specified centre coordinate
"""
shape = image.shape
if n_angles is None:
n_angles = shape[0]
if n_radii is None:
n_radii = shape[1]
theta = -np.linspace(0, 2*np.pi, n_angles, endpoint=False).reshape(-1,1)
d = np.hypot(shape[0]-centre[0], shape[1]-centre[1])
radius = np.linspace(0, d, n_radii).reshape(1,-1)
x = radius * np.sin(theta) + centre[0]
y = radius * np.cos(theta) + centre[1]
# nb: map_coordinates can give crazy negative values using higher order
# interpolation, which introduce nans when you take the log later on
output = map_coordinates(image, [x, y], order=1, cval=np.nan,
prefilter=True)
return output
I believe that the approach you describe is in general the best way to do this analysis.
However, i did spot an error in your code. as:
np.abs(f2*f2)
is not the PSD of complex array f2, you need to multiply f2 by it's complex conjugate instead of itself (|f2^2| is not the same as |f2|^2).
Instead you should do something like
(f2*np.conjugate(f2)).astype(float)
Or, more cleanly:
np.abs(f2)**2.
The oscillations in the 2D power-spectrum are a tell-tale sign of this kind of error (I've done this before myself!)

How to calculate MAPE for Training/Test set in application of Neural Network in MATLAB efficiently?

I've been using MATLAB for my time series dataset (for an electricity dataset) as a part of my course. It consists of 40,000+ samples. After the formation of neural network, I wanted to test its accuracy. I've been curious more on MAPE(mean absolute percentage error) and RMS(Root Mean Square) errors. To calculate them, I've used following lines of code.
mape_res = zeros(N_TRAIN);
mse_res = zeros(N_TRAIN);
for i_train = 1:N_TRAIN
Inp = inputs_consumption(i_train );
Actual_Output = targets_consumption( i_train + 1 );
Observed_Output = sim( ann, Inp );
mape_res(i_train) = abs(Observed_Output - Actual_Output)/Actual_Output;
mse_res(i_train) = Observed_Output - Actual_Output;
end
mape = sum(mape_res)/N_TRAIN;
mse = sum(power(mse_res,2))/N_TRAIN;
sprintf( 'The MSE on training is %g', mse )
sprintf( 'The MAPE on training is %g', mape )
The problem with above coding is that, for a large dataset(40K samples), it takes almost 15 minutes to iterate through all those loops and it's quite a long waiting for getting result for the error rate; Isn't there any other efficient way to calculate them?
You could always do a rolling average that gets updated each iteration, as follows:
mape_res = abs(Observed_Output - Actual_Output) / Actual_Output;
mse_res = Observed_Output - Actual_Output;
alpha = 1 / i_train;
mape = mape * (1 - alpha) + mape_res * alpha;
mse = mes * (1 - alpha) + power(mse_res,2) * alpha;
Then you could either display the resulting values each iteration, use them for stopping criteria if the desired error rate is reached, or both. This also has the added benefit of not requiring the initialization and population of the mape_res and mse_res vectors unless they happen to be needed elsewhere...
Edit: Do make sure to initialize the mape and mse values to zero prior to entering the for loop :)

Resources