cuda out of memory problem in gpu in google colab

cuda out of memory problem in gpu in google colab - embedding

I am trying to run code to get stacked embedding from flair and bert and I am getting the following error. one of the suggestion was to reduce the batch size, but how to pass the data in batches? here is the code and error.
from tqdm import tqdm ## tracks progress of loop ##
import torch
from flair.data import Sentence
from flair.embeddings import TransformerDocumentEmbeddings
from flair.embeddings import DocumentPoolEmbeddings
bert_embeddings = TransformerDocumentEmbeddings('bert-base-uncased')
### initialize the document embeddings, mode = mean ###
document_embeddings = DocumentPoolEmbeddings([
flair_forward,
flair_backward,
bert_embeddings
])
# Storing Size of embedding #
z = sentence.embedding.size()[0]
print(z)
### Vectorising text ###
# creating a tensor for storing sentence embeddings
sen = torch.zeros(0,z)
print(sen)
# iterating Sentences #
for tweet in tqdm(txt):
sentence = Sentence(tweet)
document_embeddings.embed(sentence)# *****this line is giving error*****
# Adding Document embeddings to list #
if(torch.cuda.is_available()):
sen = sen.cuda()
sen = torch.cat((sen, sentence.embedding.view(-1,z)),0)
and this the error I am getting.
RuntimeError Traceback (most recent call last)
<ipython-input-24-1eee00445350> in <module>()
24 for tweet in tqdm(txt):
25 sentence = Sentence(tweet)
---> 26 document_embeddings.embed(sentence)
27 # Adding Document embeddings to list #
28 if(torch.cuda.is_available()):
7 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
580 if batch_sizes is None:
581 result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
--> 582 self.dropout, self.training, self.bidirectional, self.batch_first)
583 else:
584 result = _VF.lstm(input, batch_sizes, hx, self._flat_weights, self.bias,
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.43 GiB total capacity; 6.54 GiB already allocated; 10.94 MiB free; 6.70 GiB reserved in total by PyTorch)

embeddings = FlairEmbeddings('news-forward', chars_per_chunk=128)
for example :
embedding_types = [
WordEmbeddings('glove'),
FlairEmbeddings('news-forward',chars_per_chunk=128),
FlairEmbeddings('news-backward'),
]
Edit your code accordingly new function added by flair to avoid this to see if it works. Google collab is not suitable for large transformers models for tasks such as NER, from my own experience.
see github documentation for more examples for your specific task!

Related

Reasons why swifter/dask/ray only use one core for an apply task?

I have this function that I would like to apply to a large dataframe in parallel:
from rdkit import Chem
from rdkit.Chem.MolStandardize import rdMolStandardize
from rdkit import RDLogger
RDLogger.DisableLog('rdApp.*')
def standardize_smiles(smiles):
if smiles is None:
return None
try:
mol = Chem.MolFromSmiles(smiles)
# removeHs, disconnect metal atoms, normalize the molecule, reionize the molecule
clean_mol = rdMolStandardize.Cleanup(mol)
# if many fragments, get the "parent" (the actual mol we are interested in)
parent_clean_mol = rdMolStandardize.FragmentParent(clean_mol)
# try to neutralize molecule
uncharger = rdMolStandardize.Uncharger() # annoying, but necessary as no convenience method exists
uncharged_parent_clean_mol = uncharger.uncharge(parent_clean_mol)
# note that no attempt is made at reionization at this step
# nor at ionization at some pH (rdkit has no pKa caculator)
# the main aim to to represent all molecules from different sources
# in a (single) standard way, for use in ML, catalogue, etc.
te = rdMolStandardize.TautomerEnumerator() # idem
taut_uncharged_parent_clean_mol = te.Canonicalize(uncharged_parent_clean_mol)
return Chem.MolToSmiles(taut_uncharged_parent_clean_mol)
#except:
# return False
standardize_smiles('CCC')
'CCC'
However, neither Dask, nor Swifter, nor Ray can do the job. All frameworks use a single CPU for some reason.
Native Pandas
import pandas as pd
N = 1000
smilest_test = pd.DataFrame({'smiles': ['CCC']*N})
smilest_test
CPU times: user 3.58 s, sys: 0 ns, total: 3.58 s
Wall time: 3.58 s
Swifter 1.3.4
smiles_test['standardized_siles'] = smiles_test.smiles.swifter.allow_dask_on_strings(True).apply(standardize_smiles)
CPU times: user 892 ms, sys: 31.4 ms, total: 923 ms
Wall time: 5.14 s
While this WORKS with the dummy data, it does not with the real data, which looks like this:
The strings are a bit more complicated than the ones in the dummy data.
it seems first swifter needs some time to prepare the parallel execution and only uses one core, but then uses more cores. However, for the real data, it only uses 3 out of 8 cores.
I have the same issue with other frameworks such as dask, ray, modin, swifter.
Is there something that I miss here? Is there a problem when the dataframe contains stings? Why does the parallel execution take so much time even on a single computer (with multiple cores)? Or is there an issue with the RDKit library that I am using that makes it difficult to parallelize the above function?

Ran into "TypeError: '<' not supported between instances of 'Tensor' and 'list'" when going through dataset

I am replicating ResNet (source: https://arxiv.org/abs/1512.03385).
I ran into the error "TypeError: '<' not supported between instances of 'Tensor' and 'list'" when trying to go through several different dataset in different sections of my code.
I tried different fixes but none worked: (i) I deleted enumerate cause I worried that using this may cause the problem (ii) I tried to go through dataloader rather than dataset but it didn't work
1st time: When I tried to view images:
for images, _ in train_loader:
print('images.shape:', images.shape)
plt.figure(figsize=(16,8))
plt.axis('off')
plt.imshow(torchvision.utils.make_grid(images, nrow=16).permute((1, 2, 0)))
break
2nd/3rd time: when I tried to validate/test the resnet:
with torch.no_grad():
for j, inputs, labels in enumerate(test_loader, start=0):
outputs = resnet_models[i](inputs)
_, prediction = torch.max(outputs, dim=1)
You may notice that I didn't run into this error when training the resnet, and the code is quite similar:
for batch, data in enumerate(train_dataloader, start=0):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
Error message (taking the first error as an example. The rest is pretty much the same)
TypeError Traceback (most recent call last)
Input In [38], in <cell line: 8>()
6 print("Images AFTER NORMALIZATION")
7 print("--------------------------")
----> 8 for images, _ in training_data:
9 sort=False
10 print('images.shape:', images.shape)
File ~/miniconda3/envs/resnet/lib/python3.9/site->packages/torch/utils/data/dataset.py:471, in Subset.getitem(self, idx)
469 if isinstance(idx, list):
470 return self.dataset[[self.indices[i] for i in idx]]
--> 471 return self.dataset[self.indices[idx]]
File ~/miniconda3/envs/resnet/lib/python3.9/site->packages/torchvision/datasets/cifar.py:118, in CIFAR10.getitem(self, index)
115 img = Image.fromarray(img)
117 if self.transform is not None:
--> 118 img = self.transform(img)
120 if self.target_transform is not None:
121 target = self.target_transform(target)
File ~/miniconda3/envs/resnet/lib/python3.9/site->packages/torchvision/transforms/transforms.py:95, in Compose.call(self, img)
93 def call(self, img):
94 for t in self.transforms:
---> 95 img = t(img)
96 return img
File ~/miniconda3/envs/resnet/lib/python3.9/site->packages/torch/nn/modules/module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks >or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
File ~/miniconda3/envs/resnet/lib/python3.9/site->packages/torchvision/transforms/transforms.py:707, in RandomHorizontalFlip.forward(self, >img)
699 def forward(self, img):
700 """
701 Args:
702 img (PIL Image or Tensor): Image to be flipped.
(...)
705 PIL Image or Tensor: Randomly flipped image.
706 """
--> 707 if torch.rand(1) < self.p:
708 return F.hflip(img)
709 return img
TypeError: '<' not supported between instances of 'Tensor' and 'list'

I was having the same error message, probably under different circumstances, but I just found my own bug and figured I would share it anyway for various readers. I was using a torchvision transformation in my dataset, which the dataloader was loading from. The transformation was
torchvision.transforms.RandomHorizontalFlip([0.5]),
and the error is that the input to this transformation should not be a list but should be
torchvision.transforms.RandomHorizontalFlip(0.5),
So if there is anything I can recommend, it's just that maybe there is some list argument being passed through that shouldn't be in some transformation or otherwise.

Can I get extra information to a custom scorer function in sklearn?

I am performing a classification task which is essentially doing algorithm configuration, i.e. trying to pick a configuration (or 'mode') which is likely to make the problem-solving algorithm finish in the quickest time.
I am learning to classify the "best" configuration based on features of problem instances. I see that scikit-learn enables you to create your own scoring function to use in tuning the models. However the score_func only takes the true label and the predicted label as input.
Is it possible to identify which row in the dataset a prediction came from (when passing to this custom scorer)? That way I could figure out the performance hit of a predicted ("wrong") config and score the model accordingly. Basically sometimes a "wrong" selection can still be very good and close to the best, but a naive classification has no way of knowing this when the classification labels are purely based on the best config.
Here's a contrived example to illustrate what I'm trying to do
import random as rnd
import pandas as pd
rnd.seed('hello')
probs = [f'instance_{i}' for i in range(6)]
confs = ('analytic', 'bruteforce', 'hybrid')
times = [(p,c,60*rnd.random()) for p in probs for c in confs]
df_alltimes = pd.DataFrame(times, columns=('problem', 'config', 'time'))
print(df_alltimes)
bestrows = df_alltimes.groupby(['problem'])['time'].idxmin()
dataset = df_alltimes.loc[bestrows,['config']].\
rename(columns={'config':'best_config'})
feats = [[rnd.random() for p in range(len(probs))] for f in range(5) ]
for i in range(len(feats)):
dataset[f'feature_{i}'] = feats[i]
print(dataset)
df_alltimes:
problem config time
0 instance_0 analytic 15.307044
1 instance_0 bruteforce 36.742846
2 instance_0 hybrid 35.053416
3 instance_1 analytic 57.781358
4 instance_1 bruteforce 31.723275
5 instance_1 hybrid 8.080238
6 instance_2 analytic 4.211297
7 instance_2 bruteforce 24.034830
8 instance_2 hybrid 39.073023
9 instance_3 analytic 36.325485
10 instance_3 bruteforce 14.717841
11 instance_3 hybrid 57.103908
12 instance_4 analytic 7.358539
13 instance_4 bruteforce 10.805536
14 instance_4 hybrid 2.605044
15 instance_5 analytic 0.489870
16 instance_5 bruteforce 42.888858
17 instance_5 hybrid 58.634073
dataset:
best_config feature_0 feature_1 feature_2 feature_3 feature_4
0 analytic 0.645388 0.641626 0.975619 0.680713 0.209235
5 hybrid 0.993443 0.221038 0.893763 0.408532 0.254791
6 analytic 0.263872 0.142887 0.264538 0.166985 0.800054
10 bruteforce 0.155023 0.601300 0.258767 0.614732 0.850529
14 hybrid 0.766183 0.993692 0.597047 0.401482 0.275133
15 analytic 0.386327 0.065699 0.349115 0.370136 0.357329
I am using sklearn with the dataset where the X would be the feature columns and the y would be the best_config column. In this example, the "bad" choices for instance_0 are both almost equally bad, but for instance_1, the two wrong choices are not equally bad. So I'd like my custom scorer to be able to reflect this somehow. Is that possible?

In the end I did find a way to get the information I was after in the original question. If you're passing a pandas.Series as your target labels, the index attribute is available, so you can look up whatever you want in the full dataset.
In the solution below, the first part is pretty much the same as the original minimal working example - i.e. generating a fake dataset.
In the second part, a custom scorer function is defined, which is then passed to the cross-validating hyperparameter tuner, RandomizedSearchCV. Please bear in mind the data is garbage, so the "results" are meaningless; this is just a demo of how to refer back to a fuller set of results so that you can evaluate the quality of predictions made during hyperparameter tuning based on more specialised information rather than just "match / fail" when doing a classification.
import numpy as np
import pandas as pd
import random as rnd
INSTANCES = 200
FEATURES = 5
HP_ITER = 10
SEED = 1984
# invent timings for some problems run with different configurations
rnd.seed(SEED)
probs = [f'p_{i:03d}' for i in range(INSTANCES)]
confs = ('analytic', 'bruteforce', 'hybrid')
times = [(p,c,60*rnd.random()) for p in probs for c in confs]
df_times = pd.DataFrame(times, columns=('problem', 'config', 'time'))
# pick out the fastest config for each problem
bestrows = df_times.groupby(['problem'])['time'].idxmin()
dataset = df_times.loc[bestrows,['config','problem']]\
.rename(columns={'config':'target'})\
.reset_index(drop=True)
# invent some features for each problem
feats = [[rnd.random() for _ in probs] for f in range(FEATURES) ]
for i in range(len(feats)):
dataset[f'feature_{i}'] = feats[i]
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import make_scorer
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import train_test_split
# split our data into training and test sets
df_trn = dataset.sample(frac=0.8, replace=False, random_state=SEED)
df_tst = dataset.loc[~dataset.index.isin(df_trn.index)]
def _vb_loss(xvals, yvals, validation=False):
"""A custom scorer for cross-validation which uses distance to Virtual Best"""
# use the .index attribute to access the relevant rows in the
# timing data frame
source = df_tst if validation else df_trn
data = source.loc[xvals.index].reindex(columns=['problem','target'])
data['truevals'] = xvals
data['predvals'] = yvals
# what's the best time available for each problem?
data = data.merge(
df_times, left_on=['problem','truevals'], right_on=['problem', 'config']
).rename(columns={'time' : 'best_time'}).drop(columns=['config'])
# what's the time for our predicted choices?
data = data.merge(
df_times, left_on=['problem','predvals'], right_on=['problem','config']
).rename(columns={'time' : 'pred_time'}).drop(columns=['config'])
# how far away were the predictions in total?
residual_seconds = np.sum( data['pred_time'] - data['best_time'] )
return residual_seconds
def fitAndPredict(use_custom_scorer=False):
"""Fit a model and make some predictions """
our_scorer = make_scorer(_vb_loss, greater_is_better=False)
hyperparameters = {'criterion' : ['gini', 'entropy'],
'n_estimators' : list(range(50,250)),
'max_depth' : list(range(2,32))
}
model = RandomizedSearchCV(
RandomForestClassifier(random_state=SEED),
hyperparameters,
n_iter = HP_ITER,
scoring = our_scorer if use_custom_scorer else None,
verbose = 1,
random_state = SEED,
)
model.fit(
df_trn.drop(columns=['target','problem']),
df_trn['target']
)
preds = model.predict(df_tst.drop(columns=['target','problem']))
return _vb_loss(df_tst['target'], preds, validation=True)
print("Timings for all configs:", df_times, "", sep="\n")
print("Labelled dataset:", dataset, "", sep="\n")
print("Test loss with default CV scorer :", fitAndPredict(False))
print("Test loss with custom CV scorer :", fitAndPredict(True))
Here's the output:
** Timings for all configs **
problem config time
0 p_000 analytic 21.811701
1 p_000 bruteforce 29.652341
2 p_000 hybrid 20.376605
3 p_001 analytic 12.989269
4 p_001 bruteforce 51.759137
.. ... ... ...
595 p_198 bruteforce 10.874092
596 p_198 hybrid 14.723661
597 p_199 analytic 24.984775
598 p_199 bruteforce 4.899111
599 p_199 hybrid 36.188729
[600 rows x 3 columns]
** Labelled dataset **
target problem feature_0 feature_1 feature_2 feature_3 feature_4
0 hybrid p_000 0.864952 0.487293 0.946654 0.863503 0.310866
1 analytic p_001 0.514093 0.007643 0.948784 0.582419 0.258159
2 bruteforce p_002 0.319059 0.872320 0.321495 0.807644 0.158471
3 analytic p_003 0.421063 0.955742 0.114808 0.980013 0.900057
4 hybrid p_004 0.325935 0.125824 0.697967 0.037196 0.923626
.. ... ... ... ... ... ... ...
195 hybrid p_195 0.179126 0.578338 0.391535 0.632501 0.442677
196 bruteforce p_196 0.827637 0.641567 0.710201 0.833341 0.215357
197 hybrid p_197 0.116661 0.480170 0.253893 0.623913 0.465419
198 bruteforce p_198 0.670555 0.037084 0.954332 0.408546 0.935973
199 bruteforce p_199 0.371541 0.463060 0.549176 0.581093 0.391114
[200 rows x 7 columns]
Fitting 5 folds for each of 10 candidates, totalling 50 fits
[Parallel(n_jobs=None)]: Done 50 out of 50 | elapsed: 8.8s finished
Test loss with default CV scorer : 542.5191014477357
Fitting 5 folds for each of 10 candidates, totalling 50 fits
[Parallel(n_jobs=None)]: Done 50 out of 50 | elapsed: 9.1s finished
Test loss with custom CV scorer : 522.3236277796698

CVXPY trying to formulate big problem: ValueError: negative dimensions are not allowed. Is RAM usage the problem here?

I am facing a problem trying to run a fairly large optimization problem in my opinion. You can see the code below. The variable b size is 500 x 96. What I am trying to do is to match a sum of timeseries profiles (351236 x 15 min timesteps) with a bigger profile by minimizing their difference. With the same formulation and a much smaller problem (672 timesteps and a b variable of the size 10 x 5) the problem is solved in under 2 seconds without a problem. But when I am running it for the full scale problem I get the error you see below.
I am running this on Jupyter Lab and python 3.7.4. The python installation is done with conda.
I would expect the problem to solve as with the much smaller problem. But when I run this one, RAM usage explodes up to 100 GB (about 99% of the available RAM on the server). After a while the RAM usage goes down and then a periodical swinging begins (RAM goes up and down from 50% to 100% every few minutes). From the error and after a lot of googling my suspicion is that the problem is too big for the memory and that at some point data is getting broken down to smaller pieces. I do not think it reaches to the point, where the solver does its work. I tried to optimize the code by vectorizing everything (current version) and trying not to have loops etc. in the formulation. But this did not change anything. Do you guys have any clue if this is a bug or a limitation? Or do you maybe have an idea on how to solve this?
X_opt = cp.Constant(np.asarray(X.iloc[:,:500])) # the array size is (35136,500)
K_opt = cp.Constant(np.asarray(K.YearlyDemand)) # the vector size is 96
b = cp.Variable((500,96),boolean = True, value = np.zeros((500,96)))
Y_opt = cp.Constant(np.asarray(y)) # the vector size is 35136
constraints = []
constraints.append( cp.sum(b, axis = 0) == 1 ) # the sum of the elements of every column of b must be equal to 1
constraints.append( cp.sum(b, axis = 1) <= 1 ) # the sum of the elements of every row of b must be smaller or equal to 1
objective = cp.Minimize(cp.sum(cp.abs(Y_opt-cp.sum((cp.diag(K_opt)*((X_opt#b).T)).T, axis = 1))))
prob = cp.Problem(objective, constraints)
prob.solve(solver = cp.GLPK_MI, verbose = True)
ValueError Traceback (most recent call last)
in
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\problems\problem.py in solve(self, *args, **kwargs)
287 else:
288 solve_func = Problem._solve
--> 289 return solve_func(self, *args, **kwargs)
290
291 #classmethod
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\problems\problem.py in _solve(self, solver, warm_start, verbose, parallel, gp, qcp, **kwargs)
567 self._construct_chains(solver=solver, gp=gp)
568 data, solving_inverse_data = self._solving_chain.apply(
--> 569 self._intermediate_problem)
570 solution = self._solving_chain.solve_via_data(
571 self, data, warm_start, verbose, kwargs)
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\reductions\chain.py in apply(self, problem)
63 inverse_data = []
64 for r in self.reductions:
---> 65 problem, inv = r.apply(problem)
66 inverse_data.append(inv)
67 return problem, inverse_data
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\reductions\matrix_stuffing.py in apply(self, problem)
98 # Batch expressions together, then split apart.
99 expr_list = [arg for c in cons for arg in c.args]
--> 100 Afull, bfull = extractor.affine(expr_list)
101 if 0 not in Afull.shape and 0 not in bfull.shape:
102 Afull = cvxtypes.constant()(Afull)
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\utilities\coeff_extractor.py in affine(self, expr)
76 size = sum([e.size for e in expr_list])
77 op_list = [e.canonical_form[0] for e in expr_list]
---> 78 V, I, J, b = canonInterface.get_problem_matrix(op_list, self.id_map)
79 A = sp.csr_matrix((V, (I, J)), shape=(size, self.N))
80 return A, b.flatten()
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\cvxcore\python\canonInterface.py in get_problem_matrix(linOps, id_to_col, constr_offsets)
65
66 # Unpacking
---> 67 V = problemData.getV(len(problemData.V))
68 I = problemData.getI(len(problemData.I))
69 J = problemData.getJ(len(problemData.J))
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\cvxcore\python\cvxcore.py in getV(self, values)
320
321 def getV(self, values):
--> 322 return _cvxcore.ProblemData_getV(self, values)
323
324 def getI(self, values):
ValueError: negative dimensions are not allowed

This problem is solved here:
https://github.com/cvxgrp/cvxpy/issues/826#issuecomment-648618636
Note that the general problem is that large problems create underlying matrixies too large for the numpy.int32 which CVXPY uses. You can modify the code in CVXPY fairly easily to continue using the SCS solver.
You will have to modify the file canonInterface.py here:
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\cvxcore\python\
If you have trouble finding the second file to modify, just modify the first one, and use the traceback to find the second file.

How to Batch Multiple Videoframes before run Tensorflow Inference Session

I made a project that basically uses googles object detection api with tensorflow.
All i am doing is inference with a pre-trained model: Which means realtime object detection where the Input is the Videostream of a webcam or something similar using OpenCV.
Right now i got pretty decent performance results, but i want to further increase the FPS.
Because what i experience is that Tensorflow uses my whole Memory while Inference but the GPU Usage is not maxed out at all (around 40% with a NVIDIA GTX 1050 Laptop, and 6% on a NVIDIA Jetson Tx2).
So my idea was to increase the GPU Usage by increasing the image batch size which is fed in each session run.
So my question is: How can i Batch multiple Frames of the Input-Videostream together before i feed them to sess.run()?
Have a look at my code object_detetection.py on my github repo: (https://github.com/GustavZ/realtime_object_detection).
I would be very thankful if you come up with some Hints or Code Implementations!
import numpy as np
import os
import six.moves.urllib as urllib
import tarfile
import tensorflow as tf
import cv2
# Protobuf Compilation (once necessary)
os.system('protoc object_detection/protos/*.proto --python_out=.')
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
from stuff.helper import FPS2, WebcamVideoStream
# INPUT PARAMS
# Must be OpenCV readable
# 0 = Default Camera
video_input = 0
visualize = True
max_frames = 300 #only used if visualize==False
width = 640
height = 480
fps_interval = 3
bbox_thickness = 8
# Model preparation
# What model to download.
MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = 'models/' + MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
LABEL_MAP = 'mscoco_label_map.pbtxt'
PATH_TO_LABELS = 'object_detection/data/' + LABEL_MAP
NUM_CLASSES = 90
# Download Model
if not os.path.isfile(PATH_TO_CKPT):
print('Model not found. Downloading it now.')
opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
file_name = os.path.basename(file.name)
if 'frozen_inference_graph.pb' in file_name:
tar_file.extract(file, os.getcwd())
os.remove('../' + MODEL_FILE)
else:
print('Model found. Proceed.')
# Load a (frozen) Tensorflow model into memory.
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
# Loading label map
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
# Start Video Stream
video_stream = WebcamVideoStream(video_input,width,height).start()
cur_frames = 0
# Detection
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
# fps calculation
fps = FPS2(fps_interval).start()
print ("Press 'q' to Exit")
while video_stream.isActive():
image_np = video_stream.read()
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
# Actual detection.
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=bbox_thickness)
if visualize:
cv2.imshow('object_detection', image_np)
# Exit Option
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
cur_frames += 1
if cur_frames >= max_frames:
break
# fps calculation
fps.update()
# End everything
fps.stop()
video_stream.stop()
cv2.destroyAllWindows()
print('[INFO] elapsed time (total): {:.2f}'.format(fps.elapsed()))
print('[INFO] approx. FPS: {:.2f}'.format(fps.fps()))

Well, I'd just collect batch_size frames and feed them:
batch_size = 5
while video_stream.isActive():
image_np_list = []
for _ in range(batch_size):
image_np_list.append(video_stream.read())
fps.update()
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.asarray(image_np_list)
# Actual detection.
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
for i in range(batch_size):
vis_util.visualize_boxes_and_labels_on_image_array(
image_np_expanded[i],
boxes[i],
classes[i].astype(np.int32),
scores[i],
category_index,
use_normalized_coordinates=True,
line_thickness=bbox_thickness)
if visualize:
cv2.imshow('object_detection', image_np_expanded[i])
# Exit Option
if cv2.waitKey(1) & 0xFF == ord('q'):
break
Of course you'll have to make the relevant changes after that if you're reading the results from the detection, since they will now have batch_size rows.
Be careful though: before tensorflow 1.4 (I think), the object detection API only supports batch size of 1 in image_tensor, so this will not work unless you upgrade your tensorflow.
Also note that your resulting FPS will be an average, but that the frames in a same batch will actually be closer in time than between different batches (since you'll still need to wait for the sess.run() to finish). The average should still be significantly better than your current FPS, although the max time between two consecutive frames should increase.
If you want your frames to all have roughly the same interval between them, I guess you'll need more sophisticated tools like multithreading and queueing: one thread would read the images from the stream and store them in a queue, the other one would take them from the queue and call sess.run() on them asynchronously; it could also tell the 1st thread to hurry up or slow down depending on its own computing capacity. This is trickier to implement.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

cuda out of memory problem in gpu in google colab - embedding

Related

Reasons why swifter/dask/ray only use one core for an apply task?

Ran into "TypeError: '<' not supported between instances of 'Tensor' and 'list'" when going through dataset

Can I get extra information to a custom scorer function in sklearn?

CVXPY trying to formulate big problem: ValueError: negative dimensions are not allowed. Is RAM usage the problem here?

How to Batch Multiple Videoframes before run Tensorflow Inference Session

Categories

Resources