When increasing size of data created, why does Jupyter Kernel die before MemoryError appears? - docker

I ran jupyter through docker run -p 8888:8888 jupyter/scipy-notebook.
from sklearn.datasets import make_classification
X, y = make_classification(10000,2000)
causes kernel to die, while X, y = make_classification(100000,2000) gives
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
Cell In [1], line 3
1 from sklearn.datasets import make_classification
----> 3 X, y = make_classification(100000,2000)
File /opt/conda/lib/python3.10/site-packages/sklearn/datasets/_samples_generator.py:220, in make_classification(n_samples, n_features, n_informative, n_redundant, n_repeated, n_classes, n_clusters_per_class, weights, flip_y, class_sep, hypercube, shift, scale, shuffle, random_state)
217 n_samples_per_cluster[i % n_clusters] += 1
219 # Initialize X and y
--> 220 X = np.zeros((n_samples, n_features))
221 y = np.zeros(n_samples, dtype=int)
223 # Build the polytope whose vertices become cluster centroids
MemoryError: Unable to allocate 1.49 GiB for an array with shape (100000, 2000) and data type float64
It looks like for the larger (100000,2000 ) data size, allocation failed and errored out early.
Did the smaller (10000,2000 ) data size allocate successfully
since there's no error raised?
So what is killing the kernel and why is the initial memory allocation insufficient as a check? It looks like there is extra memory allocated as the code runs?
When running the 10000, 2000 code, docker stats shows
MEM USAGE / LIMIT beginning at 152.2MiB / 966.2MiB then shooting to 369.2MiB / 966.2MiB, waiting for a few seconds, then kernel died pop-up.
369 looks very far from 966 available, did the MEM USAGE suddenly jump from 369 to above 966 in 1 step?
If the kernel indeed died because MEM USAGE was too high, how do I know when (eg. what % of available memory usage) to start being careful and deleting unused variables?

Related

machine learning+deep learning+speech recognition

I run the code in my editor (VS Code) without any problems, but for next step and due to RAM and GPU limitation, I took it in colab, but got an error that seems to be due to mismatch of versions due to transfer from my editor to colab. how can i fix this problem?
The current version of python running on Google Colab is 3.8.16, I used tensorflow 2.3.0 and keras 2.4.3.
The error is related to this part of code when use the model.fit() for train the model:
(I use CTC_loss in model):
model.fit(
train_dg,
validation_data=val_dg,
epochs=args.epochs,
callbacks=[PlotLossesKeras(),
early_stopping,
cp,
csv_logger,
lrs]
)
But I got this error:
----------------------------------------------------------------------------------------------------
**Epoch 00001: LearningRateScheduler reducing learning rate to 0.001. Epoch 1/300
-----------**---------------------------------------------------------------- InvalidArgumentError Traceback (most recent call last) <ipython-input-87-2b4ea6811b43> in <module>
----> 1 model.fit(train_dg,validation_data=val_dg,epochs=args.epochs,callbacks=[PlotLossesKeras(),early_stopping,cp,csv_logger,lrs])
9 frames /usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
57 try:
58 ctx.ensure_initialized()
---> 59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
InvalidArgumentError: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 2 num_classes: 16 labels: 16,0,0,0,0,0,0 labels seen so far: [[node functional_3/CTCloss/CTCLoss (defined at <ipython-input-17-1689d20fc46d>:887) ]] [Op:__inference_train_function_6401]
Function call stack: train_function
---------------------------------------------------------------------------------------
I try change the version of python in colab but it dosent work.
also change num_classes in the last layer of my model, it dosent work too.

cuda out of memory problem in gpu in google colab

I am trying to run code to get stacked embedding from flair and bert and I am getting the following error. one of the suggestion was to reduce the batch size, but how to pass the data in batches? here is the code and error.
from tqdm import tqdm ## tracks progress of loop ##
import torch
from flair.data import Sentence
from flair.embeddings import TransformerDocumentEmbeddings
from flair.embeddings import DocumentPoolEmbeddings
bert_embeddings = TransformerDocumentEmbeddings('bert-base-uncased')
### initialize the document embeddings, mode = mean ###
document_embeddings = DocumentPoolEmbeddings([
flair_forward,
flair_backward,
bert_embeddings
])
# Storing Size of embedding #
z = sentence.embedding.size()[0]
print(z)
### Vectorising text ###
# creating a tensor for storing sentence embeddings
sen = torch.zeros(0,z)
print(sen)
# iterating Sentences #
for tweet in tqdm(txt):
sentence = Sentence(tweet)
document_embeddings.embed(sentence)# *****this line is giving error*****
# Adding Document embeddings to list #
if(torch.cuda.is_available()):
sen = sen.cuda()
sen = torch.cat((sen, sentence.embedding.view(-1,z)),0)
and this the error I am getting.
RuntimeError Traceback (most recent call last)
<ipython-input-24-1eee00445350> in <module>()
24 for tweet in tqdm(txt):
25 sentence = Sentence(tweet)
---> 26 document_embeddings.embed(sentence)
27 # Adding Document embeddings to list #
28 if(torch.cuda.is_available()):
7 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
580 if batch_sizes is None:
581 result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
--> 582 self.dropout, self.training, self.bidirectional, self.batch_first)
583 else:
584 result = _VF.lstm(input, batch_sizes, hx, self._flat_weights, self.bias,
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.43 GiB total capacity; 6.54 GiB already allocated; 10.94 MiB free; 6.70 GiB reserved in total by PyTorch)
embeddings = FlairEmbeddings('news-forward', chars_per_chunk=128)
for example :
embedding_types = [
WordEmbeddings('glove'),
FlairEmbeddings('news-forward',chars_per_chunk=128),
FlairEmbeddings('news-backward'),
]
Edit your code accordingly new function added by flair to avoid this to see if it works. Google collab is not suitable for large transformers models for tasks such as NER, from my own experience.
see github documentation for more examples for your specific task!

CVXPY trying to formulate big problem: ValueError: negative dimensions are not allowed. Is RAM usage the problem here?

I am facing a problem trying to run a fairly large optimization problem in my opinion. You can see the code below. The variable b size is 500 x 96. What I am trying to do is to match a sum of timeseries profiles (351236 x 15 min timesteps) with a bigger profile by minimizing their difference. With the same formulation and a much smaller problem (672 timesteps and a b variable of the size 10 x 5) the problem is solved in under 2 seconds without a problem. But when I am running it for the full scale problem I get the error you see below.
I am running this on Jupyter Lab and python 3.7.4. The python installation is done with conda.
I would expect the problem to solve as with the much smaller problem. But when I run this one, RAM usage explodes up to 100 GB (about 99% of the available RAM on the server). After a while the RAM usage goes down and then a periodical swinging begins (RAM goes up and down from 50% to 100% every few minutes). From the error and after a lot of googling my suspicion is that the problem is too big for the memory and that at some point data is getting broken down to smaller pieces. I do not think it reaches to the point, where the solver does its work. I tried to optimize the code by vectorizing everything (current version) and trying not to have loops etc. in the formulation. But this did not change anything. Do you guys have any clue if this is a bug or a limitation? Or do you maybe have an idea on how to solve this?
X_opt = cp.Constant(np.asarray(X.iloc[:,:500])) # the array size is (35136,500)
K_opt = cp.Constant(np.asarray(K.YearlyDemand)) # the vector size is 96
b = cp.Variable((500,96),boolean = True, value = np.zeros((500,96)))
Y_opt = cp.Constant(np.asarray(y)) # the vector size is 35136
constraints = []
constraints.append( cp.sum(b, axis = 0) == 1 ) # the sum of the elements of every column of b must be equal to 1
constraints.append( cp.sum(b, axis = 1) <= 1 ) # the sum of the elements of every row of b must be smaller or equal to 1
objective = cp.Minimize(cp.sum(cp.abs(Y_opt-cp.sum((cp.diag(K_opt)*((X_opt#b).T)).T, axis = 1))))
prob = cp.Problem(objective, constraints)
prob.solve(solver = cp.GLPK_MI, verbose = True)
ValueError Traceback (most recent call last)
in
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\problems\problem.py in solve(self, *args, **kwargs)
287 else:
288 solve_func = Problem._solve
--> 289 return solve_func(self, *args, **kwargs)
290
291 #classmethod
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\problems\problem.py in _solve(self, solver, warm_start, verbose, parallel, gp, qcp, **kwargs)
567 self._construct_chains(solver=solver, gp=gp)
568 data, solving_inverse_data = self._solving_chain.apply(
--> 569 self._intermediate_problem)
570 solution = self._solving_chain.solve_via_data(
571 self, data, warm_start, verbose, kwargs)
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\reductions\chain.py in apply(self, problem)
63 inverse_data = []
64 for r in self.reductions:
---> 65 problem, inv = r.apply(problem)
66 inverse_data.append(inv)
67 return problem, inverse_data
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\reductions\matrix_stuffing.py in apply(self, problem)
98 # Batch expressions together, then split apart.
99 expr_list = [arg for c in cons for arg in c.args]
--> 100 Afull, bfull = extractor.affine(expr_list)
101 if 0 not in Afull.shape and 0 not in bfull.shape:
102 Afull = cvxtypes.constant()(Afull)
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\utilities\coeff_extractor.py in affine(self, expr)
76 size = sum([e.size for e in expr_list])
77 op_list = [e.canonical_form[0] for e in expr_list]
---> 78 V, I, J, b = canonInterface.get_problem_matrix(op_list, self.id_map)
79 A = sp.csr_matrix((V, (I, J)), shape=(size, self.N))
80 return A, b.flatten()
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\cvxcore\python\canonInterface.py in get_problem_matrix(linOps, id_to_col, constr_offsets)
65
66 # Unpacking
---> 67 V = problemData.getV(len(problemData.V))
68 I = problemData.getI(len(problemData.I))
69 J = problemData.getJ(len(problemData.J))
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\cvxcore\python\cvxcore.py in getV(self, values)
320
321 def getV(self, values):
--> 322 return _cvxcore.ProblemData_getV(self, values)
323
324 def getI(self, values):
ValueError: negative dimensions are not allowed
This problem is solved here:
https://github.com/cvxgrp/cvxpy/issues/826#issuecomment-648618636
Note that the general problem is that large problems create underlying matrixies too large for the numpy.int32 which CVXPY uses. You can modify the code in CVXPY fairly easily to continue using the SCS solver.
You will have to modify the file canonInterface.py here:
D:\Anaconda3\envs\py37DuAL\lib\site-packages\cvxpy\cvxcore\python\
If you have trouble finding the second file to modify, just modify the first one, and use the traceback to find the second file.

Allocator runs out of memory even on very low batch sizes

This problem never used to occur but since today Tensorflow always tries to allocate a huge amount of memory, even when using very small batch sizes.
I followed this tutorial:
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
"Using the bottleneck features of a pre-trained network: 90% accuracy in a minute"
This is my code:
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense
from keras import applications
img_width, img_height = 150, 150
top_model_weights_path = 'bottleneck_fc_model.h5'
train_data_dir = 'C:\\ImageData\\Augmented\\Train'
validation_data_dir = 'C:\\ImageData\\Augmented\\Validate'
#train_data_dir = 'C:\\Users\\NSA\\flower_photos\\Train'
#validation_data_dir = 'C:\\Users\\NSA\\flower_photos\\Validate'
nb_train_samples = 25
nb_validation_samples = 5
epochs = 10
my_batch_size = 10
def save_bottleneck_features():
datagen = ImageDataGenerator(rescale=1./255)
# build the VGG16 network
model = applications.VGG16(include_top=False, weights='imagenet')
generator = datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=my_batch_size,
class_mode=None,
shuffle=False)
bottleneck_features_train = model.predict_generator(
generator,
steps=nb_train_samples // my_batch_size,
max_queue_size=10,
workers=1,
use_multiprocessing=False,
verbose=1)
np.save(open('bottleneck_features_train.npy', 'w'),
bottleneck_features_train)
generator = datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=my_batch_size,
class_mode=None,
shuffle=False)
bottleneck_features_validation = model.predict_generator(
generator, nb_validation_samples // my_batch_size)
np.save(open('bottleneck_features_validation.npy', 'w'),
bottleneck_features_validation)
def train_top_model():
train_data = np.load(open('bottleneck_features_train.npy'))
train_labels = np.array(
[0] * (nb_train_samples / 2) + [1] * (nb_train_samples / 2))
validation_data = np.load(open('bottleneck_features_validation.npy'))
validation_labels = np.array(
[0] * (nb_validation_samples / 2) + [1] * (nb_validation_samples / 2))
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_labels,
epochs=epochs,
batch_size=my_batch_size,
validation_data=(validation_data, validation_labels))
model.save_weights(top_model_weights_path)
save_bottleneck_features()
train_top_model()
And this is the error I get:
PS C:\Users\NSA\ownCloud\Documents\Tensorflow\Skripts> cd 'c:\Users\NSA\ownCloud\Documents\Tensorflow\Skripts'; ${env:PYTHONIOENCODING}='UTF-8'; ${env:PYTHONUNBUFFERED}='1'; & 'C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\python.exe' 'C:\Users\NSA\.vscode\extensions\ms-python.python-2018.3.1\pythonFiles\PythonTools\visualstudio_py_launcher.py' 'c:\Users\NSA\ownCloud\Documents\Tensorflow\Skripts' '50490' '34806ad9-833a-4524-8cd6-18ca4aa74f14' 'RedirectOutput,RedirectOutput' 'c:\Users\NSA\ownCloud\Documents\Tensorflow\Skripts\first_try_real_transfer_learning_keras_vgg16.py'
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will
be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Bottleneck Features saven
2018-04-09 16:02:08.772206: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-04-09 16:02:09.345010: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1212] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.189
pciBusID: 0000:02:00.0
totalMemory: 2.00GiB freeMemory: 1.66GiB
2018-04-09 16:02:09.356147: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1312] Adding visible gpu devices: 0
2018-04-09 16:02:10.108947: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1429 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:02:00.0, compute capability: 5.0)
Found 109 images belonging to 2 classes.
2018-04-09 16:02:16.979539: W C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.33GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-04-09 16:02:17.441196: W C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.19GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-04-09 16:02:17.792983: W C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-04-09 16:02:18.122577: W C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.17GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2/2 [==============================] - 4s 2s/step
Traceback (most recent call last):
File "c:\Users\NSA\ownCloud\Documents\Tensorflow\Skripts\first_try_real_transfer_learning_keras_vgg16.py", line 94, in <module>
save_bottleneck_features()
File "c:\Users\NSA\ownCloud\Documents\Tensorflow\Skripts\first_try_real_transfer_learning_keras_vgg16.py", line 56, in save_bottleneck_features
bottleneck_features_train)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 511, in save
pickle_kwargs=pickle_kwargs)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\format.py", line 565, in write_array
version)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\format.py", line 335, in _write_array_header
fp.write(header_prefix)
TypeError: write() argument must be str, not bytes
PS C:\Users\NSA\ownCloud\Documents\Tensorflow\Skripts>
The error occurs specifically when calling model.predict_generator()
At first I thought its running out of memory because my batch size is too large, but even when I use a batch size of 1 it requires over 2GiB of memory. I have installed CUDA 9.0, cuDNN 7.0, Tensorflow 1.6.0 and Keras 2.1.5 using TensorFlow backend. This used to work without issue but it suddenly started giving me this error. I'm using a NVIDIA GeForce 940MX
Your problem has nothing to do with memory or tensorflow. A file opened as text is being written bytes.
Instead of opening the file as text:
open('bottleneck_features_train.npy', 'w')
open it as bytes:
open('bottleneck_features_train.npy', 'wb')
This applies to all the calls to open you have.

Out of memory exception for a matrix

I have the "'System.OutOfMemoryException" exception for this simple code (a 10 000 * 10 000 matrix) multiplied by itself:
#time
#r "Microsoft.Office.Interop.Excel"
#r "FSharp.PowerPack.dll"
open System
open System.IO
open Microsoft.FSharp.Math
open System.Collections.Generic
let mutable Matrix1 = Matrix.create 10000 10000 0.
let matrix4 = Matrix1 * Matrix1
I have the following error:
System.OutOfMemoryException: An exception 'System.OutOfMemoryException' has been raised
Microsoft.FSharp.Collections.Array2DModule.ZeroCreate[T](Int32 length1, Int32 length2)
Microsoft.FSharp.Math.DoubleImpl.mulDenseMatrixDS(DenseMatrix`1 a, DenseMatrix`1 b)
Microsoft.FSharp.Math.SpecializedGenericImpl.mulM[a](Matrix`1 a, Matrix`1 b)
<StartupCode$FSI_0004>.$FSI_0004.main#() dans C:\Users\XXXXXXX\documents\visual studio 2010\Projects\Library1\Library1\Module1.fs:line 92
Stop due to an error
I have therefore 2 questions:
I have a 8 GB memory on my computer and according to my calculation a 10 000 * 10 000 matrix should take 381 MB [computed this way : 10 000 * 10 000 = 100 000 000 integers in the matrix => 100 000 000 * 4 bytes (integers of 32 bits) = 400 000 000 => 400 000 000 / (1024*1024) = 381 MB] so I cannot understand why there is an OutOfMemoryException
More generally (it's not the case here I think), I have the impression that F# interactive registers all the data and therefore overloads the memory, do you know of a way to free all the data registered by F# interactive without exiting F#?
In summary, fsi is a 32-bit process; at most it can hold 2GB of data. Run your test as a 64-bit Windows application; you can increase the size of the matrix, but it still has 2GB limit of .NET objects.
I correct your calculation a little bit. Matrix1 is a float matrix, so each element occupies 8 bytes in memory. The total size of Matrix1 and matrix4 in memory is at least:
2 * 10000 * 10000 * 8 = 1 600 000 000 bytes ~ 1.6 GB
(ignoring some bookkeeping parts of matrix)
So it's no surprise when fsi*32 runs out of memory in this case.
Execute the test as a 64-bit Windows process, you can create float matrices of size around 15000 but not more than that. Check out this informative article for concrete numbers with different types of matrix elements.
The amount of physical memory on your computer is not the relevant bottleneck - see Eric Lippert's great blog post for more information.

Resources