machine learning+deep learning+speech recognition - machine-learning

I run the code in my editor (VS Code) without any problems, but for next step and due to RAM and GPU limitation, I took it in colab, but got an error that seems to be due to mismatch of versions due to transfer from my editor to colab. how can i fix this problem?
The current version of python running on Google Colab is 3.8.16, I used tensorflow 2.3.0 and keras 2.4.3.
The error is related to this part of code when use the model.fit() for train the model:
(I use CTC_loss in model):
model.fit(
train_dg,
validation_data=val_dg,
epochs=args.epochs,
callbacks=[PlotLossesKeras(),
early_stopping,
cp,
csv_logger,
lrs]
)
But I got this error:
----------------------------------------------------------------------------------------------------
**Epoch 00001: LearningRateScheduler reducing learning rate to 0.001. Epoch 1/300
-----------**---------------------------------------------------------------- InvalidArgumentError Traceback (most recent call last) <ipython-input-87-2b4ea6811b43> in <module>
----> 1 model.fit(train_dg,validation_data=val_dg,epochs=args.epochs,callbacks=[PlotLossesKeras(),early_stopping,cp,csv_logger,lrs])
9 frames /usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
57 try:
58 ctx.ensure_initialized()
---> 59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
InvalidArgumentError: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 2 num_classes: 16 labels: 16,0,0,0,0,0,0 labels seen so far: [[node functional_3/CTCloss/CTCLoss (defined at <ipython-input-17-1689d20fc46d>:887) ]] [Op:__inference_train_function_6401]
Function call stack: train_function
---------------------------------------------------------------------------------------
I try change the version of python in colab but it dosent work.
also change num_classes in the last layer of my model, it dosent work too.

Related

Ran into "TypeError: '<' not supported between instances of 'Tensor' and 'list'" when going through dataset

I am replicating ResNet (source: https://arxiv.org/abs/1512.03385).
I ran into the error "TypeError: '<' not supported between instances of 'Tensor' and 'list'" when trying to go through several different dataset in different sections of my code.
I tried different fixes but none worked: (i) I deleted enumerate cause I worried that using this may cause the problem (ii) I tried to go through dataloader rather than dataset but it didn't work
1st time: When I tried to view images:
for images, _ in train_loader:
print('images.shape:', images.shape)
plt.figure(figsize=(16,8))
plt.axis('off')
plt.imshow(torchvision.utils.make_grid(images, nrow=16).permute((1, 2, 0)))
break
2nd/3rd time: when I tried to validate/test the resnet:
with torch.no_grad():
for j, inputs, labels in enumerate(test_loader, start=0):
outputs = resnet_models[i](inputs)
_, prediction = torch.max(outputs, dim=1)
You may notice that I didn't run into this error when training the resnet, and the code is quite similar:
for batch, data in enumerate(train_dataloader, start=0):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
Error message (taking the first error as an example. The rest is pretty much the same)
TypeError Traceback (most recent call last)
Input In [38], in <cell line: 8>()
6 print("Images AFTER NORMALIZATION")
7 print("--------------------------")
----> 8 for images, _ in training_data:
9 sort=False
10 print('images.shape:', images.shape)
File ~/miniconda3/envs/resnet/lib/python3.9/site->packages/torch/utils/data/dataset.py:471, in Subset.getitem(self, idx)
469 if isinstance(idx, list):
470 return self.dataset[[self.indices[i] for i in idx]]
--> 471 return self.dataset[self.indices[idx]]
File ~/miniconda3/envs/resnet/lib/python3.9/site->packages/torchvision/datasets/cifar.py:118, in CIFAR10.getitem(self, index)
115 img = Image.fromarray(img)
117 if self.transform is not None:
--> 118 img = self.transform(img)
120 if self.target_transform is not None:
121 target = self.target_transform(target)
File ~/miniconda3/envs/resnet/lib/python3.9/site->packages/torchvision/transforms/transforms.py:95, in Compose.call(self, img)
93 def call(self, img):
94 for t in self.transforms:
---> 95 img = t(img)
96 return img
File ~/miniconda3/envs/resnet/lib/python3.9/site->packages/torch/nn/modules/module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks >or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
File ~/miniconda3/envs/resnet/lib/python3.9/site->packages/torchvision/transforms/transforms.py:707, in RandomHorizontalFlip.forward(self, >img)
699 def forward(self, img):
700 """
701 Args:
702 img (PIL Image or Tensor): Image to be flipped.
(...)
705 PIL Image or Tensor: Randomly flipped image.
706 """
--> 707 if torch.rand(1) < self.p:
708 return F.hflip(img)
709 return img
TypeError: '<' not supported between instances of 'Tensor' and 'list'
I was having the same error message, probably under different circumstances, but I just found my own bug and figured I would share it anyway for various readers. I was using a torchvision transformation in my dataset, which the dataloader was loading from. The transformation was
torchvision.transforms.RandomHorizontalFlip([0.5]),
and the error is that the input to this transformation should not be a list but should be
torchvision.transforms.RandomHorizontalFlip(0.5),
So if there is anything I can recommend, it's just that maybe there is some list argument being passed through that shouldn't be in some transformation or otherwise.

Why does using X[0] in MNIST classifier code give me an error?

I was learning to do classification with the MNIST dataset. And I got an error with I am not able to figure out, I have done a lot of google searches and I am not able to do anything, maybe you are an expert and can help me. Here is the code--
>>> from sklearn.datasets import fetch_openml
>>> mnist = fetch_openml('mnist_784', version=1)
>>> mnist.keys()
output:
dict_keys(['data', 'target', 'frame', 'categories', 'feature_names', 'target_names', 'DESCR', 'details', 'url'])
>>> X, y = mnist["data"], mnist["target"]
>>> X.shape
output:(70000, 784)
>>> y.shape
output:(70000)
>>> X[0]
output:KeyError Traceback (most recent call last)
c:\users\khush\appdata\local\programs\python\python39\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2897 try:
-> 2898 return self._engine.get_loc(casted_key)
2899 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-10-19c40ecbd036> in <module>
----> 1 X[0]
c:\users\khush\appdata\local\programs\python\python39\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2904 if self.columns.nlevels > 1:
2905 return self._getitem_multilevel(key)
-> 2906 indexer = self.columns.get_loc(key)
2907 if is_integer(indexer):
2908 indexer = [indexer]
c:\users\khush\appdata\local\programs\python\python39\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2898 return self._engine.get_loc(casted_key)
2899 except KeyError as err:
-> 2900 raise KeyError(key) from err
2901
2902 if tolerance is not None:
KeyError: 0
Please answer, there can be a silly mistake because I am a beggineer in ML. It would be really helpful if you gave me some hint also.
The API of fetch_openml changed between versions. In earlier versions, it returns a numpy.ndarray array. Since 0.24.0 (December 2020), as_frame argument of fetch_openml is set to auto (instead of False as default option earlier) which gives you a pandas.DataFrame for the MNIST data. You can force the data read as a numpy.ndarray by setting as_frame = False. See fetch_openml reference .
I was also facing the same problem.
scikit-learn: 0.24.0
matplotlib: 3.3.3
Python: 3.9.1
I used to below code to resolve the issue.
import matplotlib as mpl
import matplotlib.pyplot as plt
# instead of some_digit = X[0]
some_digit = X.to_numpy()[0]
some_digit_image = some_digit.reshape(28,28)
plt.imshow(some_digit_image,cmap="binary")
plt.axis("off")
plt.show()
You don't need to downgrade you scikit-learn library, if you follow the code below:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version= 1, as_frame= False)
mnist.keys()
You load the dataset as a dataframe for you to able to access the images, you have two ways to do this,
Transform the dataframe to an Array
# Transform the dataframe into an array. Check the first value
some_digit = X.to_numpy()[0]
# Reshape it to (28,28). Note: 28 x 28 = 7064, if the reshaping doesn't meet
# this you are not able to show the image
some_digit_image = some_digit.reshape(28,28)
plt.imshow(some_digit_image,cmap="binary")
plt.axis("off")
plt.show()
Transform the row
# Transform the row of your choosing into an array
some_digit = X.iloc[0,:].values
# Reshape it to (28,28). Note: 28 x 28 = 7064, if the reshaping doesn't
# meet this you are not able to show the image
some_digit_image = some_digit.reshape(28,28)
plt.imshow(some_digit_image,cmap="binary")
plt.axis("off")
plt.show()

TypeError: __call__() takes 2 positional arguments but 3 were given. To train Raccoon prediction model using FastRCNN through Transfer Learning

from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from engine import train_one_epoch, evaluate
import utils
import torchvision.transforms as T
num_epochs = 10
for epoch in range(num_epochs):
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
lr_scheduler.step()
evaluate(model, data_loader_test, device=device)
I am using the same code as provided in this link Building Raccoon Model but mine is not working.
This is the error message I am getting
TypeError Traceback (most recent call last)
in ()
2 for epoch in range(num_epochs):
3 # train for one epoch, printing every 10 iterations
4 ----> train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
5 # update the learning rate
6 lr_scheduler.step()
7 frames
in getitem(self, idx)
29 target["iscrowd"] = iscrowd
30 if self.transforms is not None:
31 ---> img, target = self.transforms(img, target)
32 return img, target
33
TypeError: call() takes 2 positional arguments but 3 were given
The above answer is incorrect, I accidentally upvoted before noticing. You are using the wrong Compose, note that it says
https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html#putting-everything-together
"In references/detection/, we have a number of helper functions to simplify training and evaluating detection models. Here, we will use references/detection/engine.py, references/detection/utils.py and references/detection/transforms.py. Just copy them to your folder and use them here."
there are helper scripts. They subclass the compose and flip methods
https://github.com/pytorch/vision/blob/6315358dd06e3a2bcbe9c1e8cdaa10898ac2b308/references/detection/transforms.py#L17
I did the same thing before noticing this. Do not use the compose method from torchvision.transforms, or else you will get the error above. Download their module and load it.
I am kind of a newbie at this and I was also having the same problem.
Upon doing more research, I found this where the accepted answer used:
img = self.transforms(img)
instead of:
img, target = self.transforms(img, target)
Removing "target" solved the error for me and should solve it for you as well. Not entirely sure why even the official PyTorch tutorial also has "target" included but it does not work for us.
I had the same issue, there is even an issue raised on Pytorch discussion forum using regarding the same T.Compose | TypeError: call() takes 2 positional arguments but 3 were given
I was able to overcome this issue by copy and pasting the files on the for a specific version v0.3.0 on the vision/reference/detection of the tutorial I am following building-your-own-object-detector-pytorch-vs-tensorflow-and-how-to-even-get-started
Just to fall into another issue I have raised here ValueError: All bounding boxes should have positive height and width. Found invaid box [500.728515625, 533.3333129882812, 231.10546875, 255.2083282470703] for target at index 0. #2740

TypeError("Tensor is unhashable if Tensor equality is enabled. " K.learning_phase(): 0

I am porting a Keras, Tensorflow, and OpenCV script to TF2 and Keras 2 and have run into a problem. I am getting an error on K.learning_phase(): 0.
The error happens in this code section.
ef detect_image(self, image):
if self.model_image_size != (None, None):
assert self.model_image_size[0]%32 == 0, 'Multiples of 32 required'
assert self.model_image_size[1]%32 == 0, 'Multiples of 32 required'
boxed_image = image_preporcess(np.copy(image), tuple(reversed(self.model_image_size)))
image_data = boxed_image
out_boxes, out_scores, out_classes = self.sess.run(
[self.boxes, self.scores, self.classes],
feed_dict={
self.yolo_model.input: image_data,
self.input_image_shape: [image.shape[0], image.shape[1]],
tf.keras.learning_phase(): 0 })
here is a gist to the full code
https://gist.github.com/robisen1/31976de17af9e752c6ba8d1dd0e08906
Traceback (most recent call last):
File "webcam_detect.py", line 188, in <module>
r_image, ObjectsList = yolo.detect_image(frame)
File "webcam_detect.py", line 110, in detect_image
K.learning_phase(): 0
File "C:\Anaconda3\envs\simplecv\lib\site-packages\tensorflow_core\python\framework\ops.py", line 705, in __hash__
raise TypeError("Tensor is unhashable if Tensor equality is enabled. "
TypeError: Tensor is unhashable if Tensor equality is enabled. Instead, use tensor.experimental_ref() as the key.
(simplecv) PS C:\dev\lacv\yolov3\yolov3ct>
I am not sure what is going on. I would appreciate any insights.
You are trying to use Tensorflow 1.x, which works in graph mode whereas TensorFlow 2.x works in eager mode. TensorFlow 1.X requires users to manually stitch together an abstract syntax tree (the graph) by making tf.* API calls. It then requires users to manually compile the abstract syntax tree by passing a set of output tensors and input tensors to a session.run() call. TensorFlow 2.0 executes eagerly (like Python normally does) and in 2.0, graphs and sessions should feel like implementation details.
The error is due to version. If you are using session in TF2 then you need to use the compatible version and same goes with other operations. Also in TF2 it is tf.keras.backend.learning_phase.
Would recommend to go through the guide - Migrate your TensorFlow 1 code to TensorFlow 2.
For Example Below Code throws the error similar to the error you are facing -
import tensorflow as tf
print(tf.__version__)
x = tf.constant(5)
y = tf.constant(10)
z = tf.constant(20)
# This will show same error.
tensor_set = {x, y, z}
tensor_dict = {x: 'five', y: 'ten', z: 'twenty'}
Output -
2.2.0
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-509b2d8d7ab1> in <module>()
6
7 # This will show same error.
----> 8 tensor_set = {x, y, z}
9 tensor_dict = {x: 'five', y: 'ten', z: 'twenty'}
10
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in __hash__(self)
724 if (Tensor._USE_EQUALITY and executing_eagerly_outside_functions() and
725 (g is None or g.building_function)):
--> 726 raise TypeError("Tensor is unhashable. "
727 "Instead, use tensor.ref() as the key.")
728 else:
TypeError: Tensor is unhashable. Instead, use tensor.ref() as the key.
But below code will fix the issue -
import tensorflow as tf
print(tf.__version__)
x = tf.constant(5)
y = tf.constant(10)
z = tf.constant(20)
#This solves the issue
tensor_set = {x.experimental_ref(), y.experimental_ref(), z.experimental_ref()}
tensor_dict = {x.experimental_ref(): 'five', y.experimental_ref(): 'ten', z.experimental_ref(): 'twenty'}
Output -
2.2.0
WARNING:tensorflow:From <ipython-input-4-05e379e669d9>:12: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
If you are still facing the error, then kindly share the reproducible code for the error like above. Will be happy to help you.
Hope this answers your question. Happy Learning.
try to disable tf.compat.v1.disable_eager_execution()
from tensorflow.compat.v1 import disable_eager_execution
disable_eager_execution()

Dask eror - min() arg is an empty sequence

I'm trying to use Dask to handle a reasonably large dataset but I keep getting
ValueError: min() arg is an empty sequence
when I try to run .describe().compute()
I have confirmed the Describe works in normal Pandas with the same dataset so it must be dask related.
Here is the line I'm using:
inpFile = dd.read_csv(fPath, sep='\t', error_bad_lines= False,quoting=csv.QUOTE_NONE)
and the full error is:
ValueError Traceback (most recent call
last) in ()
----> 1 inpFile.describe().compute()
2 #inpFile2.describe()
/home/badrul/anaconda3/lib/python3.6/site-packages/dask/dataframe/core.py
in describe(self, split_every) 1306 num =
self._get_numeric_data() 1307
-> 1308 stats = [num.count(split_every=split_every), 1309 num.mean(split_every=split_every), 1310
num.std(split_every=split_every),
/home/badrul/anaconda3/lib/python3.6/site-packages/dask/dataframe/core.py
in count(self, axis, split_every) 1191
token=token, split_every=split_every) 1192 if
isinstance(self, DataFrame):
-> 1193 result.divisions = (min(self.columns), max(self.columns)) 1194 return result 1195
ValueError: min() arg is an empty sequence
Although it doesn't run for very long so I suspect it's not loading.
The error then comes when I do: inpFile.describe().compute()

Resources