My kernel keeps dying in jupyter notebook when i run fit function - machine-learning

My kernel keeps dying when i run fit function
my tensorflow version 2.6.0
i've reinstalled the jupyter notebook, upgraded my pip, upgraded my tensoflow library,
added this line
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'
and still my kernel keeps dying
this is the code i tried to run
learning_rate_reduction = ReduceLROnPlateau(monitor = 'val_acc', patience = 3, verbose = 1, factor = .5, min_lr = .00001)
es = EarlyStopping(monitor='val_categorical_accuracy', patience = 4)
print('====')
history = model.fit_generator(generator = train_batches, steps_per_epoch = train_batches.n//batch_size, epochs=epochs,
validation_data = val_batches, validation_steps = val_batches.n//batch_size, verbose = 0,
callbacks = [learning_rate_reduction, es])

In my experience you should try one of these
Check Environment Variables
make sure you have CUDA_PATH
make sure you write path to cuda/bin, cuda/include, cuda/lib/x64 in PATH in System Variables
Check if the model is too complex by training the network on smaller simpler model
Make sure anaconda navigator is updated
In my experience python 3.8.5 and tensorflow 2.7 works and can be downloaded in the environments in anaconda navigator

If it broke with simple model, it means you're doing something wrong with the PATH in system environment
If you're using VSCode you might have to set all the variables first before using
If you're using Anaconda you can download directly in download section
im using tensorflow 2.8 and python 3.10 and still works

Related

Kivy App gives "Singular Matrix" error on phone, but not on computer

I've encountered an issue with my self-written Kivy app, that I haven't found anywhere online Any help would be greatly appreciated.
The issue is as follows. My code involves numpy matrix inversion and works absolutely fine when I run it on my computer. But as soon as I run it either on a simulated iPhone in XCode, or on my personal phone, I get a LinAlgError("Singular matrix") numpy.linalg.LinAlgError: Singular matrix error, even though the matrix in question is definitely not singular.
EDIT:
On Computer:
Numpy version: 1.19.1
Output of numpy.show_config():
blas_mkl_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
On simulated phone:
Numpy version: 1.16.4
Output of numpy.show_config():
lapack_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
NOT AVAILABLE
openblas_clapack_info:
NOT AVAILABLE
atlas_3_10_threads_info:
NOT AVAILABLE
atlas_3_10_info:
NOT AVAILABLE
atlas_threads_info:
NOT AVAILABLE
atlas_info:
NOT AVAILABLE
accelerate_info:
extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
lapack_opt_info:
extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
The cause of your issue is almost certainly the -Wl,Accelerate in your linker args. Accelerate ships with a very old and somewhat broken version of LAPACK, which is why as of https://github.com/numpy/numpy/pull/15759 (unreleased 1.20) it is no longer supported at all.
If you can rebuild kivy's numpy with ATLAS=None BLAS=None LAPACK=None set in your environment variables, you'll end up without this Accelerate dependency.
You may have to dig around starting at https://github.com/kivy/python-for-android/blob/develop/pythonforandroid/recipes/numpy/__init__.py to work out how to pass this into kivy.

PyDrake ComputePointPairPenetration() kills kernel

In calling ComputePointPairPenetration() from a QueryObject in Drake in Python in a Jupyter Notebook environment, ComputePointPairPenetration() reliably kills the kernel. I'm not sure what's causing it and couldn't figure out how to get any error message.
In case it's relevant I'm running pydrake locally on a Mac.
Here is relevant code:
builder = DiagramBuilder()
plant, scene_graph = AddMultibodyPlantSceneGraph(builder, time_step=0.00001)
file_name = FindResource("models/robot.urdf")
model = Parser(plant).AddModelFromFile(file_name)
file_name = FindResource("models/object.urdf")
object_model = Parser(plant).AddModelFromFile(file_name)
plant.Finalize()
diagram = builder.Build()
# Run simulation...
# Get geometry info from scene graph
context = scene_graph.AllocateContext()
q_obj = scene_graph.get_query_output_port().Eval(context)
q_obj.ComputePointPairPenetration()
Edit:
#Sherm's comment fixed my problem :) Thank you so much!
For reference:
diagram_context = diagram.CreateDefaultContext()
scene_graph_context = scene_graph.GetMyContextFromRoot(diagram_context)
q_obj = scene_graph.get_query_output_port().Eval(scene_graph_context)
q_obj.ComputePointPairPenetration()
You created a local Context for scene_graph. Instead you want the full diagram context so that the ports are connected up properly (e.g. scene_graph has an input port that receives poses from MultibodyPlant). So the above should work if you ask the Diagram to create a Context, then ask for the SceneGraph subcontext for the calls you have above, rather than creating a standalone SceneGraph context.
This lets you extract the fully-connected subcontext:
scene_graph_context = scene_graph.GetMyContextFromRoot(diagram_context)
FTR Here's a similar formulation in a Drake Python unittest:
TestPlant.test_scene_graph_queries
Note that this takes an alternate route (using diagram.GetMutableSubsystemContext instead of scene_graph.GetMyContextFromroot), namely because it's doing scalar-type conversion as well.
If you're curious about scalar-type conversion (esp. if you're going to be doing optimization, e.g. needing AutoDiffXd), please see:
Drake C++ API: System Scalar Conversion Overview
Drake Python API: pydrake.systems.scalar_conversion
Additionally, here are examples of scalar-converting both MultibodyPlant and SceneGraph for testing InverseKinematics constraint classes:
inverse_kinematics.py: TestConstraints

openCV imshow in WSL using Xming

I am working on some video processing tasks and have been using opencv-python 4.2.0 as my go-to library. At first there was a problem with displaying video frames using the imshow function - I would only see a small black window, but I thought there was something wrong with my logic. I tried reproducing the problem in its simplest form - loading and displaying a static image:
import cv2
frame = imread("path/to/some/image.png")
print(frame.shape)
cv2.imshow('test', frame)
The output:
>>> (600, 600, 3)
I have not had similar problems in this development environment before. I am developing under WSL (Ubuntu 16.04) and use Xming to display the program's window under Win10.
Image in window is updated when function waitKey() is executed - so you have to use it
import cv2
frame = cv2.imread("path/to/some/image.png")
print(frame.shape)
cv2.imshow('test', frame)
cv2.waitKey(1)
At least it resolves this problem on Linux Mint 19.3 based on Ubuntu 18.04

Image processing in TensorFlow distributed session

I am testing out the Tensorflow Distributed (https://www.tensorflow.org/deploy/distributed) with my local machine (Windows) and Ubuntu VM. Where,
I have followed this link Distributed tensorflow replicated training example: grpc_tensorflow_server - No such file or directory and set up the Tensorflow so called server like as per below.
import tensorflow as tf
parameter_servers = ["10.0.3.15:2222"]
workers = ["10.0.3.15:2222","10.0.3.15:2223"]
cluster = tf.train.ClusterSpec({"local": parameter_servers, "worker": workers})
server = tf.train.Server(cluster, job_name="local", task_index=0)
server.join()
Where “10.0.3.15” – is my Ubuntu local ip address.
In the windows host machine – I am doing some simple image preprocessing using open cv and extending the graph session to the VM. I have used following code for that.
*import tensorflow as tf
from OpenCVTest import *
with tf.Session("grpc:// 10.0.3.15:2222") as sess:
### Open CV calling section ###
img = cv2.imread('Data/ball.jpg')
grey_img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
flat_img_array = img.flatten()
x = tf.placeholder(tf.float32, shape=(flat_img_array[0],flat_img_array[1]))
y = tf.multiply(x, x)
sess.run(y)*
I can see that my session is running on my Ubunu machine. Please see below screenshot.
Test_result
[ Note- In the image you would notice, in Windows console I am calling the session and Ubuntu terminal is listening to that same session. ]
But strange thing I have observed is that for the OpenCV preprocessing operation (grey_img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)) it’s leveraging local OpenCV package. I was in assumption that when I am running a session on any other server it should do all the operation on that server. In my case as I am running the session on Ubuntu VM, it should run all the operation which has been defined with tf.Session("grpc:// 10.0.3.15:2222") in this should also be running on that ubuntu VM leveraging VM’s local packages, but that’s not happening.
Is my understanding of the sess.run(y) distributed correct ? When we run the session in a distributed manner. Does it only extend the graph computation load to another machine through gRPC ?
I would summarize my ask like this - “I am planning to do large pre-preprocessing before feeding the value to the tensor and I want to do it in a distributed way. What would be the better approach to follow ? My initial understanding was I can with tensorflow distributed but with this test I think I may not able to do it.“
Any thoughts would be of real help.
Thank you.

load_balanced_view not working properly

I have some code that used to work with ipyparallel using a load_balanced_view. I used to be able to:
from ipyparallel import Client
rcAll = Client()
lbvAll = rcAll.load_balanced_view()
for anInpt in allInpt:
lbvAll.apply(doAll, anInpt)
lbvAll.wait()
lbvAll.get_result()
and then
lbvAll.results.values()
would be a list of the results
however, now lbvAll.apply() does not work for me
I can do
result = lbvAll.map_sync(doAll, allInpt)
and result is returned as a list of results.
Using 2-4 cores/engines, there is not much improvement using 4 cores over 2 cores.
My feeling is that ipyparallel has changed but I am not sure and I do not seem to using it correctly. Thanks in advance for any help.
I am using Python 3.4.5, IPython3 5.1.0 and ipyparallel 5.2.0 on linux

Resources