Running pre trained neural net model crashes whole system - machine-learning

I am fairly new to machine learning and have been trying to implement a GAN from source code here https://github.com/tkarras/progressive_growing_of_gans
I have all the dependencies as far as I can tell and I receive no errors when I run their import script. However when I reach the line I have marked below to generate images from the received generator my system shuts off abruptly.
I get no error logs or system events other than the kernel power loss. I have tested some of the CUDA util examples for bandwidth and device testing and that seems to run no problem which leads me to believe its not a hardware issue.
import pickle
import numpy as np
import tensorflow as tf
import PIL.Image
# Initialize TensorFlow session.
tf.InteractiveSession()
# Import official CelebA-HQ networks.
with open('karras2018iclr-celebahq-1024x1024.pkl', 'rb') as file:
G, D, Gs = pickle.load(file)
# Generate latent vectors.
latents = np.random.RandomState(1000).randn(1000, *Gs.input_shapes[0][1:]) # 1000 random latents
latents = latents[[477, 56, 83, 887, 583, 391, 86, 340, 341, 415]] # hand-picked top-10
# Generate dummy labels (not used by the official networks).
labels = np.zeros([latents.shape[0]] + Gs.input_shapes[1][1:])
# Run the generator to produce a set of images.
!!!!!!!!!!!SYSTEM CRASHES ON THIS INSTRUCTION!!!!!!!!!!!!!!!!
images = Gs.run(latents, labels)
# Convert images to PIL-compatible format.
images = np.clip(np.rint((images + 1.0) / 2.0 * 255.0), 0.0, 255.0).astype(np.uint8) # [-1,1] => [0,255]
images = images.transpose(0, 2, 3, 1) # NCHW => NHWC
# Save images as PNG.
for idx in range(images.shape[0]):
PIL.Image.fromarray(images[idx], 'RGB').save('img%d.png' % idx)
However I have had the same power loss issue when running a different ML implementation which used Caffee. So at the moment I am at a loss as to what the core issue might be. Any ideas of what else I could test would be greatly appreciated.
System specs
-Windows 7
-2x - Intel Xeon CPU X5680 3.33GHz
-2x - Nvidia Quadro M6000 GPUs
-24gb Memory
-1250W power supply
-Miniconda 3 w/ python version 3.6.4
-CUDA version 9.0
-CUDNN version 7
-Tensorflow_gpu version 1.7

Related

How to convert a Neural Network model to a standalone function source code

I have to run source code in an environment where I do not have access to third-party libraries and I would like to be able to use a Neural Network model for predictions in that environment. I cannot run compiled code, it has to be source code.
I'd like to train my Neural Network using a popular library like Keras, Pytorch, Tensorflow etc... and convert that model to a source code function that can run in an environment that doesn't have access to that library. So the generated code can't be a wrapper around calls to the library. It needs to have everything it needs inside it to be able to run the Neural Network without external dependencies. And it has to be source code, not compiled code.
While researching this I realised that most libraries will have APIs to save the model to different kinds of serialized format but no way to generate code that could be run on its own.
How would I go about doing that?
Just to be extremely clear here is an example of the kind of code I would like to generate from a neural network model:
function predict(input) {
// Here is all the code that contains the topology and weights of the network
// along with the code needed to load it up into memory and exercise the network,
// using no external libraries.
return output;
}
So what I'm looking for is a library, or a technique that would allow me to do:
var neuralNetworkSourceCode = myNeuralNetworkModel.toSourceCode();
// neuralNetworkSourceCode is a string containing the source code described in the previous example
It could save the source code to a file instead of just producing a string, makes no difference to me. At this point I also don't care about the language it is producing source code in, but ideally it would be in one of these: c, c++, c#, java, python, javascript, go or rust.
Is there a library that does this? If not, how should I go about implementing this functionality.
Something similar was asked a while back: Convert Keras model to C++ or Convert Keras model to C, and those threads have some advice that may still be relevant. Neither mention keras2c (paper, code), which provides a k2c(model, ...) function after setting the library up.
Calling k2c on the model created from the Simple MNIST convnet example like this produces a .csv files with the weights and some code to set up the predictions:
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from keras2c import k2c
(x_train, y_train), (_, _) = keras.datasets.mnist.load_data()
x_train = np.expand_dims(x_train.astype("float32") / 255, -1)
y_train = keras.utils.to_categorical(y_train, 10)
model = keras.Sequential([
keras.Input(shape=(28, 28, 1)),
layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(),
layers.Dropout(0.5),
layers.Dense(10, activation="softmax")])
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=128, epochs=15, validation_split=0.1)
k2c(model, "MNIST", malloc=True, num_tests=10, verbose=True)
Compile the contents of include/ with make, then:
gcc -std=c99 -I./include/ -o predict_mnist MNIST.c MNIST_test_suite.c -L./include/ -l:libkeras2c.a -lm
And running the executable shows a quick benchmark:
$ ./predict_mnist
Average time over 10 tests: 6.059000e-04 s
Max absolute error for 10 tests: 3.576279e-07

Detectron2 - Same Code&Data // Different platforms // highly divergent results

I use different hardware to benchmark multiple possibilites. The Code runs in a jupyter Notebook.
When i evaluate the different losses i get highly divergent results.
I also checked the full .cfg with cfg.dump() - it is completely consistent.
Detectron2 Parameters:
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/retinanet_R_101_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("dataset_train",)
cfg.DATASETS.TEST = ("dataset_test",)
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/retinanet_R_101_FPN_3x.yaml") # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025 # 0.00125 pick a good LR
cfg.SOLVER.MAX_ITER = 1200 # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = [] # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 512 # faster, and good enough for this toy dataset (default: 512)
#cfg.MODEL.ROI_HEADS.NUM_CLASSES = 25 # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
cfg.MODEL.RETINANET.NUM_CLASSES = 3
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.
cfg.OUTPUT_DIR = "/content/drive/MyDrive/Colab_Notebooks/testrun/output"
cfg.TEST.EVAL_PERIOD = 25
cfg.SEED=5
1. Environment: Azure
Microsoft Azure - Machine Learning
STANDARD_NC6
Torch: 1.9.0+cu111
Results:
Training Log: Log Azure
2. Environment: Colab
GoogleColab free
Torch: 1.9.0+cu111
Results:
Training Log: Log Colab
EDIT:
3. Environment: Ubuntu
Ubuntu 22.04
RTX 3080
Torch: 1.9.0+cu111
Results:
Training Log: https://pastebin.com/PwXMz4hY
New dataset
Issue is not reproducible with a larger dataset:

Whats the right way of using openCV with openVINO?

Dislcaimer: I have never used openCV or openVINO or for the fact anything even close to ML before. However I've been slamming my head studying neural-networks(reading material online) because I've to work with intel's openVINO on an edge device.
Here's what the official documentation says about using openCV with openVINO(using openVINO's inference engine with openCV).
->Optimize the pretrained model with openVINO's model optimizer(creating the IR file pair)
use these IR files with
openCV's dnn.readnet() //this is where the inference engine gets set?
https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_raspbian.html
Tried digging more and found a third party reference. Here a difference approach is taken.
->Intermediatte files (bin/xml are not created. Instead caffe model file is used)
->the inference engine is defined explicitly with the following line
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_INFERENCE_ENGINE)
https://www.learnopencv.com/using-openvino-with-opencv/
Now I know to utilize openCV we have to use it's inference engine with pretrained models. I want to know which of the two approach is the correct(or preferred) one, and if rather I'm missing out no something.
You can get started using OpenVino from: https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_windows.html
You would require a set of pre-requsites to run your sample. OpenCV is your Computer Vision package which can used for Image processing.
Openvino inference requires you to convert any of your trained models(.caffemodel,.pb,etc.) to Intermediate representations(.xml,.bin) files.
For a better understanding and sample demos on OpenVino, watch the videos/subscribe to the OpenVino Youtube channel: https://www.youtube.com/channel/UCkN8KINLvP1rMkL4trkNgTg
If the topology that you are using is supported by OpenVino,the best way to use is the opencv that comes with openvino. For that you need to
1.Initialize the openvino environment by running the setupvars.bat in your openvino path(C:\Program Files (x86)\IntelSWTools\openvino\bin)
2.Generate the IR file (xml&bin)for your model using model optimizer.
3.Run using inference engine samples in the path /inference_engine_samples_build/
If the topology is not supported, then you can go for the other procedure that you mentioned.
The most common issues I ran into:
setupvars.bat must be run within the same terminal, or use os.environ["varname"] = varvalue
OpenCV needs to be built with support for the inference engines (ie DLDT). There are pre-built binaries here: https://github.com/opencv/opencv/wiki/Intel%27s-Deep-Learning-Inference-Engine-backend
Target inference engine: net.setPreferableBackend(cv2.dnn.DNN_BACKEND_INFERENCE_ENGINE)
Target NCS2: net.setPreferableTarget(cv2.dnn.DNN_TARGET_MYRIAD)
The OpenCV pre-built binary located in the OpenVino directory already has IE support and is also an option.
Note that the Neural Compute Stick 2 AKA NCS2 (OpenVino IE/VPU/MYRIAD) requires FP16 model formats (float16). Also try to keep you image in this format to avoid conversion penalties. You can input images as any of these formats though: FP32, FP16, U8
I found this guide helpful: https://learnopencv.com/using-openvino-with-opencv/
Here's an example targetting the NCS2 from https://medium.com/sclable/intel-openvino-with-opencv-f5ad03363a38:
# Load the model.
net = cv2.dnn.readNet(ARCH_FPATH, MODEL_FPATH)
# Specify target device.
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_INFERENCE_ENGINE)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_MYRIAD) # NCS 2
# Read an image.
print("Processing input image...")
img = cv2.imread(IMG_FPATH)
if img is None:
raise Exception(f'Image not found here: {IMG_FPATH}')
# Prepare input blob and perform inference
blob = cv2.dnn.blobFromImage(img, size=(672, 384), ddepth=cv2.CV_8U)
net.setInput(blob)
out = net.forward()
# Draw detected faces
for detect in out.reshape(-1, 7):
conf = float(detect[2])
xmin = int(detect[3] * frame.shape[1])
ymin = int(detect[4] * frame.shape[0])
xmax = int(detect[5] * frame.shape[1])
ymax = int(detect[6] * frame.shape[0])
if conf > CONF_THRESH:
cv2.rectangle(img, (xmin, ymin), (xmax, ymax), color=(0, 255, 0))
There are more samples here (jupyter notebook/python): https://github.com/sclable/openvino_opencv

Does sklearn clustering output differs due to machine?

I am using sklearn AffinityPropagation clustering algorithm . The output of the clustering algorithm on my 4 core machine is different than what is getting generated on a typical server machine. Can someone suggest any method so that I can get similar output on both the systems.
I am using similar feature vector on both the machine.
Output on my machine is cluster0:[1,2,3],cluster1:[4,5,6] but on server its cluster0:[1,2] cluster1:[3,4],cluster2:[5]
from keras.applications.xception import Xception
from keras.preprocessing import image
from keras.applications.xception import preprocess_input
from keras.models import Model
from sklearn.cluster import AffinityPropagation
import cv2
import glob
base_model = Xception(weights = model_path)
base_model=Model(inputs=base_model.input,outputs=base_model.get_layer('avg_pool').output)
files = glob.glob("*.jpg")
image_vector = []
for f in files:
image = cv2.imread(f)
temp_vector = base_model.predict(image)
image_vector.append(temp_vector)
import numpy as np
image_vector = np.asarray(image_vector)
clustering = AffinityPropagation()
clustering.fit(image_vector)
Packages :-
scikit-learn 0.20.3
sklearn 0.0
tensorflow 1.12.0
keras 2.2.4
opencv-python
Machine 1 :- 4 core 8GB RAM
Machine 2 :- 7 Core 16GB RAM
Results on different machines can be different when running algorithms that are not deterministic.
I suggest that you fix the random seed of numpy and the random seed of Python if you want to be able to reproduce results across machines for such algorithms.
Python random seed can be fixed by using: random.seed(42) (or any other integer)
Numpy random seed can be fixed with: np.random.seed(12345) (or any other integer)
sklearn and Keras use numpy random number generator so the second option by itself could solve your issue.
This answer assumes that all libraries versions are the same on both systems.

CUDA_ERROR_OUT_OF_MEMORY: How to activate multiple GPUs from Keras in Tensorflow

I am running a large model on tensorflow using Keras and toward the end of the training the jupyter notebook kernel stops and in the command line I have the following error:
2017-08-07 12:18:57.819952: E tensorflow/stream_executor/cuda/cuda_driver.cc:955] failed to alloc 34359738368 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
This I guess is simple enough - I am running out of memory. I have 4 NVIDIA 1080ti GPUs. I know that TF uses only one unless specified. Therefore, I have 2 questions:
Is there a good working example of how to utilise all GPUs in Keras
In Keras, it seems it is possible to change gpu_options.allow_growth=True, but I cannot see exactly how to do this (I understand this is being a help-vampire, but I am completely new to DL on GPUs)
see CUDA_ERROR_OUT_OF_MEMORY in tensorflow
See this Official Keras Blog
Try this:
import keras.backend as K
config = K.tf.ConfigProto()
config.gpu_options.allow_growth = True
session = K.tf.Session(config=config)

Resources