I am trying to cross-validate my model using scikit-learn's cross_val_score.
I tried using multiple cores by setting n_jobs=-1 but it resulted in
OSError: [Errno 28] No space left on device
Code given below results in error:
cross_val_score(mod1, train_feats1, target, cv=5, scoring=make_scorer(accuracy_score), n_jobs=-1)
whereas:
cross_val_score(mod1, train_feats1, target, cv=5, scoring=make_scorer(accuracy_score), n_jobs=1)
works perfectly fine.
Is there something I'm doing wrong?
As far I can read Kaggle allows up to 4 CPUs for parallel computations.
Here's the link: https://www.kaggle.com/product-feedback/39790
How can I parallelize my cross-validation process, using all four CPUs?
I overcame this problem by setting JOBLIB_TEMP_FOLDER variable using following code in Python notebook.
%env JOBLIB_TEMP_FOLDER=/tmp
Hope that helps!
Related
I have created a new tflite model based on MobilenetV2. It works well without quantization using CPU on iOS. I should say that TensorFlow team did a great job, many thanks.
Unfortunately there is a problem with latency. I use iPhone5s to test my model, so I have the following results for CPU:
500ms for MobilenetV2 with 224*224 input image.
250-300ms for MobilenetV2 with 160*160 input image.
I used the following pod 'TensorFlowLite', '~> 1.13.1'
It's not enough, so I have read TF documentation related to optimization (post trainig quantization). I suppose I need to use Float16 or UInt8 quantization and GPU Delegate (see https://www.tensorflow.org/lite/performance/post_training_quantization).
I used Tensorflow v2.1.0 to train and quantize my models.
Float16 quantization of weights (I used MobilenetV2 model after Float16 quantization)
https://github.com/tensorflow/examples/tree/master/lite/examples/image_segmentation/ios
pod 'TensorFlowLiteSwift', '0.0.1-nightly'
No errors, but model doesn’t work
pod 'TensorFlowLiteSwift', '2.1.0'
2020-05-01 21:36:13.578369+0300 TFL Segmentation[6367:330410] Initialized TensorFlow Lite runtime.
2020-05-01 21:36:20.877393+0300 TFL Segmentation[6367:330397] Execution of the command buffer was aborted due to an error during execution. Caused GPU Hang Error (IOAF code 3)
Full integer quantization of weights and activations
pod ‘TensorFlowLiteGpuExperimental’
Code sample: https://github.com/makeml-app/MakeML-Nails/tree/master/Segmentation%20Nails
I used a MobilenetV2 model after uint8 quantization.
GpuDelegateOptions options;
options.allow_precision_loss = true;
options.wait_type = GpuDelegateOptions::WaitType::kActive;
//delegate = NewGpuDelegate(nullptr);
delegate = NewGpuDelegate(&options);
if (interpreter->ModifyGraphWithDelegate(delegate) != kTfLiteOk)
Segmentation Live[6411:331887] [DYMTLInitPlatform] platform initialization successful
Loaded model 1resolved reporterDidn't find op for builtin opcode 'PAD' version '2'
Is it possible to use MObilenetV2 quantized model on IOS somehow? Hopefully I did some mistake :) and it's possible.
Best regards,
Dmitriy
This is a link to GITHUB issue with answers: https://github.com/tensorflow/tensorflow/issues/39101
sorry for outdated documentation - the GPU delegate should be included in the TensorFlowLiteSwift 2.1.0. However, looks like you're using C API, so depending on TensorFlowLiteC would be sufficient.
MobileNetV2 do work with TFLite runtime in iOS, and if I recall correctly it doesn't have PAD op. Can you attach your model file? With the information provided it's a bit hard to see what's causing the error. As a sanity check, you can get quant/non-quant version of MobileNetV2 from here: https://www.tensorflow.org/lite/guide/hosted_models
For int8 quantized model - afaik GPU delegate only works for FP32 and (possibly) FP16 inputs.
I'm an experienced developer, new to Machine Learning. I'm experimenting with Keras/TensorFlow, starting with the mnist_mlp.py example. I installed Keras and TensorFlow using pip on a Mac.
In order to understand the inner workings better, instead of running the file ('python mnist_mlp.py'), I'm cutting and pasting the file contents into a Python (2.7.12) interactive window.
Everything runs fine and I get the 98.4% test accuracy as noted in the comments of that file.
What I want to do next is to feed it novel input and use model.predict() to see how it performs. I create 28x28 images in GIMP and bring them into my Python session (being careful to convert from 4-channel, 8-bit RGBA images to a linear single-channel floating-point array).
When I feed this into the model, I get what look like strange results to me. Some images are correctly categorized while others are wildly off.
They look like perfectly reasonable numbers to me, and they match the MNIST set examples pretty closely. When I extract the array back out and look at it it looks OK, so it doesn't seem to be a flipping or flopping issue. When I feed MNIST images in the same way, they appear to work correctly.
I'm not sure what's going on here. Is it a case of overfitting? Why is the validation data set the same as the test set?
Test images and python code with instructions can be found here:
https://s3.amazonaws.com/stackoverflow-47799896/StackOverflow_47799896.zip
Thanks.
EDIT: I tried the same test with the convnet example (mnist_cnn.py) and got slightly better results but still similar errors. If anyone wants to try that, they can use the same functions in the readme.py file but make these changes:
import numpy as np
x = np.ndarray((1,28,28,1), dtype='float32')
def l (s):
with open(s, 'rb') as fd:
_ = fd.read(1)
for i in xrange(28):
for j in xrange(28):
v = ord(fd.read(1))
x[0][i][j][0] = v / 255.0
_ = fd.read(3)
EDIT 2: Interestingly, if I replace the first 19 items in the training data set (out of 60,000) with my images in the MLP case, I get at or near perfect prediction of all my images after training. Does this suggest overfitting?
I am running a large model on tensorflow using Keras and toward the end of the training the jupyter notebook kernel stops and in the command line I have the following error:
2017-08-07 12:18:57.819952: E tensorflow/stream_executor/cuda/cuda_driver.cc:955] failed to alloc 34359738368 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
This I guess is simple enough - I am running out of memory. I have 4 NVIDIA 1080ti GPUs. I know that TF uses only one unless specified. Therefore, I have 2 questions:
Is there a good working example of how to utilise all GPUs in Keras
In Keras, it seems it is possible to change gpu_options.allow_growth=True, but I cannot see exactly how to do this (I understand this is being a help-vampire, but I am completely new to DL on GPUs)
see CUDA_ERROR_OUT_OF_MEMORY in tensorflow
See this Official Keras Blog
Try this:
import keras.backend as K
config = K.tf.ConfigProto()
config.gpu_options.allow_growth = True
session = K.tf.Session(config=config)
I'm trying to learn various ways to monitor tensorflow weight tensor.
I know we can watch these variable tensors through Session.run(), tf.Print(), tf.py_func()and tools like tensorboard, tdb, tfdbg
(https://wookayin.github.io/tensorflow-talk-debugging/#1)
But is it impossible to use IDE(like Pycharm) for this?
I tried by myself, and couldn't find some places to set a breakpoint.
Please tell me if you succeed tensor debugging using IDE. Thank you!
I debug TF daily in Pycharm. I set breakpoints as usual in the left column of the editor next to the line numbers. You can also hover over the tensors (see the yellow tooltip) to see their shape and summarized contents.
Does anyone know any function for plotting the obtained measures in Caffe? I would like to plot train loss, test loss, and accuracy, train moving average and etc. in one plot. Is there any function except Caffe built-in function that is available online?
Edited:
First, I ran parse_log.py file (the following command):
$python /path/to/caffe/tools/extra/parse_log.py /logfile_path/logfile.log /output_dir
Two files are created based on the log file (lofile.log.train and logfile.log.test). After that,I ran plot_training_log.py file. It has options like:
0: Test accuracy vs. Iters
1: Test accuracy vs. Seconds
2: Test loss vs. Iters
3: Test loss vs. Seconds
4: Train learning rate vs. Iters
5: Train learning rate vs. Seconds
6: Train loss vs. Iters
7: Train loss vs. Seconds
Whenever, I chose option 3, it is showing the following graph:
and by choosing option 0 :
However, whenever, I want to plot train-loss figure, it is giving error:
$python /path/to/caffe/tools/extra/plot_training_log.py.example 6 /output_dir/train_loss_cnn1.png ./logfile.log
Traceback (most recent call last):
File "/home/ss/caffe-master/tools/extra/plot_training_log.py.example", line 191, in <module>
plot_chart(chart_type, path_to_png, path_to_logs)
File "/home/ss/caffe-master/tools/extra/plot_training_log.py.example", line 117, in plot_chart
data = load_data(data_file, x, y)
File "/home/ss/caffe-master/tools/extra/plot_training_log.py.example", line 88, in load_data
data[1].append(float(fields[field_idx1].strip()))
ValueError: invalid literal for float(): 0.522037s/50
My question can be folded into three parts:
Are the plots correct? Is the network behaving well?
From which point this error stem from? I have the following columns in logfile.log.train (#Iters|Seconds |TrainingLoss |LearningRate).
How can I show all chart types in one plot? I tried to include them by comma, like 0,2,3,6, however, it is showing error.
Many thanks in advance.
Take a look at parse_log.py found in $CAFFE_ROOT/tools/extra.
This python utility helps parsing and distilling information from caffe running log.
start training your model by executing the command below:
/home/ubuntu/caffe/build/tools/caffe train --solver /home/ubuntu/yourpath/solver.prototxt 2>&1 | tee /home/ubuntu/yourpath/model_train.log
The training logs will be stored under yourpath/model_train.log.
I haven't looked at caffe's built in plot scripts, but I use the script from here. This only plots your train/test loss, but you can add moving average calculation.
Consider also installing DIGITS, that provides a real-time plot showing all that kind of stuff.