I have enabled the GPU option in accelerator tab but when I try to train my model GPU is not processing my data instead only CPU is processing.
You need to specifically use the packages like https://rapids.ai/ to move data from CPU to GPU. I am assuming you using something like pandas to do data operations.
Seems like a problem in your training code where you might not be properly sending model and input to device and thus ending up only using your CPU. Check out these Kaggle examples on how to use:
TPUs: https://www.kaggle.com/docs/tpu#tpu9
GPUs: https://www.kaggle.com/rj1993/kernel-with-gpu-example
Related
I'm learning SDR by trying to decode different 433Mhz keyfob signals.
My initial flow for capture and pre-processing looks like this:
What I get on the sink is:
I guess the bits are visible fine and I could proceed to decode it somehow. But I am worried about the big difference in the amplitude: the very beginning of the burst has much higher amplitude in comparison to the rest of the packet. This picture is very consistent (I was not able to get bursts with more balanced amplitudes). If I was speaking about music recording I would look for a compression method. But I don't know what the equivalent in the SDR world is.
I'm not sure if this will be a problem when I'll try to quadrature demod, binary slice and/or clock recover.
Is this a known problem and what is the approach to eliminate it within GNU Radio Companion?
I have created a new tflite model based on MobilenetV2. It works well without quantization using CPU on iOS. I should say that TensorFlow team did a great job, many thanks.
Unfortunately there is a problem with latency. I use iPhone5s to test my model, so I have the following results for CPU:
500ms for MobilenetV2 with 224*224 input image.
250-300ms for MobilenetV2 with 160*160 input image.
I used the following pod 'TensorFlowLite', '~> 1.13.1'
It's not enough, so I have read TF documentation related to optimization (post trainig quantization). I suppose I need to use Float16 or UInt8 quantization and GPU Delegate (see https://www.tensorflow.org/lite/performance/post_training_quantization).
I used Tensorflow v2.1.0 to train and quantize my models.
Float16 quantization of weights (I used MobilenetV2 model after Float16 quantization)
https://github.com/tensorflow/examples/tree/master/lite/examples/image_segmentation/ios
pod 'TensorFlowLiteSwift', '0.0.1-nightly'
No errors, but model doesn’t work
pod 'TensorFlowLiteSwift', '2.1.0'
2020-05-01 21:36:13.578369+0300 TFL Segmentation[6367:330410] Initialized TensorFlow Lite runtime.
2020-05-01 21:36:20.877393+0300 TFL Segmentation[6367:330397] Execution of the command buffer was aborted due to an error during execution. Caused GPU Hang Error (IOAF code 3)
Full integer quantization of weights and activations
pod ‘TensorFlowLiteGpuExperimental’
Code sample: https://github.com/makeml-app/MakeML-Nails/tree/master/Segmentation%20Nails
I used a MobilenetV2 model after uint8 quantization.
GpuDelegateOptions options;
options.allow_precision_loss = true;
options.wait_type = GpuDelegateOptions::WaitType::kActive;
//delegate = NewGpuDelegate(nullptr);
delegate = NewGpuDelegate(&options);
if (interpreter->ModifyGraphWithDelegate(delegate) != kTfLiteOk)
Segmentation Live[6411:331887] [DYMTLInitPlatform] platform initialization successful
Loaded model 1resolved reporterDidn't find op for builtin opcode 'PAD' version '2'
Is it possible to use MObilenetV2 quantized model on IOS somehow? Hopefully I did some mistake :) and it's possible.
Best regards,
Dmitriy
This is a link to GITHUB issue with answers: https://github.com/tensorflow/tensorflow/issues/39101
sorry for outdated documentation - the GPU delegate should be included in the TensorFlowLiteSwift 2.1.0. However, looks like you're using C API, so depending on TensorFlowLiteC would be sufficient.
MobileNetV2 do work with TFLite runtime in iOS, and if I recall correctly it doesn't have PAD op. Can you attach your model file? With the information provided it's a bit hard to see what's causing the error. As a sanity check, you can get quant/non-quant version of MobileNetV2 from here: https://www.tensorflow.org/lite/guide/hosted_models
For int8 quantized model - afaik GPU delegate only works for FP32 and (possibly) FP16 inputs.
I'm training a LinearSVC model and I want to get the training error of it. Is it possible to get it w/o evaluating it manually?
Thanks
sklearn is using liblinear for this task.
You can take a quick glance into the sources here:
self.coef_, self.intercept_, self.n_iter_ = _fit_liblinear(
X, y, self.C, self.fit_intercept, self.intercept_scaling,
self.class_weight, self.penalty, self.dual, self.verbose,
self.max_iter, self.tol, self.random_state, self.multi_class,
self.loss, sample_weight=sample_weight)
which shows that only coefficients, intercepts and number of iterations are processed by sklearn's python-API. Whatever else is available in liblinear's output is not grabbed. You can't directly read out the training-error without changing the internal code.
There might be a possible hack turning on verbose-mode, redirect the output and parse additional info available there. But this assumes the info you look for is available there and it's also hacky and i won't recommend it.
Just use the score-method. It won't be too costly compared to fitting.
Since two operations Conv2DBackpropFilter and Conv2DBackpropInput count most of the time for lots of applications(AlexNet/VGG/GAN/Inception, etc.), I am analyzing the complexity of these two operations (back-propagation) in TensorFlow and I found out that there are three implementation versions (custom, fast and slot) for Conv2DBackpropFilter (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/conv_grad_filter_ops.cc ) and Conv2DBackpropInput (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/conv_grad_input_ops.cc). While I profile, all computations are passed to "custom" version instead of "fast" or "slow" which directly calls Eigen function SpatialConvolutionBackwardInput to do that.
The issue is:
Conv2DBackpropFilter uses Eigen:“TensorMap.contract" to do the tensor contraction and Conv2DBackpropInput uses Eigen:"MatrixMap.transpose" to do the matrix transposition in the Compute() function. Beside these two functions, I didn't see any convolutional operations which are needed for back-propagation theoretically. Beside convolutions, what else would be run inside these two operations for back-propagation? Does anyone know how to analyze the computation complexity of "back propagation" operation in TensorFlow?
I am looking for any advise/suggestion. Thank you!
In addition to the transposition and contraction, the gradient op for the filter and the gradient op for the input must transform their input using Im2Col and Col2Im respectively. Approximately speaking, these transformations enable the convolution operation to be implemented using tensor contraction. For more information, see the CS231n page on Convolutional Networks (specifically, the paragraphs titled "Implementation as Matrix Multiplication" and "Backpropagation").
mrry, I got it. It means that Conv2D, Conv2DBackpropFilter and Conv2DBackpropInput use the same way by using "GEMM" to work for convolution by Im2Col/Col2Im. An other issue is that while I do the profile of GAN in TensorFlow, the execution time of Conv2DBackpropInput and Conv2DBackpropFilter are around 4-6 times slower than Conv2D with the same input size. Why?
Is there a straightforward way to find the GPU memory consumed by, say, an inception-resnet-v2 model that is initialized in tensorflow? This includes the inference and the backprop memories required.
You can explicitly calculate the memory needed to store parameters, but I am afraid it would be difficult to compute the size of all buffers needed for training. Probably, a more clever way would be to make TF do it for you. Set the gpu_options.allow_growth config option to True and see how much does it consume. Another option is to try smaller values for gpu_options.per_process_gpu_memory_fraction until it fails with out of memory.
Since using gpu.options.allow_growth and gpu_options.per_process_gpu_memory_fraction for model size estimation is currently a trial-and-error and tedious solution, I suggest using tf.RunMetadata() in combination with tensorboard.
Example:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
summary, _ = sess.run(train_step, feed_dict, options=run_options, run_metadata=run_metadata)
train_writer.add_run_metadata(run_metadata, 'step%d' % i)
Run your model and tensorboard, navigate to the desired part of your graph and read the node statistics.
Source: https://www.tensorflow.org/get_started/graph_viz