Espresso ANERuntimeEngine Program Inference overflow - ios

I have two CoreML models. One works fine, and the other generates this error message:
[espresso] [Espresso::ANERuntimeEngine::__forward_segment 0] evaluate[RealTime]WithModel returned 0; code=5 err=Error Domain=com.apple.appleneuralengine Code=5 "processRequest:qos:qIndex:error:: 0x3: Program Inference overflow" UserInfo={NSLocalizedDescription=processRequest:qos:qIndex:error:: 0x3: Program Inference overflow}
[espresso] [Espresso::overflow_error] /var/containers/Bundle/Application/E0DE5E08-D2C6-48AF-91B2-B42BA7877E7E/xxx demoapp.app/mpii-hg128.mlmodelc/model.espresso.net:0
Both models are very similar, (Conv2D models). There are generated with the same scripts and versions of PyTorch, ONNX, and onnx-coreml. The model that works has 1036 layers, and the model that generates the error has 599 layers. They both use standard layers - Conv2D, BatchNorm, ReLU, MaxPool, and Upsample (no custom layers and no Functional or Numpy stuff). They both use relatively the same number of features per layer. They follow essentially the same structure, except the erroring model skips a maxpool layer at the start (hence the higher output resolution).
They both take a 256x256 color image as input, and output 16 channels at (working) 64x64 and (erroring) 128x128 pixels.
The app does not crash, but gives garbage results for the erroring model.
Both models train, evaluate, etc. fine in their native formats (PyTorch).
I have no idea what a Code=5 "processRequest:qos:qIndex:error:: 0x3: Program Inference overflow" error is, and google searches are not yielding anything productive, as I gather "Espresso" and "ANERuntimeEngine" are both private Apple Libraries.
What is this error message telling me? How can I fix it?
Can I avoid this error message by not running the model on the bionic chip but on the CPU/GPU?
Any help is appreciated, thanks.

That's a LOT of layers!
Espresso is the C++ library that runs the Core ML models. ANERuntimeEngine is used with the Apple Neural Engine chip.
By passing in an MLModelConfiguration with computeUnits set to .cpuAndGPU when you load the Core ML model, you can tell Core ML to not use the Neural Engine.

Related

Inference on openvino model returns only scores

My task is to perform inference for face detection using Intel Movidius and Raspberry Pi. The error is that the model only returns "Scores" -> (1, 3000, 2) and not "Boxes".
Steps:
On my local machine, I trained several models(mb1-ssd, mb1-ssd-lite, vgg16-ssd) from the repository https://github.com/qfgaohao/pytorch-ssd and converted them to onnx. Then, using open vino model optimizer from openvinotoolkit = 2020.1, I obtained the '.bin', '.xml' files for each model.
Then, using the obtained files, I performed the infference on the Rasberry Pi and hit the mentioned error.
Note: The inference works using pretrained face detection models from model zoo, the only difference I found looking at the .xml files and my .xml files is that the last layer, "Detection output" is missing. However, when I visualize the .xml file using netron, the conversion seems to be correct.
Link to repo: https://github.com/cocacola0/bsc_thesis
OpenVINO™ 2020.3 release is the last OpenVINO™ version that supports Intel® Movidius™ Neural Compute Stick powered by the Intel® Movidius™ Myriad™ 2.
Use ssd_mobilenet_v2_coco and ssdlite_mobilenet_v2, alternative models that are available in Open Model Zoo. Both models are working well with your code.

Catboost's Incremental training with "init_model" fails when not all initial labels are present in new data

catboost python version: 1.0.6
I am training a CatboostClassifier on 10 different output classes, which works fine. Then I'm incrementally training a new classifier using the earlier trained init_model and training on a new training dataset. The catch is that this dataset has only 2 of the original 10 unique labels. Catboost warns me already with: Found only 2 unique classes in the data, but have defined 10 classes. Probably something is wrong with data.
but starts to train fine anyway. Only at the end (I assume when the model gets merged with the original one?) I get the following error message:
Exception has occurred: CatBoostError
CatBoostError: catboost/libs/model/model.cpp:1716: Approx dimensions don't match: 10 != 2
Is it expected behavior that incremental training is not possible on only a subset of the original classes? If yes, then maybe a clearer error message should be given. It would be even better though if the code could handle this case, but maybe I'm overseeing some things that do not allow such functionality.
The similar issue has been posted on github : https://github.com/catboost/catboost/issues/1953

Issue with Custom Object Detector model from GCP Vision

My full tech stack is:
GCP ML Vision.
Exported Model to tflite format (from the same GCP console).
XCode for iOS development
iPhone 11 pro
I am trying to use a Custom Object detector using MLKit and one model trained in GCP AutoML Vision.
I created the model and exported as tflite file, but when trying to do objectDetector processImage:visionImage, I always get the error:
Error Domain=com.google.visionkit.pipeline.error Code=3 "Pipeline failed to fully start:
CalculatorGraph::Run() failed in Run:
Calculator::Open() for node "BoxClassifierCalculator" failed: #vk Unexpected number of dimensions for output index 0:
got 3D, expected either 2D (BxN with B=1) or 4D (BxHxWxN with B=1, W=1, H=1)."
UserInfo={com.google.visionkit.status=<MLKITvk_VNKStatusWrapper: 0x280841270>, NSLocalizedDescription=Pipeline failed to fully start:
CalculatorGraph::Run() failed in Run:
Calculator::Open() for node "BoxClassifierCalculator" failed: #vk Unexpected number of dimensions for output index 0:
got 3D, expected either 2D (BxN with B=1) or 4D (BxHxWxN with B=1, W=1, H=1).}.
I have downloaded the mlkit examples from https://github.com/googlesamples/mlkitand there is something similar in the vision project, (to detect birds) when I try to replace my own tflite file, it breaks in the exact same way as in my own project.
I presume the tflite is created in a very different way as MLVision does.
Any insight? (Sorry if this is so obvious, but I'm pretty new to TensorFlow and MLVision)
Thanks in advance
The issue is exactly as what the error message says: got 3D, expected either 2D (BxN with B=1) or 4D (BxHxWxN with B=1, W=1, H=1). That means your model is not compatible with ML Kit, as its tensor has incorrect dimension. The model compatibility requirements are specified here.

Use .tflite with iOS and GPU

I have created a new tflite model based on MobilenetV2. It works well without quantization using CPU on iOS. I should say that TensorFlow team did a great job, many thanks.
Unfortunately there is a problem with latency. I use iPhone5s to test my model, so I have the following results for CPU:
500ms for MobilenetV2 with 224*224 input image.
250-300ms for MobilenetV2 with 160*160 input image.
I used the following pod 'TensorFlowLite', '~> 1.13.1'
It's not enough, so I have read TF documentation related to optimization (post trainig quantization). I suppose I need to use Float16 or UInt8 quantization and GPU Delegate (see https://www.tensorflow.org/lite/performance/post_training_quantization).
I used Tensorflow v2.1.0 to train and quantize my models.
Float16 quantization of weights (I used MobilenetV2 model after Float16 quantization)
https://github.com/tensorflow/examples/tree/master/lite/examples/image_segmentation/ios
pod 'TensorFlowLiteSwift', '0.0.1-nightly'
No errors, but model doesn’t work
pod 'TensorFlowLiteSwift', '2.1.0'
2020-05-01 21:36:13.578369+0300 TFL Segmentation[6367:330410] Initialized TensorFlow Lite runtime.
2020-05-01 21:36:20.877393+0300 TFL Segmentation[6367:330397] Execution of the command buffer was aborted due to an error during execution. Caused GPU Hang Error (IOAF code 3)
Full integer quantization of weights and activations
pod ‘TensorFlowLiteGpuExperimental’
Code sample: https://github.com/makeml-app/MakeML-Nails/tree/master/Segmentation%20Nails
I used a MobilenetV2 model after uint8 quantization.
GpuDelegateOptions options;
options.allow_precision_loss = true;
options.wait_type = GpuDelegateOptions::WaitType::kActive;
//delegate = NewGpuDelegate(nullptr);
delegate = NewGpuDelegate(&options);
if (interpreter->ModifyGraphWithDelegate(delegate) != kTfLiteOk)
Segmentation Live[6411:331887] [DYMTLInitPlatform] platform initialization successful
Loaded model 1resolved reporterDidn't find op for builtin opcode 'PAD' version '2'
Is it possible to use MObilenetV2 quantized model on IOS somehow? Hopefully I did some mistake :) and it's possible.
Best regards,
Dmitriy
This is a link to GITHUB issue with answers: https://github.com/tensorflow/tensorflow/issues/39101
sorry for outdated documentation - the GPU delegate should be included in the TensorFlowLiteSwift 2.1.0. However, looks like you're using C API, so depending on TensorFlowLiteC would be sufficient.
MobileNetV2 do work with TFLite runtime in iOS, and if I recall correctly it doesn't have PAD op. Can you attach your model file? With the information provided it's a bit hard to see what's causing the error. As a sanity check, you can get quant/non-quant version of MobileNetV2 from here: https://www.tensorflow.org/lite/guide/hosted_models
For int8 quantized model - afaik GPU delegate only works for FP32 and (possibly) FP16 inputs.

The implementation in source code of Backpropagation in TensorFlow (Conv2DBackpropFilter and Conv2DBackpropInput)

Since two operations Conv2DBackpropFilter and Conv2DBackpropInput count most of the time for lots of applications(AlexNet/VGG/GAN/Inception, etc.), I am analyzing the complexity of these two operations (back-propagation) in TensorFlow and I found out that there are three implementation versions (custom, fast and slot) for Conv2DBackpropFilter (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/conv_grad_filter_ops.cc ) and Conv2DBackpropInput (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/conv_grad_input_ops.cc). While I profile, all computations are passed to "custom" version instead of "fast" or "slow" which directly calls Eigen function SpatialConvolutionBackwardInput to do that.
The issue is:
Conv2DBackpropFilter uses Eigen:“TensorMap.contract" to do the tensor contraction and Conv2DBackpropInput uses Eigen:"MatrixMap.transpose" to do the matrix transposition in the Compute() function. Beside these two functions, I didn't see any convolutional operations which are needed for back-propagation theoretically. Beside convolutions, what else would be run inside these two operations for back-propagation? Does anyone know how to analyze the computation complexity of "back propagation" operation in TensorFlow?
I am looking for any advise/suggestion. Thank you!
In addition to the transposition and contraction, the gradient op for the filter and the gradient op for the input must transform their input using Im2Col and Col2Im respectively. Approximately speaking, these transformations enable the convolution operation to be implemented using tensor contraction. For more information, see the CS231n page on Convolutional Networks (specifically, the paragraphs titled "Implementation as Matrix Multiplication" and "Backpropagation").
mrry, I got it. It means that Conv2D, Conv2DBackpropFilter and Conv2DBackpropInput use the same way by using "GEMM" to work for convolution by Im2Col/Col2Im. An other issue is that while I do the profile of GAN in TensorFlow, the execution time of Conv2DBackpropInput and Conv2DBackpropFilter are around 4-6 times slower than Conv2D with the same input size. Why?

Resources