How to use DNN OPENCL on AMD GPU? - opencv

On Windows 10, I want to use GPU as DNN backend to save CPU power. It works for Intel GPU, but there is problem on AMD GPU.
After I use setPreferableTarget(DNN_TARGET_OPENCL), the speed become very slow (much slower than DNN_TARGET_CPU).
I checked with task manager and saw all calculation is on CPU actually.
The following is some log.
[ INFO:0] global D:\work\opencv\opencv\modules\core\src\ocl.cpp (891)
cv::ocl::haveOpenCL Initialize OpenCL runtime...
OpenCV(ocl4dnn): consider to specify kernel configuration cache
directory via OPENCV_OCL4DNN_CONFIG_PATH parameter.
[ INFO:0] global D:\work\opencv\opencv\modules\core\src\ocl.cpp (433)
cv::ocl::OpenCLBinaryCacheConfigurator::OpenCLBinaryCacheConfigurator
Successfully initialized OpenCL cache directory:
C:\Users\wangq\AppData\Local\Temp\opencv\4.3\opencl_cache\
[ INFO:0] global D:\work\opencv\opencv\modules\core\src\ocl.cpp (457)
cv::ocl::OpenCLBinaryCacheConfigurator::prepareCacheDirectoryForContext
Preparing OpenCL cache configuration for context:
Advanced_Micro_Devices__Inc_--Baffin--2906_10
OpenCL program build log: dnn/dummy
Status -66: CL_INVALID_COMPILER_OPTIONS
-cl-no-subgroup-ifp -D AMD_DEVICE
Any help would be appreciated.

Related

Quick3D fails with software render backend

When I run my Qt project inside a docker container with the environment variable Qt_quick_backend=software (possibly related to https://bugreports.qt.io/browse/QTBUG-102634), the application runs but does not display any QtQuick 3D object. If I remove the Qt_quick_backend environment variable, the application gives a OpenGL segmentation fault error. When the application runs outside the docker container, i.e., locally, it works.
I'm using Qt 6.2.3, C++, my OS is Ubuntu and the docker container is run with the following options: "xhost +local:",
"--net host",
"-e DISPLAY",
"-e QT_QUICK_BACKEND=software",
"-v /tmp/.X11-unix:/tmp/.X11-unix"
This is the OpenGL segmentation fault output:
qt.scenegraph.general: Using QRhi with backend OpenGL
Graphics API debug/validation layers: 0
QRhi profiling and debug markers: 0
Shader/pipeline cache collection: 0
qt.scenegraph.general: threaded render loop
qt.scenegraph.general: Using sg animation driver
qt.scenegraph.general: Animation Driver: using vsync: 16.67 ms
I20230212 20:19:57.865648 23643 workerthread.cpp:202] runLoop Time: 0
I20230212 20:19:57.865960 23643 workerthread.cpp:80] run: Loop running
qt.scenegraph.general: Using sg animation driver
qt.scenegraph.general: Animation Driver: using vsync: 16.67 ms
qt.rhi.general: Created OpenGL context QSurfaceFormat(version 4.1, options QFlags<QSurfaceFormat::FormatOption>(DeprecatedFunctions), depthBufferSize 24, redBufferSize 8, greenBufferSize 8, blueBufferSize 8, alphaBufferSize 0, stencilBufferSize 8, samples -1, swapBehavior QSurfaceFormat::DoubleBuffer, swapInterval 1, colorSpace QColorSpace(), profile QSurfaceFormat::CompatibilityProfile)
qt.rhi.general: OpenGL VENDOR: VMware, Inc. RENDERER: SVGA3D; build: RELEASE; LLVM; VERSION: 4.1 (Compatibility Profile) Mesa 21.2.6
qt.scenegraph.general: MSAA sample count for the swapchain is 1. Alpha channel requested = no.
qt.scenegraph.general: rhi texture atlas dimensions: 2048x2048
Segmentation fault (core dumped)
I'm expecting my Qt Project to display correctly the QtQuick 3D Object (e.g. View3D QML Type) inside a docker container.

xen linux + baremetal cortex A53

I am trying to run Linux (buildroot) + baremetal app on Xen.
Xen boots great and buildroot is ok on the first CPU.
My problem is the baremetal application.
The baremetal app is generated by Vitis (my A53 is in the FPGA).
This is the simplest example application : just an hello world and exit.
I am able to boot directly on this app using the jtag but when I try to run the baremetal app in Xen the app just hand and doesn't print anything.
My configurations :
FILE : xen.cfg
name = "test"
kernel = "/opt/baremetal_app.bin"
memory = 8
vcpus = 1
cpus = [1]
iomem = [ "0xff010,1" ] # UART1
Xen command line :
console=dtuart dtuart=serial0 dom0_mem=1G bootscrub=0 maxcpus=1 timer_slop=0
dom0 command line :
console=hvc0 earlycon=xen earlyprintk=xen maxcpus=1 clk_ignore_unused root=/dev/mmcblk0p2
Vitis is configured to include the hypervisor_guest the in binary.
I already tried to change the psu_ddr_0_MEM_0 to 0x40000000, len 100000.
When I run the command "xl create xen.cfg", the system print "Parsing config from xen.cfg" and returns. I see my app test in "xl list".
The uart1 doesn't print anything and the "xl console test" neither.
With "xl top" my app test seems to use 100% of the cpu.
Any idea ?

NVIDIA Jetson Nano with Realsense 435i via Isaac - Camera not found

I posted about this over on the Isaac forums, but listing it here for visibility as well. I am trying to get the Isaac Realsense examples working on a Jetson Nano with my 435i (firmware downgraded to 5.11.15 per the Isaac documentation), but I've been unable to so far. I've got a Nano flashed with Jetpack4.3 and have installed all dependencies on both the desktop and the Nano. The realsense-viewer works fine, so I know the camera is functioning properly and is being detected by the Nano. However, when I run ./apps/samples/realsense_camera/realsense_camera it throws an error:
ERROR engine/alice/components/Codelet.cpp#229: Component 'camera/realsense' of type 'isaac::RealsenseCamera' reported FAILURE:
No device connected, please connect a RealSense device
ERROR engine/alice/backend/event_manager.cpp#42: Stopping node 'camera' because it reached status 'FAILURE'
I've attached the log of this output as well. I get the same error running locally on my desktop, but that's running through WSL so I was willing to write that off. Any suggestions would be greatly appreciated!
0m2020-06-15 17:18:20.620 INFO engine/alice/tools/websight.cpp#166: Loading websight...0m
33m2020-06-15 17:18:20.621 WARN engine/alice/backend/application_json_loader.cpp#174: This application does not have an explicit scheduler configuration. One will be autogenerated to the best of the system's abilities if possible.0m
0m2020-06-15 17:18:20.622 INFO engine/alice/backend/redis_backend.cpp#40: Successfully connected to Redis server.
0m
33m2020-06-15 17:18:20.623 WARN engine/alice/backend/backend.cpp#201: This application does not have an execution group configuration. One will be autogenerated to the best of the systems abilities if possible.0m
33m2020-06-15 17:18:20.623 WARN engine/gems/scheduler/scheduler.cpp#337: No default execution groups specified. Attempting to create scheduler configuration for 4 remaining cores. This may be non optimal for the system and application.0m
0m2020-06-15 17:18:20.623 INFO engine/gems/scheduler/scheduler.cpp#290: Scheduler execution groups are:0m
0m2020-06-15 17:18:20.623 INFO engine/gems/scheduler/scheduler.cpp#299: __BlockerGroup__: Cores = [3], Workers = No0m
0m2020-06-15 17:18:20.623 INFO engine/gems/scheduler/scheduler.cpp#299: __WorkerGroup__: Cores = [0, 1, 2], Workers = Yes0m
0m2020-06-15 17:18:20.660 INFO engine/alice/backend/modules.cpp#226: Loaded module 'packages/realsense/librealsense_module.so': Now has 45 components total0m
0m2020-06-15 17:18:20.679 INFO engine/alice/backend/modules.cpp#226: Loaded module 'packages/rgbd_processing/librgbd_processing_module.so': Now has 51 components total0m
0m2020-06-15 17:18:20.696 INFO engine/alice/backend/modules.cpp#226: Loaded module 'packages/sight/libsight_module.so': Now has 54 components total0m
0m2020-06-15 17:18:20.720 INFO engine/alice/backend/modules.cpp#226: Loaded module 'packages/viewers/libviewers_module.so': Now has 83 components total0m
90m2020-06-15 17:18:20.720 DEBUG engine/alice/application.cpp#348: Loaded 83 components: isaac::RealsenseCamera, isaac::alice::BufferAllocatorReport, isaac::alice::ChannelMonitor, isaac::alice::CheckJetsonPerformanceModel, isaac::alice::CheckOperatingSystem, isaac::alice::Config, isaac::alice::ConfigBridge, isaac::alice::ConfigLoader, isaac::alice::Failsafe, isaac::alice::FailsafeHeartbeat, isaac::alice::InteractiveMarkersBridge, isaac::alice::JsonToProto, isaac::alice::LifecycleReport, isaac::alice::MessageLedger, isaac::alice::MessagePassingReport, isaac::alice::NodeStatistics, isaac::alice::Pose, isaac::alice::Pose2Comparer, isaac::alice::PoseFromFile, isaac::alice::PoseInitializer, isaac::alice::PoseMessageInjector, isaac::alice::PoseToFile, isaac::alice::PoseToMessage, isaac::alice::PoseTree, isaac::alice::PoseTreeJsonBridge, isaac::alice::PoseTreeRelink, isaac::alice::ProtoToJson, isaac::alice::PyCodelet, isaac::alice::Random, isaac::alice::Recorder, isaac::alice::RecorderBridge, isaac::alice::Replay, isaac::alice::ReplayBridge, isaac::alice::Scheduling, isaac::alice::Sight, isaac::alice::SightChannelStatus, isaac::alice::Subgraph, isaac::alice::Subprocess, isaac::alice::TcpPublisher, isaac::alice::TcpSubscriber, isaac::alice::Throttle, isaac::alice::TimeOffset, isaac::alice::TimeSynchronizer, isaac::alice::UdpPublisher, isaac::alice::UdpSubscriber, isaac::map::Map, isaac::map::ObstacleAtlas, isaac::map::OccupancyGridMapLayer, isaac::map::PolygonMapLayer, isaac::map::WaypointMapLayer, isaac::navigation::DistanceMap, isaac::navigation::NavigationMap, isaac::navigation::RangeScanModelClassic, isaac::navigation::RangeScanModelFlatloc, isaac::rgbd_processing::DepthEdges, isaac::rgbd_processing::DepthImageFlattening, isaac::rgbd_processing::DepthImageToPointCloud, isaac::rgbd_processing::DepthNormals, isaac::rgbd_processing::DepthPoints, isaac::rgbd_processing::FreespaceFromDepth, isaac::sight::AliceSight, isaac::sight::SightWidget, isaac::sight::WebsightServer, isaac::viewers::BinaryMapViewer, isaac::viewers::ColorCameraViewer, isaac::viewers::DepthCameraViewer, isaac::viewers::Detections3Viewer, isaac::viewers::DetectionsViewer, isaac::viewers::FiducialsViewer, isaac::viewers::FlatscanViewer, isaac::viewers::GoalViewer, isaac::viewers::ImageKeypointViewer, isaac::viewers::LidarViewer, isaac::viewers::MosaicViewer, isaac::viewers::ObjectViewer, isaac::viewers::OccupancyMapViewer, isaac::viewers::PointCloudViewer, isaac::viewers::PoseTrailViewer, isaac::viewers::SegmentationCameraViewer, isaac::viewers::SegmentationViewer, isaac::viewers::SkeletonViewer, isaac::viewers::TensorViewer, isaac::viewers::TrajectoryListViewer, 0m
33m2020-06-15 17:18:20.723 WARN engine/alice/application.cpp#164: The function Application::findComponentByName is deprecated. Please use `getNodeComponentOrNull` instead. Note that the new method requires a node name instead of a component name. (argument: 'websight/isaac.sight.AliceSight')0m
0m2020-06-15 17:18:20.723 INFO engine/alice/application.cpp#255: Starting application 'realsense_camera' (instance UUID: 'e24992d0-af66-11ea-8bcf-c957460c567e') ...0m
90m2020-06-15 17:18:20.723 DEBUG engine/gems/scheduler/execution_groups.cpp#476: Launching 0 pre-start job(s)0m
90m2020-06-15 17:18:20.723 DEBUG engine/gems/scheduler/execution_groups.cpp#485: Replaying 0 pre-start event(s)0m
90m2020-06-15 17:18:20.723 DEBUG engine/gems/scheduler/execution_groups.cpp#476: Launching 0 pre-start job(s)0m
90m2020-06-15 17:18:20.723 DEBUG engine/gems/scheduler/execution_groups.cpp#485: Replaying 0 pre-start event(s)0m
0m2020-06-15 17:18:20.723 INFO engine/alice/backend/asio_backend.cpp#33: Starting ASIO service0m
0m2020-06-15 17:18:20.727 INFO packages/sight/WebsightServer.cpp#216: Sight webserver is loaded0m
0m2020-06-15 17:18:20.727 INFO packages/sight/WebsightServer.cpp#217: Please open Chrome Browser and navigate to http://<ip address>:30000m
33m2020-06-15 17:18:20.727 WARN engine/alice/backend/codelet_canister.cpp#225: Codelet 'websight/isaac.sight.AliceSight' was not added to scheduler because no tick method is specified.0m
33m2020-06-15 17:18:20.728 WARN engine/alice/components/Codelet.cpp#53: Function deprecated. Set tick_period to the desired tick paramater0m
33m2020-06-15 17:18:20.728 WARN engine/alice/backend/codelet_canister.cpp#225: Codelet '_check_operating_system/isaac.alice.CheckOperatingSystem' was not added to scheduler because no tick method is specified.0m
33m2020-06-15 17:18:20.728 WARN engine/alice/components/Codelet.cpp#53: Function deprecated. Set tick_period to the desired tick paramater0m
33m2020-06-15 17:18:20.730 WARN engine/alice/components/Codelet.cpp#53: Function deprecated. Set tick_period to the desired tick paramater0m
1;31m2020-06-15 17:18:20.741 ERROR engine/alice/components/Codelet.cpp#229: Component 'camera/realsense' of type 'isaac::RealsenseCamera' reported FAILURE:
No device connected, please connect a RealSense device
0m
1;31m2020-06-15 17:18:20.741 ERROR engine/alice/backend/event_manager.cpp#42: Stopping node 'camera' because it reached status 'FAILURE'0m
33m2020-06-15 17:18:20.743 WARN engine/alice/backend/codelet_canister.cpp#225: Codelet 'camera/realsense' was not added to scheduler because no tick method is specified.0m
0m2020-06-15 17:18:21.278 INFO packages/sight/WebsightServer.cpp#113: Server connected / 10m
0m2020-06-15 17:18:30.723 INFO engine/alice/backend/allocator_backend.cpp#57: Optimized memory CPU allocator.0m
0m2020-06-15 17:18:30.724 INFO engine/alice/backend/allocator_backend.cpp#66: Optimized memory CUDA allocator.0m

caffe powered and GPU enabled Microsoft Azure VM

I'm trying to build a VM for model training in Azure. I found this Data Science Virtual Machine for Linux (Ubuntu) VM which seems to be a suitable candidate.
Unfortunately, when I spun up the VM and installed the caffe prerequisites I wasn't able to run the tests. I'm getting the following error on make runtest (make all and make test were completed without errors):
NVIDIA: no NVIDIA devices found
Cuda number of devices: 0
Setting to use device 0
Current device id: 0
Current device name:
Note: Randomizing tests' orders with a seed of 97204 .
[==========] Running 2041 tests from 267 test cases.
[----------] Global test environment set-up.
[----------] 11 tests from AdaDeltaSolverTest/3, where TypeParam = caffe::GPUDevice<double>
[ RUN ] AdaDeltaSolverTest/3.TestAdaDeltaLeastSquaresUpdateWithHalfMomentum
NVIDIA: no NVIDIA devices found
E0715 02:24:32.097311 59355 common.cpp:114] Cannot create Cublas handle. Cublas won't be available.
NVIDIA: no NVIDIA devices found
E0715 02:24:32.103780 59355 common.cpp:121] Cannot create Curand generator. Curand won't be available.
F0715 02:24:32.103914 59355 test_gradient_based_solver.cpp:80] Check failed: error == cudaSuccess (30 vs. 0) unknown error
*** Check failure stack trace: ***
# 0x7f77a463f5cd google::LogMessage::Fail()
# 0x7f77a4641433 google::LogMessage::SendToLog()
# 0x7f77a463f15b google::LogMessage::Flush()
# 0x7f77a4641e1e google::LogMessageFatal::~LogMessageFatal()
# 0x7115e3 caffe::GradientBasedSolverTest<>::TestLeastSquaresUpdate()
# 0x7122af caffe::AdaDeltaSolverTest_TestAdaDeltaLeastSquaresUpdateWithHalfMomentum_Test<>::TestBody()
# 0x8e6023 testing::internal::HandleExceptionsInMethodIfSupported<>()
# 0x8df63a testing::Test::Run()
# 0x8df788 testing::TestInfo::Run()
# 0x8df865 testing::TestCase::Run()
# 0x8e0b3f testing::internal::UnitTestImpl::RunAllTests()
# 0x8e0e63 testing::UnitTest::Run()
# 0x466ecd main
# 0x7f77a111c830 __libc_start_main
# 0x46e589 _start
# (nil) (unknown)
Makefile:532: recipe for target 'runtest' failed
make: *** [runtest] Aborted (core dumped)
Is it possible to spin up a virtual machine in Azure suitable for GPU enabled machine learning using caffe?
All the details about the VM here
The Data Science Virtual Machine (DSVM) for Ubuntu already has Caffe installed in /opt/caffe. To use it on a GPU, create a VM with a K80 GPU by choosing the one of the NC sizes. (Be sure to choose HDD as the storage type, or the NC sizes will not appear.) Caffe will then be available out of the box.
Also note that PyCaffe is available. At a terminal:
source activate root
And python will then have PyCaffe available.

Grunt tasks are slow in yeoman application

I have an angular project build with yeoman, talking to a rails api backend.
Everything works fine, except that grunt tasks are very slow.
When I run grunt server --verbose:
Execution Time (2014-01-15 13:37:55 UTC)
loading tasks 14.3s ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 26%
server 1ms 0%
preprocess:multifile 11ms 0%
clean:server 13ms 0%
concurrent:server 34.3s ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 63%
autoprefixer 1ms 0%
autoprefixer:dist 369ms ▇ 1%
connect:livereload 17ms 0%
watch 5.8s ▇▇▇▇▇▇▇▇▇ 11%
Total 54.8s
Some of my Gruntfile:
'use strict';
module.exports = function (grunt) {
require('time-grunt')(grunt);
require('load-grunt-tasks')(grunt);
require('time-grunt')(grunt);
grunt.initConfig({
...
});
grunt.loadNpmTasks('grunt-preprocess');
grunt.registerTask('server', function (target) {
if (target === 'dist') {
return grunt.task.run(['build', 'connect:dist:keepalive']);
}
grunt.task.run([
'preprocess:multifile',
'clean:server',
'concurrent:server',
'autoprefixer',
'connect:livereload',
'watch'
]);
});
grunt.registerTask('test', [
'clean:server',
'concurrent:test',
'autoprefixer',
'connect:test'
//'karma'
]);
grunt.registerTask('build', [
'preprocess:multifile',
'clean:dist',
'useminPrepare',
'concurrent:dist',
'autoprefixer',
'concat',
'copy:dist',
'cdnify',
'ngmin',
'cssmin',
'uglify',
'rev',
'usemin'
]);
grunt.registerTask('default', [
'jshint',
'test',
'build'
]);
};
Size of project:
vagrant#vm ~code/myapp/app/scripts
$> find -name "*.js" | xargs cat | wc -l
10209
I am running on MacOS 10.8 with i7 processor, 16GB ram, SSD... It is normal that is takes so long ? What makes the grunt task (and especially "loading tasks") so slow ?
Note: I am ssh'd inside a vagrant machine and running the grunt commands from there. If I run the grunt command on my native system, it's much faster (loading tasks takes 1.6s instead of 14.3).
So the shared filesystem might be an issue. But why...
I had exactly same problem with Vagrant and Yeomans angular-generator. After running grunt serve it took almost 30 seconds to compile sass, restart server etc.
I already used NFS but it was still slow. Then I tried jit-grunt, just-in-time grunt loader. I replaced load-grunt-tasks with jit-grunt and everything is now a lot faster.
Here's a good article about JIT-Grunt:
https://medium.com/written-in-code/ced193c2900b
I am using grunt inside a Vagrant virtualbox. (ubuntu 12.04). My native files are on my host machine (OSx). Because the grunt tasks are io-intensive, and they are run through file sharing, this make them quite slow.
This can be improved by adding nfs to Vagrant (http://docs.vagrantup.com/v2/synced-folders/nfs.html). This will make Vagrant share files with nfs instead of the default Vagrant file sharing. It will be a bit faster, but not much.
For comparison, on my machine:
For running the subtask loading grunt tasks
natively: 1.2s
with nfs: 4s
vagrant file sharing: 16s
If only a specific task is taking a lot of time, this specific task might be the problem. To troubleshoot, use time-grunt: https://npmjs.org/package/time-grunt.
I've had issues as well, and have found:
nospawn: true
To be the fastest option. I went from ~20s to ~1s to concat, minify, and uglify JS.
I had the same problem with Yeoman's ngbp generator and Vagrant. Even with nfs, a simple change on a template took about 30s to be seen in the browser.
Using jit-grunt caused time to be reduced to 10s. After using spawn:false, even though there was no reduction at the first load, changes took less than 1s (0.086s) to propagate to the browser! (Yes!)
The changes I've made to Gruntfile.js:
I commented all the grunt.loadNpmTasks but grunt.loadNpmTasks('grunt-contrib-watch') [That's because of a task rename ngbp does later on];
I added require('jit-grunt')(grunt); after grunt.loadNpmTasks('grunt-contrib-watch');
I added spawn: false to delta: { options: {livereload: true, spawn: false} ...}.

Resources