Error when running torch prediction model on GPU

Error when running torch prediction model on GPU - lua

I have been trying to use a specific pretrained machine learning model for captioning pictures. I have been using https://github.com/unnonouno/densecap .
It comes with a Dockerfile setting up a whole cuda/torch/cudnn environement.
Predictions on a new picture are made by running the run_model.lua script. It does work when running it on the CPU by passing -gpu -1 but not when removing the arguement and running it on the GPU. I get the following error in that case:
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-8398/cutorch/lib/THC/THCGeneral.c line=70 error=35 : CUDA driver version is insufficient for CUDA runtime version
/root/torch/install/bin/luajit:
/root/torch/install/share/lua/5.1/trepl/init.lua:389: loop or previous error loading module 'cutorch'
stack traceback:
[C]: in function 'error'
/root/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
./densecap/utils.lua:26: in function 'setup_gpus'
run_model.lua:145: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670
I have tried different things such as reinstalling cudnn by runnign luarocks install cudnn or downgrading from cudnn5 to cudnn4 without any success.

The issue appears to be with your CUDA driver:
CUDA driver version is insufficient for CUDA runtime version
Take a look at similar discussions here.
No need to change your cuDNN version. You just need to rectify your CUDA driver/toolkit compatibility.

Related

OpenCV dnn exception SSD Mobilenetv2

Using transfer learning, I trained SSD MobileNetV2 (ssd_mobilenet_v2_coco.config) model in TensorFlow (tensorflow-gpu==1.15.0). After freezing the graph (.pb) using TensorFlow API Python script (export_inference_graph.py), I created a text graph (.pbtxt) using the Python script provided in OpenCV wiki (tf_text_graph_ssd.py).
I used the Python code snippet from the wiki to test inference, but I am getting the following error:
cv2.error: OpenCV(4.2.0) C:\projects\opencv-python\opencv\modules\dnn\src\dnn.cpp:562: error: (-2:Unspecified error) Can't create layer "FeatureExtractor/MobilenetV2/expanded_conv_2/add" of type "AddV2" in function 'cv::dnn::dnn4_v20191202::LayerData::getLayerInstance'
I am using Windows 10, Python 3.6.8, and OpenCV 4.2.0.32. I have tried downgrading OpenCV, but earlier versions give different errors.
However, in Ubuntu 18.04.4, installing OpenCV from source, I do not get any errors. Does anybody know if this is an incompatible layer in binary wheels of OpenCV for Windows? Should I wait until the next release?

How to send HTTP request in LuaJit of the latest version, what library actually works now? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I need to send HTTP request in LuaJit 2.0.5. I've tried 3 HTTP client libraries so far and none of them worked. I couldn't install the latest one due to an error during installation.
My version of Lua is 5.3
Is there a client HTTP library for LuaJit 2.0.5 that works for sure? One I'll be able both to install, and to use.
LuaSocket -- doesn't work after installation:
luasocket 3.0rc1-2 is now installed in /usr (license: MIT)
And
$ luajit
LuaJIT 2.0.5 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/
JIT: ON CMOV SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse
> a1 = require "socket"
error loading module 'socket.core' from file '/usr/lib/lua/5.3/socket/core.so':
/usr/lib/lua/5.3/socket/core.so: undefined symbol: luaL_setfuncs
stack traceback:
[C]: at 0x5617fff23fb0
[C]: in function 'require'
/usr/share/lua/5.3/socket.lua:12: in main chunk
[C]: in function 'require'
stdin:1: in main chunk
[C]: at 0x5617ffed4c00

LuaSocket works on 5.1, 5.2 and 5.3, both Windows and Linux
LuaHTTP works on 5.1, 5.2 and 5.3, but only supports Linux
Luvit specifically uses LuaJIT, but you get a separate binary
A quick google search also found this, but i have no idea if it works.
The problem you're having with LuaSocket is an odd one.
You're not using Lua 5.3, you're using LuaJIT, which is for the most part a rewrite of Lua 5.1; but it's looking for the socket package in a 5.3 directory.
This is most likely because you have the LUA_CPATH set to the Lua 5.3 include path. This should never be used on systems with more than one installed Lua system, because all Lua versions use that environment variable, and it even shadows the version-specific LUA_CPATH_X_Y variables in newer versions.
Find out the exact values of the environment variables LUA_CPATH
Ideally, this variable should be unset.
If 3. is not the case (which it won't), find out where the variable is set. Common suspects are: project configuration files, your .bashrc and your .profile files (located in your home directory).
Fix these variable. Lua 5.3 also uses the variable LUA_CPATH_5_3, so you can just change it to that if you don't want to break things.
A quick note: I did not mention LUA_PATH and LUA_PATH_5_3, because your problem is with a C module; but most likely, you have the same problem with those variables too, so you should follow the same steps for them as well.

Error loading module when using lua

I'm new to lua and recently learning DL with Torch.
I have installed torch just following instructions: http://torch.ch/docs/getting-started.html#_ and added some packages using luarocks install. Then I wrote a test file:
require 'torch'
require 'nn'
--[[do something]]
when running with lua test.lua (Ubuntu 14.04), it errs as followed:
error loading module 'libpaths' from file
'/home/user1/torch/install/lib/lua/5.1/libpaths.so':
/home/user1/torch/install/lib/lua/5.1/libpaths.so: undefined symbol:
luaL_register
It seems something wrong with path settings or so. However, when I run test with command th, it works fine.
I searched and examined these answers: Error loading module (Lua)
Torch7 Lua, error loading module 'libpaths' (Linux)
not fully answered my question though.
So I wonder where exactly the error comes from, and how to fix it. Even though I can use torch with th.
ADD:
I find that the reason maybe API luaL_register is not supported in ver 5.2 which is what I am using, while th calls a lua shell in ver 5.1? So does this mean I can only use th to run my files?

You are likely using your system Lua (probably version 5.2), but Torch requires LuaJIT it comes with. Run your script as luajit test.lua (it's probably in /home/user1/torch/install/bin/luajit).

Lua version in ZeroBraneStudio vs Torch

I am using ZeroBrane Studio as IDE to code deep learning. I have realized that the models I save when programming in the IDE (using Lua 5.1 as interpreter) do not load well when executing the same loading from Torch7. The same happens when learning from torch (./th code.lua) and then trying to load them inside the IDE. I get something like:
/opt/zbstudio/bin/linux/x64/lua: /home/dg/torch/install/share/lua/5.1/torch/File.lua:294: unknown object
Does anybody know how to check the lua version that torch is using? Any idea on how to workaround this?
Thanks!
update: It seems that I am indeed using the same Lua version (5.1) in both Torch and ZeroBrane. I still get different behaviour (one successful and the other crashing) when passing through torch.load().

To check the version of Lua that anything is running, you would usually print _VERSION. It's a global variable that stores the version of Lua (unless you overwrite it, of course).
print(_VERSION)
If this isn't available for some reason, they might state their version on their site (?)

Most command line tools on Linux understand the -v command line switch (for "version"). So do Lua and LuaJIT.
To figure out which interpreter is running a particular script, you can scan the arg table for the smallest (usually negative) index:
local exe, i = arg[ 0 ], -1
while arg[ i ] do
exe, i = arg[ i ], i-1
end
print( exe )
Or (on Linux) you can look into the /proc file system while your script is running:
ls -l /proc/4425/exe
(substitute 4425 with real process ID).
Judging from the error message the interpreter used in ZeroBrane Studio seems to be /opt/zbstudio/bin/linux/x64/lua in your case.

#siffiejoe: thanks for posing your question regarding versions, it gave me the correct directions to explore.
/opt/zbstudio/bin/linux/x64/lua version is LuaJIT 2.0.2
"lua" command alone points to /usr/bin/lua, and it is Lua 5.1.5
~/torch/install/share/lua/5.1 seemed to contain Lua 5.1
~/torch/install/bin/luajit is 2.1.0-alpha
So after realizing that terminal "th" is using LuaJit 2.1.0 all I had to do is create a user.lua in ZeroBrane and add the line "path.lua = "~/torch/install/bin/luajit". Now ZB is using the same luajit interpreter as th.
Thanks all for your suggestions.

Four different outcomes when overflowing main stack

Out of curiosity, I was playing with overflowing the the stack with this code:
fn main() {
let my_array: [i32; 3000000000] = [3; 3000000000];
println!("{}", my_array[0]);
}
And to my surprise I ended with three different outcomes:
1) This is what I expected:
thread '<main>' has overflowed its stack
Illegal instruction (core dumped)
2) Surprisingly vague:
Illegal instruction (core dumped)
3) Totally puzzling:
208333333
In order for stochastic nature to show up I had to restart the shell, otherwise results were deterministic ( I would get the same error message over and over).
I compiled with just:
rustc my_file.rs
and excuted with:
./my_file
My rustc version:
rustc 1.0.0 (a59de37e9 2015-05-13) (built 2015-05-14)
My ubuntu version:
Distributor ID: Ubuntu
Description: Ubuntu 14.04 LTS
Release: 14.04
Codename: trusty
Also the size of the array I am trying to create is 12 gigs, I am on a tiny laptop that does not have that amount of RAM.
Any ideas what could be going on here?
Edit:
I was playing with the size of array (which I think might be the reason for different errors, but why?), and got one more:
4) Makes perfect sense.
error: the type `[i32; 300000000000000]` is too big for the current architecture
and my system architecture is x86_64.

It seems that above randomness is related to my machine.
I checked the same code on another machine, that has the same rustc version, ubuntu version and the same architecture. And my results a much more predictable:
If size of the array 536870871 or greater (without getting to case 4) I get:
Illegal instruction (core dumped)
If size of array is 536870870 or smaller (without being small enough to actually work) I get:
thread '<main>' has overflowed its stack
Illegal instruction (core dumped)
Not a single time have I gotten a case 3) where I had garbage returned.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Error when running torch prediction model on GPU - lua

The issue appears to be with your CUDA driver: CUDA driver version is insufficient for CUDA runtime version Take a look at similar discussions here. No need to change your cuDNN version. You just need to rectify your CUDA driver/toolkit compatibility.

Related

OpenCV dnn exception SSD Mobilenetv2

How to send HTTP request in LuaJit of the latest version, what library actually works now? [closed]

Error loading module when using lua

Lua version in ZeroBraneStudio vs Torch

Four different outcomes when overflowing main stack

Categories

Resources