Missing nvcc compiler - theano - environment-variables

I use ubuntu 14.04 and cuda 7.5. I get cuda version information using $ nvcc --version :
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
$PATH and $LD_LIBRARY_PATH are below :
$ echo $PATH
/usr/local/cuda-7.5/bin:/usr/local/cuda-7.5/bin/:/opt/ros/indigo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
$ echo $LD_LIBRARY_PATH
/usr/local/cuda-7.5/lib64
I install theano. I use it with cpu but not gpu. This guide says that
Testing Theano with GPU¶
To see if your GPU is being used, cut and paste the following program into a file and run it.
from theano import function, config, shared, sandbox import
> theano.tensor as T import numpy import time
>
> vlen = 10 * 30 * 768 # 10 x #cores x # threads per core iters = 1000
>
> rng = numpy.random.RandomState(22) x =
> shared(numpy.asarray(rng.rand(vlen), config.floatX)) f = function([],
> T.exp(x)) print(f.maker.fgraph.toposort()) t0 = time.time() for i in
> range(iters):
> r = f() t1 = time.time() print("Looping %d times took %f seconds" % (iters, t1 - t0)) print("Result is %s" % (r,)) if
> numpy.any([isinstance(x.op, T.Elemwise) for x in
> f.maker.fgraph.toposort()]):
> print('Used the cpu') else:
> print('Used the gpu') The program just computes the exp() of a bunch of random numbers. Note that we use the shared function to make
> sure that the input x is stored on the graphics device.
If I run this program (in check1.py) with device=cpu, my computer
takes a little over 3 seconds, whereas on the GPU it takes just over
0.64 seconds. The GPU will not always produce the exact same floating-point numbers as the CPU. As a benchmark, a loop that calls
numpy.exp(x.get_value()) takes about 46 seconds.
$ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python
check1.py [Elemwise{exp,no_inplace}()]
Looping 1000 times took 3.06635117531 seconds Result is [ 1.23178029
1.61879337 1.52278066 ..., 2.20771813 2.29967761
1.62323284] Used the cpu
$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python
check1.py Using gpu device 0: GeForce GTX 580
[GpuElemwise{exp,no_inplace}(),
HostFromGpu(GpuElemwise{exp,no_inplace}.0)] Looping 1000 times took
0.638810873032 seconds Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
1.62323296] Used the gpu Note that GPU operations in Theano require for now floatX to be float32 (see also below).
I run gpu version command without sudo, it throws permission denied error :
/theano/gof/cmodule.py", line 741, in refresh
files = os.listdir(root)
OSError: [Errno 13] Permission denied: '/home/user/.theano/compiledir_Linux-3.16--generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/tmp077r7U'
If I use it with sudo, the compiler cannot find nvcc path.
ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.
How can I fix this error?

Try running
chown -R user /home/user/.theano
chmod -R 775 /home/user/.theano
this will change the permissions of the folder that your python script can't access. The first one will make the folder belong to your user and the second one will change the permissions to be readable, writable and executable by the user.

Regarding this error only:
You can check where your NVCC is installed , default path is '/usr/local/cuda/bin', if you could see it there then do as below:
$ export PATH="/usr/local/cuda/bin:$PATH"
$ source .bashrc
This worked for me and now I can use NVCC and it is no longer missing.

Related

Pulling ROS2 packages in via bazel WORKSPACE mech in drake-ros_*

Question related to Drake-ros's #142 - I am unable to pull in the rviz_2d_overlay_plugins in drake_ros_viz via the bazel WORKSPACE mech. I have tried both focal and jammy rolling/humble distributions. I see the following error -
ERROR: no such package '#ros2//': Failed to setup #ros2 repository: './run.bash /home/arrowhead/.cache/bazel/_bazel_arrowhead/90cc0dfb7c9efd65bbcc1ee9dd934f8e/external/bazel_ros2_rules/ros2/scrape_distribution.py -i geometry_msgs -i rclcpp -i rclpy -i tf2_ros -i tf2_ros_py -i visualization_msgs -i rviz_2d_overlay_plugins -o distro_metadata.json' exited with 1
--- captured stderr ---
Traceback (most recent call last):
File "/home/arrowhead/.cache/bazel/_bazel_arrowhead/90cc0dfb7c9efd65bbcc1ee9dd934f8e/external/bazel_ros2_rules/ros2/scrape_distribution.py", line 54, in <module>
main()
File "/home/arrowhead/.cache/bazel/_bazel_arrowhead/90cc0dfb7c9efd65bbcc1ee9dd934f8e/external/bazel_ros2_rules/ros2/scrape_distribution.py", line 46, in main
distro = scrape_distribution(
File "/home/arrowhead/.cache/bazel/_bazel_arrowhead/90cc0dfb7c9efd65bbcc1ee9dd934f8e/external/ros2/resources/ros2bzl/scraping/__init__.py", line 90, in scrape_distribution
packages, dependency_graph = build_dependency_graph(
File "/home/arrowhead/.cache/bazel/_bazel_arrowhead/90cc0dfb7c9efd65bbcc1ee9dd934f8e/external/ros2/resources/ros2bzl/scraping/__init__.py", line 50, in build_dependency_graph
raise RuntimeError(msg)
RuntimeError: Cannont find package 'rviz_2d_overlay_plugins'
I see the package on ROS Index here and my relevant WORKSPACE Bazel bit is given below.
ros2_archive(
name = "ros2",
include_packages = [
"geometry_msgs",
"rclcpp",
"rclpy",
"tf2_ros",
"tf2_ros_py",
"visualization_msgs",
"rviz_2d_overlay_plugins"
],
sha256_url = "https://build.ros2.org/view/Hci/job/Hci__nightly-cyclonedds_ubuntu_jammy_amd64/lastSuccessfulBuild/artifact/ros2-humble-linux-jammy-amd64-ci-CHECKSUM", # noqa
strip_prefix = "ros2-linux",
url = "https://build.ros2.org/view/Hci/job/Hci__nightly-cyclonedds_ubuntu_jammy_amd64/lastSuccessfulBuild/artifact/ros2-humble-linux-jammy-amd64-ci.tar.bz2", # noqa
)
Primary Suggestion: You may want to consider installing ROS Humble directly onto your Jammy OS, and then use ros2_local_repository rather than ros2_archive:
https://github.com/RobotLocomotion/drake-ros/blob/06dc6aa7d0237f4edb74130bcce8e9e78689d50c/bazel_ros2_rules/ros2/defs.bzl#L221-L239
Then you should be able to use a locally installed ROS 2 version, and install the package you want, either to /opt/ros or in a workspace that you're chaining.
FTR, in Anzu at TRI, we use ros2_archive presently for the following reasons (some of which are my opinions):
We wanted to use ROS 2 Humble on Focal, possibly mixing multiple different versions along different checkouts.
I want things to update only when I (we) precisely want. (For high confidence reproducibility, also to denoise apt upgrade)
For reproduciblity, I want us to maintain the ability to have decent reproducibility without aggressive containerization.
This means you should consolidate your development solely onto Jammy.
If you are already containerizing, then that should hopefully not be too large of a jump.
Alternative: If you really want to maintain archives, is to build your own extension archive including the packages you want. Here's an example kinda-hack that I had shimmed into Anzu:
https://github.com/EricCousineau-TRI/repro/tree/4079be209bfbc282d4580b5b2979faa148e1f394/ros/drake_ros_example_layered_archive

parallel: Error: Command line too long (68914 >= 65524) at input 0

Given a file with long lines, parallel fails to pass these lines as an argument to any command:
$> cat johny_long_lines.txt | parallel echo {}
parallel: Error: Command line too long (68906 >= 65524) at input 0: 2236439425|\x308286873082856fa003020102020c221ff03...
This gets more confusing when I see that the line is 68900 characters long:
$> cat johny_long_lines.txt | head -n 1 | wc -m
68900
while the max line length allowed by parallel is way more longer that my input:
$> parallel --max-line-length-allowed
131049
Also: if you think that it's a problem of execve, this might interest you:
$> getconf ARG_MAX
2097152
Any idea what I'm doing here wrong?
UPDATE
I figured out that the problem persists for versions 20161222 and 20220522 but not for 20210822 (delivered with Ubuntu 22.04 LTS). Further inspection reveals that this line causes the problem:
# Usable len = maxlen - 3000 for wrapping, div 2 for hexing
int(($Global::minimal_command_line_length - 3000)/2);
Which I can confirm using --show-limits:
$> parallel --show-limits
[...]
Maximal size of command: 131063
Maximal usable size of command: 64031
This annoying feature does not exist in version 20210822 and I my file goes through as expected.
Can this be disabled?
I got the message
parallel --show-limits
Maximal size of command: 131049
Maximal used size of command: 83906
and had some trouble finding the source for the 83906
but then found it in
~/.parallel/tmp/sshlogin/$(hostname)/linelen
and had no clue how it was set to this small value.
But the Version of parallel was quite old:
GNU parallel 20180922

What is the correct address for loading spiffsimg file

I have used spiffsimg to create a single file containing multiple lua files:
# ./spiffsimg -f lua.img -c 262144 -r lua.script
f 4227 init.lua
f 413 cfg.lua
f 2233 setupWifi.lua
f 7498 configServer.lua
f 558 cfgForm.htm
f 4255 setupConfig.lua
f 14192 main.lua
#
I then use esptool.py to flash the NodeMCU firmware and the file containing the lua files to the esp8266 (NodeMCU dev kit):
c:\esptool-master>c:\Python27\python esptool.py -p COM7 write_flash -fs 32m -fm dio 0x00000 nodemcu-dev-9-modules-2016-07-18-12-06-36-integer.bin 0x78000 lua.img
esptool.py v1.0.2-dev
Connecting...
Running Cesanta flasher stub...
Flash params set to 0x0240
Writing 446464 # 0x0... 446464 (100 %)
Wrote 446464 bytes at 0x0 in 38.9 seconds (91.9 kbit/s)...
Writing 262144 # 0x78000... 262144 (100 %)
Wrote 262144 bytes at 0x78000 in 22.8 seconds (91.9 kbit/s)...
Leaving...
I then run ESPLorer to check the status and get:
PORT OPEN 115200
Communication with MCU..Got answer! AutoDetect firmware...
Can't autodetect firmware, because proper answer not received.
NodeMCU custom build by frightanic.com
branch: dev
commit: b21b3e08aad633ccfd5fd29066400a06bb699ae2
SSL: true
modules: file,gpio,http,net,node,rtctime,tmr,uart,wifi
build built on: 2016-07-18 12:05
powered by Lua 5.1.4 on SDK 1.5.4(baaeaebb)
lua: cannot open init.lua
>
----------------------------
No files found.
----------------------------
>
Total : 3455015 bytes
Used : 0 bytes
Remain: 3455015 bytes
The NodeMCU firmware flashed correctly, but the lua files can't be located.
I have tried flashing to other locations (0x84000, 0x7c000), but I am just guessing at these locations based on reading threads on github.
I used the NodeMCU file.fscfg() routine to get the flash address and size. If I only flash the NodeMCU firmware I get the following:
print (file.fscfg())
524288 3653632
534288 is 0x80000, so I tried flashing only the spiffsimg file (lua.img) to 0x8000, then ran the same print statement and got:
print (file.fscfg())
786432 3391488
The flash address incremented by the exact number of bytes in the lua.img - which I don't understand, why would the flash address change? Is the first number returned by file.fscfg not the starting flash address, but the ending flash address?
What is the correct address for flashing an image file, contain lua files, that was created by spiffsimg?
The version of spiffsimg found here will provide the correct address for flashing an image file that contains lua files.
Do not use this version of spiffsimg as it is out of date.
To install the spiffsimg utility, you need to download and install the entire nodemcu-firmware package (into a linux environment, use make to install - note: make on my debian linux box generated an error, but i was able to go to the ../tools/spiffsimg subdirectory and run make on the Makefile found in that directory to create the utility).
The spiffsimg instructions found here are quite clear, with one exception: the file name you specify, with the -f parameter, needs to include the characters %x. The %x will be replaced with the address that the image file should be flashed to.
For example, the command
spiffsimage -f %x-luaFiles.img -S 4MB -U 465783 -r lua.script
will create a file, in the local directory, with a name like: 80000-luaFiles.img. Which means you should install that image file at address 0x80000 on the ESP8266.
I've never done that myself but I'm reasonably confident the correct answer can be extracted from the docs.
-f specifies the filename for the disk image. '%x' will be replaced
by the calculated offset of the file system.
And a bit further down
The disk image file is placed into the bin directory and it is named
0x<offset>-<size>.bin where the offset is the location where it
should be flashed, and the size is the size of the flash part.
However, there's a slight mismatch between the two statements. We may have a bug in the docs. If "'%x' will be replaced..." then I'd expected the final name won't contain 0x anymore.
Furthermore, it is possible to define a fixed SPIFFS position when you build the firmware.
#define SPIFFS_FIXED_LOCATION 0x100000
This specifies that the SPIFFS filesystem starts at 1Mb from the start of the flash. Unless
otherwise specified, it will run to the end of the flash (excluding
the 16k of space reserved by the SDK).

Is this a bug of spark stream or memory leak?

I submit my code to a spark stand alone cluster. Submit command is like below:
nohup ./bin/spark-submit \
--master spark://ES01:7077 \
--executor-memory 4G \
--num-executors 1 \
--total-executor-cores 1 \
--conf "spark.storage.memoryFraction=0.2" \
./myCode.py 1>a.log 2>b.log &
I specify the executor use 4G memory in above command. But use the top command to monitor the executor process, I notice the memory usage keeps growing. Now the top Command output is below:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12578 root 20 0 20.223g 5.790g 23856 S 61.5 37.3 20:49.36 java
My total memory is 16G so 37.3% is already bigger than the 4GB I specified. And it is still growing.
Use the ps command , you can know it is the executor process.
[root#ES01 ~]# ps -awx | grep spark | grep java
10409 ? Sl 1:43 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -Xms4G -Xmx4G -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip ES01 --port 7077 --webui-port 8080
10603 ? Sl 6:16 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -Xms4G -Xmx4G -XX:MaxPermSize=256m org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://ES01:7077
12420 ? Sl 10:16 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.SparkSubmit --master spark://ES01:7077 --conf spark.storage.memoryFraction=0.2 --executor-memory 4G --num-executors 1 --total-executor-cores 1 /opt/flowSpark/sparkStream/ForAsk01.py
12578 ? Sl 21:03 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -Xms4096M -Xmx4096M -Dspark.driver.port=52931 -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler#10.79.148.184:52931 --executor-id 0 --hostname 10.79.148.184 --cores 1 --app-id app-20160511080701-0013 --worker-url spark://Worker#10.79.148.184:52660
Below are the code. It is very simple so I do not think there is memory leak
if __name__ == "__main__":
dataDirectory = '/stream/raw'
sc = SparkContext(appName="Netflow")
ssc = StreamingContext(sc, 20)
# Read CSV File
lines = ssc.textFileStream(dataDirectory)
lines.foreachRDD(process)
ssc.start()
ssc.awaitTermination()
The code for process function is below. Please note that I am using HiveContext not SqlContext here. Because SqlContext do not support window function
def getSqlContextInstance(sparkContext):
if ('sqlContextSingletonInstance' not in globals()):
globals()['sqlContextSingletonInstance'] = HiveContext(sparkContext)
return globals()['sqlContextSingletonInstance']
def process(time, rdd):
if rdd.isEmpty():
return sc.emptyRDD()
sqlContext = getSqlContextInstance(rdd.context)
# Convert CSV File to Dataframe
parts = rdd.map(lambda l: l.split(","))
rowRdd = parts.map(lambda p: Row(router=p[0], interface=int(p[1]), flow_direction=p[9], bits=int(p[11])))
dataframe = sqlContext.createDataFrame(rowRdd)
# Get the top 2 interface of each router
dataframe = dataframe.groupBy(['router','interface']).agg(func.sum('bits').alias('bits'))
windowSpec = Window.partitionBy(dataframe['router']).orderBy(dataframe['bits'].desc())
rank = func.dense_rank().over(windowSpec)
ret = dataframe.select(dataframe['router'],dataframe['interface'],dataframe['bits'], rank.alias('rank')).filter("rank<=2")
ret.show()
dataframe.show()
Actually I found below code will cause the problem:
# Get the top 2 interface of each router
dataframe = dataframe.groupBy(['router','interface']).agg(func.sum('bits').alias('bits'))
windowSpec = Window.partitionBy(dataframe['router']).orderBy(dataframe['bits'].desc())
rank = func.dense_rank().over(windowSpec)
ret = dataframe.select(dataframe['router'],dataframe['interface'],dataframe['bits'], rank.alias('rank')).filter("rank<=2")
ret.show()
Because If I remove these 5 line. The code can run all night without showing memory increase. But adding them will cause the memory usage of executor grow to a very high number.
Basically the above code is just some window + grouby in SparkSQL. So is this a bug?
Disclaimer: this answer isn't based on debugging, but more on observations and the documentation Apache Spark provides
I don't believe that this is a bug to begin with!
Looking at your configurations, we can see that you are focusing mostly on the executor tuning, which isn't wrong, but you are forgetting the driver part of the equation.
Looking at the spark cluster overview from Apache Spark documentaion
As you can see, each worker has an executor, however, in your case, the worker node is the same as the driver node! Which frankly is the case when you run locally or on a standalone cluster in a single node.
Further, the driver takes 1G of memory by default unless tuned using spark.driver.memory flag. Furthermore, you should not forget about the heap usage from the JVM itself, and the Web UI that's been taken care of by the driver too AFAIK!
When you delete the lines of code you mentioned, your code is left without actions as map function is just a transformation, hence, there will be no execution, and therefore, you don't see memory increase at all!
Same applies on groupBy as it is just a transformation that will not be executed unless an action is being called which in your case is agg and show further down the stream!
That said, try to minimize your driver memory and the overall number of cores in spark which is defined by spark.cores.max if you want to control the number of cores on this process, then cascade down to the executors. Moreover, I would add spark.python.profile.dump to your list of configuration so you can see a profile for your spark job execution, which can help you more with understanding the case, and to tune your cluster more to your needs.
As I can see in your 5 lines, maybe the groupBy is the issue , would you try with reduceBy, and see how it performs.
See here and here.

portaudio pa_devs report 0 device

My system is ubuntu 15.10. I am very sure my audio works,
arecord -l
**** List of CAPTURE Hardware Devices ****
card 1: PCH [HDA Intel PCH], device 0: ALC887-VD Analog [ALC887-VD Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 1: PCH [HDA Intel PCH], device 2: ALC887-VD Alt Analog [ALC887-VD Alt Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
but pa_devs, which is a official provided execuatble file in portaudio, reports 0 device as below,
PortAudio version number = 1899
PortAudio version text = 'PortAudio V19-devel (built Jan 30 2016 19:22:45)'
Number of devices = 0
And I can get devices number with pyAudio
import pyaudio
pa = pyaudio.PyAudio()
print(pa.get_default_input_device_info())
print(pa.get_device_count())
--- output ---
{'defaultHighInputLatency': 0.034829931972789115, 'maxInputChannels': 32, 'defaultLowOutputLatency': 0.008707482993197279, 'defaultLowInputLatency': 0.008707482993197279, 'defaultSampleRate': 44100.0, 'hostApi': 0, 'structVersion': 2, 'maxOutputChannels': 32, 'defaultHighOutputLatency': 0.034829931972789115, 'name': 'default', 'index': 6}
7
Should I install something or re-built portaudio with some special settings? Thanks!
I ran into this exact same problem. It was because portaudio was built with only support for OSS. You need to build it with ALSA support. Note that even if you specify --with-alsa to the ./configure script it still "succeeds" even if it can't find ALSA - you have to manually check the configuration summary for a line like this:
ALSA ........................ no
(Don't you love autotools?)
Anyway do this:
sudo apt-get install libasound2-dev
./configure
And so on. I unfortunately couldn't find a way to get pa_devs to list what backends it supports, so you just have to guess this is the problem and try it. Worked for me anyway!

Resources