I have a Dockerfile which installs PyTorch library from the source code.
Here is the snippet from Dockerfile which performs the installation from source code of pytorch
RUN cd /tmp/ \
&& git clone https://github.com/pytorch/pytorch.git \
&& cd pytorch \
&& git submodule sync && git submodule update --init --recursive \
&& sudo TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0 7.5 8.0" python3 setup.py install
I don't have proper understanding of what's happening here and would appreciate some input from the community:
Why does PyTorch need different way of installation for different CUDA versions?
What is the role of TORCH_CUDA_ARCH_LIST in this context?
If my machine has multiple CUDA setups, does that mean I will have multiple PyTorch versions (specific to each CUDA setup) installed in my Docker container?
If my machine has none of the mentioned CUDA setups ("6.0 6.1 7.0 7.5 8.0"), will the PyTorch installation fail?
From the Nvidia compiler documentation at https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list :
nvcc tag
GPU Architecture
sm_50, sm_52 and sm_53
Maxwell support
sm_60, sm_61, and sm_62
Pascal support
sm_70 and sm_72
Volta support
sm_75
Turing support
sm_80, sm_86 and sm_87
Ampere support
sm_89
Ada support
sm_90, sm_90a
Hopper support
From the above you can garner that sm_50 is 5.0 and so on...
Related
Is Python3.9 supported?
I got this error with Python3.9:
File "/home/drake/drake/drake-build/install/lib/python3.6/site-packages/pydrake/common/__init__.py", line 8, in <module> from ._module_py import * ModuleNotFoundError: No module named 'pydrake.common._module_py'
There is no "python3.9" folder in .../install/lib.
I am running Ubuntu 18, and I am building Drake from source with latest github commit in master.
EDIT: Can someone explain how exactly Drake sets up pydrake?
It seems it detects the default Python installation somewhere automatically. I tried with a new installation, the default python was 3.8, and I also install:
apt install -y python3.10
Then I followed Drake python setup instructions.
git clone https://github.com/RobotLocomotion/drake.git
mkdir drake-build
cd drake-build
cmake ../drake
make -j
Pydrake only became available in 3.8. How to make it available for 3.10?
The current version of Pydrake (1.11.0) is officially supported on Ubuntu 20.04 with Python 3.8 and Ubuntu 22.04 with Python 3.10 when building from source. However, we recommend that most users use a binary release, and don't try to rebuild Drake from scratch themselves.
There are precompiled wheels at https://pypi.org/project/drake/ aka pip install drake; helpful installation details are at https://drake.mit.edu/pip.html. The wheels when run on Ubuntu support Python versions 3.8, 3.9, 3.10, or 3.11.
For example:
python3 -m venv env
env/bin/pip install --upgrade pip
env/bin/pip install drake
source env/bin/activate
For additional details, see https://drake.mit.edu/installation.html for full instructions and supported versions.
The last version of Pydrake to support Ubuntu 18.04 was v1.1.0 (released in March of 2022). If you need a newer version of Pydrake, you'll need to use a newer version of Ubuntu.
On running my app exited with code 150
Output: Debug
You may only use the Microsoft .NET Core Debugger (vsdbg) with
Visual Studio Code, Visual Studio or Visual Studio for Mac software
to help you develop and test your applications.
-------------------------------------------------------------------
It was not possible to find any compatible framework version
The specified framework 'Microsoft.NETCore.App', version '3.0.0-preview3-27503-5' was not found.
- Check application dependencies and target a framework version installed at:
/usr/share/dotnet/
- The .NET Core framework and SDK can be installed from:
https://aka.ms/dotnet-download
- The following versions are installed:
3.0.0-preview-27324-5 at [/usr/share/dotnet/shared/Microsoft.NETCore.App]
- Installing .NET Core prerequisites might help resolve this problem:
https://go.microsoft.com/fwlink/?linkid=2063370
The target process exited without raising a CoreCLR started event. Ensure that the target process is configured to use .NET Core. This may be expected if the target process did not run on .NET Core.
The program '[31] dotnet' has exited with code 150 (0x96).
The program 'dotnet' has exited with code 150 (0x96).
If you have problems running the application in the Docker Container, then make sure that the correct Image for your .NET Core SDK is specified in the Dockerfile
For the .NET Core SDK 3.0.100-preview3-010431
Dockerfile was created incorrectly.
You can view Dockerfile samples for various SDKs at hub.docker.com
https://hub.docker.com/_/microsoft-dotnet-core-sdk/?tab=description
Example:
FROM buildpack-deps:stretch-scm
# Install .NET CLI dependencies
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
libc6 \
libgcc1 \
libgssapi-krb5-2 \
libicu57 \
libssl1.1 \
libstdc++6 \
zlib1g \
&& rm -rf /var/lib/apt/lists/*
# Install .NET Core SDK
ENV DOTNET_SDK_VERSION 3.0.100-preview3-010431
...
I'm trying to install Opencv 3.2.0 and Nvidia CUDA toolkit 8.0 on Ubuntu 16.04 but I can't configure them together. I get the following error when I try to make project using both:
CMake Error at /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
Could NOT find CUDA: Found unsuitable version "8.0", but required is exact
version "7.5" (found /usr/local/cuda)
Call Stack (most recent call first):
/usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:386 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake-3.5/Modules/FindCUDA.cmake:949 (find_package_handle_standard_args)
/usr/local/share/OpenCV/OpenCVConfig.cmake:86 (find_package)
/usr/local/share/OpenCV/OpenCVConfig.cmake:105 (find_host_package)
CMakeLists.txt:10 (find_package)
-- Configuring incomplete, errors occurred!
I have tried installing cuda toolkit 7.5 but its not compatible with ubuntu 16.04 I believe. I'm really clueless now, I hope someone can help with this.
Thanks
so I solved this issue by managing to install toolkit 7.5. Here is how I did it:
Updated nvidia driver for my Operating System
Download cuda toolkit 7.5 and extract it to a folder
$ mkdir ~/Downloads/NVIDIA_TOOLKIT
$ cd ~/Downloads
$ ./cuda_7.5.18_linux.run -extract=~/Downloads/NVIDIA_TOOLKIT;
go to the virtual console by pressing Ctrl + Alt + F1 and turn off
lightdm service
$ sudo service lightdm stop
cd to downloads and install the extracted toolkit and samples
$ cd ~/Downloads/NVIDIA_TOOLKIT
$ sudo ./cuda-linux64-rel-6.0.37-18176142.run
$ sudo ./cuda-samples-linux-6.0.37-18176142.run
Set environment variables in .bashrc file
$ PATH=/usr/local/cuda-7.5/bin
$ LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64
Turn back on the lightdm service
$ sudo service lightdm start
Reboot and you should be able to use the nvcc compiler
For openCV you will have to downgrade your gcc/ g++ compiler to 4.9 since it is not yet compatible with the higher versions
I have a NVIDIA jetson TX1 board. I want to install caffe on that. Based on Caffe prerequisites I installed CUDA toolkit from https://developer.nvidia.com/cuda-downloads. Later I found that this board has its own installtion description. It needs 10GB space and I do not have it since I have given some to Caffe prerequisites installations.
Now I need to remove this CUDA toolkit completely.
I did not find a sure way till now. Can you please help me?
I am using ubunto 14.4.+ NVIDIA jetson TX1
If you installed CUDA 7.5 using the .run :
From the manual:
4.6. Uninstallation
To uninstall the CUDA Toolkit, run the uninstallation script provided in the bin directory of the toolkit. By
default, it is located in /usr/local/cuda-7.5/bin:
$ sudo /usr/local/cuda-7.5/bin/uninstall_cuda_7.5.pl
To uninstall the NVIDIA Driver, run nvidia-uninstall:
$ sudo /usr/bin/nvidia-uninstall
If you installed CUDA 7.5 using the .deb package:
$ sudo apt-get purge cuda-7.5
(I think the package name is cuda-7.5, if it does not work, try with cuda-7-5 or just cuda)
Try:
sudo apt-get --purge -y remove 'cuda*'
sudo apt-get --purge -y remove 'nvidia*'
sudo reboot
It removes any installed cuda and nvidia packages and then you can install any specific version that you like from:
https://developer.nvidia.com/cuda-toolkit-archive.
To add up on mhaghighat's answer.
You can do this.
sudo apt purge -y '*cuda*'
sudo apt purge -y '*cudnn*'
reboot
Since you only asked about removing cuda, I assume you dont need to reinstall nvidia, so no need to remove that. Beware purge is a powerful command, use it with caution.
I would like to install LuaJIT on my redhat system...in order to get OSRM working. I have tried to do so by following the instructions here
and in particular i was following this part:
cd /tmp
wget http://luajit.org/download/LuaJIT-2.0.2.tar.gz
tar -zxvf LuaJIT-2.0.2.tar.gz
cd LuaJIT-2.0.2
make install PREFIX=/opt/osrm_infrastructure/LuaJIT-2.0.2
however i get the following error:
==== Building LuaJIT 2.0.2 ====
make -C src
lj_arch.h:324:2: error: #error "No support for PowerPC 64 bit mode"
#error "No support for PowerPC 64 bit mode"
^
I am on a redhat 7 ppc64 architecture...
Is there a work around that might be available?
Try using LuaJIT 2.1 from this fork, which has support for PowerPC 64-bit Little Endian (ppc64le). It it was not tested with OSRM, but it works with some Lua software.