Right after creating and starting up a data science virtual machine and connecting through ssh, I tried to use the nvidia-smi to see if the built-in nvidia and cuda were working property. The returned message read
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA
driver. Make sure that the latest NVIDIA driver is installed and
running.
These were supposed to be part of the vm, yet when I tried to run the program I created, my local computer's default CPU was used instead of the vm's GPU. The ultimate goal of my project is to run an object detection model with the performance sped up from the my lousy 11 sec/image, so I figured I would use a vm and take advantage of its computing power. Yet it seems like this may not be the best option, so if anyone else has some advice there, I would appreciate it.
The issue you are seeing is because you are using a D Series VM. Only the N series VMs have GPUs. So in order to utilize the GPU you need to select one of the following sizes:
https://learn.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu
For this size family, the vCPU (core) quota in your subscription is initially set to 0 in each region. You will need to request a vCPU quota increase for this family in an available region.
Related
I'm on a system with multiple NVIDIA GPUs. One or more of them may - or may not - be used to drive a physical monitor. In my compute work, I want to avoid using that one (or more).
How can I, programmatically, check which GPUs are used for display?
If there's no robust way of doing that, I'll settle for getting those GPUs which are used by an Xorg process (which is what nvidia-smi gives me on the command-line)
In case you want to use the same process, you can check the NVML API functions nvmlDeviceGetDisplayActive and nvmlDeviceGetDisplayMode.
Specifically,
nvmlReturn_t nvmlDeviceGetDisplayMode ( nvmlDevice_t device, nvmlEnableState_t* display ) can be used to detect if a physical display is connected to a device.
nvmlReturn_t nvmlDeviceGetDisplayActive ( nvmlDevice_t device, nvmlEnableState_t* isActive ) can be used to check if X server is attached to a device, it can be possible that an X server is running, without an attached physical display.
Link to documentation
Try the following on a terminal
nvidia-smi --format=csv --query-gpu=index,display_mode,display_active
For more information check the nvidia-smi documentation and nvidia-smi --help-query-gpu
When I let my application output the available memory and number of cores on a Google Cloud Run instance using linux commands like "free -h", "lscpu" and "top" I always get the information that there are 2 GB of available memory and 2 cores, although I specified other capacities in my deployment. No matter, I set 1 GB, 2 GB and 4 GB of memory and 1, 2 or 4 CPUs the mentioned linux tools always show the same capacity.
Am I misunderstanding these tools or the Google Cloud Run concept, or is there something not working like it should?
The Cloud Run services run container on a non standard runtime environmen (named BORG internally at Google Cloud). It's possible that the low level info values are not relevants.
In addition, Cloud Run services run in a sandbox (gVisor) and system calls can be also filtered like that.
What did you look at with these test?
I performed tests to validate the multi-cpus capacity of Cloud Run and wrote an article about that. The multi cpu capacity is real!! Have a look on it.
I'm getting started with freeRTOS. I went through the documentation provided in freeRTOS.org, and had some practice with some demo projects. My question is how to install freeRTOS without using the win32 port (since it is only an emulator that doesn't provide real time behaviour)? Is it possible to install freeRTOS as a standalone OS, or is it necessary to use linux kernel or windows?
FreeRTOS is a real time operating system kernel. It's not a fully blown OS, it's just the kernel. You don't "install" FreeRTOS like you would windows or a ubuntu distro on an x86 PC. You build a project and use freeRTOS to schedule tasks, manage memory resources etc. In general, you need a different microcontroller/processor than one you're developing on as your platform.
If you want to use only your laptop, then you'll need to simulate a "target" processor (that's what that option is). You won't be able to achieve "real time" results (windows will get in the way), but you can get pretty close.
The first thing I'd do is get an eval kit for whatever microcontroller you want to actually use/target/develop on.
My application benefits greatly from advanced CPU features that gcc can access when it is run with -march native. Docker can smooth over differences in OS, but how does it handle different CPUs? To build an application that can run on any CPU I would have to build for amd64, losing out on a lot of performance. Is there a good way to distribute Docker images when the application needs to be compiled separately for each CPU architecture?
Docker doesn't handle CPU at all. It is just a composition of kernel namespacing, FS system layering (e.g. UnionFS) and process quoting.
When you run something on a docker container it is just an executable running on your OS, without virtualisation, it has access only to a selected set of kernel objects (e.g. devices) and it is chrooted to a FS hierarchy resulting from overlaying vary FSs (including the one in the docker container).
Hence, Docker doesn't handle the CPU at all, it is completely orthogonal to your problem.
As Peter commented there are essentially two ways to CPU-dispatch:
You load the right dynamic library (but every function call into the library uses a pointer).
You build multiple versions of the same statically-linked binary and run the right one.
The main issue is that sometime ISA extensions are orthogonal and this makes the combinations (i.e. the number of libraries/binaries) grow exponential.
So, considering that you are dealing with the Docker's userbase you can simplify the approach a bit (if combinations are a problem):
Either make some ISA extensions required (if the absence of such would degrade the performance too much). For the optional extensions you can use one of the approaches of a above.
Create only a few baseline containers. E.g. One for the generic amd64, one for amd64-avx, one for amd64-avx2-aesni-tsx and similar. The idea being to create only a few containers that covers all, most and few of your users.
EDIT
As BeeOnRope pointed in the comments, Dockers has a version running on Windows. It uses Hyper-V to run a Linux VM with the Linux version of docker.
As Hyper-V is a native VMM, apart from an extra layer, the same considerations apply.
Similarly, there is a macOS version too. This time it uses an hypervisor framework based on xhyve.
I came across Nvidia optimus implementation for linux called bumblebee project https://github.com/Bumblebee-Project
I installed bumblebee on my laptop with Nvidia graphics card. The issue is that for the applications which need to use discrete gpu, have to be run through a special command "Optirun". Hence only when this is done, the discrete gpu is powered on else it is powered off whenever necessary to conserve power.
Is there a way to identity whether an application needs discrete gpu to run or could run on normal on chip graphics processor. Can this be done in linux ?
I don't think so, I also have a laptop with an optimus card, even on windows it has a list of applications you want to run with the nvidia chip or the intel one.
I believe that when you install the driver it comes with a list.
In theory you could profile each application that uses the video card for how much GPU/Memory it uses, if it is more than said limit you tag that this app should run on the nvidia if it is running on nvidia but using a small ammount you tag it to use the intel chip