Similar to this question, I am interested in detecting the exact GPU inside a Mac equipped with Apple silicon.
I am interested in knowing the exact GPU core count.
sysctl -a | grep gpu
or
sysctl -a | grep core
does not seem to provide anything useful.
You can use ioreg like this:
ioreg -l | grep gpu-core-count
You can also look up an object with class that's named something like AGXAcceleratorG13X and see all of its properties, gpu-core-count will also be there.
To detect GPU core count on Apple Silicon architecture you can use this command:
system_profiler SPDisplaysDataType
Related
i need to create zpool configuration based on below requirements
HDD disks: create mirror or RAIDZ
SSD disks: create cache pool
but how to identify disk type? any logic to identify based read/write speed? if so how?
note: myserver is freebsd. but please dont recommend me to post freebsd forum but didnt solve my issue. if there are no commands but at least tell me the logic. how can i check read/write speed?
You can check the the full output of lspci command
sudo lspci -vvv | grep prog-if
It indicates the interface (NVM Express, IDE, etc.)
Why do I specify ipu4 and ipu4_ex both to use ipu device in docker like below command?
docker run --device=/dev/**ipu4**:/dev/ipu4 --device=/dev/**ipu4_ex**:/dev/ipu4_ex -ti graphcore/tools gc-inventory
The suggested way to launch docker images that require access to Graphcore IPUs is using the gc-docker command line tool that you can read more about here. This command line tool is available in the Poplar SDK and wraps the system installed docker command line so that you don't need to worry about passing in devices manually like you've shown above.
For interested users you can see what gc-docker is calling under the hood by using the --echo arg, and this is where you will see something similar to what you've posted:
docker run --device=/dev/ipu0:/dev/ipu0 --device=/dev/ipu0_ex:/dev/ipu0_ex --device=/dev/ipu0_mailbox:/dev/ipu0_mailbox --device=/dev/ipu0_mem:/dev/ipu0_mem -ti graphcore/tools gc-inventory
This is what the corresponding gc-docker call would look like:
gc-docker --device-id 0 -- -ti graphcore/tools gc-inventory
As you can see, each IPU device has 4 associated user space PCIe devices. This is because each Graphcore IPU device has 4 distinct memory regions (which you can see if you use lspci -v to list Graphcore PCI devices). Each memory region corresponds to a different functional part of the device (you can read more about why devices may want to have multiple distinct memory regions in this Stack Exchange post). These memory regions are the IPU config space, IPU exchange space, ICU mailbox, and the host exchange memory device.
The Graphcore PCIe driver bridges the IPU PCIe device memory regions to the 4 user space character devices that you see in the docker command. This mapping of memory regions into user space is required for applications to access them. If any of these devices aren't accessible from a docker container utilising IPUs then you'll run into issues...hence why it's much easier to use the gc-docker tool rather than remembering all the user space device names!
On the project, we faced with the issue related to the crash of docker agent "No space left on the device".
On one of the nodes of K8S cluster, we executed the command:
# ps -eLf | grep './DotNetApp' | awk '{print $10}' | wc -l
13882
It means that all my .Net processes have 13882 threads. On the node, with this leak to a limit of the maximum number of threads.
To check the limit, you can execute:
root#ip-172-20-104-47:~# cat /proc/sys/kernel/pid_max
32768
"Threads" is the amount, but pid_max is about the pool of the ids. And pods can easily reach this limit and crash docker on the node.
We use CentOS for the K8S worker. We tried Ubuntu and got the same result.
Do you have any ideas, why do we have such a thread leak on Linux nodes under .net core 2.2?
The issue was quite interesting. We decided to create a health check for Redis. If Redis is not available, we just shot down the pod. However, the implementation of this health check for each /health call, it created the separate connection multiplexer without disposing the old one. So, after the some time the limit of available threads was reached.
So, be careful with implementation of health checks.
I have a server (Ubuntu 16.04) with 4 GPUs. My team shares this, and our current approach is to containerize all of our work with Docker, and to restrict containers to GPUs using something like $ NV_GPU=0 nvidia-docker run -ti nvidia/cuda nvidia-smi. This works well when we're all very clear about who's using which GPU, but our team has grown and I'd like a more robust way of monitoring GPU use and prohibit access to GPUs when they're in use. nvidia-smi is one channel of information with the "GPU-Util", but sometimes the GPU may have a 0% GPU-Util at one moment while it is currently reserved by someone working in a container.
Do you have any recommendations for:
Tracking when a user runs $ NV_GPU='gpu_id' nvidia-docker run
Kicking an error when another user runs $ NV_GPU='same_gpu_id' nvidia-docker run
Keeping an updated log that's something along the lines of {'gpu0':'user_name or free', . . ., 'gpu3':'user_name or free'}, where for every gpu it identifies the user who ran the active docker container utilizing that gpu, or it states that it is 'free'. Actually, stating the user and the container that is linked to the gpu would be preferable.
Updating the log when the user closes the container that is utilizing the gpu
I may be thinking about this the wrong way too, so open to other ideas. Thanks!
My command is:
qsub -t 1:30:1 -q test.q -l r_core=5 -l r_mem=30 run.sh
It launches 30 instances, each on one server, but they tend to consume more than the specified 30GB of RAM.
What are the reasons for this?
The only real-time resource enforcement you get is A) checking of min/max requests at submission, and B) walltime--and even with walltime, you may not get reliable enforcement, depending on the node. For solid resource enforcement, you should impose default resource restrictions, and then upgrade to the version supporting cgroups and enable that.