On the Google Cloud Platform (GCP), I have the following specs:
Machine type: n1-standard-8 (8 vCPUs, 30 GB memory)
CPU platform: Intel Haswell
I am using Jupyter notebook to fit an SVM to large amounts of NLP data. This process is very slow, and according to the GCP I am only utilizing around 0.12% of CPUs
How do I increase CPU utilization?
As DazWilkin mentioned actually, you're using 12% (12/100). This corresponds to one vCPU. This is because -- IIRC -- Jupyter is a Python app and Python's single-threaded so you're stuck using one core. You could reduce the number of cores (the OS will use multiple cores, of course) to save yourself some money but you'll need to evaluate alternatives to use more cores.
Related
I am new to the RAPIDS AI world and I decided to try CUML and CUDF out for the first time.
I am running UBUNTU 18.04 on WSL 2. My main OS is Windows 11. I have a 64 GB RAM and a laptop RTX 3060 6 GB GPU.
At the time I am writing this post, I am running a TSNE fitting calculation over a CUDF dataframe composed by approximately 26 thousand values, stored in 7 columns (all the values are numerical or binary ones, since the categorical ones have been one hot encoded).
While classifiers like LogisticRegression or SVM were really fast, TSNE seems taking a while to output results (it's been more than a hour now, and it is still going on even if the Dataframe is not so big). The task manager is telling me that 100% of GPU is being used for the calculations even if, by running "nvidia-smi" on the windows powershell, the command returns that only 1.94 GB out of a total of 6 GB are currently in use. This seems odd to me since I read papers on RAPIDS AI's TSNE algorithm being 20x faster than the standard scikit-learn one.
I wonder if there is a way of increasing the percentage of dedicated GPU memory to perform faster computations or if it is just an issue related to WSL 2 (probably it limits the GPU usage at just 2 GB).
Any suggestion or thoughts?
Many thanks
The task manager is telling me that 100% of GPU is being used for the calculations
I'm not sure if the Windows Task Manager will be able to tell you of GPU throughput that is being achieved for computations.
"nvidia-smi" on the windows powershell, the command returns that only 1.94 GB out of a total of 6 GB are currently in use
Memory utilisation is a different calculation than GPU throughput. Any GPU application will only use as much memory as is requested, and there is no correlation between higher memory usage and higher throughput, unless the application specifically mentions a way that it can achieve higher throughput by using more memory (for example, a different algorithm for the same computation may use more memory).
TSNE seems taking a while to output results (it's been more than a hour now, and it is still going on even if the Dataframe is not so big).
This definitely seems odd, and not the expected behavior for a small dataset. What version of cuML are you using, and what is your method argument for the fit task? Could you also open an issue at www.github.com/rapidsai/cuml/issues with a way to access your dataset so the issue can be reproduced?
I have a pretty big model I'm trying to run (30 GB of ram minimum) but every time I start a new instance, I can adjust the CPU ram but not the GPU. Is there a way on Google's AI notebook service to increase the ram for a GPU?
Thanks for the help.
In short: you can't. You might consider switching to Colab Pro that features e.g. better GPU:
With Colab Pro you get priority access to our fastest GPUs. For
example, you may get access to T4 and P100 GPUs at times when
non-subscribers get K80s. You also get priority access to TPUs. There
are still usage limits in Colab Pro, though, and the types of GPUs and
TPUs available in Colab Pro may vary over time.
In the free version of Colab there is very limited access to faster
GPUs, and usage limits are much lower than they are in Colab Pro.
That being said, don't count on getting best-in-class GPU just for yourself for ~10 USD / month. If you need high-memory dedicated GPU, you will likely have to resort to using a dedicated service. You should easily find services with 24 GB cards for less than 1 USD / hour.
Yes, you can create a personalized AI Notebook and also edit its hardware after the creation of it. Please take special care if you are not hitting the quota limit for GPU if you still are not able to change these settings.
I am using Rancher. I have deployed a cluster with 1 master & 3 worker nodes.
All Machines are VPSes with 2 vCPU, 8GB RAM and 80GB SSD.
After the cluster was set up, the CPU reserved figure on Rancher dashboard was 15%. After metrics were enabled, I could see CPU used figures too and now CPU reserved had become 44% and CPU used was 16%. I find those figures too high. Is it normal for Kubernetes a cluster to consume this much CPU by itself?
Drilling down into the metrics, if find that the networking solution that Rancher uses - Canal - consumes almost 10% of CPU resources. Is this normal?
Rancher v2.3.0
User Interface v2.3.15
Helm v2.10.0-rancher12
Machine v0.15.0-rancher12-1
This "issue" is known for some time now and it affects smaller clusters. Kuberenetes is very CPU hungry relative to small clusters and this is currently by design. I have found multiple threads reporting this for different kind of setups. Here is an example.
So the short answer is: yes, Kubernetes setup consumes these amounts of CPU when used with relative small clusters.
I hope it helps.
I'm trying to get tesserocr python library to run on 4 cores. According to tesseract docs, I understand it supports up to 4 cores. I have a tesserocr python3.x job running inside AWS Batch (docker container based on amazonlinux:latest image) on a c4.x2large instance which has 8 vCPUs, all were allocated to the batch job at submission time.
The benchmarks show CPU at 30% max, i.e. 2.5 vCPUs i.e. about 1.25 physical cores (each 2 vCPUs are roughy 1 physical core).
I've also tried the OMP_NUM_THREADS=4, OMP_THREAD_LIMIT=4 environment variables (based on some forum online), but no value had any effect on performance whatsoever.
How do I can tesserocr to scale up to all 4 cores (8 vCPUs)?
Python Threads are not 1:1 with vCPUs / Cores.
If you look at the specifications for a c4.x2large instance, it says you can run up to 2 threads per core. And that machine has 8 vCPUs.
To potentially use all 8 vCPUs you could try setting OMP_NUM_THREADS=16, OMP_THREAD_LIMIT=16. However how the underlying tesserocr is implemented will have a huge impact on how well it scales within the context of the machine.
I have two systems with different Xeon processors and different amounts of RAM. Both run on Ubuntu 16 and have the same docker version. My applications are dockerized.
I ran the same docker image on both the systems and the amount of memory they consumed on both the systems are 610 MB and 814 MB.
I'm trying to figure out why this difference occurs. Does having a faster CPU reduce the usage of memory? and if so then why does it take less memory?