Why does Google Cloud Run instance show wrong available memory and number of cores - google-cloud-run

When I let my application output the available memory and number of cores on a Google Cloud Run instance using linux commands like "free -h", "lscpu" and "top" I always get the information that there are 2 GB of available memory and 2 cores, although I specified other capacities in my deployment. No matter, I set 1 GB, 2 GB and 4 GB of memory and 1, 2 or 4 CPUs the mentioned linux tools always show the same capacity.
Am I misunderstanding these tools or the Google Cloud Run concept, or is there something not working like it should?

The Cloud Run services run container on a non standard runtime environmen (named BORG internally at Google Cloud). It's possible that the low level info values are not relevants.
In addition, Cloud Run services run in a sandbox (gVisor) and system calls can be also filtered like that.
What did you look at with these test?
I performed tests to validate the multi-cpus capacity of Cloud Run and wrote an article about that. The multi cpu capacity is real!! Have a look on it.

Related

Spark in standalone mode on a single computer : is it worth splitting it in masters and workers through docker containers (or another way)?

I currently own only one computer, and I won't have another.
I run Spark on its CPU cores : master=local[5], using it directly : I set spark-core and spark-sql for dependencies, do quite no other configuration, and my programs start immediately. It's confortable, of course.
But should I attempt to create an architecture with a master and some workers by the mean of Docker containers or minikube (Kubernetes) on my computer ?
Will solution #2 - with all the settings it requires - reward me with better performances, because Spark is truly designed to work that way, even on a single computer,
or will I loose some time, because the mode I'm currently running it, without network usage, without need of data locality will always give me better performances, and solution #1 will always be the best on a single computer ?
My hypothesis is that #1 is fine. But I have no true measurement for that. No source of comparison. Who have experienced the two manners of doing things on a sigle computer ?
It really depends on your goals - if you always will run your Spark code on the single node with local master, then just use it. But if you intend to run your resulting code in the distributed mode on multiple machines, then emulating cluster with Docker could be useful, as you'll get your code running in truly distributed manner, and you'll able to find problems that not always are found when you run your code with the local master.
Instead of direct Docker usage (that could be tricky to setup, although it's still possible), maybe you can consider to use Spark on Kubernetes, for example, via minikube - there is a plenty of articles found by Google on this topic.
Having done testing on this with executor size, the cutover from when it makes sense to use more multiple executors is # CPUs > 32. AWS EMR spark runtime defaults to at least 4 CPUs per executor and Databricks always uses fat executors which means > 32CPUS on the 8xl instances. Your greatest limitation tends to be the JVMs garbage collection which caps the size of the heap. Local mode has a couple performance advantages compared to cluster mode.
full stage code gen has to be run on both the drive and every single executor. For short queries this can add several 100MS per stage.
driver <-> executor communication has latency.
shared memory between driver and executors. This reduces the chance of OOM and reduces the amount of spilling to disk.
People end up choosing to go with multiple executors/instances not because it would be faster than a single instance but because it is the only way to scale up in terms of data volume and parallization. (also for failure recovery)
If you're feeling ambitious there's a performance testing tool called TPC-DS that runs a set of dataprocessing queries against a standardized dataset
https://github.com/databricks/spark-sql-perf
https://github.com/maropu/spark-tpcds-datagen
Also if you're feeling adventurous the spark code has a script to fire up a mini cluster on minikube if you want a quick and easy way to test this.

Configuration Dask Distributed

I'm setting up an environment for our data scientists to work on. Currently we have a single node running Jupyterhub with Anaconda and Dask installed. (2 sockets with 6 cores and 2 threads per core with 140 gb ram). When users create a LocalCluster, currently the default settings are to take all the available cores and memory (as far as I can tell). This is okay when done explicitly, but I want the standard LocalCluster to use less than this. Because almost everything we do is
Now when looking into the config I see no configuration dealing with n_workers, n_threads_per_worker, n_cores etc. For memory, in dask.config.get('distributed.worker') I see two memory related options (memory and memory-limit) both specifying the behaviour listed here: https://distributed.dask.org/en/latest/worker.html.
I've also looked at the jupyterlab dask extension, which lets me do all this. However, I can't force people to use jupyterlab.
TL;DR I want to be able set the following standard configuration when creating a cluster:
n_workers
processes = False (I think?)
threads_per_worker
memory_limit either per worker, or for the cluster. I know this can only be a soft limit.
Any suggestions for configuration is also very welcome.
As of 2019-09-20 this isn't implemented. I recommend raising an feature request at https://github.com/dask/distributed/issues/new , or even a pull request.

RNG on Google Kubernetes Engine

I need to be able to produce random longs using Java SecureRandom in an application running on Google Kubernetes Engine. The rate may vary based on day and time, perhaps from as low as 1 per minute to as high as 20 per second.
To assist RNG, we would install either haveged or rng-tools in the container.
Will GKE be capable of supporting this scenario with high-quality random distribution of longs without blocking? Which of haveged or rng-tools is more capable for this scenario?
I asked a related question here but didn't get a satisfactory answer:
Allocating datastore id using PRNG

influxDB query speed

My influxdb measurement have 24 Field Keys and 5 tag keys.
I try to do 'select last(cpu) from mymeasurement', and found result :
When there is no client throwing data into it, it'll take around 2 seconds to got the result
But when I run 95 client throwing data (per 5 seconds) into it, the query will take more than 10 seconds before it show the result. is it normal ?
Note :
My system is a Centos7 VM in xenserver with 4 vcore CPU and 8 GB ram, the top command show 30% cpu while that clients throw datas.
Some ideas:
Check your vCPU configuration on other VMs running on the same host. Other VMs you might have that don't need the extra vCPUs should only be configured with one vCPU, for a latency boost.
If your DB server requires 4 vCPUs and your host already has very little CPU% used during queries, you might want to check the storage and memory configurations of the VM in case your server is slow due to swap partition use, especially if your swap partition is located on a Virtual Disk over the network via iSCSI or NFS.
It might also be a memory allocation issue within the VM and server application. If you have XenTools installed on the VM, try on a system without the XenTools installed to rule out latency issues related to the XenTools driver.

Excessive memory use pyspark

I have setup a JupyterHub and configured a pyspark kernel for it. When I open a pyspark notebook (under username Jeroen), two processes are added, a Python process and a Java process. The Java process is assigned 12g of virtual memory (see image). When running a test script on a range of 1B number it grows to 22g. Is that something to worry about when we work on this server with multiple users? And if it is, how can I prevent Java from allocating so much memory?
You don't need to worry about virtual memory usage, reserved memory is much more important here (the RES column).
You can control size of JVM heap usage using --driver-memory option passed to spark (if you use pyspark kernel on jupyterhub you can find it in environment under PYSPARK_SUBMIT_ARGS key). This is not exactly the memory limit for your application (there are other memory regions on JVM), but it is very close.
So, when you have multiple users setup, you should learn them to set appropriate driver memory (the minimum they need for processing) and shutdown notebooks after they finish work.

Resources