GCP Cloud Storage Transfer Service - Agents unable to connect to pool - docker

I wanted to make use of GCP Cloud Transfer Service to sync data from on-premise. For this I had to install agents and connect them to a pool. I was following the steps here, where they provide docker commands to install a transfer agent. Instead of docker, I used podman:
podman run -ti --name gcloud-config docker.io/google/cloud-sdk gcloud auth application-default login
podman run -d --rm \
--env https_proxy=<some_proxy> \
--volumes-from gcloud-config \
-v <some_dir>:<some_dir> \
gcr.io/cloud-ingest/tsop-agent:latest \
--project-id=<some_project_id> \
--hostname=$(hostname) \
--agent-pool=source_agent_pool
The agents do start but they aren't able to connect to the pool.
If I see the output of the agent container (using podman attach containerID)
0B/s txSum: 0B taskResps[copy:0 delete:0 list:0] ctrlMsgAge:10m50s (??) |
and agent.INFO logs:
Build target: //cloud/transfer/online/onprem/workers/agent:agent
Build id: <some_id>
I1222 06:47:51.288924 3 log_spam.go:51] Command line arguments:
I1222 06:47:51.288926 3 log_spam.go:53] argv[0]: './agent'
I1222 06:47:51.288928 3 log_spam.go:53] argv[1]: '--project-id=<project_id>'
I1222 06:47:51.288930 3 log_spam.go:53] argv[2]: '--hostname=<hostname>'
I1222 06:47:51.288931 3 log_spam.go:53] argv[3]: '--agent-pool=source_agent_pool'
I1222 06:47:51.288933 3 log_spam.go:53] argv[4]: '--container-id=49be0b94bced'
I1222 06:47:51.289408 3 prodlayer.go:217] layer successfully set to NO_LAYER with source DEFAULT
I1222 06:47:53.148699 3 handler.go:45] TaskletHandler initialized to delete at most 1000 objects in parallel:
I1222 06:47:53.148725 3 handler.go:48] TaskletHandler initialized with delete-files: 1024
I1222 06:47:53.148827 3 copy.go:145] TaskletHandler initialized with copy-files: &{0xc00073d2c0 10800000000000}
I1222 06:47:53.148860 3 handler.go:61] TaskletHandler initialized to process at most 256 list outputs in parallel:
I1222 06:48:51.291680 3 cpuutilization.go:86] Last minute's CPU utilization: 0
I1222 06:49:51.291017 3 cpuutilization.go:86] Last minute's CPU utilization: 0
I1222 06:50:51.290721 3 cpuutilization.go:86] Last minute's CPU utilization: 0
I1222 06:51:51.291057 3 cpuutilization.go:86] Last minute's CPU utilization: 0
I1222 06:52:51.290677 3 cpuutilization.go:86] Last minute's CPU utilization: 0
I1222 06:53:51.290445 3 cpuutilization.go:86] Last minute's CPU utilization: 0
I also checked all the troubleshooting steps here, but couldn't find anything. Is it something to do with using podman instead of docker?

Related

Make docker build --memory-swap=20g use the available swap space?

I have run free -h and see that I have 29G of swap space.
total used free shared buff/cache available
Mem: 15G 6.9G 8.8G 17M 223M 8.9G
Swap: 29G 2.0M 29G
I have also enabled 100 swappiness.
$ sudo sysctl vm.swappiness=100
vm.swappiness = 100
$ cat /proc/sys/vm/swappiness
100
However, docker build --memory-swap=20g does not appear to use the swap space. This is the output of htop throughout the docker build.
1 [|||||||||||||||| 18.7%]
2 [||||||| 7.3%]
3 [|||||||||||||||||||||| 26.5%]
4 [||||||||||||||| 18.0%]
Mem[||||||||||||||||||||||||||||||||||| 6.47G/15.9G]
Swp[| 2.00M/29.6G]
This is the docker build command:
docker build --build-arg NODE_OPTIONS="--max-old-space-size=325" \
--memory=600m --memory-swap=20g \
--cpu-period=100000 --cpu-quota=50000 \
--no-cache --tag farm_app_image:latest --file Dockerfile .
The docker build appears to be running out of RAM, because the build's internal process (NodeJS) runs out of heap space and crashes. Also, immediately before the crash the memory is maxed:
shaun#DESKTOP-5T629JB:/mnt/c/Users/bigfo$ docker ps -q | xargs docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
66bdf8efb492 charming_maxwell 51.72% 562.2MiB / 600MiB 93.70% 46.8MB / 1.53MB 277MB / 230MB 94
Why is it running out of RAM without using the swap space? How can we make it use the available swap space?
May be you should try to run it with --privileged flag.
docker run -ti --privileged yourimage
But make sure that you know what you are doing.
You should also read docker-tips-privilaged-flag

kubernetes not able to pull image from spark master host

I have 3 node[host a,host b, host c] kubernetes cluster(version 1.12.2). I am trying run spark-pi example jar as mentioned in kubernetes document.
Host a is my kubernetes Master. >> kubectl get nodees list all the three nodes.
I have built the spark docker image using whats provided in spark 2.3.0 binary folder.
>> sudo ./bin/docker-image-tool.sh -r docker.io/spark/spark -t spark230 build
I got the message the image got built successfully.
>> docker images ls
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/spark/spark spark230 6a2b645d7efe About an hour ago 346 MB
docker.io/weaveworks/weave-npc 2.5.0 d499500e93d3 7 days ago 49.5 MB
docker.io/weaveworks/weave-kube 2.5.0 a5103f96993a 7 days ago 148 MB
docker.io/openjdk 8-alpine 97bc1352afde 2 weeks ago 103 MB
k8s.gcr.io/kube-proxy v1.12.2 15e9da1ca195 2 weeks ago 96.5 MB
k8s.gcr.io/kube-apiserver v1.12.2 51a9c329b7c5 2 weeks ago 194 MB
k8s.gcr.io/kube-controller-manager v1.12.2 15548c720a70 2 weeks ago 164 MB
k8s.gcr.io/kube-scheduler v1.12.2 d6d57c76136c 2 weeks ago 58.3 MB
k8s.gcr.io/etcd 3.2.24 3cab8e1b9802 7 weeks ago 220 MB
k8s.gcr.io/coredns 1.2.2 367cdc8433a4 2 months ago 39.2 MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 10 months ago 742 kB
> ./bin/spark-submit
--master k8s://https://<api-server>:<api
> server port> --deploy-mode cluster --name spark-pi
> --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf
> spark.kubernetes.container.image=spark/spark:spark230 --conf
> spark.kubernetes.authenticate.driver.serviceAccountName=spark
> local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
When I submit above command , it gives proper output sometimes. Other times it throws below error.
> code = Unknown desc = repository docker.io/spark/spark not found:
does not exist or no pull access, reason=ErrImagePull
When I debugged it further, it looks like, whenever node name: host b or host c its throwing above error message.
When node name : host a , then it runs fine. Looks like other nodes are unable to locate the image.
Questions:
Should I install spark on all nodes & build docker image on all nodes?
Is it possible to pass the image reference from single node [host a] to other nodes? i.e how to make other nodes refer the same image from host a.
Yes, you need to build Spark image in all the nodes. You can write the wrapper script to invoke the 'rebuild-image.sh' on all the nodes as below:
for h in hostnames; do
rsync -av /opt/spark ${h}:/opt
ssh ${h} /opt/spark/rebuild-image.sh
You can always save docker image as tar file and then copy that tar file to other host and load the image there.
To save docker image as tar file:
sudo docker save -o <path for generated tar file> <image name>
Now copy your tar file to other host using scp or some other copy tools. And load the docker image using:
sudo docker load -i <path to image tar file>
Hope this helps

How to check the number of cores used by docker container?

I have been working with Docker for a while now, I have installed docker and launched a container using
docker run -it --cpuset-cpus=0 ubuntu
When I log into the docker console and run
grep processor /proc/cpuinfo | wc -l
It shows 3 which are the number of cores I have on my host machine.
Any idea on how to restrict the resources to the container and how to verify the restrictions??
The issue has been already raised in #20770. The file /sys/fs/cgroup/cpuset/cpuset.cpus reflects the correct output.
The cpuset-cpus is taking effect however is not being reflected in /proc/cpuinfo
docker inspect <container_name>
will give the details of the container launched u have to check for "CpusetCpus" in there and then u will find the details.
Containers aren't complete virtual machines. Some kernel resources will still appear as they do on the host.
In this case, --cpuset-cpus=0 modifies the resources the container cgroup has access to which is available in /sys/fs/cgroup/cpuset/cpuset.cpus. Not what the VM and container have in /proc/cpuinfo.
One way to verify is to run the stress-ng tool in a container:
Using 1 cpu will be pinned at 1 core (1 / 3 cores in use, 100% or 33% depending on what tool you use):
docker run --cpuset-cpus=0 deployable/stress -c 3
This will use 2 cores (2 / 3 cores, 200%/66%):
docker run --cpuset-cpus=0,2 deployable/stress -c 3
This will use 3 ( 3 / 3 cores, 300%/100%):
docker run deployable/stress -c 3
Memory limits are another area that don't appear in kernel stats
$ docker run -m 64M busybox free -m
total used free shared buffers cached
Mem: 3443 2500 943 173 261 1858
-/+ buffers/cache: 379 3063
Swap: 1023 0 1023
yamaneks answer includes the github issue.
it should be in double quotes --cpuset-cpus="", --cpuset-cpus="0" means it make use of cpu0.

Mixing cpu-shares and cpuset-cpus in Docker

I would like to run two containers with the following resource allocation:
Container "C1": reserved cpu1, shared cpu2 with 20 cpu-shares
Container "C2": reserved cpu3, shared cpu2 with 80 cpu-shares
If I run the two containers in this way:
docker run -d --name='C1' --cpu-shares=20 --cpuset-cpus="1,2" progrium/stress --cpu 2
docker run -d --name='C2' --cpu-shares=80 --cpuset-cpus="2,3" progrium/stress --cpu 2
I got that C1 takes 100% of cpu1 as expected but 50% of cpu2 (instead of 20%), C2 takes 100% of cpu3 as expected and 50% of cpu2 (instead of 80%).
It looks like the --cpu-shares option is ignored.
Is there a way to obtain the behavior I'm looking for?
docker run mentions that parameter as:
--cpu-shares=0 CPU shares (relative weight)
And contrib/completion/zsh/_docker#L452 includes:
"($help)--cpu-shares=[CPU shares (relative weight)]:CPU shares:(0 10 100 200 500 800 1000)"
So those values are not %-based.
The OP mentions --cpu-shares=20/80 works with the following Cpuset constraints:
docker run -ti --cpuset-cpus="0,1" C1 # instead of 1,2
docker run -ti --cpuset-cpus="3,4" C2 # instead of 2,3
(those values are validated/checked only since docker 1.9.1 with PR 16159)
Note: there is also CPU quota constraint:
The --cpu-quota flag limits the container’s CPU usage. The default 0 value allows the container to take 100% of a CPU resource (1 CPU).

How to set Apache Spark Executor memory

How can I increase the memory available for Apache spark executor nodes?
I have a 2 GB file that is suitable to loading in to Apache Spark. I am running apache spark for the moment on 1 machine, so the driver and executor are on the same machine. The machine has 8 GB of memory.
When I try count the lines of the file after setting the file to be cached in memory I get these errors:
2014-10-25 22:25:12 WARN CacheManager:71 - Not enough space to cache partition rdd_1_1 in memory! Free memory is 278099801 bytes.
I looked at the documentation here and set spark.executor.memory to 4g in $SPARK_HOME/conf/spark-defaults.conf
The UI shows this variable is set in the Spark Environment. You can find screenshot here
However when I go to the Executor tab the memory limit for my single Executor is still set to 265.4 MB. I also still get the same error.
I tried various things mentioned here but I still get the error and don't have a clear idea where I should change the setting.
I am running my code interactively from the spark-shell
Since you are running Spark in local mode, setting spark.executor.memory won't have any effect, as you have noticed. The reason for this is that the Worker "lives" within the driver JVM process that you start when you start spark-shell and the default memory used for that is 512M. You can increase that by setting spark.driver.memory to something higher, for example 5g. You can do that by either:
setting it in the properties file (default is $SPARK_HOME/conf/spark-defaults.conf),
spark.driver.memory 5g
or by supplying configuration setting at runtime
$ ./bin/spark-shell --driver-memory 5g
Note that this cannot be achieved by setting it in the application, because it is already too late by then, the process has already started with some amount of memory.
The reason for 265.4 MB is that Spark dedicates spark.storage.memoryFraction * spark.storage.safetyFraction to the total amount of storage memory and by default they are 0.6 and 0.9.
512 MB * 0.6 * 0.9 ~ 265.4 MB
So be aware that not the whole amount of driver memory will be available for RDD storage.
But when you'll start running this on a cluster, the spark.executor.memory setting will take over when calculating the amount to dedicate to Spark's memory cache.
Also note, that for local mode you have to set the amount of driver memory before starting jvm:
bin/spark-submit --driver-memory 2g --class your.class.here app.jar
This will start the JVM with 2G instead of the default 512M.
Details here:
For local mode you only have one executor, and this executor is your driver, so you need to set the driver's memory instead. *That said, in local mode, by the time you run spark-submit, a JVM has already been launched with the default memory settings, so setting "spark.driver.memory" in your conf won't actually do anything for you. Instead, you need to run spark-submit as follows
The answer submitted by Grega helped me to solve my issue. I am running Spark locally from a python script inside a Docker container. Initially I was getting a Java out-of-memory error when processing some data in Spark. However, I was able to assign more memory by adding the following line to my script:
conf=SparkConf()
conf.set("spark.driver.memory", "4g")
Here is a full example of the python script which I use to start Spark:
import os
import sys
import glob
spark_home = '<DIRECTORY WHERE SPARK FILES EXIST>/spark-2.0.0-bin-hadoop2.7/'
driver_home = '<DIRECTORY WHERE DRIVERS EXIST>'
if 'SPARK_HOME' not in os.environ:
os.environ['SPARK_HOME'] = spark_home
SPARK_HOME = os.environ['SPARK_HOME']
sys.path.insert(0,os.path.join(SPARK_HOME,"python"))
for lib in glob.glob(os.path.join(SPARK_HOME, "python", "lib", "*.zip")):
sys.path.insert(0,lib);
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext
conf=SparkConf()
conf.set("spark.executor.memory", "4g")
conf.set("spark.driver.memory", "4g")
conf.set("spark.cores.max", "2")
conf.set("spark.driver.extraClassPath",
driver_home+'/jdbc/postgresql-9.4-1201-jdbc41.jar:'\
+driver_home+'/jdbc/clickhouse-jdbc-0.1.52.jar:'\
+driver_home+'/mongo/mongo-spark-connector_2.11-2.2.3.jar:'\
+driver_home+'/mongo/mongo-java-driver-3.8.0.jar')
sc = SparkContext.getOrCreate(conf)
spark = SQLContext(sc)
Apparently, the question never says to run on local mode not on yarn. Somehow I couldnt get spark-default.conf change to work. Instead I tried this and it worked for me
bin/spark-shell --master yarn --num-executors 6 --driver-memory 5g --executor-memory 7g
( couldnt bump executor-memory to 8g there is some restriction from yarn configuration.)
You need to increase the driver memory.On mac(i.e when running on local master), the default driver-memory is 1024M). By default, thus 380Mb is allotted to the executor.
Upon increasing [--driver-memory 2G], executor memory got increased to ~950Mb.
As far as i know it wouldn't be possible to change the spark.executor.memory at run time. If you are running a stand-alone version, with pyspark and graphframes, you can launch the pyspark REPL by executing the following command:
pyspark --driver-memory 2g --executor-memory 6g --packages graphframes:graphframes:0.7.0-spark2.4-s_2.11
Be sure to change the SPARK_VERSION environment variable appropriately regarding the latest released version of Spark
create a file called spark-env.sh in spark/conf directory and
add this line
SPARK_EXECUTOR_MEMORY=2000m #memory size which you want to allocate for the executor
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \ # can be client for client mode
--executor-memory 2G \
--num-executors 5 \
/path/to/examples.jar \
1000
You can build command using following example
spark-submit --jars /usr/share/java/postgresql-jdbc.jar --class com.examples.WordCount3 /home/vaquarkhan/spark-scala-maven-project-0.0.1-SNAPSHOT.jar --jar --num-executors 3 --driver-memory 10g **--executor-memory 10g** --executor-cores 1 --master local --deploy-mode client --name wordcount3 --conf "spark.app.id=wordcount"
Spark executor memory is required for running your spark tasks based on the instructions given by your driver program. Basically, it requires more resources that depends on your submitted job.
Executor memory includes memory required for executing the tasks plus overhead memory which should not be greater than the size of JVM and yarn maximum container size.
Add the following parameters in spark-defaults.conf
spar.executor.cores=1
spark.executor.memory=2g
If you using any cluster management tools like cloudera manager or amabari please refresh the cluster configuration for reflecting the latest configs to all nodes in the cluster.
Alternatively, we can pass the executor core and memory value as an argument while running spark-submit command along with class and application path.
Example:
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \ # can be client for client mode
--executor-memory 2G \
--num-executors 5 \
/path/to/examples.jar \
1000
you mentioned that you are running yourcode interactivly on spark-shell so, while doing if no proper value is set for driver-memory or executor memory then spark defaultly assign some value to it, which is based on it's properties file(where default value is being mentioned).
I hope you are aware of the fact that there is one driver(master node) and worker-node(where executors are get created and processed), so basically two types of space is required by the spark program,so if you want to set driver memory then when start spark-shell .
spark-shell --driver-memory "your value" and to set executor memory :
spark-shell --executor-memory "your value"
then I think you are good to go with the desired value of the memory that you want your spark-shell to use.
In Windows or Linux, you can use this command:
spark-shell --driver-memory 2G
For configuring Cores and Memory for executors.
spark-shell --help
--master MASTER_URL spark://host:port, mesos://host:port, yarn,
--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).
--total-executor-cores NUM Total cores for all executors.
--executor-cores NUM Number of cores used by each executor. (Default: 1 in YARN and K8S modes, or all available cores on the worker in standalone mode).
Choose one of the following commands in case your system is having 6 Cores and 6GB RAM:
creates 6 executors with each 1 core and 1GB RAM
spark-shell --master spark://sparkmaster:7077 --executor-cores 1 --executor-memory 1g
creates 3 executors with each 1 core and 2GB RAM. The Max memory is 6GB, 3 cores are ideal.
spark-shell --master spark://sparkmaster:7077 --executor-cores 1 --executor-memory 2g
creates 2 executors with each 3 cores and 3GB RAM. Using all RAM and Cores
spark-shell --master spark://sparkmaster:7077 --executor-cores 3 --executor-memory 3g
creates 2 executors with each 3 cores and only 1GB RAM.
spark-shell --master spark://sparkmaster:7077 --executor-cores 3 --executor-memory 1g
if we want to use only one executors with 1 core and 1GB RAM
spark-shell --master spark://sparkmaster:7077 --total-executor-cores 1 --executor-cores 1 --executor-memory 1g
if we want to use only two executors with each 1 core and 1GB RAM
spark-shell --master spark://sparkmaster:7077 --total-executor-cores 2 --executor-cores 1 --executor-memory 1g
if we want to use only two executors with each 2 cores and 2GB RAM (total 4 cores and 4GB RAM)
spark-shell --master spark://sparkmaster:7077 --total-executor-cores 4 --executor-cores 2 --executor-memory 2g
If we apply --total-executor-cores 2, then only one executor will be created.
spark-shell --master spark://sparkmaster:7077 --total-executor-cores 4 --executor-cores 2 --executor-memory 2g
Total executor cores: 3 is not divisible by cores per executor: 2, the left cores: 1 will not be allocated. one executor with 2 core will create.
spark-shell --master spark://sparkmaster:7077 --total-executor-cores 3 --executor-cores 2 --executor-memory 2g
So --total-executor-cores / --executor-cores = Number of executors that will create.

Resources