sstableloader does not exit after successful data loading - datastax-enterprise

I'm trying to bulk-load my data into DSE but sstableloader doesn't exit after a successful run. According to the output, the progress for each node is already 100% and the progress total also shows 100%
Environment: CentOS 6.x x86_64; DSE 4.0.1
Topology: 1 Cassandra node, 5 Solr nodes (DC auto-assigned by DSE); RF 2
System ulimit (hard, soft) in each DSE node: 65536
sstableloader heap size (-Xmx): 10240M (10G)
SSTables size: 158gb (from 80gb CSV, 241m rows)
I tried to take down all nodes - hoping that sstableloader would somehow exit when one or more nodes go down - but it didn't. I had to kill the process manually either by 'kill' command or CTRL+C in the command window (SIGINT).
Prior to experiencing this issue with sstableloader not exiting, I had 1 successful run where it exited. I can't reproduce that anymore because sstableloader refuses to exit in all of my subsequent attempts regardless of sstables size

Secondary indexes need to be built as well. But that shouldn't take hours.

Related

docker container hanging on run - how to debug

I am trying to run Screaming Frog in a docker. For this I have used as a starting point, this Github project:
https://github.com/iihnordic/screamingfrog-docker
After building, I ran the docker with the follwing command:
docker run -v /<my-path>/screamingfrog-crawls:/home/crawls screamingfrog --crawl https://<my-domain> --headless --save-crawl
--output-folder /home/crawls
It worked the first time, but after multiple attempts, it seems that the process hangs 8 out of 10 times with no error, always hanging at a different stage in the process.
I assumed the most likely reason is memory, but despite significantly increasing the docker memory and also increasing the Screaming Frog Memory to 16GB the same issue persists.
How can I go about debugging my container when no errors are thrown except for the container hanging indefinitely
As suggested by #Ralle, I checked docker stats, and while it seems that Memory usage is actually staying well below 10%, the CPU is always of 100%
Try
docker stats:
returns something like this.
At least you can see the behaviour of memory and cpu.
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
3.18kB / 496B 17.2MB / 41.4MB 8
9949a4ee1238 nest-api-1 0.87% 290MiB / 3.725GiB 7.60% 2.14MB / 37.2kB 156kB / 2.06MB 33
96fe43dba2b0 postgres 0.00% 29MiB / 3.725GiB 0.76% 7.46kB / 6.03kB 1.17MB / 67.8MB 7
ff570659e917 redis 0.30% 3.004MiB / 3.725GiB 0.08% 2.99kB / 0B 614kB / 4.1kB 5
ALSO docker top shows you you the pid ids.
I dont know your application, but check also what if the issue can be related to the volumes itselfs each time the container restarts.

Docker service showing no such image when trying to upgrade service

First of all sorry if I have a bad english.
We have a service that was being upgraded until 26 / September / 2022, via portainer or via terminal on Docker. It was on gitlab registry.
We did not make any changes but we are not able to upgrade it anymore!
How can we debug why this message is appearing?
No such image: registry.gitlab.com/xxxx/xxx/api:1.1.18#sha256:xxxx
Some additional informations:
-We are using docker login before trying to do the service update.
-We can do docker pull registry.gitlab.com/etc/etc (the version)
The problem only occurs when we try to upgrade it as a service.
There is some kind of debug on the service upgrade that can provide some additional information like firewall is blocking or something like this?
docker service update nameofservice
nameofservice
overall progress: 0 out of 1 tasks
overall progress: 0 out of 1 tasks
overall progress: 0 out of 1 tasks
overall progress: 0 out of 1 tasks
overall progress: 0 out of 1 tasks
overall progress: 0 out of 1 tasks
1/1: preparing [=================================> ]
Until return the error 'no such image'!
I am pretty sure the image exists.
If you are experiencing the same problem, check if you have more nodes, phisical machines or vms running connected to your docker node (docker node ls).
If that is your case, run docker pull gitlabaddressetcetc on the other nodes and check if everything is fine.
I found the message 'No space left on device', so I runned 'df -h' but a lot of space are available for the VM. Anyway I decided to run 'docker prune -f' to see what happens:
So running the 'docker system prune -f' seems to solved my problem, and everything is fine now.
After that I just needed to change the version of the portainer to a invalid one before trying again.

docker-compose exits due to large folder var/cache/prod of Symfony project

My symfony 5 app is running inside a docker container. When I want to deploy an update, docker-compose shows this error:
ERROR: for app UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).
I've tried to run export COMPOSE_HTTP_TIMEOUT=200 before docker-compose but the problem remains !
The only solution is to enter container, manually empty var/cache/prod folder then run docker-compose, but it's not a clean way !
Notice that the size of var/cache/prod is increasing enormously and very quickly : almost 2Gb in less than 3 hours !

Dockerized .Net core "No space left on the device" error

On the project, we faced with the issue related to the crash of docker agent "No space left on the device".
On one of the nodes of K8S cluster, we executed the command:
# ps -eLf | grep './DotNetApp' | awk '{print $10}' | wc -l
13882
It means that all my .Net processes have 13882 threads. On the node, with this leak to a limit of the maximum number of threads.
To check the limit, you can execute:
root#ip-172-20-104-47:~# cat /proc/sys/kernel/pid_max
32768
"Threads" is the amount, but pid_max is about the pool of the ids. And pods can easily reach this limit and crash docker on the node.
We use CentOS for the K8S worker. We tried Ubuntu and got the same result.
Do you have any ideas, why do we have such a thread leak on Linux nodes under .net core 2.2?
The issue was quite interesting. We decided to create a health check for Redis. If Redis is not available, we just shot down the pod. However, the implementation of this health check for each /health call, it created the separate connection multiplexer without disposing the old one. So, after the some time the limit of available threads was reached.
So, be careful with implementation of health checks.

Kubernetes Garbage Collection fails - FreeDiskSpaceFailed & ImageGCFailed

Apparently the GC of my Kubernetes cluster is failing to delete any image and the server is getting to full-disk.
Can you please guide me on where to find the logs for the ImageGC with the error trying to delete the images or to a reason of why this is happening?
3m 5d 1591 ip-xxx.internal Node Warning FreeDiskSpaceFailed {kubelet ip-xxx.internal} failed to garbage collect required amount of images. Wanted to free 6312950988, but freed 0
3m 5d 1591 ip-xxx.internal Node Warning ImageGCFailed {kubelet ip-xxx.internal} failed to garbage collect required amount of images. Wanted to free 6312950988, but freed 0
Thanks!
There may not be much in the way of logs (see this issue) but there may be Kubernetes event data. Look for events of type ImageGCFailed.
Alternatively you could check the cadvisor Prometheus metrics to see if it exposes any information about container garbage collecton.
Docs on the GC feature in general: https://kubernetes.io/docs/concepts/cluster-administration/kubelet-garbage-collection/
most likely your host file system is full, can you check /var file system usage.
you can use docker-gc to cleanup old image.
https://github.com/spotify/docker-gc
Run it like this
docker run --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /etc:/etc:ro spotify/docker-gc

Resources