Cannot run nodetool commands and cqlsh to Scylla in Docker - docker

I am new to Scylla and I am following the instructions to try it in a container as per this page: https://hub.docker.com/r/scylladb/scylla/.
The following command ran fine.
docker run --name some-scylla --hostname some-scylla -d scylladb/scylla
I see the container is running.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e6c4e19ff1bd scylladb/scylla "/docker-entrypoint.…" 14 seconds ago Up 13 seconds 22/tcp, 7000-7001/tcp, 9042/tcp, 9160/tcp, 9180/tcp, 10000/tcp some-scylla
However, I'm unable to use nodetool or cqlsh. I get the following output.
$ docker exec -it some-scylla nodetool status
Using /etc/scylla/scylla.yaml as the config file
nodetool: Unable to connect to Scylla API server: java.net.ConnectException: Connection refused (Connection refused)
See 'nodetool help' or 'nodetool help <command>'.
and
$ docker exec -it some-scylla cqlsh
Connection error: ('Unable to connect to any servers', {'172.17.0.2': error(111, "Tried connecting to [('172.17.0.2', 9042)]. Last error: Connection refused")})
Any ideas?
Update
Looking at docker logs some-scylla I see some errors in the logs, the last one is as follows.
2021-10-03 07:51:04,771 INFO spawned: 'scylla' with pid 167
Scylla version 4.4.4-0.20210801.69daa9fd0 with build-id eb11cddd30e88ef39c32c847e70181b5cf786355 starting ...
command used: "/usr/bin/scylla --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --developer-mode=1 --overprovisioned --listen-address 172.17.0.2 --rpc-address 172.17.0.2 --seed-provider-parameters seeds=172.17.0.2 --blocked-reactor-notify-ms 999999999"
parsed command line options: [log-to-syslog: 0, log-to-stdout: 1, default-log-level: info, network-stack: posix, developer-mode: 1, overprovisioned, listen-address: 172.17.0.2, rpc-address: 172.17.0.2, seed-provider-parameters: seeds=172.17.0.2, blocked-reactor-notify-ms: 999999999]
ERROR 2021-10-03 07:51:05,203 [shard 6] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application
2021-10-03 07:51:05,316 INFO exited: scylla (exit status 1; not expected)
2021-10-03 07:51:06,318 INFO gave up: scylla entered FATAL state, too many start retries too quickly
Update 2
The reason for the error was described on the docker hub page linked above. I had to start container specifying the number of CPUs with --smp 1 as follows.
docker run --name some-scylla --hostname some-scylla -d scylladb/scylla --smp 1
According to the above page:
This command will start a Scylla single-node cluster in developer mode
(see --developer-mode 1) limited by a single CPU core (see --smp).
Production grade configuration requires tuning a few kernel parameters
such that limiting number of available cores (with --smp 1) is the
simplest way to go.
Multiple cores requires setting a proper value to the
/proc/sys/fs/aio-max-nr. On many non production systems it will be
equal to 65K. ...

As you have found out, in order to be able to use additional CPU cores you'll need to increase fs.aio-max-nr kernel parameter.
You may run as root:
# sysctl -w fs.aio-max-nr=65535
Which should be enough for most systems. Should you still have any error preventing it to use all of your CPU cores, increase its value further.
Do notice that the above configuration is not persistent. Edit /etc/sysctl.conf in order to make it persistent across reboots.

Related

SSL(curl) connection error in ElasticSearch setup

Have setup a 3-node Elasticsearch cluster using docker-compose. Followed below steps:
On one of the master nodes, es11, gets below error, however same curl command works fine on other 2 nodes i.e. es12, es13:
Error:
curl -X GET 'https://localhost:9316'
curl: (35) Encountered end of file
Below error in logs:
"stacktrace": ["org.elasticsearch.transport.RemoteTransportException: [es13][SOMEIP:9316][internal:cluster/coordination/join]",
"Caused by: org.elasticsearch.transport.ConnectTransportException: [es11][SOMEIP:9316] handshake failed. unexpected remote node {es13}{SOMEVALUE}{SOMEVALUE
"at org.elasticsearch.transport.TransportService.lambda$connectionValidator$6(TransportService.java:468) ~[elasticsearch-7.17.6.jar:7.17.6]",
"at org.elasticsearch.action.ActionListener$MappedActionListener.onResponse(ActionListener.java:95) ~[elasticsearch-7.17.6.jar:7.17.6]",
"at org.elasticsearch.transport.TransportService.lambda$handshake$9(TransportService.java:577) ~[elasticsearch-7.17.6.jar:7.17.6]",
https://localhost:9316 on browser gives site can't be reached error as well.It seems SSL certificate as created in step 4 below is having some issues in es11.
Any leads please? OR If I repeat step 4, do i need to copy the certs again to es12 & es13?
Below elasticsearch.yml
cluster.name: "docker-cluster"
network.host: 0.0.0.0
Ports as defined in all 3 nodes docker-compose.yml
environment:
- node.name=es11
- transport.port=9316
ports:
- 9216:9200
- 9316:9316
Initialize a docker swarm. On ES11 run docker swarm init. Follow the instructions to join 12 and 13 to the swarm.
Create an overlay network docker network create -d overlay --attachable elastic
If necessary, bring down the current cluster and remove all the associated volumes by running docker-compose down -v
Create SSL certificates for ES with docker-compose -f create-certs.yml run --rm create_certs
Copy the certs for es12 and 13 to the respective servers
Use this busybox to create the overlay network on 12 and 13 sudo docker run -itd --name containerX --net [network name] busybox
Configure certs on 12 and 13 with docker-compose -f config-certs.yml run --rm config_certs
Start the cluster with docker-compose up -d on each server
Set the passwords for the built-in ES accounts by logging into the cluster docker exec -it es11 sh then running bin/elasticsearch-setup-passwords interactive --url localhost:9316
(as per your https://discuss.elastic.co thread)
you cannot talk HTTP to the transport protocol port, which you have defined in transport.port. you need to talk to port 9200 in the container, which you have mapped to 9216 outside the container
the transport port runs a binary protocol that is not HTTP accessible

Running a Chainlink Node - Can't connect to database

Using docker-desktop on macOS.
I'm trying to run a node following the instructions on this page.
The database name is node, which is the same as the username: node. The user has access to the database and can log in using psql client.
Connection strings I've tried in the .env file:
postgresql://node#localhost/node
postgresql://node:password#localhost/node
postgresql://node:password#localhost:5432/node
postgresql://node:password#127.0.0.1:5432/node
postgresql://node:password#127.0.0.1/node
When I run the start command: cd ~/.chainlink-kovan && docker run -p 6688:6688 -v ~/.chainlink-kovan:/chainlink -it --env-file=.env smartcontract/chainlink local n , using docker-desktop on macOS, I get the following stack trace:
2020-09-15T14:24:41Z [INFO] Starting Chainlink Node 0.8.15 at commit a904730bd62c7174b80a2c4ccf885de3e78e3971 cmd/local_client.go:50
2020-09-15T14:24:41Z [INFO] SGX enclave *NOT* loaded cmd/enclave.go:11
2020-09-15T14:24:41Z [INFO] This version of chainlink was not built with support for SGX tasks cmd/enclave.go:12
2020-09-15T14:24:41Z [INFO] Locking postgres for exclusive access with 500ms timeout orm/orm.go:69
2020-09-15T14:24:41Z [ERROR] unable to lock ORM: dial tcp 127.0.0.1:5432: connect: connection refused logger/default.go:139 stacktrace=github.com/smartcontractkit/chainlink/core/logger.Error
/chainlink/core/logger/default.go:117
...
Does anyone know how I can resolve this?
The problem probably caused by the fact that your chainlink database has been locked with Exclusive Lock and before stopping node that locks never removed.
What you do in this situation (as what works for me) is use PgAdmin Ui or similar way to find all Locks then find the Exclusive Lock that is held on the chainlink database and note down its Process id or ids (if multiple exclusive locks there are on chainlink DB)
Log in to your pg client and run SELECT pg_terminate_backend(<pid>) or SELECT pg_cancel_backend(<pid>); Enter PID of those locks here without quotes and meanwhile keep refreshing on pg admin URL to see if those processes stopped If stopped then rerun your chainlink node.
The problem is with docker networking.
Add --network host to the docker run command so that it is:
cd ~/.chainlink-kovan && docker run -p 6688:6688 -v ~/.chainlink-kovan:/chainlink -it --env-file=.env smartcontract/chainlink --network host local n
This fixes the issue.

Issue accessing vespa outside docker container

Installed Docker on Mac and trying to run Vespa on Docker following steps specified in following link
https://docs.vespa.ai/documentation/vespa-quick-start.html
I did n't had any issues till step 4. I see vespa container running after step 2 and step 3 returned 200 OK response.
But Step 5 failed to return 200 OK response. Below is the command I ran on my terminal
curl -s --head http://localhost:8080/ApplicationStatus
I keep getting
curl: (52) Empty reply from server whenever I run without -s option.
So I tried to see listening ports inside my vespa container and don't see anything for 8080 but can see for 19071(used in step 3)
➜ ~ docker exec vespa bash -c 'netstat -vatn| grep 8080'
➜ ~ docker exec vespa bash -c 'netstat -vatn| grep 19071'
tcp 0 0 0.0.0.0:19071 0.0.0.0:* LISTEN
Below doc has info related to vespa ports
https://docs.vespa.ai/documentation/reference/files-processes-and-ports.html
I'm assuming port 8080 should be active after docker run(step 2 of quick start link) and can be accessed outside container as port mapping is done.
But I don't see 8080 port active inside container in first place.
A'm I missing something. Do I need to perform any additional step than mentioned in quick start? FYI I installed Jenkins inside my docker and was able to access outside container via port mapping. But not sure why it's not working with vespa.I have been trying from quiet sometime but no progress. Please advice me if I'm missing something here.
You have too low memory for your docker container, "Minimum 6GB memory dedicated to Docker (the default is 2GB on Macs).". See https://docs.vespa.ai/documentation/vespa-quick-start.html
The deadlock detector warnings and failure to get configuration from configuration server (which is likely oom killed) indicates that you are too low on memory.
My guess is that your jdisc container had not finished initialize or did not initialize properly? Did you try to check the log?
docker exec vespa bash -c '/opt/vespa/bin/vespa-logfmt /opt/vespa/logs/vespa/vespa.log'
This should tell you if there was something wrong. When it is ready to receive requests you would see something like this:
[2018-12-10 06:30:37.854] INFO : container Container.org.eclipse.jetty.server.AbstractConnector Started SearchServer#79afa369{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
[2018-12-10 06:30:37.857] INFO : container Container.org.eclipse.jetty.server.Server Started #10280ms
[2018-12-10 06:30:37.857] INFO : container Container.com.yahoo.container.jdisc.ConfiguredApplication Switching to the latest deployed set of configurations and components. Application switch number: 0
[2018-12-10 06:30:37.859] INFO : container Container.com.yahoo.container.jdisc.ConfiguredApplication Initializing new set of configurations and components. Application switch number: 1

Cannot conect to Docker container running in VSTS

I have a test which starts a Docker container, performs the verification (which is talking to the Apache httpd in the Docker container), and then stops the Docker container.
When I run this test locally, this test runs just fine. But when it runs on hosted VSTS, thus a hosted build agent, it cannot connect to the Apache httpd in the Docker container.
This is the .vsts-ci.yml file:
queue: Hosted Linux Preview
steps:
- script: |
./test.sh
This is the test.sh shell script to reproduce the problem:
#!/bin/bash
set -e
set -o pipefail
function tearDown {
docker stop test-apache
docker rm test-apache
}
trap tearDown EXIT
docker run -d --name test-apache -p 8083:80 httpd
sleep 10
curl -D - http://localhost:8083/
When I run this test locally, the output that I get is:
$ ./test.sh
469d50447ebc01775d94e8bed65b8310f4d9c7689ad41b2da8111fd57f27cb38
HTTP/1.1 200 OK
Date: Tue, 04 Sep 2018 12:00:17 GMT
Server: Apache/2.4.34 (Unix)
Last-Modified: Mon, 11 Jun 2007 18:53:14 GMT
ETag: "2d-432a5e4a73a80"
Accept-Ranges: bytes
Content-Length: 45
Content-Type: text/html
<html><body><h1>It works!</h1></body></html>
test-apache
test-apache
This output is exactly as I expect.
But when I run this test on VSTS, the output that I get is (irrelevant parts replaced with …).
2018-09-04T12:01:23.7909911Z ##[section]Starting: CmdLine
2018-09-04T12:01:23.8044456Z ==============================================================================
2018-09-04T12:01:23.8061703Z Task : Command Line
2018-09-04T12:01:23.8077837Z Description : Run a command line script using cmd.exe on Windows and bash on macOS and Linux.
2018-09-04T12:01:23.8095370Z Version : 2.136.0
2018-09-04T12:01:23.8111699Z Author : Microsoft Corporation
2018-09-04T12:01:23.8128664Z Help : [More Information](https://go.microsoft.com/fwlink/?LinkID=613735)
2018-09-04T12:01:23.8146694Z ==============================================================================
2018-09-04T12:01:26.3345330Z Generating script.
2018-09-04T12:01:26.3392080Z Script contents:
2018-09-04T12:01:26.3409635Z ./test.sh
2018-09-04T12:01:26.3574923Z [command]/bin/bash --noprofile --norc /home/vsts/work/_temp/02476800-8a7e-4e22-8715-c3f706e3679f.sh
2018-09-04T12:01:27.7054918Z Unable to find image 'httpd:latest' locally
2018-09-04T12:01:30.5555851Z latest: Pulling from library/httpd
2018-09-04T12:01:31.4312351Z d660b1f15b9b: Pulling fs layer
[…]
2018-09-04T12:01:49.1468474Z e86a7f31d4e7506d34e3b854c2a55646eaa4dcc731edc711af2cc934c44da2f9
2018-09-04T12:02:00.2563446Z % Total % Received % Xferd Average Speed Time Time Time Current
2018-09-04T12:02:00.2583211Z Dload Upload Total Spent Left Speed
2018-09-04T12:02:00.2595905Z
2018-09-04T12:02:00.2613320Z 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 8083: Connection refused
2018-09-04T12:02:00.7027822Z test-apache
2018-09-04T12:02:00.7642313Z test-apache
2018-09-04T12:02:00.7826541Z ##[error]Bash exited with code '7'.
2018-09-04T12:02:00.7989841Z ##[section]Finishing: CmdLine
The key thing is this:
curl: (7) Failed to connect to localhost port 8083: Connection refused
10 seconds should be enough for apache to start.
Why can curl not communicate with Apache on its port 8083?
P.S.:
I know that a hard-coded port like this is rubbish and that I should use an ephemeral port instead. I wanted to get it running first wirth a hard-coded port, because that's simpler than using an ephemeral port, and then switch to an ephemeral port as soon as the hard-coded port works. And in case the hard-coded port doesn't work because the port is unavailable, the error should look different, in that case, docker run should fail because the port can't be allocated.
Update:
Just to be sure, I've rerun the test with sleep 100 instead of sleep 10. The results are unchanged, curl cannot connect to localhost port 8083.
Update 2:
When extending the script to execute docker logs, docker logs shows that Apache is running as expected.
When extending the script to execute docker ps, it shows the following output:
2018-09-05T00:02:24.1310783Z CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2018-09-05T00:02:24.1336263Z 3f59aa014216 httpd "httpd-foreground" About a minute ago Up About a minute 0.0.0.0:8083->80/tcp test-apache
2018-09-05T00:02:24.1357782Z 850bda64f847 microsoft/vsts-agent:ubuntu-16.04-docker-17.12.0-ce-standard "/home/vsts/agents/2…" 2 minutes ago Up 2 minutes musing_booth
The problem is that the VSTS build agent runs in a Docker container. When the Docker container for Apache is started, it runs on the same level as the VSTS build agent Docker container, not nested inside the VSTS build agent Docker container.
There are two possible solutions:
Replacing localhost with the ip address of the docker host, keeping the port number 8083
Replacing localhost with the ip address of the docker container, changing the host port number 8083 to the container port number 80.
Access via the Docker Host
In this case, the solution is to replace localhost with the ip address of the docker host. The following shell snippet can do that:
host=localhost
if grep '^1:name=systemd:/docker/' /proc/1/cgroup
then
apt-get update
apt-get install net-tools
host=$(route -n | grep '^0.0.0.0' | sed -e 's/^0.0.0.0\s*//' -e 's/ .*//')
fi
curl -D - http://$host:8083/
The if grep '^1:name=systemd:/docker/' /proc/1/cgroup inspects whether the script is running inside a Docker container. If so, it installs net-tools to get access to the route command, and then parses the default gw from the route command to get the ip address of the host. Note that this only works if the container's network default gw actually is the host.
Direct Access to the Docker Container
After launching the docker container, its ip addresses can be obtained with the following command:
docker container inspect --format '{{range .NetworkSettings.Networks}}{{.IPAddress}} {{end}}' <container-id>
Replace <container-id> with your container id or name.
So, in this case, it would be (assuming that the first ip address is okay):
ips=($(docker container inspect --format '{{range .NetworkSettings.Networks}}{{.IPAddress}} {{end}}' nuance-apache))
host=${ips[0]}
curl http://$host/

docker - driver "devicemapper" failed to remove root filesystem after process in container killed

I am using Docker version 17.06.0-ce on Redhat with devicemapper storage. I am launching a container running a long-running service. The master process inside the container sometimes dies for whatever reason. I get the following error message.
/bin/bash: line 1: 40 Killed python -u scripts/server.py start go
I would like the container to exit and to be restarted by docker. However docker never exits. If I do it manually I get the following error:
Error response from daemon: driver "devicemapper" failed to remove root filesystem.
After googling, I tried a bunch of things:
docker rm -f <container>
rm -f <pth to mount>
umount <pth to mount>
All result in device is busy. The only remedy right now is to reboot the host system which is obviously not a long-term solution.
Any ideas?
I had the same problem and the solution was a real surprise.
So here is the error om docker rm:
$ docker rm 08d51aad0e74
Error response from daemon: driver "devicemapper" failed to remove root filesystem for 08d51aad0e74060f54bba36268386fe991eff74570e7ee29b7c4d74047d809aa: remove /var/lib/docker/devicemapper/mnt/670cdbd30a3627ae4801044d32a423284b540c5057002dd010186c69b6cc7eea: device or resource busy
Then I did the following (basically go through all processes and look for docker in mountinfo):
$ grep docker /proc/*/mountinfo | grep 958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac
/proc/20416/mountinfo:629 574 253:15 / /var/lib/docker/devicemapper/mnt/958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac rw,relatime shared:288 - xfs /dev/mapper/docker-253:5-786536-958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac rw,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota
This got be the PID of the offending process keeping it busy - 20416 (the item after /proc/)
So I did a ps -p and to my surprise find:
[devops#dp01app5030 SeGrid]$ ps -p 20416
PID TTY TIME CMD
20416 ? 00:00:19 ntpd
A true WTF moment. So I pair problem solved with Google and found this:
Then found this https://github.com/docker/for-linux/issues/124
Turns out I had to restart ntp daemon and that fixed the issue!!!

Resources