Cassandra client connection issue within Docker from an application container - docker

batchWorker_1 | [DEBUG] 2017-10-30 12:42:10.035 [cluster1-nio-worker-0] Connection - Connection[/172.17.0.3:9042-1, inFlight=0, closed=false] Error connecting to /172.17.0.3:9042 (connection timed out: /172.17.0.3:9042)
batchWorker_1 | [DEBUG] 2017-10-30 12:42:10.037 [cluster1-nio-worker-0] STATES - Defuncting Connection[/172.17.0.3:9042-1, inFlight=0, closed=false] because: [/172.17.0.3:9042] Cannot connect
batchWorker_1 | [DEBUG] 2017-10-30 12:42:10.038 [cluster1-nio-worker-0] STATES - [/172.17.0.3:9042] preventing new connections for the next 1000 ms
batchWorker_1 | [DEBUG] 2017-10-30 12:42:10.038 [cluster1-nio-worker-0] STATES - [/172.17.0.3:9042] Connection[/172.17.0.3:9042-1, inFlight=0, closed=false] failed, remaining = 0
batchWorker_1 | [DEBUG] 2017-10-30 12:42:10.039 [cluster1-nio-worker-0] Connection - Connection[/172.17.0.3:9042-1, inFlight=0, closed=true] closing connection
batchWorker_1 | [DEBUG] 2017-10-30 12:42:10.042 [main] ControlConnection - [Control connection] error on /172.17.0.3:9042 connection, no more host to try
batchWorker_1 | com.datastax.driver.core.exceptions.TransportException: [/172.17.0.3:9042] Cannot connect
batchWorker_1 | at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:165) ~[batch_worker_server.jar:0.01]
batchWorker_1 | at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:148) ~[batch_worker_server.jar:0.01]
...
I am running my application and a Cassandra container, trying to establish connection from application container to Cassandra container.
I tried with docker-compose. It throws the same error. It is able to resolve the right container IP (as you can see) but failing to connect.
I tried to run by starting cassandra container separately and harddcode the IP in my application container, it still fails.
The Cassandra container works fine, if i run the same app outside, it connect.
The issue is that, it is not able to resolve the Cassandra container IP from the application container. Not sure why.
I also enabled start_rpc and exposed all cassandra related ports. Still no luck.

Issue
[Control connection] error on /IP:9042 connection, no more host to try
Solution
Silly but important check "App tries to connect before cassandra is up"
RPC_ADDESS Default IP Issue
As containers have their own network. So every container will takes its own IP. As your RPC_ADDESS is set to container IP it will through this error.
In Cassandra configuration change the IP address of
rpc_address (Default: localhost) The listen address for client
connections (Thrift RPC service and native transport).
Valid values:
unset: Resolves the address using the configured hostname
configuration of the node. If left unset, the hostname resolves to the
IP address of this node using /etc/hostname, /etc/hosts, or DNS.
0.0.0.0: Listens on all configured interfaces. You must set the broadcast_rpc_address to a value other than 0.0.0.0.
IP address
4. hostname
RPC_ADDRESS=0.0.0.0
Here is docker-compose.yml file
version: '2'
services:
cassandra:
container_name: cassandra
image: cassandra:3.9
volumes:
- /path/of/host/for/cassandra/:/var/lib/cassandra/
ports:
- 7000:7000
- 7001:7001
- 7199:7199
- 9042:9042
- 9160:9160
environment:
- CASSANDRA_CLUSTER_NAME='cassandra-cluster'
- CASSANDRA_NUM_TOKENS=256
- CASSANDRA_RPC_ADDRESS=0.0.0.0
restart: always

You can create Docker network as described in documentation, and connect Cassandra & your application to same network.
You also need to check on what interfaces Cassandra is listening - is it single interface, or all?

Related

kafka-server-start.sh failed with advertised.listeners set [duplicate]

This question already has an answer here:
Connect to Kafka on host from Docker (ksqlDB)
(1 answer)
Closed last month.
I am trying to start my kafka server on my localhost, and with the following two settings in config/server.properties:
listeners=PLAINTEXT://127.0.0.1:9092
# Listener name, hostname and port the broker will advertise to clients.
# If not set, it uses the value for "listeners".
advertised.listeners=PLAINTEXT://host.docker.internal:9092
The reason I set advertised.listeners is because I want to run kafka-ui docker container, and it will connect to that listener, but as long as I set advertised.listeners, bin/kafka-server-start.sh config/server.properties failed with the following error:
[2023-01-12 00:06:33,814] WARN [Controller id=0, targetBrokerId=0] Error connecting to node host.docker.internal:9092 (id: 0 rack: null) (org.apache.kafka.clients.NetworkClient)
java.net.UnknownHostException: host.docker.internal
at java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:952)
at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1658)
at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1524)
at org.apache.kafka.clients.DefaultHostResolver.resolve(DefaultHostResolver.java:27)
at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:110)
at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:510)
at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:467)
at org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:173)
at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:990)
at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:301)
at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:64)
at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:292)
at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:246)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
If I comment out advertised.listeners, it will work.
Can someone help me.
Maybe the Kafka broker is unable to resolve the hostname "host.docker.internal". Try to add an IP address instead hostname (advertised.listeners=PLAINTEXT://xx.x.x.x:9092)
or add the hostname to /etc/hosts
Where is Kafka Broker running ?? Inside the Docker container??

weblogic in docker trying to use the public port in the container when 7101:7001 port mapping is used

I am starting a WebLogic 12.2.1.4 admin server in docker from my docker-compose.yml file.
I use different port mapping, not the default 7001.
My docker port mapping is this: 7101:7001
Everything works fine, except this: I constantly get the following exception when I click on the Deployment menu on the web console:
<Feb 12, 2021 5:11:21,002 PM UTC> <Notice> <JMX> <BEA-149535> <JMX Resiliency Activity Server=All Servers : Resolving connection list DomainRuntimeServiceMBean>
javax.ws.rs.ProcessingException: java.net.ConnectException: Tried all: '1' addresses, but could not connect over HTTP to server: 'localhost', port: '7101'
failed reasons:
[0] address:'localhost/127.0.0.1',port:'7101' : java.net.ConnectException: Connection refused
The WL admin server tries to use the public docker port 7101 in the container but actually, WL is listening on the default 7001 port inside the container. Port 7101 is only used from the host machine, and of course, WL is not listening on port 7101 in the container.
My workaround is the following:
Check the IP address of the admin-server container with docker inspect <container-name>
Open the WL console using the container private IP address, e.g.: http://172.19.0.2:7001/console
In this case, the exception does not appear
But if I open the WL console from http://localhost:7101/console which is the mapped port to the host machine by docker, then the exception appears
Maybe this is a WL user interface issue? But I am not sure.
Any idea why this happening?

Docker swarm network debugging

I have created a service in docker using docker swarm and stack, but somehow my application running in a service replica is not able to connect to other replicas.
Here is more detail. I have three nodes: one manage and two workers. At the same node as manager I run a registry, which is the source for the image which I deploy to the workers using docker stack.
Here is how my configuration file for docker stack looks like:
version: "3.2"
services:
compute:
image: openmpi-docker.registry:443/openmpi
ports:
- "2222:22"
deploy:
replicas: 4
restart_policy:
condition: none
placement:
constraints: [node.role != manager]
networks:
- openmpi-network
networks:
openmpi-network:
driver: overlay
attachable: true
As you can guess the image contains an openmpi distribution. It starts sshd and opens port 22.
I can connect to one of the the services using following command:
ssh -p 2222 -i ./user-ssh/docker_id <worker node>
Then I try to run a simple test. I figure out what are the IP addresses of other replicas and try to connect them over ssh:
for i in $(seq 3 6) ; do ssh 10.0.1.$i hostname ; done
This works as expected I see 4 different hostnames.
So my next test involves mpi.
mpirun -H 10.0.1.3,10.0.1.4,10.0.1.5,10.0.1.6 hostname
This command is supposed to do exactly the same, with the exception that it should use mpirun instead of ssh. But somehow this fails.
So I want to see where exactly mpirun hangs using strace, but it is impossible to do so with SYS_PTRACE capability. And there is seemingly no way to set it if you use swarm. So I try to create another container, not in swarm, which shares the same overlay network:
docker run --network openmpi_openmpi-network --rm -it openmpi-docker.registry:443/openmpi bash
Here openmpi_openmpi-network is the name of the network created by "docker stack". And originally I planned to run it with "--privileged", but it turned out that I do not need that.
Then I switch from the root to a normal user and run the very same mpi command with the very same IP addresses, and the command works. So I do not run in the very same problem.
In other cases the solution to similar problem is just to turn off the firewall, but I'm pretty sure I don't have it inside containers. I created the image myself from debian:9 image and the only server I installed is sshd. And still this wouldn't explain the difference between starting the container in swarm and starting the container using docker run.
This makes me wonder. First how can I debug such kind of problem? Second what is so different between starting the service in swarm and starting it manually in regard to networking?
I would be grateful if you could help me answering these two questions.
Update
I tried to run mpirun with "--mca oob_base_verbose 100" and got following result. The logs are relatively long, but the key difference is in behaviour of the node which initiates the communication.
I a working log at some mpirun of the initiator node prints following:
...
[3365816c2c1e:00033] [[39982,0],2] oob:tcp:init adding 10.0.1.4 to our list of V4 connections
[3365816c2c1e:00033] [[39982,0],2] TCP STARTUP
[3365816c2c1e:00033] [[39982,0],2] attempting to bind to IPv4 port 0
[3365816c2c1e:00033] [[39982,0],2] assigned IPv4 port 57161
[3365816c2c1e:00033] mca:oob:select: Adding component to end
[3365816c2c1e:00033] mca:oob:select: checking available component usock
[3365816c2c1e:00033] mca:oob:select: Querying component [usock]
[3365816c2c1e:00033] oob:usock: component_available called
[3365816c2c1e:00033] [[39982,0],2] USOCK STARTUP
[3365816c2c1e:00033] SUNPATH: /tmp/openmpi-sessions-1000#3365816c2c1e_0/39982/0/usock
[3365816c2c1e:00033] mca:oob:select: Inserting component
[3365816c2c1e:00033] mca:oob:select: Found 2 active transports
<Log output from other nodes>
[3365816c2c1e:00033] [[39982,0],2]: set_addr to uri 2620260352.0;usock;tcp://10.0.1.7,172.18.0.5:48695
[3365816c2c1e:00033] [[39982,0],2]:set_addr checking if peer [[39982,0],0] is reachable via component usock
[3365816c2c1e:00033] [[39982,0],2]: peer [[39982,0],0] is NOT reachable via component usock
[3365816c2c1e:00033] [[39982,0],2]:set_addr checking if peer [[39982,0],0] is reachable via component tcp
[3365816c2c1e:00033] [[39982,0],2] oob:tcp: ignoring address usock
[3365816c2c1e:00033] [[39982,0],2] oob:tcp: working peer [[39982,0],0] address tcp://10.0.1.7,172.18.0.5:48695
[3365816c2c1e:00033] [[39982,0],2] PASSING ADDR 10.0.1.7 TO MODULE
[3365816c2c1e:00033] [[39982,0],2]:tcp set addr for peer [[39982,0],0]
[3365816c2c1e:00033] [[39982,0],2] PASSING ADDR 172.18.0.5 TO MODULE
[3365816c2c1e:00033] [[39982,0],2]:tcp set addr for peer [[39982,0],0]
...
But in non working case the initiator node stops at "Found 2 active transports" message and never prints anything else.
Update 2
I also figured out that hostname works if I add "--mca oob_tcp_if_include 10.0.1.0/24", so that the full command is:
mpirun -H 10.0.1.3,10.0.1.4,10.0.1.5,10.0.1.6 --mca oob_tcp_if_include 10.0.1.0/24 hostname
Still I do not really understand why and how to avoid the need of specifying the subnet.

why can't i access the docker based zookeeper port

on OS X i started kafka docker image successfully,but it seems that i can't access it on localhost
➜ ~ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1f931da3d661 wurstmeister/zookeeper:3.4.6 "/bin/sh -c '/usr/..." About an hour ago Up About an hour 22/tcp, 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp docker_zookeeper_1
8bc36bcf8fdf wurstmeister/kafka:0.10.1.1 "start-kafka.sh" About an hour ago Up About an hour 0.0.0.0:9092->9092/tcp docker_kafka_1
➜ ~ telnet 0.0.0.0:2181
0.0.0.0:2181: nodename nor servname provided, or not known
➜ ~ telnet 0.0.0.0 2181
Trying 0.0.0.0...
telnet: connect to address 0.0.0.0: Connection refused
telnet: Unable to connect to remote host
➜ ~ telnet 192.168.43.193 2181
Trying 192.168.43.193...
telnet: connect to address 192.168.43.193: Connection refused
telnet: Unable to connect to remote host
➜ ~ telnet 127.0.0.1 2181
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
telnet: Unable to connect to remote host
my docker file is here kafka.yml and use this command to up:
docker-compose -f src/main/docker/kafka.yml up -d
when i use
./mvnw
the console is:
2017-09-15 17:05:46.433 WARN 15871 --- [localhost:2181)] org.apache.zookeeper.ClientCnxn : Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
how can i access the 2181 port
EDIT
docker logs 8bc36bcf8fdf
[2017-09-15 08:14:13,386] FATAL Fatal error during KafkaServerStartable startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
java.lang.RuntimeException: A broker is already registered on the path /brokers/ids/1001. This probably indicates that you either have configured a brokerid that is already in use, or else you have shutdown this broker and restarted it faster than the zookeeper timeout so it appears to be re-registering.
at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:393)
at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:379)
at kafka.server.KafkaHealthcheck.register(KafkaHealthcheck.scala:70)
at kafka.server.KafkaHealthcheck.startup(KafkaHealthcheck.scala:51)
at kafka.server.KafkaServer.startup(KafkaServer.scala:270)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:39)
at kafka.Kafka$.main(Kafka.scala:67)
at kafka.Kafka.main(Kafka.scala)
[2017-09-15 08:14:13,393] INFO [Kafka Server 1001], shutting down (kafka.server.KafkaServer)
docker logs 1f931da3d661
2017-09-14 08:53:05,878 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x15e7ea74c8e0000, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-09-14 08:53:05,887 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1007] - Closed socket connection for client /172.18.0.2:54222 which had sessionid 0x15e7ea74c8e0000
Have you tried using host networking as in this example? https://docs.confluent.io/current/cp-docker-images/docs/quickstart.html#zookeeper
That looks like it will simplify and solve this. I'd also recommend checking out these images instead of the custom ones it looks like you are using because these are being run in production for people so they are known to work well.

Unable to gosip from one region to another on Amazon with ec2multiregionsnitch

I'm trying to launch a multi region cluster using the ec2multiregion snitch.
The nodes in one DC can communicate. But when adding nodes from another DC they fail with the following error:
ERROR [main] 2016-05-09 10:57:01,88
CassandraDaemon.java:581 Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
I have installed dse on an ubuntu 14.04 and have 4 nodes running in a cluster in Frankfurt (2 on subnet a and 2 on subnet b).
The problem arise when I try to add more nodes from Ireland.
I have added the following ports to the security:
80
8984
7199
61620
7000 - 7001
61620 - 61621
8983
7077
443
4040
8888
22
7080 - 7081
7080
9160
9042
Then I made the following settings in the cassandra.yaml file
listen_address: local ip
rpc_address: local ip
seeds: "public ip seed 1, public ip seed 2"
endpoint_snitch: Ec2MultiRegionSnitch
broadcast_address: public ip
What more do I need to setup for them to communicate?
I ended up going with cassandra community version 3.2 and instead using the GossipingPropertyFileSnitch and then it worked

Resources