single node ejabberd on kubernetes -ejabberdctl status shows node down - docker

I'm trying to deploy ejabberd docker image in kubernetes with the following folders are mounted from a persistent volume,
/home/ejabberd/logs
/home/ejabberd/conf
/home/ejabberd/database
populated the database,and conf directory with our configuration files and the database folder
from the docker image using an init container .Upon setting the permissions, we could able to
start the ejabberd service , the logs says that the services (on ports 5222,5269,5280) are ready .
when I check the xmpp server status in the container using " ejabberdctl status " , the output says "node down"
===========ejabberd.log===================================================
2020-12-16 09:18:58.477630+00:00 [info] <0.3406.0>#mod_mqtt:init_topic_cache/2:611 Building MQTT cache for mydomain this may take a while
2020-12-16 09:18:59.087380+00:00 [info] <0.483.0>#ejabberd_mnesia:create/2:267 Creating Mnesia ram table 'bytestream'
2020-12-16 09:19:01.193203+00:00 [info] <0.126.0>#ejabberd_cluster_mnesia:wait_for_sync/1:123 Waiting for Mnesia synchronization to complete
2020-12-16 09:19:02.401537+00:00 [info] <0.126.0>#ejabberd_app:start/2:62 ejabberd 20.4.0 is started in the node 'ejabberd#mydomain' in 49.77s
2020-12-16 09:19:02.403414+00:00 [info] <0.601.0>#ejabberd_listener:init/4:159 Start accepting TCP connections at [::]:5222 for ejabberd_c2s
2020-12-16 09:19:02.403479+00:00 [info] <0.602.0>#ejabberd_listener:init/4:159 Start accepting TCP connections at [::]:5269 for ejabberd_s2s_in
2020-12-16 09:19:02.403956+00:00 [info] <0.603.0>#ejabberd_listener:init/4:159 Start accepting TLS connections at [::]:5443 for ejabberd_http
2020-12-16 09:19:02.403999+00:00 [info] <0.604.0>#ejabberd_listener:init/4:159 Start accepting TCP connections at [::]:5280 for ejabberd_http
2020-12-16 09:19:02.404098+00:00 [info] <0.605.0>#ejabberd_listener:init/4:159 Start accepting TCP connections at [::]:1883 for mod_mqtt
2020-12-16 09:19:02.404345+00:00 [info] <0.3418.0>#ejabberd_listener:init/4:159 Start accepting TCP connections at 10.42.8.15:7777 for mod_proxy65_stream
========================================ejabberdctl status===========================
~ $ ./bin/ejabberdctl status
Failed RPC connection to the node 'ejabberd#mydomain': nodedown
Commands to start an ejabberd node:
start - Start an ejabberd node in server mode
debug - Attach an interactive Erlang shell to a running ejabberd node
iexdebug - Attach an interactive Elixir shell to a running ejabberd node
live - Start an ejabberd node in live (interactive) mode
iexlive - Start an ejabberd node in live (interactive) mode, within an Elixir shell
foreground - Start an ejabberd node in server mode (attached)
Optional parameters when starting an ejabberd node:
--config-dir dir Config ejabberd: /home/ejabberd/conf
--config file Config ejabberd: /home/ejabberd/conf/ejabberd.yml
--ctl-config file Config ejabberdctl: /home/ejabberd/conf/ejabberdctl.cfg
--logs dir Directory for logs: /home/ejabberd/logs
--spool dir Database spool dir: /home/ejabberd/database/ejabberd#mydomain
--node nodename ejabberd node name: ejabberd#mydomain
If anyone has tried ejabberd on kubernetes, Please share your thought on this issue
Thanks in advance

Related

Port error when setting up Dev mode of Hyperledger Fabric

I'm setting up the development environment following the instructions on Hyperledger fabric's official website:
https://hyperledger-fabric.readthedocs.io/en/latest/peer-chaincode-devmode.html
I have started the orderer successfully using:
ORDERER_GENERAL_GENESISPROFILE=SampleDevModeSolo orderer
This command didn't work at first but it worked after I cd fabric/sampleconfig
2020-12-21 11:23:15.084 CST [orderer.common.server] Main -> INFO 009 Starting orderer: Version: 2.3.0 Commit SHA: dc2e59b3c Go version: go1.15.6 OS/Arch: darwin/amd64
2020-12-21 11:23:15.084 CST [orderer.common.server] Main -> INFO 00a Beginning to serve requests
but when I start the peer using:
export PATH=$(pwd)/build/bin:$PATH
export FABRIC_CFG_PATH=$(pwd)/sampleconfig
export FABRIC_LOGGING_SPEC=chaincode=debug
export CORE_PEER_CHAINCODELISTENADDRESS=0.0.0.0:7052
peer node start --peer-chaincodedev=true
An error is spotted:
FABRIC_LOGGING_SPEC=chaincode=debug
CORE_PEER_CHAINCODELISTENADDRESS=0.0.0.0:7052
peer node start --peer-chaincodedev=true
2020-12-21 11:25:13.047 CST [nodeCmd] serve -> INFO 001 Starting peer: Version: 2.3.0 Commit SHA: dc2e59b3c Go version: go1.15.6 OS/Arch: darwin/amd64 Chaincode: Base Docker Label: org.hyperledger.fabric Docker Namespace: hyperledger
2020-12-21 11:25:13.048 CST [peer] getLocalAddress -> INFO 002 Auto-detected peer address: 10.200.83.208:7051
2020-12-21 11:25:13.048 CST [peer] getLocalAddress -> INFO 003 Host is 0.0.0.0 , falling back to auto-detected address: 10.200.83.208:7051 Error: failed to initialize operations subsystem: listen tcp 127.0.0.1:9443: bind: address already in use
this is the error:
Error: failed to initialize operations subsystem: listen tcp 127.0.0.1:9443: bind: address already in use
I checked this issue and it seems this happens because the peer node is using the same port 9443 as the orderer node for the same service. How can I get the two nodes running separately? It seems the docker is running as well.
If you see your error, you can easily follow
Error: failed to initialize operations subsystem: listen tcp 127.0.0.1:9443: bind: address already in use
It is said that the 9443 port is already in use.
It seems that you are not running the orderer and peer as separate containers on the docker-based virtual network, but running on the host pc.
This eventually seems to conflict with two servers requesting one port 9443 on your pc.\
Referring to the configuration below of fabric-2.3/sampleconfig, you can see that each port 9443 is assigned to the server. Assigning one of them to the other port solves this.
fabric-2.3/sampleconfig/orderer.yaml
configuration of orderer
# orderer.yaml
...
Admin:
# host and port for the admin server
ListenAddress: 127.0.0.1:9443
...
fabric-2.3/sampleconfig/core.yaml
configuration of peer
# core.yaml
...
operations:
# host and port for the operations server
# listenAddress: 127.0.0.1:9443
listenAddress: 127.0.0.1:10443
...
This is not a direct answer to the port mapping / collision issue, but we've had great success using the new Kubernetes Test Network as a development platform running on a local system with a virtual Kubernetes cluster running in KIND (Kubernetes in Docker).
In this mode, applications can be developed using the Gateway client (exposed via a port forward or ingress), and smart contracts running As a Service can be launched either in the cluster OR run on the local host OS in a container, binary, or launched in a debugger.
The documentation for the development setup is still sparse, but we'd love to hear feedback on the overall approach, as it offers an exponentially better experience for working with a test network in a development context. In general the process of "port juggling" with Compose is no longer relevant when working on a local Kubernetes cluster. In this mode, you can run services on the host network, instructing peers/orderers/etc. to connect to the remote process running on the host OS.

Couchdb - data in docker volume not accessible (copy from other cdb instance)

I'm trying to copy whole data folder from one instance of CouchDB (single node) into Docker volume and I have issues accessing those databases getting error(s) below
[notice] 2018-12-02T09:46:18.664647Z nonode#nohost <0.313.0> -------- chttpd_auth_cache changes listener died database_does_not_exist at mem3_shards:load_shards_from_db/6(line:395) <= mem3_shards:load_shards_from_disk/1(line:370) <= mem3_shards:load_shards_from_disk/2(line:399) <= mem3_shards:for_docid/3(line:86) <= fabric_doc_open:go/3(line:38) <= chttpd_auth_cache:ensure_auth_ddoc_exists/2(line:187) <= chttpd_auth_cache:listen_for_changes/1(line:134)
[error] 2018-12-02T09:46:18.664764Z nonode#nohost emulator -------- Error in process <0.12460.0> with exit value: {database_does_not_exist,[{mem3_shards,load_shards_from_db,"_users",[{file,"src/mem3_shards.erl"},{line,395}]},{mem3_shards,load_shards_from_disk,1,[{file,"src/mem3_shards.erl"},{line,370}]},{mem3_shards,load_shards_from_disk,2,[{file,"src/mem3_shards.erl"},{line,399}]},{mem3_shards,for_docid,3,[{file,"src/mem3_shards.erl"},{line,86}]},{fabric_doc_open,go,3,[{file,"src/fabric_doc_open.erl"},{line,38}]},{chttpd_auth_cache,ensure_auth_ddoc_exists,2,[{file,"src/chttpd_auth_cache.erl"},{line,187}]},{chttpd_auth_cache,listen_for_changes,1,[{file,"src/chttpd_auth_cache.erl"},{line,134}]}]}
[error] 2018-12-02T09:51:20.301152Z nonode#nohost <0.17260.0> 45e2192077 req_err(2686395495) internal_server_error : No DB shards could be opened.[<<"fabric_util:get_shard/4 L185">>,<<"fabric:get_security/2 L146">>,<<"chttpd_auth_request:db_authorization_check/1 L98">>,<<"chttpd_auth_request:authorize_request/1 L19">>,<<"chttpd:handle_req_after_auth/2 L315">>,<<"chttpd:process_request/1 L300">>,<<"chttpd:handle_request_int/1 L240">>,<<"mochiweb_http:headers/6 L124">>]
[notice] 2018-12-02T09:51:20.301604Z nonode#nohost <0.17260.0> 45e2192077 127.0.0.1:5984 172.17.0.1 adminis GET /mydatabase 500 ok 4
[error] 2018-12-02T09:51:20.301455Z nonode#nohost <0.17258.0> fe8bc3ea8a req_err(2686395495) internal_server_error : No DB shards could be opened. [<<"fabric_util:get_shard/4 L185">>,<<"fabric:get_security/2 L146">>,<<"chttpd_auth_request:db_authorization_check/1 L98">>,<<"chttpd_auth_request:authorize_request/1 L19">>,<<"chttpd:handle_req_after_auth/2 L315">>,<<"chttpd:process_request/1 L300">>,<<"chttpd:handle_request_int/1 L240">>,<<"mochiweb_http:headers/6 L124">>]
[notice] 2018-12-02T09:51:20.303896Z nonode#nohost <0.17258.0> fe8bc3ea8a 127.0.0.1:5984 172.17.0.1 admin GET /mydatabase-v2 500 ok 6
I'm able to access these data in non-docker couchdb instance so it has to be something within docker that prevents me to properly access the data. Running couchdb v2.2.0 in all instancies, SElinux context and ACL are correct. Only difference I see between docker and non-docker instance is the host definition (eg. docker has nonode#nohost while local instance has couchdb#127.0.0.1).
Any idea what might be wrong?

Thingsboard installation using docker on Ubuntu

I'm facing issues when installing thingsboard using docker-compose on ubuntu
images are correctly pulled , container seems to be up but logs shows :
logs for thingsboard/application:1.2.2 :
thingsboard-db-schema container is still in progress. waiting until it
completed...
thingsboard-db-schema container is still in progress. waiting until it
completed...
thingsboard-db-schema container is still in progress. waiting until it
completed...
thingsboard-db-schema container is still in progress. waiting until it
completed...
thingsboard-db-schema container is still in progress. waiting until it
completed...
thingsboard-db-schema container is still in progress. waiting until it
completed...
logs for thingsboard/thingsboard-db-schema:1.2.2
Wait for Cassandra...
Failed to resolve "db".
WARNING: No targets were specified, so 0 hosts scanned.
Wait for Cassandra...
Failed to resolve "db".
WARNING: No targets were specified, so 0 hosts scanned.
Wait for Cassandra...
seems that the first container waiting cassandra to be up which is not the case
Any suggestions ?
Thanks in advance
Please check output of the DB container using command 'docker-compose logs -f db' and verify that cassandra is ready to accept client on 9042 port:
db_1 | INFO 11:02:07 Waiting for gossip to settle before accepting client requests...
db_1 | INFO 11:02:15 No gossip backlog; proceeding
db_1 | INFO 11:02:15 Netty using native Epoll event loop
db_1 | INFO 11:02:15 Using Netty Version: [netty-buffer=netty-buffer-4.0.39.Final.38bdf86, netty-codec=netty-codec-4.0.39.Final.38bdf86, netty-codec-haproxy=netty-codec-haproxy-4.0.39.Final.38bdf86, netty-codec-http=netty-codec-http-4.0.39.Final.38bdf86, netty-codec-socks=netty-codec-socks-4.0.39.Final.38bdf86, netty-common=netty-common-4.0.39.Final.38bdf86, netty-handler=netty-handler-4.0.39.Final.38bdf86, netty-tcnative=netty-tcnative-1.1.33.Fork19.fe4816e, netty-transport=netty-transport-4.0.39.Final.38bdf86, netty-transport-native-epoll=netty-transport-native-epoll-4.0.39.Final.38bdf86, netty-transport-rxtx=netty-transport-rxtx-4.0.39.Final.38bdf86, netty-transport-sctp=netty-transport-sctp-4.0.39.Final.38bdf86, netty-transport-udt=netty-transport-udt-4.0.39.Final.38bdf86]
db_1 | INFO 11:02:15 Starting listening for CQL clients on /0.0.0.0:9042 (unencrypted)...
Output should be like logs above.
Plus additionally verify that no errors happened during the cassandra start up.

dashDB local MPP deployment issue - cannot connect to database

I am facing a huge problem at deploying a dashDB local cluster. After a successful deployment the following error comes in case of trying to create a single table or launch a query. Furthermore webserver is not working properly like in previous SMP deployment.
Cannot connect to database "BLUDB" on node "20" because the difference
between the system time on the catalog node and the virtual timestamp
on this node is greater than the max_time_diff database manager
configuration parameter.. SQLCODE=-1472, SQLSTATE=08004,
DRIVER=4.18.60
I followed official deployment guide, so followings were doublechecked:
each physical machines' and docker containers' /etc/hosts file contains all ips, fully qualified and simple hostnames
there is a NFS preconfigured and mounted to /mnt/clusterfs on every single server
none of the servers signed an error at phase "docker logs --follow dashDB" command
nodes config file is located in /mnt/clusterfs directory
After starting dashDB with following command:
docker exec -it dashDB start
It looks as it should be (see below), but the error can be found at /opt/ibm/dsserver/logs/dsserver.0.log.
#
--- dashDB stack service status summary ---
##################################################################### Redirecting to /bin/systemctl status slapd.service
SUMMARY
LDAPrunning: SUCCESS
dashDBtablesOnline: SUCCESS
WebConsole : SUCCESS
dashDBconnectivity : SUCCESS
dashDBrunning : SUCCESS
#
--- dashDB high availability status ---
#
Configuring dashDB high availability ... Stopping the system Stopping
datanode dashdb02 Stopping datanode dashdb01 Stopping headnode
dashdb03 Running sm on head node dashdb03 .. Running sm on data node
dashdb02 .. Running sm on data node dashdb01 .. Attempting to activate
previously failed nodes, if any ... SM is RUNNING on headnode dashdb03
(ACTIVE) SM is RUNNING on datanode dashdb02 (ACTIVE) SM is RUNNING on
datanode dashdb01 (ACTIVE) Overall status : RUNNING
After several redeployment nothing has changed. Please help me in what I am doing wrong.
Many Thanks, Daniel
Always make sure to NTP service is started on every single cluster node before starting a docker container. Otherwise it will take no effect on it.

No nodes connecting to host in docker swarm

I just followed this tutorial step by step for setting up a docker swarm in EC2 -- https://docs.docker.com/swarm/install-manual/
I created 4 Amazon Servers using the Amazon Linux AMI.
manager + consul
manager
node1
node2
I followed the instructions to start the swarm and everything seems to go ok regarding making the docker instances.
Server 1
Running docker ps gives:
The Consul logs show this
2016/07/05 20:18:47 [INFO] serf: EventMemberJoin: 729a440e5d0d 172.17.0.2
2016/07/05 20:18:47 [INFO] serf: EventMemberJoin: 729a440e5d0d.dc1 172.17.0.2
2016/07/05 20:18:48 [INFO] raft: Node at 172.17.0.2:8300 [Follower] entering Follower state
2016/07/05 20:18:48 [INFO] consul: adding server 729a440e5d0d (Addr: 172.17.0.2:8300) (DC: dc1)
2016/07/05 20:18:48 [INFO] consul: adding server 729a440e5d0d.dc1 (Addr: 172.17.0.2:8300) (DC: dc1)
2016/07/05 20:18:48 [ERR] agent: failed to sync remote state: No cluster leader
2016/07/05 20:18:49 [WARN] raft: Heartbeat timeout reached, starting election
2016/07/05 20:18:49 [INFO] raft: Node at 172.17.0.2:8300 [Candidate] entering Candidate state
2016/07/05 20:18:49 [INFO] raft: Election won. Tally: 1
2016/07/05 20:18:49 [INFO] raft: Node at 172.17.0.2:8300 [Leader] entering Leader state
2016/07/05 20:18:49 [INFO] consul: cluster leadership acquired
2016/07/05 20:18:49 [INFO] consul: New leader elected: 729a440e5d0d
2016/07/05 20:18:49 [INFO] raft: Disabling EnableSingleNode (bootstrap)
2016/07/05 20:18:49 [INFO] consul: member '729a440e5d0d' joined, marking health alive
2016/07/05 20:18:50 [INFO] agent: Synced service 'consul'
I registered each node using the following command with appropriate IP's
docker run -d swarm join --advertise=x-x-x-x:2375 consul://x-x-x-x:8500
Each of those created a docker instance
Node1
Running docker ps gives:
With logs that suggest there's a problem:
time="2016-07-05T21:33:50Z" level=info msg="Registering on the discovery service every 1m0s..." addr="172.31.17.35:2375" discovery="consul://172.31.3.233:8500"
time="2016-07-05T21:36:20Z" level=error msg="cannot set or renew session for ttl, unable to operate on sessions"
time="2016-07-05T21:37:20Z" level=info msg="Registering on the discovery service every 1m0s..." addr="172.31.17.35:2375" discovery="consul://172.31.3.233:8500"
time="2016-07-05T21:39:50Z" level=error msg="cannot set or renew session for ttl, unable to operate on sessions"
time="2016-07-05T21:40:50Z" level=info msg="Registering on the discovery service every 1m0s..." addr="172.31.17.35:2375" discovery="consul://172.31.3.233:8500"
...
And lastly when I get to the last step of trying to get host information like so on my Consul machine,
docker -H :4000 info
I see no nodes. Lastly when I try the step of running an app, I get the obvious error:
[ec2-user#ip-172-31-3-233 ~]$ docker -H :4000 run hello-world
docker: Error response from daemon: No healthy node available in the cluster.
See 'docker run --help'.
[ec2-user#ip-172-31-3-233 ~]$
Thanks for any insight on this. I'm still pretty confused by much of the swarm model and not sure where to go from here to diagnose.
It looks like Consul is either not binding to a public IP address, or is not accessible on the public IP due to security group or VPC settings. You are setting the discovery URL to consul://172.31.3.233:8500 on the Docker nodes, so I would sugest trying to connect to that address from an external IP, either in your browser or via curl like this:
% curl http://172.31.3.233:8500/ui/dist/
HTML
If you cannot connect (connection refused or timeout) then add a TCP port 8500 ingress rule to your AWS VMs, and try again.
After investigating your issue, I see that you forgot open port 2375 for Docker Engine in all four nodes.
Before starting Swarm Manager or Swarm Node, you have to open a TCP Port for Docker Engine, so Swarm will work with Docker Engine via that Port.
With Docker on Ubuntu 14.04, you can open the port by change file /etc/default/docker and add -H tcp://0.0.0.0:2375 to DOCKER_OPTS. For example:
DOCKER_OPTS="-H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock"
After that, you restart Docker Engine
service docker restart
If you are using CentOS, the solution is same, you can read my blog article https://sonnguyen.ws/install-docker-docker-swarm-centos7/
And the other thing, I think that you should install and run Consul in all nodes (4 servers). So your Swarm can work with Consul on its node

Resources