dashDB local MPP deployment issue - cannot connect to database - docker

I am facing a huge problem at deploying a dashDB local cluster. After a successful deployment the following error comes in case of trying to create a single table or launch a query. Furthermore webserver is not working properly like in previous SMP deployment.
Cannot connect to database "BLUDB" on node "20" because the difference
between the system time on the catalog node and the virtual timestamp
on this node is greater than the max_time_diff database manager
configuration parameter.. SQLCODE=-1472, SQLSTATE=08004,
DRIVER=4.18.60
I followed official deployment guide, so followings were doublechecked:
each physical machines' and docker containers' /etc/hosts file contains all ips, fully qualified and simple hostnames
there is a NFS preconfigured and mounted to /mnt/clusterfs on every single server
none of the servers signed an error at phase "docker logs --follow dashDB" command
nodes config file is located in /mnt/clusterfs directory
After starting dashDB with following command:
docker exec -it dashDB start
It looks as it should be (see below), but the error can be found at /opt/ibm/dsserver/logs/dsserver.0.log.
#
--- dashDB stack service status summary ---
##################################################################### Redirecting to /bin/systemctl status slapd.service
SUMMARY
LDAPrunning: SUCCESS
dashDBtablesOnline: SUCCESS
WebConsole : SUCCESS
dashDBconnectivity : SUCCESS
dashDBrunning : SUCCESS
#
--- dashDB high availability status ---
#
Configuring dashDB high availability ... Stopping the system Stopping
datanode dashdb02 Stopping datanode dashdb01 Stopping headnode
dashdb03 Running sm on head node dashdb03 .. Running sm on data node
dashdb02 .. Running sm on data node dashdb01 .. Attempting to activate
previously failed nodes, if any ... SM is RUNNING on headnode dashdb03
(ACTIVE) SM is RUNNING on datanode dashdb02 (ACTIVE) SM is RUNNING on
datanode dashdb01 (ACTIVE) Overall status : RUNNING
After several redeployment nothing has changed. Please help me in what I am doing wrong.
Many Thanks, Daniel

Always make sure to NTP service is started on every single cluster node before starting a docker container. Otherwise it will take no effect on it.

Related

Running a Chainlink Node - Can't connect to database

Using docker-desktop on macOS.
I'm trying to run a node following the instructions on this page.
The database name is node, which is the same as the username: node. The user has access to the database and can log in using psql client.
Connection strings I've tried in the .env file:
postgresql://node#localhost/node
postgresql://node:password#localhost/node
postgresql://node:password#localhost:5432/node
postgresql://node:password#127.0.0.1:5432/node
postgresql://node:password#127.0.0.1/node
When I run the start command: cd ~/.chainlink-kovan && docker run -p 6688:6688 -v ~/.chainlink-kovan:/chainlink -it --env-file=.env smartcontract/chainlink local n , using docker-desktop on macOS, I get the following stack trace:
2020-09-15T14:24:41Z [INFO] Starting Chainlink Node 0.8.15 at commit a904730bd62c7174b80a2c4ccf885de3e78e3971 cmd/local_client.go:50
2020-09-15T14:24:41Z [INFO] SGX enclave *NOT* loaded cmd/enclave.go:11
2020-09-15T14:24:41Z [INFO] This version of chainlink was not built with support for SGX tasks cmd/enclave.go:12
2020-09-15T14:24:41Z [INFO] Locking postgres for exclusive access with 500ms timeout orm/orm.go:69
2020-09-15T14:24:41Z [ERROR] unable to lock ORM: dial tcp 127.0.0.1:5432: connect: connection refused logger/default.go:139 stacktrace=github.com/smartcontractkit/chainlink/core/logger.Error
/chainlink/core/logger/default.go:117
...
Does anyone know how I can resolve this?
The problem probably caused by the fact that your chainlink database has been locked with Exclusive Lock and before stopping node that locks never removed.
What you do in this situation (as what works for me) is use PgAdmin Ui or similar way to find all Locks then find the Exclusive Lock that is held on the chainlink database and note down its Process id or ids (if multiple exclusive locks there are on chainlink DB)
Log in to your pg client and run SELECT pg_terminate_backend(<pid>) or SELECT pg_cancel_backend(<pid>); Enter PID of those locks here without quotes and meanwhile keep refreshing on pg admin URL to see if those processes stopped If stopped then rerun your chainlink node.
The problem is with docker networking.
Add --network host to the docker run command so that it is:
cd ~/.chainlink-kovan && docker run -p 6688:6688 -v ~/.chainlink-kovan:/chainlink -it --env-file=.env smartcontract/chainlink --network host local n
This fixes the issue.

Can I run k8s master INSIDE a docker container? Getting errors about k8s looking for host's kernel details

In a docker container I want to run k8s.
When I run kubeadm join ... or kubeadm init commands I see sometimes errors like
\"modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could
not open moddep file
'/lib/modules/3.10.0-1062.1.2.el7.x86_64/modules.dep.bin'.
nmodprobe:
FATAL: Module configs not found in directory
/lib/modules/3.10.0-1062.1.2.el7.x86_64",
err: exit status 1
because (I think) my container does not have the expected kernel header files.
I realise that the container reports its kernel based on the host that is running the container; and looking at k8s code I see
// getKernelConfigReader search kernel config file in a predefined list. Once the kernel config
// file is found it will read the configurations into a byte buffer and return. If the kernel
// config file is not found, it will try to load kernel config module and retry again.
func (k *KernelValidator) getKernelConfigReader() (io.Reader, error) {
possibePaths := []string{
"/proc/config.gz",
"/boot/config-" + k.kernelRelease,
"/usr/src/linux-" + k.kernelRelease + "/.config",
"/usr/src/linux/.config",
}
so I am bit confused what is simplest way to run k8s inside a container such that it consistently past this getting the kernel info.
I note that running docker run -it solita/centos-systemd:7 /bin/bash on a macOS host I see :
# uname -r
4.9.184-linuxkit
# ls -l /proc/config.gz
-r--r--r-- 1 root root 23834 Nov 20 16:40 /proc/config.gz
but running exact same on a Ubuntu VM I see :
# uname -r
4.4.0-142-generic
# ls -l /proc/config.gz
ls: cannot access /proc/config.gz
[Weirdly I don't see this FATAL: Module configs not found in directory error every time, but I guess that is a separate question!]
UPDATE 22/November/2019. I see now that k8s DOES run okay in a container. Real problem was weird/misleading logs. I have added an answer to clarify.
I do not believe that is possible given the nature of containers.
You should instead test your app in a docker container then deploy that image to k8s either in the cloud or locally using minikube.
Another solution is to run it under kind which uses docker driver instead of VirtualBox
https://kind.sigs.k8s.io/docs/user/quick-start/
It seems the FATAL error part was a bit misleading.
It was badly formatted by my test environment (all on one line.
When k8s was failing I saw the FATAL and assumed (incorrectly) that was root cause.
When I format the logs nicely I see ...
kubeadm join 172.17.0.2:6443 --token 21e8ab.1e1666a25fd37338 --discovery-token-unsafe-skip-ca-verification --experimental-control-plane --ignore-preflight-errors=all --node-name 172.17.0.3
[preflight] Running pre-flight checks
[WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
[preflight] The system verification failed. Printing the output from the verification:
KERNEL_VERSION: 4.4.0-142-generic
DOCKER_VERSION: 18.09.3
OS: Linux
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.3. Latest validated version: 18.06
[WARNING SystemVerification]: failed to parse kernel config: unable to load kernel module: "configs", output: "modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-142-generic/modules.dep.bin'\nmodprobe: FATAL: Module configs not found in directory /lib/modules/4.4.0-142-generic\n", err: exit status 1
[discovery] Trying to connect to API Server "172.17.0.2:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.17.0.2:6443"
[discovery] Failed to request cluster info, will try again: [the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps cluster-info)]
There are other errors later, which I originally though were a side-effect of the nasty looking FATAL error e.g. .... "[util/etcd] Attempt timed out"]} but I now think root cause is Etcd part times out sometimes.
Adding this answer in case someone else puzzled like I was.

Trying to join worker node to master master status ready worker status not ready

I am following all the steps from this link : https://github.com/justmeandopensource/kubernetes
after running the join command in the worker node it's getting added to master, but the status of the worker node is getting changed to ready.
From the logs I got the following :
Container runtime network not ready: NetworkReady=false
reason:NetworkPluginNotReady message:dock
Unable to update cni config: No networks found in /etc/cni/net.d
kubelet.go:2266 -- node "XXXXXXXXX" not found. (xxxxx is the masters
host/node name)
To establish CNI I am using flannel and also tried with weave and many other
CNI networks but the results are the same
points to ponder:
---> worker node kubelet status is healthy
---> trying to run kubeadm init command in the worker node,its showing the status of kubelet might be unhealthy. (Not able to make worker node master by running the kubeadm init command but kubeadm join command is working.After joining kubectl get nodes is showing the worker node but status is notready)
Thank you for the help
I cannot reproduce your issue. I followed exactly the instructions on github`s site you shared, and did not face similar error.
The only extra steps I needed to do, to suppress errors, detected by pre-flight checks of kubeadm init:
[ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1
[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...
was to set appropriate flag by running:
echo '1' > /proc/sys/net/ipv4/ip_forward
State of my cluster nodes:
NAME STATUS ROLES AGE VERSION
centos-master Ready master 18h v1.13.1
centos-worker Ready <none> 18h v1.13.1
I verified cluster condition by deploying&exposing sample application and everything seems to be working fine:
kubectl create deployment hello-node --image=gcr.io/hello-minikube-zero-install/hello-node
kubectl expose deployment hello-node --port=8080
I`m getting valid response from hello-world node.js app:
curl 10.100.113.255:8080
Hello World!#
What IP address you have put to your /etc/hosts files ?

Issue accessing vespa outside docker container

Installed Docker on Mac and trying to run Vespa on Docker following steps specified in following link
https://docs.vespa.ai/documentation/vespa-quick-start.html
I did n't had any issues till step 4. I see vespa container running after step 2 and step 3 returned 200 OK response.
But Step 5 failed to return 200 OK response. Below is the command I ran on my terminal
curl -s --head http://localhost:8080/ApplicationStatus
I keep getting
curl: (52) Empty reply from server whenever I run without -s option.
So I tried to see listening ports inside my vespa container and don't see anything for 8080 but can see for 19071(used in step 3)
➜ ~ docker exec vespa bash -c 'netstat -vatn| grep 8080'
➜ ~ docker exec vespa bash -c 'netstat -vatn| grep 19071'
tcp 0 0 0.0.0.0:19071 0.0.0.0:* LISTEN
Below doc has info related to vespa ports
https://docs.vespa.ai/documentation/reference/files-processes-and-ports.html
I'm assuming port 8080 should be active after docker run(step 2 of quick start link) and can be accessed outside container as port mapping is done.
But I don't see 8080 port active inside container in first place.
A'm I missing something. Do I need to perform any additional step than mentioned in quick start? FYI I installed Jenkins inside my docker and was able to access outside container via port mapping. But not sure why it's not working with vespa.I have been trying from quiet sometime but no progress. Please advice me if I'm missing something here.
You have too low memory for your docker container, "Minimum 6GB memory dedicated to Docker (the default is 2GB on Macs).". See https://docs.vespa.ai/documentation/vespa-quick-start.html
The deadlock detector warnings and failure to get configuration from configuration server (which is likely oom killed) indicates that you are too low on memory.
My guess is that your jdisc container had not finished initialize or did not initialize properly? Did you try to check the log?
docker exec vespa bash -c '/opt/vespa/bin/vespa-logfmt /opt/vespa/logs/vespa/vespa.log'
This should tell you if there was something wrong. When it is ready to receive requests you would see something like this:
[2018-12-10 06:30:37.854] INFO : container Container.org.eclipse.jetty.server.AbstractConnector Started SearchServer#79afa369{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
[2018-12-10 06:30:37.857] INFO : container Container.org.eclipse.jetty.server.Server Started #10280ms
[2018-12-10 06:30:37.857] INFO : container Container.com.yahoo.container.jdisc.ConfiguredApplication Switching to the latest deployed set of configurations and components. Application switch number: 0
[2018-12-10 06:30:37.859] INFO : container Container.com.yahoo.container.jdisc.ConfiguredApplication Initializing new set of configurations and components. Application switch number: 1

Docker on RHEL 6 Cgroup mounting failing

I'm trying to get my head around something that's been working on a Centos+Vagrant, but not on our providers RHEL (Red Hat Enterprise Linux Server release 6.5 (Santiago)). A sudo service docker restart hands this:
Stopping docker: [ OK ]
Starting cgconfig service: Error: cannot mount cpuset to /cgroup/cpuset: Device or resource busy
/sbin/cgconfigparser; error loading /etc/cgconfig.conf: Cgroup mounting failed
Failed to parse /etc/cgconfig.conf [FAILED]
Starting docker: [ OK ]
The service starts okey enough, but images cannot run. A mounting failed error is shown when I try. And the startup-log also gives a warning or two. Regarding the kernelwarning, centos gives the same and has no problems as Epel should resolve this:
WARNING: You are running linux kernel version 2.6.32-431.17.1.el6.x86_64, which might be unstable running docker. Please upgrade your kernel to 3.8.0.
2014/08/07 08:58:29 docker daemon: 1.1.2 d84a070; execdriver: native; graphdriver:
[1233d0af] +job serveapi(unix:///var/run/docker.sock)
[1233d0af] +job initserver()
[1233d0af.initserver()] Creating server
2014/08/07 08:58:29 Listening for HTTP on unix (/var/run/docker.sock)
[1233d0af] +job init_networkdriver()
[1233d0af] -job init_networkdriver() = OK (0)
2014/08/07 08:58:29 WARNING: mountpoint not found
Anyone had any success overcoming this problem or should I throw in the towel and wait for the provider to update to RHEL 7?
I have the same issue.
(1) check cgconfig status
# /etc/init.d/cgconfig status
if it stopped, restart it
# /etc/init.d/cgconfig restart
check cgconfig is running
(2) check cgconfig is on
# chkconfig --list cgconfig
cgconfig 0:off 1:off 2:off 3:off 4:off 5:off 6:off
if cgconfig is off, turn it on
(3) if still does not work, may be some cgroups modules is missing. In the kernel .config file, make menuconfig, add those modules into kernel and recompile and reboot
after that, it should be OK
I ended up asking the same question at Google Groups and in the end finding a solution with some help. What worked for me was this:
umount cgroup
sudo service cgconfig start
The project of making Docker work was put on halt all the same. Later a problem of network connection for the containers. This took to much time to solve and had to give up.
So I spent the whole day trying to rig docker to work on my vps. I was running into this same error. Basically what it came down to was the fact that OpenVZ didn't support docker containers up until a couple months ago. Specifically this RHEL update:
https://openvz.org/Download/kernel/rhel6/042stab105.14
Assuming this is your problem, or some variation of it, the burden of solving it is on your host. They will need to follow these steps:
https://openvz.org/Docker_inside_CT
In my case
/etc/rc.d/rc.cgconfig start
was generating
Starting cgconfig service: Error: cannot mount cpu,cpuacct,memory to
/cgroup/cpu_and_mem: Device or resource busy /usr/sbin/cgconfigparser;
error loading /etc/cgconfig.conf: Cgroup mounting failed Failed to
parse /etc/cgconfig.conf
i had to use:
/etc/rc.d/rc.cgconfig restart
and it automagicly umouted and mounted groups
Stopping cgconfig service: Starting cgconfig service:
it seems like the cgconfig service not running,so check it!
# /etc/init.d/cgconfig status
# mkdir -p /cgroup/cpuacct /cgroup/memory /cgroup/devices /cgroup/freezer net_cls /cgroup/blkio
# cat /etc/cgconfig.conf |tail|grep "="|awk '{print "mount -t cgroup -o",$1,$1,$NF}'>cgroup_mount.sh
# sh ./cgroup_mount.sh
# /etc/init.d/cgconfig restart
# /etc/init.d/docker restart
This situation occurs when the kernel is booted with cgroup_disable=memory and /etc/cgconfig.conf contains memory = /cgroup/memory;
This causes only /cgroup/cpuset to be mounted instead of the full set.
Solution: either remove cgroup_disable=memory from your kernel boot options or comment out memory = /cgroup/memory; from cgconfig.conf.
The cgconfig service startup uses mount and umount which requires an extra privilege bump from docker.
See the --privileged=true flag here for more info.
I was able to overcome this issue by starting my container with:
docker run -it --privileged=true my-image.
Tested in Centos6, Centos6.5.

Resources