DashDB Local Docker Deployment - docker

I have been able to deploy DashDB local (SMP) locally on my mac (using Kite) 3-4 months ago, but recently I am not able to successfully deploy either SMP or MPP using MacOS (Kite) or Linux (on AWS using individual instances with docker running - not swarm).
Linux flavor (Default Amazon Linux AMI)
[ec2-user#ip-10-0-0-171 ~]$ cat /etc/*-release
NAME="Amazon Linux AMI"
VERSION="2016.03"
ID="amzn"
ID_LIKE="rhel fedora"
VERSION_ID="2016.03"
PRETTY_NAME="Amazon Linux AMI 2016.03"
ANSI_COLOR="0;33"
CPE_NAME="cpe:/o:amazon:linux:2016.03:ga"
HOME_URL="http://aws.amazon.com/amazon-linux-ami/"
Amazon Linux AMI release 2016.03
Linux Kernel
[ec2-user#ip-10-0-0-171 ~]$ sudo uname -r
4.4.11-23.53.amzn1.x86_64
Docker Version
[ec2-user#ip-10-0-0-171 ~]$ docker --version
Docker version 1.11.2, build b9f10c9/1.11.2
hostname
[ec2-user#ip-10-0-0-171 ~]$ hostname
ip-10-0-0-171
dnsdomainname
[ec2-user#ip-10-0-0-171 ~]$ dnsdomainname
ec2.internal
In every variant approach I always end up with something like the message below after running:
docker run -d -it --privileged=true --net=host --name=dashDB -v /mnt/clusterfs:/mnt/bludata0 -v /mnt/clusterfs:/mnt/blumeta0 ibmdashdb/preview:latest
(for SMP) or docker exec -it dashDB start (after the run command for MPP). I tried using the getlogs, but couldn't find anything interesting. Any ideas? For SMP I am using a created directory on single host, for MPP I am using AWS' EFS for a shared NFS mount.
[ec2-user#ip-10-0-0-171 ~]$ docker logs --follow dashDB
/mnt/bludata0/nodes cannot be found. We will continue with a single-node deployment.
Checking if dashDB initialize has been done previously ...
dashDB stack is NOT initialized yet.
#####################################################################
Running dashDB prerequisite checks on node: ip-10-0-0-171
#####################################################################
#####################################################################
Prerequisite check -- Minimum Memory requirement
#####################################################################
* Memory check: PASS
#####################################################################
Prerequisite check -- Minimum data volume free-space requirement
#####################################################################
* Free space in data volume check: PASS
#####################################################################
Prerequisite check -- Minimum number of CPU/CPU core requirement
#####################################################################
* CPU check: PASS
#####################################################################
Prerequisite check -- Data Volume device DIO requirement
#####################################################################
* DIO check: PASS
#####################################################################
Prerequisite check -- Data Volume device I/O stats
#####################################################################
Testing WRITE I/O performance of the data volume device
32768+0 records in
32768+0 records out
134217728 bytes (134 MB) copied, 33.7435 s, 4.0 MB/s
real 0m33.746s
user 0m0.008s
sys 0m12.040s
Testing READ I/O performance of the data volume device
32768+0 records in
32768+0 records out
134217728 bytes (134 MB) copied, 10.8286 s, 12.4 MB/s
real 0m10.831s
user 0m0.116s
sys 0m0.344s
######################################################################
*************************************************
Prerequisite check summary for Node: ip-10-0-0-171
*************************************************
* Memory check: PASS
* Free space in data volume check: PASS
* CPU check: PASS
* DIO check: PASS
*********************************************
I/O perf test summary for Node: ip-10-0-0-171
*********************************************
* Read throughput: 12.4 MB/s
* Write throughput: 4.0 MB/s
######################################################################
Creating dashDB directories and dashDB instance
Starting few of the key services ...
Generating /etc/rndc.key: [ OK ]
Starting named: [ OK ]
Starting saslauthd: [ OK ]
Starting sendmail: [ OK ]
Starting sm-client: [ OK ]
Setting dsserver Config
Setting openldap
Starting slapd: [ OK ]
Starting sssd: [ OK ]
Starting system logger: [ OK ]
Starting nscd: [ OK ]
Update dsserver with ldap info
dashDB set configuration
Setting database configuration
database SSL configuration
-bludb_ssl_keystore_password
-bludb_ssl_certificate_label
UPDATED: /opt/ibm/dsserver/Config/dswebserver.properties
set dashDB Encryption
Setting up keystore
dashDB failed to stop on ip-10-0-0-171 because database services didn't stop.
Retry the operation. If the same failure occurs, contact IBM Service.
If a command prompt is not visible on your screen, you need to detach from the container by typing Ctrl-C.
Stop/Start
[ec2-user#ip-10-0-0-171 ~]$ docker exec -it dashDB stop
Attempt to shutdown services on node ip-10-0-0-171 ...
dsserver_home: /opt/ibm/dsserver
port: -1
https.port: 8443
status.port: 11082
SERVER STATUS: INACTIVE
httpd: no process killed
Instance is already in stopped state due to which database consistency can't be checked
###############################################################################
Successfully stopped dashDB
###############################################################################
[ec2-user#ip-10-0-0-171 ~]$ docker stop dashDB
dashDB
[ec2-user#ip-10-0-0-171 ~]$ docker start dashDB
dashDB
[ec2-user#ip-10-0-0-171 ~]$ docker logs --follow dashDB
Follow the logs again
[ec2-user#ip-10-0-0-171 ~]$ docker logs --follow dashDB
....SAME INFO FROM BEFORE...
/mnt/bludata0/nodes cannot be found. We will continue with a single-node deployment.
Checking if dashDB initialize has been done previously ...
dashDB stack is NOT initialized yet.
#####################################################################
Running dashDB prerequisite checks on node: ip-10-0-0-171
#####################################################################
#####################################################################
Prerequisite check -- Minimum Memory requirement
#####################################################################
* Memory check: PASS
#####################################################################
Prerequisite check -- Minimum data volume free-space requirement
#####################################################################
* Free space in data volume check: PASS
#####################################################################
Prerequisite check -- Minimum number of CPU/CPU core requirement
#####################################################################
* CPU check: PASS
#####################################################################
Prerequisite check -- Data Volume device DIO requirement
#####################################################################
* DIO check: PASS
#####################################################################
Prerequisite check -- Data Volume device I/O stats
#####################################################################
Testing WRITE I/O performance of the data volume device
32768+0 records in
32768+0 records out
134217728 bytes (134 MB) copied, 34.5297 s, 3.9 MB/s
real 0m34.532s
user 0m0.020s
sys 0m11.988s
Testing READ I/O performance of the data volume device
32768+0 records in
32768+0 records out
134217728 bytes (134 MB) copied, 10.8309 s, 12.4 MB/s
real 0m10.833s
user 0m0.000s
sys 0m0.432s
######################################################################
*************************************************
Prerequisite check summary for Node: ip-10-0-0-171
*************************************************
* Memory check: PASS
* Free space in data volume check: PASS
* CPU check: PASS
* DIO check: PASS
*********************************************
I/O perf test summary for Node: ip-10-0-0-171
*********************************************
* Read throughput: 12.4 MB/s
* Write throughput: 3.9 MB/s
######################################################################
Creating dashDB directories and dashDB instance
mv: cannot stat `/tmp/bashrc_db2inst1': No such file or directory
mv: cannot stat `/tmp/bash_profile_db2inst1': No such file or directory
Starting few of the key services ...
Starting named: [ OK ]
Starting saslauthd: [ OK ]
Starting sendmail: [ OK ]
Setting dsserver Config
mv: cannot stat `/tmp/dswebserver.properties': No such file or directory
Setting openldap
/bin/sh: /tmp/ldap-directories.sh: No such file or directory
cp: cannot stat `/tmp/cn=config.ldif': No such file or directory
mv: cannot stat `/tmp/olcDatabase0bdb.ldif': No such file or directory
cp: cannot stat `/tmp/slapd-sha2.so': No such file or directory
mv: cannot stat `/tmp/cn=module0.ldif': No such file or directory
ln: creating hard link `/var/run/slapd.pid': File exists [ OK ]
Starting sssd: [ OK ]
Starting system logger: [ OK ]
Starting nscd: [ OK ]
Update dsserver with ldap info
dashDB set configuration
Setting database configuration
database SSL configuration
-bludb_ssl_keystore_password
-bludb_ssl_certificate_label
UPDATED: /opt/ibm/dsserver/Config/dswebserver.properties
set dashDB Encryption
dashDB failed to stop on ip-10-0-0-171 because database services didn't stop.
Retry the operation. If the same failure occurs, contact IBM Service.
If a command prompt is not visible on your screen, you need to detach from the container by typing Ctrl-C.

Thank you for testing dashDB Local.
MPP is only supported on Linux.
SMP on Mac is only supported using Kitematic with Docker Toolbox v1.11.1b and using the 'v1.0.0-kitematic' tag image, not 'latest'.
To help you further I'd like to focus on a single environment and for simplicity let's start with SMP on Linux and we can later discuss MPP.
Check the minimum requirements for an SMP installation:
Processor 2.0 GHz core
Memory 8 GB RAM
Storage 20 GB
What is the Linux flavor you use? Check with:
cat /etc/*-release
Make sure you have at least a Linux kernel 3.10. You can check with:
$ uname -r
3.10.0-229.el7.x86_64
Then let's find out what version of docker is installed with:
$ docker --version
Docker version 1.12.1, build 23cf638
Additionally you need to configure a hostname and domain name. You can verify that you have these with:
$ hostname
and
$ dnsdomainname
Also ensure all the required ports are open, the list is long. Check our documentation.
Is this system virtual or physical?
Can you show the entire output of as well as all above checks:
$ docker logs -–follow dashDB
Try the following steps which if all else is correct may help resolve this issue. Once you see the error:
$ docker exec -it dashDB stop
$ docker stop dashDB
$ docker start dashDB

Related

Cannot run nodetool commands and cqlsh to Scylla in Docker

I am new to Scylla and I am following the instructions to try it in a container as per this page: https://hub.docker.com/r/scylladb/scylla/.
The following command ran fine.
docker run --name some-scylla --hostname some-scylla -d scylladb/scylla
I see the container is running.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e6c4e19ff1bd scylladb/scylla "/docker-entrypoint.…" 14 seconds ago Up 13 seconds 22/tcp, 7000-7001/tcp, 9042/tcp, 9160/tcp, 9180/tcp, 10000/tcp some-scylla
However, I'm unable to use nodetool or cqlsh. I get the following output.
$ docker exec -it some-scylla nodetool status
Using /etc/scylla/scylla.yaml as the config file
nodetool: Unable to connect to Scylla API server: java.net.ConnectException: Connection refused (Connection refused)
See 'nodetool help' or 'nodetool help <command>'.
and
$ docker exec -it some-scylla cqlsh
Connection error: ('Unable to connect to any servers', {'172.17.0.2': error(111, "Tried connecting to [('172.17.0.2', 9042)]. Last error: Connection refused")})
Any ideas?
Update
Looking at docker logs some-scylla I see some errors in the logs, the last one is as follows.
2021-10-03 07:51:04,771 INFO spawned: 'scylla' with pid 167
Scylla version 4.4.4-0.20210801.69daa9fd0 with build-id eb11cddd30e88ef39c32c847e70181b5cf786355 starting ...
command used: "/usr/bin/scylla --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --developer-mode=1 --overprovisioned --listen-address 172.17.0.2 --rpc-address 172.17.0.2 --seed-provider-parameters seeds=172.17.0.2 --blocked-reactor-notify-ms 999999999"
parsed command line options: [log-to-syslog: 0, log-to-stdout: 1, default-log-level: info, network-stack: posix, developer-mode: 1, overprovisioned, listen-address: 172.17.0.2, rpc-address: 172.17.0.2, seed-provider-parameters: seeds=172.17.0.2, blocked-reactor-notify-ms: 999999999]
ERROR 2021-10-03 07:51:05,203 [shard 6] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application
2021-10-03 07:51:05,316 INFO exited: scylla (exit status 1; not expected)
2021-10-03 07:51:06,318 INFO gave up: scylla entered FATAL state, too many start retries too quickly
Update 2
The reason for the error was described on the docker hub page linked above. I had to start container specifying the number of CPUs with --smp 1 as follows.
docker run --name some-scylla --hostname some-scylla -d scylladb/scylla --smp 1
According to the above page:
This command will start a Scylla single-node cluster in developer mode
(see --developer-mode 1) limited by a single CPU core (see --smp).
Production grade configuration requires tuning a few kernel parameters
such that limiting number of available cores (with --smp 1) is the
simplest way to go.
Multiple cores requires setting a proper value to the
/proc/sys/fs/aio-max-nr. On many non production systems it will be
equal to 65K. ...
As you have found out, in order to be able to use additional CPU cores you'll need to increase fs.aio-max-nr kernel parameter.
You may run as root:
# sysctl -w fs.aio-max-nr=65535
Which should be enough for most systems. Should you still have any error preventing it to use all of your CPU cores, increase its value further.
Do notice that the above configuration is not persistent. Edit /etc/sysctl.conf in order to make it persistent across reboots.

Can I run k8s master INSIDE a docker container? Getting errors about k8s looking for host's kernel details

In a docker container I want to run k8s.
When I run kubeadm join ... or kubeadm init commands I see sometimes errors like
\"modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could
not open moddep file
'/lib/modules/3.10.0-1062.1.2.el7.x86_64/modules.dep.bin'.
nmodprobe:
FATAL: Module configs not found in directory
/lib/modules/3.10.0-1062.1.2.el7.x86_64",
err: exit status 1
because (I think) my container does not have the expected kernel header files.
I realise that the container reports its kernel based on the host that is running the container; and looking at k8s code I see
// getKernelConfigReader search kernel config file in a predefined list. Once the kernel config
// file is found it will read the configurations into a byte buffer and return. If the kernel
// config file is not found, it will try to load kernel config module and retry again.
func (k *KernelValidator) getKernelConfigReader() (io.Reader, error) {
possibePaths := []string{
"/proc/config.gz",
"/boot/config-" + k.kernelRelease,
"/usr/src/linux-" + k.kernelRelease + "/.config",
"/usr/src/linux/.config",
}
so I am bit confused what is simplest way to run k8s inside a container such that it consistently past this getting the kernel info.
I note that running docker run -it solita/centos-systemd:7 /bin/bash on a macOS host I see :
# uname -r
4.9.184-linuxkit
# ls -l /proc/config.gz
-r--r--r-- 1 root root 23834 Nov 20 16:40 /proc/config.gz
but running exact same on a Ubuntu VM I see :
# uname -r
4.4.0-142-generic
# ls -l /proc/config.gz
ls: cannot access /proc/config.gz
[Weirdly I don't see this FATAL: Module configs not found in directory error every time, but I guess that is a separate question!]
UPDATE 22/November/2019. I see now that k8s DOES run okay in a container. Real problem was weird/misleading logs. I have added an answer to clarify.
I do not believe that is possible given the nature of containers.
You should instead test your app in a docker container then deploy that image to k8s either in the cloud or locally using minikube.
Another solution is to run it under kind which uses docker driver instead of VirtualBox
https://kind.sigs.k8s.io/docs/user/quick-start/
It seems the FATAL error part was a bit misleading.
It was badly formatted by my test environment (all on one line.
When k8s was failing I saw the FATAL and assumed (incorrectly) that was root cause.
When I format the logs nicely I see ...
kubeadm join 172.17.0.2:6443 --token 21e8ab.1e1666a25fd37338 --discovery-token-unsafe-skip-ca-verification --experimental-control-plane --ignore-preflight-errors=all --node-name 172.17.0.3
[preflight] Running pre-flight checks
[WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
[preflight] The system verification failed. Printing the output from the verification:
KERNEL_VERSION: 4.4.0-142-generic
DOCKER_VERSION: 18.09.3
OS: Linux
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.3. Latest validated version: 18.06
[WARNING SystemVerification]: failed to parse kernel config: unable to load kernel module: "configs", output: "modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-142-generic/modules.dep.bin'\nmodprobe: FATAL: Module configs not found in directory /lib/modules/4.4.0-142-generic\n", err: exit status 1
[discovery] Trying to connect to API Server "172.17.0.2:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.17.0.2:6443"
[discovery] Failed to request cluster info, will try again: [the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps cluster-info)]
There are other errors later, which I originally though were a side-effect of the nasty looking FATAL error e.g. .... "[util/etcd] Attempt timed out"]} but I now think root cause is Etcd part times out sometimes.
Adding this answer in case someone else puzzled like I was.

ceph mount failed with (95) Operation not supported

I installed ceph on servers "A" and "B" and I would like to mount it from "C" or "D" servers.
But I faced below error.
ceph-fuse[4628]: ceph mount failed with (95) Operation not supported
My server configuration is as follow.
A Server: ubunt16.04(ceph-server) 10.1.1.54
B Server: ubuntu16.04(ceph-server) 10.1.1.138
C Server: AmazonLinux(clinet)
D Server: ubuntu16.04(client)
and ceph.conf
[global]
fsid = 44f299ac-ff11-41c8-ab96-225d62cb3226
mon_initial_members = node01, node02
mon_host = 10.1.1.54,10.1.1.138
auth cluster required = none
auth service required = none
auth client required = none
auth supported = none
osd pool default size = 2
public network = 10.1.1.0/24
Ceph is also installed correctly.
ceph health
HEALTH_OK
ceph -s
cluster 44f299ac-ff11-41c8-ab96-225d62cb3226
health HEALTH_OK
monmap e1: 2 mons at {node01=10.1.1.54:6789/0,node02=10.1.1.138:6789/0}
election epoch 12, quorum 0,1 node01,node02
osdmap e41: 2 osds: 2 up, 2 in
flags sortbitwise,require_jewel_osds
pgmap v100: 64 pgs, 1 pools, 306 bytes data, 4 objects
69692 kB used, 30629 MB / 30697 MB avail
64 active+clean
An error occurred when using the ceph-fuse command.
sudo ceph-fuse -m 10.1.1.138:6789 /mnt/mycephfs/ --debug-auth=10 --debug-ms=10
ceph-fuse[4628]: starting ceph client
2017-11-02 08:57:22.905630 7f8cfdd60f00 -1 init, newargv = 0x55779de6af60 newargc=11
ceph-fuse[4628]: ceph mount failed with (95) Operation not supported
ceph-fuse[4626]: mount failed: (95) Operation not supported
I got an error saying "ceph mount failed with (95) Operation not supported"
I added an option "--auth-client-required=none"
sudo ceph-fuse -m 10.1.1.138:6789 /mnt/mycephfs/ --debug-auth=10 --debug-ms=10 --auth-client-required=none
ceph-fuse[4649]: starting ceph client
2017-11-02 09:03:47.501363 7f1239858f00 -1 init, newargv = 0x5597621eaf60 newargc=11
Behavior changed, There is no response here.
I got an below error if ceph-fuse command is not used.
sudo mount -t ceph 10.1.1.138:6789:/ /mnt/mycephfs
can't read superblock
Somehow, it seems necessary to authenticate with client even with "auth supported = none"
In that case, how could you pass authentication form servers "c" or "d"?
Please let me know, If there is possible cause other than authentication.
I thought that you need more steps such as format the file system, so
you should check again your installation steps for your purposes, Ceph has multiple components for each services, such as object storage, block storage, file system and API. And each service was required its configuration steps.
This installation gude is helpful for your cases.
https://github.com/infn-bari-school/cloud-storage-tutorials/wiki/Ceph-cluster-installation-(jewel-on-CentOS)
If you want to build the Ceph file system for testing, you can build the small size CephFS as following installation steps.
I'll skip the details of the steps and CLI usages, you can get more information from the official documents.
Environment informations
Ceph version: Jewel, 10.2.9
OS: CentOS 7.4
Prerequisite before installation of Ceph file system.
Required this configuration 4 nodes,
ceph-admin node: deploy monitor, metadata server
ceph-osd0: osd service
ceph-osd1: osd service
ceph-osd2: osd service
Enabling NTP - all nodes
The OS user for deploying ceph compnents required escalation privileges setting (e.g. sudoers)
SSH public key configuration (directions: ceph-admin -> OSD nodes)
Installation of ceph-deploy tool on ceph-admin Admin node.
# yum install -y ceph-deploy
Deploying the required the Ceph components for the Ceph file system
Create the cluster on ceph-admin Admin node using normal OS user (for deploying ceph components)
$ mkdir ./cluster
$ cd ./cluster
$ ceph-deploy new ceph-admin
modify the ceph.conf into the cluster directory.
$ vim ceph.conf
[global]
..snip...
mon_initial_members = ceph-admin
mon_host = $MONITORSERVER_IP_OR_HOSTNAME
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
# the number of replicas for objects in the pool, default value is 3
osd pool default size = 3
public network = $YOUR_SERVICE_NETWORK_CIDR
installing monitor and osd services to related nodes.
$ ceph-deploy install --release jewel ceph-admin ceph-osd0 ceph-osd1 ceph-osd2
initiate monitor service
$ ceph-deploy mon create-initial
Create the OSD devices
ceph-deploy osd list ceph-osd{0..2}:vdb
Adding metadata server component for Ceph file system service.
Adding metadata server (this service just required only Ceph file system)
ceph-deploy mds create ceph-admin
check the status
ceph mds stat
create the pools for cephFS
ceph osd pool create cephfs_data_pool 64
ceph osd pool create cephfs_meta_pool 64
Create the ceph file systems
ceph fs new cephfs cephfs_meta_pool cephfs_data_pool
Mount the Ceph file system
Required ceph-fuse package on the node for mounting.
mount as the cephFS
ceph-fuse -m MONITOR_SERVER_IP_OR_HOSTNAME:PORT_NUMBER <LOCAL_MOUNTPOINT>
End...
I solved this problem by fixing three settings.
1.
The auth settings in ceph.conf returned as follows
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
2.
public network was wrong.
public network = 10.1.1.0/24
↓
public network = 10.0.0.0/8
my client ip address was 10.1.0.238...
It was a stupid mistake.
3.
I changed secret option to secretfile option and everything was fine.
In this case, it was failed.
sudo mount -t ceph 10.1.1.138:6789:/ /mnt/mycephfs -o name=client.admin,secret=`sudo ceph-authtool -p /etc/ceph/ceph.client.admin.keyring`
output:
mount error 1 = Operation not permitted
but in this case, It was success.
sudo mount -vvvv -t ceph 10.1.1.138:6789:/ /mnt/mycephfs -o name=admin,secretfile=admin.secret
output:
parsing options: rw,name=admin,secretfile=admin.secret
mount: error writing /etc/mtab: Invalid argument
※ Invalid argument The error seems to be ignored.
Apparently, Both are the same key.
sudo ceph-authtool -p /etc/ceph/ceph.client.admin.keyring
AQBd9f9ZSL46MBAAqwepJDC5tuIL/uYp0MXjCA==
cat admin.secret
AQBd9f9ZSL46MBAAqwepJDC5tuIL/uYp0MXjCA==
I don't know reason,but I could mount using secretfile option.

dashDB local MPP deployment issue - cannot connect to database

I am facing a huge problem at deploying a dashDB local cluster. After a successful deployment the following error comes in case of trying to create a single table or launch a query. Furthermore webserver is not working properly like in previous SMP deployment.
Cannot connect to database "BLUDB" on node "20" because the difference
between the system time on the catalog node and the virtual timestamp
on this node is greater than the max_time_diff database manager
configuration parameter.. SQLCODE=-1472, SQLSTATE=08004,
DRIVER=4.18.60
I followed official deployment guide, so followings were doublechecked:
each physical machines' and docker containers' /etc/hosts file contains all ips, fully qualified and simple hostnames
there is a NFS preconfigured and mounted to /mnt/clusterfs on every single server
none of the servers signed an error at phase "docker logs --follow dashDB" command
nodes config file is located in /mnt/clusterfs directory
After starting dashDB with following command:
docker exec -it dashDB start
It looks as it should be (see below), but the error can be found at /opt/ibm/dsserver/logs/dsserver.0.log.
#
--- dashDB stack service status summary ---
##################################################################### Redirecting to /bin/systemctl status slapd.service
SUMMARY
LDAPrunning: SUCCESS
dashDBtablesOnline: SUCCESS
WebConsole : SUCCESS
dashDBconnectivity : SUCCESS
dashDBrunning : SUCCESS
#
--- dashDB high availability status ---
#
Configuring dashDB high availability ... Stopping the system Stopping
datanode dashdb02 Stopping datanode dashdb01 Stopping headnode
dashdb03 Running sm on head node dashdb03 .. Running sm on data node
dashdb02 .. Running sm on data node dashdb01 .. Attempting to activate
previously failed nodes, if any ... SM is RUNNING on headnode dashdb03
(ACTIVE) SM is RUNNING on datanode dashdb02 (ACTIVE) SM is RUNNING on
datanode dashdb01 (ACTIVE) Overall status : RUNNING
After several redeployment nothing has changed. Please help me in what I am doing wrong.
Many Thanks, Daniel
Always make sure to NTP service is started on every single cluster node before starting a docker container. Otherwise it will take no effect on it.

Docker on RHEL 6 Cgroup mounting failing

I'm trying to get my head around something that's been working on a Centos+Vagrant, but not on our providers RHEL (Red Hat Enterprise Linux Server release 6.5 (Santiago)). A sudo service docker restart hands this:
Stopping docker: [ OK ]
Starting cgconfig service: Error: cannot mount cpuset to /cgroup/cpuset: Device or resource busy
/sbin/cgconfigparser; error loading /etc/cgconfig.conf: Cgroup mounting failed
Failed to parse /etc/cgconfig.conf [FAILED]
Starting docker: [ OK ]
The service starts okey enough, but images cannot run. A mounting failed error is shown when I try. And the startup-log also gives a warning or two. Regarding the kernelwarning, centos gives the same and has no problems as Epel should resolve this:
WARNING: You are running linux kernel version 2.6.32-431.17.1.el6.x86_64, which might be unstable running docker. Please upgrade your kernel to 3.8.0.
2014/08/07 08:58:29 docker daemon: 1.1.2 d84a070; execdriver: native; graphdriver:
[1233d0af] +job serveapi(unix:///var/run/docker.sock)
[1233d0af] +job initserver()
[1233d0af.initserver()] Creating server
2014/08/07 08:58:29 Listening for HTTP on unix (/var/run/docker.sock)
[1233d0af] +job init_networkdriver()
[1233d0af] -job init_networkdriver() = OK (0)
2014/08/07 08:58:29 WARNING: mountpoint not found
Anyone had any success overcoming this problem or should I throw in the towel and wait for the provider to update to RHEL 7?
I have the same issue.
(1) check cgconfig status
# /etc/init.d/cgconfig status
if it stopped, restart it
# /etc/init.d/cgconfig restart
check cgconfig is running
(2) check cgconfig is on
# chkconfig --list cgconfig
cgconfig 0:off 1:off 2:off 3:off 4:off 5:off 6:off
if cgconfig is off, turn it on
(3) if still does not work, may be some cgroups modules is missing. In the kernel .config file, make menuconfig, add those modules into kernel and recompile and reboot
after that, it should be OK
I ended up asking the same question at Google Groups and in the end finding a solution with some help. What worked for me was this:
umount cgroup
sudo service cgconfig start
The project of making Docker work was put on halt all the same. Later a problem of network connection for the containers. This took to much time to solve and had to give up.
So I spent the whole day trying to rig docker to work on my vps. I was running into this same error. Basically what it came down to was the fact that OpenVZ didn't support docker containers up until a couple months ago. Specifically this RHEL update:
https://openvz.org/Download/kernel/rhel6/042stab105.14
Assuming this is your problem, or some variation of it, the burden of solving it is on your host. They will need to follow these steps:
https://openvz.org/Docker_inside_CT
In my case
/etc/rc.d/rc.cgconfig start
was generating
Starting cgconfig service: Error: cannot mount cpu,cpuacct,memory to
/cgroup/cpu_and_mem: Device or resource busy /usr/sbin/cgconfigparser;
error loading /etc/cgconfig.conf: Cgroup mounting failed Failed to
parse /etc/cgconfig.conf
i had to use:
/etc/rc.d/rc.cgconfig restart
and it automagicly umouted and mounted groups
Stopping cgconfig service: Starting cgconfig service:
it seems like the cgconfig service not running,so check it!
# /etc/init.d/cgconfig status
# mkdir -p /cgroup/cpuacct /cgroup/memory /cgroup/devices /cgroup/freezer net_cls /cgroup/blkio
# cat /etc/cgconfig.conf |tail|grep "="|awk '{print "mount -t cgroup -o",$1,$1,$NF}'>cgroup_mount.sh
# sh ./cgroup_mount.sh
# /etc/init.d/cgconfig restart
# /etc/init.d/docker restart
This situation occurs when the kernel is booted with cgroup_disable=memory and /etc/cgconfig.conf contains memory = /cgroup/memory;
This causes only /cgroup/cpuset to be mounted instead of the full set.
Solution: either remove cgroup_disable=memory from your kernel boot options or comment out memory = /cgroup/memory; from cgconfig.conf.
The cgconfig service startup uses mount and umount which requires an extra privilege bump from docker.
See the --privileged=true flag here for more info.
I was able to overcome this issue by starting my container with:
docker run -it --privileged=true my-image.
Tested in Centos6, Centos6.5.

Resources