Issue while indexing data in solr in docker cluster setup

Issue while indexing data in solr in docker cluster setup - docker

Friends,
I am working on a POC as part of which I have to setup a cluster environment for solr and check on HA perspective viz a viz our existing tool. I am using docker and using below commands to setup
docker network create --subnet 192.168.22.0/24 --ip-range=192.168.22.128/25 netzksolr
#the IP address for the container
ZK1_IP=192.168.22.10
ZK2_IP=192.168.22.11
ZK3_IP=192.168.22.12
# the Docker image
ZK_IMAGE=jplock/zookeeper
docker pull jplock/zookeeper && docker create --ip=$ZK1_IP --net netzksolr --name zk1 --hostname=zk1 --add-host zk2:$ZK2_IP --add-host zk3:$ZK3_IP -it $ZK_IMAGE
docker pull jplock/zookeeper && docker create --ip=$ZK2_IP --net netzksolr --name zk2 --hostname=zk2 --add-host zk1:$ZK1_IP --add-host zk3:$ZK3_IP -it $ZK_IMAGE
docker pull jplock/zookeeper && docker create --ip=$ZK3_IP --net netzksolr --name zk3 --hostname=zk3 --add-host zk1:$ZK1_IP --add-host zk2:$ZK2_IP -it $ZK_IMAGE
docker cp zk1:/opt/zookeeper/conf/zoo.cfg .
cat >>zoo.cfg <<EOM
server.1=zk1:2888:3888
server.2=zk2:2888:3888
server.3=zk3:2888:3888
EOM
docker cp zoo.cfg zk1:/opt/zookeeper/conf/zoo.cfg
docker cp zoo.cfg zk2:/opt/zookeeper/conf/zoo.cfg
docker cp zoo.cfg zk3:/opt/zookeeper/conf/zoo.cfg
echo "1">myid
docker cp myid zk1:/tmp/zookeeper/myid
echo "2">myid
docker cp myid zk2:/tmp/zookeeper/myid
echo "3">myid
docker cp myid zk3:/tmp/zookeeper/myid
docker start zk1;sleep 10
docker start zk2;sleep 10
docker start zk3;sleep 10
docker ps
ZKSOLR1_IP=192.168.22.20
ZKSOLR2_IP=192.168.22.21
ZKSOLR3_IP=192.168.22.22
SOLR_IMAGE=solr
HOST_OPTIONS="--add-host zk1:$ZK1_IP --add-host zk2:$ZK2_IP --add-host zk3:$ZK3_IP "
###setup solr
docker pull $SOLR_IMAGE && docker create --ip=$ZKSOLR1_IP --net netzksolr --name zksolr1 --hostname=zksolr1 -it $HOST_OPTIONS $SOLR_IMAGE
docker pull $SOLR_IMAGE && docker create --ip=$ZKSOLR2_IP --net netzksolr --name zksolr2 --hostname=zksolr2 -it $HOST_OPTIONS $SOLR_IMAGE
docker pull $SOLR_IMAGE && docker create --ip=$ZKSOLR3_IP --net netzksolr --name zksolr3 --hostname=zksolr3 -it $HOST_OPTIONS $SOLR_IMAGE
#create solr.sh file
for h in zksolr1 zksolr2 zksolr3; do
docker cp zksolr1:/opt/solr/bin/solr.in.sh .
sed -i -e 's/#ZK_HOST=""/ZK_HOST="zk1:2181,zk2:2181,zk3:2181"/' solr.in.sh
sed -i -e 's/#*SOLR_HOST=.*/SOLR_HOST="'$h'"/' solr.in.sh
mv solr.in.sh solr.in.sh-$h
done
docker cp solr.in.sh-zksolr1 zksolr1:/opt/solr/bin/solr.in.sh
docker cp solr.in.sh-zksolr2 zksolr2:/opt/solr/bin/solr.in.sh
docker cp solr.in.sh-zksolr3 zksolr3:/opt/solr/bin/solr.in.sh
###Start docker
docker start zksolr1
docker start zksolr2
docker start zksolr3
docker ps
###Create data directory
docker exec -i zksolr1 /bin/bash -c 'mkdir ./Data'
###Create core
docker exec -i zksolr1 /opt/solr/bin/solr create_collection -c HATest -p 8983 -replicationFactor 2 -shards 2
###Send data to solr
docker cp data_file.txt zksolr1:/opt/solr/Data/
##send schema to zoo keeper
docker exec -i zk1 bin/zkCli.sh -cmd set /configs/HATest/managed-schema "`cat <mylocal location>/managed-schema`"
##validate schema
docker exec -i zk1 bin/zkCli.sh -cmd get /configs/HATest/managed-schema
HOSTPORT="http://zksolr1:8983/solr/HATest"
DATAFILE="/opt/solr/Data/data_file.txt" #This is a file with 80,450 records; columns delimited by ",", and rows by \n
SCHEMA=<contains comma separated names of variables. There are 146 variables in it>
date;docker exec -i zksolr1 /bin/bash -c "curl '$HOSTPORT/update/csv?separator=%7C&fieldnames=$SCHEMA&encapsulator=%05&trim=true' -H 'Content-type:application/csv; charset=utf-8' --data-binary #$DATAFILE";date
docker exec -i zksolr1 /bin/bash -c "curl '$HOSTPORT/update/csv?commit=true'"
The index creation process never seems to complete.
GURC02RCC74G8WN:Data amada2$ date;docker exec -i zksolr1 /bin/bash -c "curl '$HOSTPORT/update/csv?separator=%7C&fieldnames=$SCHEMA&encapsulator=%05&trim=true' -H 'Content-type:application/csv; charset=utf-8' --data-binary #$DATAFILE";date
Fri Dec 30 23:01:44 IST 2016
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
3 91.4M 0 0 3 3632k 0 659 40:24:28 1:34:00 38:50:28 0^C
Sat Dec 31 00:35:46 IST 2016
Upon looking at the logs of zksolr1. I saw endless warning/error stating
2016-12-31 12:59:05.581 ERROR (qtp110456297-16) [c:HATest s:shard1 r:core_node1 x:HATest_shard1_replica2] o.a.s.s.ManagedIndexSchema Bad version when trying to persist schema using 0 due to: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /configs/HATest/managed-schema
2016-12-31 12:59:05.581 INFO (qtp110456297-16) [c:HATest s:shard1 r:core_node1 x:HATest_shard1_replica2] o.a.s.s.ManagedIndexSchema Failed to persist managed schema at /configs/HATest/managed-schema - version mismatch
2016-12-31 12:59:05.634 ERROR (qtp110456297-16) [c:HATest s:shard1 r:core_node1 x:HATest_shard1_replica2] o.a.s.s.ManagedIndexSchema Bad version when trying to persist schema using 0 due to: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /configs/HATest/managed-schema
2016-12-31 12:59:05.634 INFO (qtp110456297-16) [c:HATest s:shard1 r:core_node1 x:HATest_shard1_replica2] o.a.s.s.ManagedIndexSchema Failed to persist managed schema at /configs/HATest/managed-schema - version mismatch
I looked up this error and found below link as per which it's just a warning, and the process should just work fine. However while indexing my process is all stuck.
https://issues.apache.org/jira/browse/SOLR-8791
I earlier thought it might be issue with my office machine, so I tried on my other laptop again, same issue. I am new to this would appreciate if you could help in this regards.
Thanks,
Aman

Related

What is preventing file persistence when using a Docker volume?

I have a Docker image named pfa-image (contains a fairly basic Express-based website), a running mongoDB container named pfa-mongo, and a docker volume named image-volume. When I run the following sequence of commands..:
host$ docker run -d --name pfa-container -v image-volume:/images \
--link pfa-mongo:mongodb -p 5000:5000 pfa-image
host$ docker exec -it pfa-container /bin/bash
container:/pfa-site# cd images
container:/pfa-site/images# touch test.txt
container:/pfa-site/images# exit
host$ docker rm -f pfa-container
host$ docker run -d --name pfa-container -v image-volume:/images \
--link pfa-mongo:mongodb -p 5000:5000 pfa-image
host$ docker exec -it pfa-container /bin/bash
container:/pfa-site# cd images
container:/pfa-site/images# ls
...test.txt is missing. What am I overlooking here? I am quite new to docker and somewhat new to Linux.
Thank you!
I have tried using bind mounts and volumes, to the same result.

How to launch the Solr techproducts example on Docker?

I am running solr in docker and I tried the commands from the comment.
docker run --name test -d -p 8983:8983 -t solr
docker exec -it --user=solr test bin/solr create -c techproducts -d sample_techproducts_configs
After the last command, I received the following error message:
Unrecognized argument: example/exampledocs/*.xml .
If this was intended to be a data file, it does not exist relative to /opt/solr
Is this the correct location for the techproducts.xml data?

I looked at the official solr image on hub.docker.com and I found this,
docker run -d -P -v $PWD/myconfig:/myconfig solr solr-create -c mycore -d /myconfig
I guess you need to pass in core configuration from your host with bind mount as in the example. In this case it is not myconfig, it is "sample_techproducts_configs"

I tried the following, it worked for me.
docker run --name my_solr -d -p 8983:8983 -t solr
docker exec -it --user=solr my_solr bin/solr create_core -c
techproducts
docker exec -it --user=solr my_solr bin/post -c techproducts
example/exampledocs/

execute a command within docker swarm service

Initialize swarm mode:
root#ip-172-31-44-207:/home/ubuntu# docker swarm init --advertise-addr 172.31.44.207
Swarm initialized: current node (4mj61oxcc8ulbwd7zedxnz6ce) is now a manager.
To add a worker to this swarm, run the following command:
Join the second node:
docker swarm join \
--token SWMTKN-1-4xvddif3wf8tpzcg23tem3zlncth8460srbm7qtyx5qk3ton55-6g05kuek1jhs170d8fub83vs5 \
172.31.44.207:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
# start 2 services
docker service create continuumio/miniconda3
docker service create --name redis redis:3.0.6
root#ip-172-31-44-207:/home/ubuntu# docker service ls
ID NAME REPLICAS IMAGE COMMAND
2yc1xjmita67 miniconda3 0/1 continuumio/miniconda3
c3ptcf2q9zv2 redis 1/1 redis:3.0.6
As shown above, redis has it's replica while miniconda does not seem to be replicated.
I do usually log-in to miniconda container to type these commands:
/opt/conda/bin/conda install jupyter -y --quiet && mkdir /opt/notebooks && /opt/conda/bin/jupyter notebook --notebook-dir=/opt/notebooks --ip='*' --port=8888 --no-browser
The problem is that docker exec -it XXX bash command does not work with swarm mode.

You can execute commands by filtering container name without needing to pass the entire swarm container hash, just by the service name. Like this:
docker exec $(docker ps -q -f name=servicename) ls

There is one liner for accessing corresponding instance of the service for localhost:
docker exec -ti stack_myservice.1.$(docker service ps -f 'name=stack_myservice.1' stack_myservice -q --no-trunc | head -n1) /bin/bash
It is tested on PowerShell, but bash should be the same. The oneliner accesses the first instance, but replace '1' with the number of the instance you want to access in two places to get other one.
More complex example is for distributed case:
#! /bin/bash
set -e
exec_task=$1
exec_instance=$2
strindex() {
x="${1%%$2*}"
[[ "$x" = "$1" ]] && echo -1 || echo "${#x}"
}
parse_node() {
read title
id_start=0
name_start=`strindex "$title" NAME`
image_start=`strindex "$title" IMAGE`
node_start=`strindex "$title" NODE`
dstate_start=`strindex "$title" DESIRED`
id_length=name_start
name_length=`expr $image_start - $name_start`
node_length=`expr $dstate_start - $node_start`
read line
id=${line:$id_start:$id_length}
name=${line:$name_start:$name_length}
name=$(echo $name)
node=${line:$node_start:$node_length}
echo $name.$id
echo $node
}
if true; then
read fn
docker_fullname=$fn
read nn
docker_node=$nn
fi < <( docker service ps -f name=$exec_task.$exec_instance --no-trunc -f desired-state=running $exec_task | parse_node )
echo "Executing in $docker_node $docker_fullname"
eval `docker-machine env $docker_node`
docker exec -ti $docker_fullname /bin/bash
This script could be used later as:
swarm_bash stack_task 1
It just execute bash on required node.

EDIT 2017-10-06:
Nowadays you can create the overlay network with --attachable flag to enable any container to join the network. This is great feature as it allows a lot of flexibility.
E.g.
$ docker network create --attachable --driver overlay my-network
$ docker service create --network my-network --name web --publish 80:80 nginx
$ docker run --network=my-network -ti alpine sh
(in alpine container) $ wget -qO- web
<!DOCTYPE html>
<html>
<head>
....
You are right, you cannot run docker exec on docker swarm mode service. But you can still find out, which node is running the container and then run exec directly on the container. E.g.
docker service ps miniconda3 # find out, which node is running the container
eval `docker-machine env <node name here>`
docker ps # find out the container id of miniconda
docker exec -it <container id here> sh
In your case you first have to find out, why service cannot get the miniconda container up. Maybe running docker service ps miniconda3 shows some helpful error messages..?

Using the Docker API
Right now, Docker does not provide an API like docker service exec or docker stack exec for this. But regarding this, there already exists two issues dealing with this functionality:
github.com - moby/moby - Docker service exec
github.com - docker/swarmkit - Support for executing into a task
(Regarding the first issue, for me, it was not directly clear that this issue deals with exactly this kind of functionality. But Exec for Swarm was closed and marked as duplicate of the Docker service exec issue.)
Using Docker daemon over HTTP
As mentioned by BMitch on run docker exec from swarm manager, you could also configure the Docker daemon to use HTTP and than connect to every node without the need of ssh. But you should protect this using TLS authentication which is already integrated into Docker. Afterwards you would be able to execute the docker exec like this:
docker --tlsverify --tlscacert=ca.pem --tlscert=cert.pem --tlskey=key.pem \
-H=$HOST:2376 exec $containerId $cmd
Using skopos-plugin-swarm-exec
There exists a github project which claims to solve the problem and provide the desired functionality binding the docker daemon:
docker run -v /var/run/docker.sock:/var/run/docker.sock \
datagridsys/skopos-plugin-swarm-exec \
task-exec <taskID> <command> [<arguments>...]
As far as I can see, this works by creating another container at same node where the container reside where the docker exec should by executed on. On this node this container mounts the docker daemon socket to be able to execute docker exec there locally.
For more information have a look at: skopos-plugin-swarm-exec
Using docker swarm helpers
There is also another project called docker swarm helpers which seems to be more or less a wrapper around ssh and docker exec.
Reference:
https://github.com/docker/swarmkit/issues/1895#issuecomment-302147604
https://github.com/docker/swarmkit/issues/1895#issuecomment-358925313

You can jump in a Swarm node and list the docker containers running using:
docker container ls
That will give you the container name in a format similar to: containername.1.q5k89uctyx27zmntkcfooh68f
You can then use the regular exec option to run commands on it:
docker container exec -it containername.1.q5k89uctyx27zmntkcfooh68f bash

created a small script for our docker swarm cluster.
this script takes 3 params. first is the service you want to connect to second the task you want to run this can be /bin/bash or any other process you want to run. Third is optional and will fill -c option for bash or sh
-n is optional to force it to connect to a node
it retrieves the node that runs the service and runs the command.
#! /bin/bash
set -e
task=${1}
service=$2
bash=$3
serviceID=$(sudo docker service ps -f name=$service -f desired-state=running $service -q --no-trunc |head -n1)
node=$(sudo docker service ps -f name=$service -f desired-state=running $service --format="{{.Node}}"| head -n1 )
sudo docker -H $node exec -it $service".1."$serviceID $bash -c "$task"
note: this requires the docker nodes to accept tcp connections by exposing docker on port 2375 on the worker nodes

For those who have multiple replicas and just want to run a command within any of them, here is another shortcut:
docker exec -it $(docker ps -q -f name=SERVICE_NAME | head -1) bash

I wrote script to exec command in docker swarm by service name. For example it can be used in cron. Also you can use bash pipelines and passes all params to docker exec command. But works only on same node where service started. I wish it could help someone
#!/bin/bash
# swarm-exec.sh
set -e
for ((i=1;i<=$#;i++)); do
val=${!i}
if [ ${val:0:1} != "-" ]; then
service_id=$(docker ps -q -f "name=$val");
if [[ $service_id == "" ]]; then
echo "Container $val not found!";
exit 1;
fi
docker exec ${#:1:$i-1} $service_id ${#:$i+1:$#};
exit 0;
fi
done
echo "Usage: $0 [OPTIONS] SERVICE_NAME COMMAND [ARG...]";
exit 1;
Example of using:
./swarm-exec.sh app_postgres pg_dump -Z 9 -F p -U postgres app > /backups/app.sql.gz
echo ls | ./swarm-exec.sh -i app /bin/bash
./swarm-exec.sh -it some_app /bin/bash

The simpliest command I found to docker exec into a swarm node (with a swarm manager at $SWARM_MANAGER_HOST) running the service $SERVICE_NAME (for example mystack_myservice) is the following:
SERVICE_JSON=$(ssh $SWARM_MANAGER_HOST "docker service ps $SERVICE_NAME --no-trunc --format '{{ json . }}' -f desired-state=running")
ssh -t $(echo $SERVICE_JSON | jq -r '.Node') "docker exec -it $(echo $SERVICE_JSON | jq -r '.Name').$(echo $SERVICE_JSON | jq -r '.ID') bash"
This asserts that you have ssh access to $SWARM_MANAGER_HOST as well as the swarm node currently running the service task.
This also asserts that you have jq installed (apt install jq), but if you can't or don't want to install it and you have python installed you can create the following alias (based on this answer):
alias jq="python3 -c 'import sys, json; print(json.load(sys.stdin)[sys.argv[2].partition(\".\")[-1]])'"

See addendum 2...
Example of a oneliner for entering the database my_db on node master:
DB_NODE_ID=master && docker exec -it $(docker ps -q -f name=$DB_NODE_ID) mysql my_db
In case you want to configure, say max_connections:
DB_NODE_ID=master && $(docker exec -it $(docker ps -q -f name=$DB_NODE_ID) mysql -e "SET GLOBAL max_connections = 1000") && docker exec -it $(docker ps -q -f name=$DB_NODE_ID) mysql my_db
This approach allows to enter all database nodes (e.g. slaves) just by setting the DB_NODE_ID variable accordingly.
Example for slave s2:
DB_NODE_ID=s2 && docker exec -it $(docker ps -q -f name=$DB_NODE_ID) mysql my_db
or
DB_NODE_ID=s2 && $(docker exec -it $(docker ps -q -f name=$DB_NODE_ID) mysql -e "SET GLOBAL max_connections = 1000") && docker exec -it $(docker ps -q -f name=$DB_NODE_ID) mysql my_db
Put this into your KiTTY or PuTTY configuration for master / s2 under Data/Command and you are set.
As an addendum:
The old, non swarm mode version reads simply
docker exec -it master mysql my_db
resp.
DB_ID=master && $(docker exec -it $DB_ID mysql -e "SET GLOBAL max_connections = 1000") && docker exec -it $DB_ID mysql tmp
Addendum 2:
As it turned out by example, the term docker ps -q -f name=$DB_NODE_ID may return wrong values under certain conditions.
The following approach works correctily:
docker ps -a | grep "_$DB_NODE_ID." | awk '{print $1}'
You may substitute the examples above accordingly.
Addendum 3:
Well, these terms look awful and they certainly are painful to type, so you may want to ease your work. On Linux, everybody knows how to do this. On Windws, you may want to use AHK.
This is the AHK term I use:
:*:ii::DB_NODE_ID=$(docker ps -a | grep "_." | awk '{{}print $1{}}') && docker exec -it $id ash{Left 49}
So when I type ii -- which is as simple as it can get -- I get the desired term with the cursor in place and just have to fill in the container name.

I edited the script Brian van Rooijen added above. Because my reputation is to low, I cannot add it
#! /bin/bash
set -e
service=${1}
shift
task="$*"
echo $task
serviceID=$(docker service ps -f name=$service -f desired-state=running $service -q --no-trunc |head -n1)
node=$(docker service ps -f name=$service -f desired-state=running $service --format="{{.Node}}"| head -n1 )
serviceName=$(docker service ps -f name=$service -f desired-state=running $service --format="{{.Name}}"| head -n1 )
docker -H $node exec -it $serviceName"."$serviceID $task
I had the issue that the container didn't exists with the hard coded .1. in the execution.

Take a look at my solution: https://github.com/binbrayer/swarmServiceExec.
This approach is based on Docker Machines. I also created the prototype of the script to call containers asynchronously and as a result simultaneously.

Mariadb failure to daemonise with docker

I'm trying to use this image https://hub.docker.com/_/mariadb/ (any version).
I'm using the following to launch the container:
cd maria
docker build -t maria-image .
docker run --name maria maria-image -d -e MYSQL_ALLOW_EMPTY_PASSWORD=1
cd ..
I'm preparing a custom build in case I need to do any future modifications so that lives in maria/Dockerfile with the following:
FROM mariadb:5.5
MAINTAINER ...
EXPOSE 3306
If I do docker ps -a I get status "Exited (2) 5 seconds ago".

Your args appear to be in the wrong order, maria-image should be after all other docker run args:
docker run --name maria -d -e MYSQL_ALLOW_EMPTY_PASSWORD=1 maria-image
The version you ran passed the -d and -e as the command for docker to run. Note that you'll want to first run docker rm -v maria to free the container name for reuse.

How to run a database independent of a Rails app?

How can I separate a database and rails app into two different containers? The tutorial on Docker shows how to create the two with the docker-compose set-up, however I'm more curious on how to set this up manually so that I can play around with SOA on Docker.

Create an instance of your db container
db stop/rm/pull/run:
# First three lines are for teardown/reubuild
#!/bin/bash
docker stop myapp-postgres
docker rm myapp-postgres
docker pull postgres
docker run --name myapp-postgres -t -i -d postgres
app stop/rm/pull/run:
#!/bin/bash
docker stop myapp
docker rm myapp
docker pull dockerhubname/myapp
docker run -d -t -i --link myapp-postgres:postgres -p 80:80 --name myapp dockerhubname/myapp
#spit out some useful info
docker ps
MYAPP_MACHINE=$(docker ps | grep myapp | awk '{print $1}')
echo $MYAPP_MACHINE
docker exec -ti $MYAPP_MACHINE ps -aux

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Issue while indexing data in solr in docker cluster setup - docker

Related

What is preventing file persistence when using a Docker volume?

How to launch the Solr techproducts example on Docker?

execute a command within docker swarm service

Mariadb failure to daemonise with docker

How to run a database independent of a Rails app?

Categories

Resources