Set up a sharded solr collection using solrcloud

Set up a sharded solr collection using solrcloud - solr4

I would like to set up a 6 shards solr collection on 3 windows machines.
Tried the bin\solr -e cloud and set up 2 machines 6 shards and 1 replica. When stopping and starting 2 cores on one machine (each using another hard disk) I get 6 shards; 3 for each core.
When I start another core on another machine nothing happens, the 3rd one doesn't do anything.
When I start another core on the same machine using the same config in another directory nothing happens, the core starts but has no collections and the 2 cores first started still have 3 shards each.
For example: I start the 3rd one with:
bin\solr start -c -p 7576 -z localhost:9983 -s server/solr/collection/node3/solr
Or start on another machine:
bin\solr start -c -p 7576 -z zookeeper:9983 -s server/solr/collection/node3/solr
Is there some documentation out there that doesn't use the "convenient" bin\solr that I'm trying to reverse engineer the entire day to figure out how to set up zookeeper/solr to add the nth solr core as a shard until 6 shards are reached?

I think I found the answer: bin\solr -e cloud starts up the cores and assignes data to them.
After running the standard bin\solr -e cloud with 2 cores, a collection with 6 shards and 1 replica I stop all bin\solr stop -all
Then copy solr-5.2.1\example\cloud\node1 as solr-5.2.1\example\cloud\node3 delete the files in solr-5.2.1\example\cloud\node3\logs and let solr-5.2.1\example\cloud\node3 have gettingstarted_shard6_replica1 (leave that file in solr-5.2.1\example\cloud\node3\solr and remove it from solr-5.2.1\example\cloud\node1\solr).
Start up 3 cores:
bin\solr start -c -p 8983 -s example\cloud\node1\solr
bin\solr start -cloud -p 7574 -z localhost:9983 -s example\cloud\node2\solr
bin\solr start -cloud -p 7575 -z localhost:9983 -s example\cloud\node3\solr
And now I can see the 3rd solr instance has gettingstarted_shard6_replica1

Related

Create a single container instead of 3 different containers

I saw you were setting up a Docker-compose file but it which creates 3 different containers but wanted to combine those 3 containers to a single container/image instead of setting it up as multiple containers at deployment system.
My current list of containers are as follow:
my main container containing my code that I built using Docker File
rest 2 are containers of Redis and Postress but wanted to combine them in 1.
Is there any way to do so?

First of all, running redis, postgres and your "main container" in one container is NOT best practice.
Typically you should have 3 separate containers (single app per container) communicating over the network. Sometimes we want to run two or more lightweight services inside the same container but redis and postgres aren't such services.
I recommend reading: best practices for building containers.
However, it's possible to have multiple services in the same docker container using the supervisord process management system.
I will run both redis and postgres services in one docker container (it's similar to your issue) to illustrate you how it works. It's for demonstration purposes only.
This is a directory structure, we only need Dockerfile and supervisor.conf (supervisord config file):
$ tree example_container/
example_container/
├── Dockerfile
└── supervisor.conf
First, I created a supervisord configuration file with redis and postgres services defined:
$ cat example_container/supervisor.conf
[supervisord]
nodaemon=true
[program:redis]
command=redis-server # command to run redis service
autorestart=true
stderr_logfile=/dev/stdout
stderr_logfile_maxbytes = 0
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes = 0
[program:postgres]
command=/usr/lib/postgresql/12/bin/postgres -D /var/lib/postgresql/12/main/ -c config_file=/etc/postgresql/12/main/postgresql.conf # command to run postgres service
autostart=true
autorestart=true
stderr_logfile=/dev/stdout
stderr_logfile_maxbytes = 0
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes = 0
user=postgres
environment=HOME="/var/lib/postgresql",USER="postgres"
Next I created a simple Dockerfile:
$ cat example_container/Dockerfile
FROM ubuntu:latest
ARG DEBIAN_FRONTEND=noninteractive
# Installing redis and postgres
RUN apt-get update && apt-get install -y supervisor redis-server postgresql-12
# Copying supervisor configuration file to container
ADD supervisor.conf /etc/supervisor.conf
# Initializing redis and postgres services using supervisord
CMD ["supervisord","-c","/etc/supervisor.conf"]
And then I built the docker image:
$ docker build -t example_container:v1 .
Finally I ran and tested docker container using the image above:
$ docker run --name multi_services -dit example_container:v1
472c7b2eac7441360126f8fcd0cc80e0e63ac3039f8195715a3a400f6288a236
$ docker exec -it multi_services bash
root#472c7b2eac74:/# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.7 0.1 27828 23372 pts/0 Ss+ 10:04 0:00 /usr/bin/python3 /usr/bin/supervisord -c /etc/supervisor.conf
postgres 8 0.1 0.1 212968 28972 pts/0 S 10:04 0:00 /usr/lib/postgresql/12/bin/postgres -D /var/lib/postgresql/12/main/ -c config_file=/etc/postgresql/12/main/postgresql.conf
root 9 0.1 0.0 47224 6216 pts/0 Sl 10:04 0:00 redis-server *:6379
...
root#472c7b2eac74:/# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:6379 0.0.0.0:* LISTEN 9/redis-server *:6
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN 8/postgres
tcp6 0 0 :::6379 :::* LISTEN 9/redis-server *:6
As you can see it is possible to have multiple services in a single container but this is a NOT recommended approach that should be used ONLY for testing.

Regarding Kubernetes, you can group your containers in a single pod, as a deployment unit.
A Pod is the smallest deployable units of computing that you can create and manage in Kubernetes.
It is a group of one or more containers, with shared storage and network resources, and a specification for how to run the containers.
A Pod's contents are always co-located and co-scheduled, and run in a shared context.
That would be more helpful than trying to merge containers together in one container.

dumpcap stops logging after 1184 files?

Recently I come across a linux application design. The indent of the application is to log the ethernet frames via dumpcap <> api in linux. But they implemented as below:
Create a new process using fork()
Call dumpcap <> in execl() as shown below
a. execl("/bin/sh", "/bin/sh", "-c", dumpcap<>, NULL);
b. sudo dumpcap -i "eth0" -B 1 -b filesize:5 -w "/mnt/Test_1561890567.pcapng" -t -q
They send a SIGTERM to kill the process
The problem facing now is when ever we run the command from the process after 1184 or 1185 no:files then dumpcap stops logging. The process and thread is alive the command we can see in top command.

Does `strace -f` work differently when run inside a docker container?

Assume the following:
I have a program myprogram inside a docker container
I'm running the docker container with
docker run --privileged=true my-label/my-container
Inside the container - the program is being run with:
strace -f -e trace=desc ./myprogram
What I see is that the strace (despite having the -f on) doesn't follow all the child processes.
I see the following output from strace
[pid 10] 07:36:46.668931 write(2, "..\n"..., 454 <unfinished ...>
<stdout of ..>
<stdout other output - but I don't see the write commands - so probably from a child process>
[pid 10] 07:36:46.669684 write(2, "My final output\n", 24 <unfinished ...>
<stdout of My final output>
What I want to see is the other write commands.
Now I should see the the other write commands - because I'm using -f.
What I think is happening is that running inside docker makes the process handling and security different.
My question is: Does strace -f work differently when run inside a docker container?
Note that this application starts and stops in 2 seconds - so the tracing tool has to follow the application lifecycle - like strace does. Connecting to a server background process won't work.

It turns out strace truncates string output - you have to explicitly tell it that you want more than the first n (10?) string chars. You do this with -s 800.
strace -s 800 -ff ./myprogram
You can also get all the write commands by asking strace explicitly with -e write.
strace -s 800 -ff -e write ./myprogram

solr 6.3.0 not starting Ubuntu 14.04

I am trying to run solr on my machine. I have made everthing available for the same.
For example java and ruby versions are same as asked in the tutorials around.
This is how I am doing it.
solr_wrapper -d solr/config/ --collection_name hydra-development --version 6.3.0
This throws the followign error.
`exec': Failed to execute solr start: (RuntimeError)
Port 8983 is already being used by another process (pid: 1814)
Please choose a different port using the -p option.

The error message clearly indicates that some other process is using port 8983.
U need to find which process and try killing it
first run
$ lsof -i :8983
This will list applications running on port 8983. Lets say the pid of the process is 1814
run
$ sudo kill 1814
if you run into Error CREATEing SolrCore, it is mostly because of the permission issues caused by root installation
first cleanup the broken core:
bin/solr delete -c mycore
and recreate core as the solr user
su -u solr -c "/opt/solr/bin/solr create_core -c mycore"

Cannot find main class SolrCLI when running bin/solr -e cloud

I want to compile solr from the main trunk and run it.
I did the following:
git clone https://github.com/apache/lucene-solr.git
cd lucene-solr/solr
ant dist
bin/solr -e cloud
This creates the relevant solr nodes but fails to create a collection with the following error:
$ bin/solr -e cloud
Welcome to the SolrCloud example!
This interactive session will help you launch a SolrCloud cluster on your local workstation.
To begin, how many Solr nodes would you like to run in your local cluster? (specify 1-4 nodes) [2]
Ok, let's start up 2 Solr nodes for your example SolrCloud cluster.
Please enter the port for node1 [8983]
8983
Please enter the port for node2 [7574]
7574
Starting up SolrCloud node1 on port 8983 using command:
solr start -cloud -s example/cloud/node1/solr -p 8983
Waiting to see Solr listening on port 8983 [|]
Started Solr server on port 8983 (pid=94888). Happy searching!
Starting node2 on port 7574 using command:
solr start -cloud -s example/cloud/node2/solr -p 7574 -z localhost:9983
Waiting to see Solr listening on port 7574 [|]
Started Solr server on port 7574 (pid=94979). Happy searching!
Now let's create a new collection for indexing documents in your 2-node cluster.
Please provide a name for your new collection: [gettingstarted]
gettingstarted
How many shards would you like to split gettingstarted into? [2]
2
How many replicas per shard would you like to create? [2]
2
Please choose a configuration for the gettingstarted collection, available options are:
basic_configs, data_driven_schema_configs, or sample_techproducts_configs [data_driven_schema_configs]
Error: Could not find or load main class org.apache.solr.util.SolrCLI
I am sure this used to work before.
But I am not able to figure out what's wrong.
Any help would be appreciated.

ant server needs to be run to solve the classpath issue.
(Or ant example for older versions).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Set up a sharded solr collection using solrcloud - solr4

Related

Create a single container instead of 3 different containers

dumpcap stops logging after 1184 files?

Does `strace -f` work differently when run inside a docker container?

solr 6.3.0 not starting Ubuntu 14.04

Cannot find main class SolrCLI when running bin/solr -e cloud

Categories

Resources