datastax solr index no documents - datastax-enterprise

I have a datastax cluster setup, and am trying to run through the demo located here: https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/srch/srchTutStrt.html. The cluster is healthy, with 2 datacenters (2 cassandra nodes in one dc and 1 solr node in another dc). The issue i'm seeing is that the solr admin is reporting no documents, even though there are records in cassandra. Any ideas on what to look at to see why nothing is indexing?
Note: I've tried issuing a reindex from the solr core admin page and it runs without error, but there are no documents.

Related

Couchdb 3.1.0 cluster - database failed to load after restarting one node

Here is the situation : on a couchdb cluster made of two nodes, each node is a couchdb docker instance on a server (ip1 and ip2). I had to reboot one server and restart docker, after that both my couchdb instances displays for each database: "This database failed to load."
I can connect with Futon and see the full list of databases, but that's all. On "Verify Couchdb Installation" with Futon I have several errors (only 'Create database' is a green check)
The docker logs for the container gives me this error :
"internal_server_error : No DB shards could be opened"
I tried to recover the database locally by copying the .couch and shards/ files to a local instance of couchdb but the same problem occurs.
How can I retrieve the data ?
PS: I checked the connectivity between my two nodes with erl, no problem there. Looks like docker messed up with some couchdb config file on restart.
metadata and cloning a node
The individual databases have metadata indicating on which nodes their shards are stored which is built on creation based on cluster options, so copying database files alone does not actually move or mirror the database on to the new node. (If one sets the metadata correctly the shards are copied by couch itself, so copying the files is only done to speed up the process.)
replica count
A 2 node cluster usually does not make sense. As with file system RAID, you can stripe for maximal performance and half the reliability or you can create a mirror, but unless individual node state has perfect consistency detection one can not automatically decide which of two nodes is incorrect, while deciding which of 3 nodes is incorrect is easy enough to perform automatically. Consequently, most clusters are 3 or more nodes and each shard has 3 replicas on any 3 nodes.
Alright, just in case someone do the same mistake :
When you have a 2 node cluster, couchdb#ip1 and couchdb#ip2, and created the cluster from couchdb#ip1 :
1) If the node couchdb#ip2 stops, then the cluster setup is messed up (couchdb#ip1 will no longer work), on restart it appears that the node will not connect correctly and the databases will appear but will not be available.
2) On the other hand, stoping and starting couchdb#ip1 do not cause any problem
The solution in case 1 is to recreate the cluster with 2 fresh couchdb instances (couchdb#ip1 and couchdb#ip2), then copy the databases on one couchdb instance and all the databases will be back !
If anyone can explain in detail why this happend ? It also means that this cluster configuration is absolutly not reliable (if couchdb#ip2 is down then nothing works), I guess it will not be the same with a 3 nodes cluster ?

Using Gremlin-Client container and Gremlin-server with neo4j

sorry to bother you. I am trying to set up an ecosystem for a graph database that is going to be used by an application.
I am going to use the gremlin-client container:
Gremlin-Console docker
Also i am going to use gremlin-server container:
Gremlin-Server container
And finally i want to use the neo4j container as the storage layer:
Neo4j container
I have read all the docker-files and i was able to connect the console with the server. But now i need to connect the gremlin server container to the neo4j container. I have several links on the web but i was not able to complete this task. It gives me an error of server failure when i tried to connect to neo4j through gremlin-server by running the gremlin-server.sh file .
I have downloaded the repository in order to change the docker-file to fit my needs. Does anyone has experience and knows the correct procedure on how to accomplish the connection between the neo4j container with the gremlin server container and making queries through the gremlin console container?
Please any help would be really appreciated.
Thanks in advance,
Juan Ignacio
Since you want to use Neo4j Server you're basically asking how to connect Gremlin Server to a Neo4j Server which was asked in this question. You must either:
Configure the Neo4j graph in Gremlin Server to use HA mode as described here
Configure the Neo4j graph in Gremlin Server to use the Bolt implementation found here
Once you have Gremlin Server connected to Neo4j Server you can then connect Gremlin Console to Gremlin Server through "remoting" discussed here.
In your comments below, you alluded to the fact that you really just want to use Gremlin Console with Neo4j. I brought up the options above because you referenced use of Docker containers and specifically Neo4j Server. Note that you can get going very quickly with Neo4j in embedded mode directly in the Gremlin Console which is discussed in detail here. In that case there is no need for Docker, Neo4j Server, etc.
If you must use Neo4j Server/Docker for some reason and connect to it from the Gremlin Console, then you will still go with one of the two options discussed above, either (1) HA Mode or (2) neo4j-gremlin-bolt but you would simply create those Graph instances in Gremlin Console. For HA mode, that would mean the Gremlin Console would effectively become a node in the Neo4j cluster and for neo4j-gremlin-bolt your Graph instance would just connect over the Bolt protocol.

neo4j browser not running queries spinning wheel

I'm a newbie with Neo4j and I am attempting to set up a local Neo4j Causal Cluster where all nodes are on the same machine (CENTOS6.6 x64). I can login to each node through Firefox browser using localhost:7474, localhost:7475, localhost:7476. However, when I try to create a node/database it just shows spinning wheel and never stops.
I tried using :sysinfo and it shows all nodes with correct leader and followers and port numbers.
I can use the cypher-shell and it allows me to create nodes and run match queries etc.
I've disabled firewall and stopped iptables and diabled proxy settings but it still won't let me create a node through the browser.
Any help would be greatly appreciated.

Cassandra and lucene: User management in docker

I am using a Docker image to run cassandra with the lucene plugin: https://hub.docker.com/r/cpelka/cassandra-lucene/
I am running this image on the Google container engine.
Everything works fine, except the user management. When I log in with the cassandra/cassandra user, it seems to have no rights. I cannot list users, and I cannot change passwords.
I can access and edit tables fine though, it just does not seem to be a superuser.
Something I read is that I have to enable password auth. I added the setting to my cassandra.yaml, but I cannot restart my cassandra service. Hell, if I run service cassandra stop, it takes a while and stops, and then I can still connect to my DB remotely. I think the docker image runs the Database in ways that I do not understand with my one day of cassandra experience. Any help is appreciated.
Thanks and good day,
Dries

DataStax Enterprise 4.8.4, 2-node cluster on AWS EC2 formation snitch guidance

I have just started working with DSE 4.8.4 in AWS EC2. Launched 2 "m3.xlarge" instances in us-west-1a availability zone. Of course, both nodes are in same region and same availability zone. This is fresh installation and does not have any user defined keyspace and no data etc.
On both instances, DSE 4.8.4 was installed as per datastax documentation. Service 'dse' starts on both nodes as individual with default endpoint_snitch as "com.datastax.bdp.snitch.DseSimpleSnitch" and I have used ALL private IP addresses on both nodes in cassandra.yaml file.
Now, when I changed 2nd node's .yaml file seeds property to point to 1st node Private IP address, dse service on 2nd node no longer starts with error indicating "Unable to find gossip ...", with hint to fix snitch settings.
I looked around and it seems, snitch should be used as Ec2Snitch.
Q1) Does both nodes need to have same snitch?
Q2) Will cassandra-rackdc.properties file need any changes due to us-west-1a?
Q3) Should I follow steps described in http://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeSingleDS.html ?
My objective is to build this 2-node cluster (manually as I did not use OpsCenter) with suitable changes to relevant config files.
I would really appreciate if someone could please point me in right direction?

Resources