Keyspace doesn't appear in Opscenter Repair Service UI - datastax-enterprise

I recently updated to Datastax OpsCenter 6.1.0 and enabled the repair service against my cluster running DSE 5.0.5. The UI shows repair task status for tables in the OpsCenter keyspace, but none for the keyspace which stores all my application data. My keyspace is configured with NetworkTopologyStrategy and two replicas in each of two data centers.
How can I determine why OpsCenter is not repairing my keyspace (I don't see anything relevant in the logs)? Is there something specific I need to change in the configuration?

After updating to OpsCenter 6.1.3, the "Subrange" area of the repair service appears to be making significant progress for the first time. It's almost to 10%, whereas previously it would only ever get to ~2% at most. I am optimistic that the upgrade has fixed the issue.

Related

Ksqldb High Availability - how to reach it

I’m working with a ksqldb server deployed in Kubernetes, and since some time ago it crashed for some reason, I want to implement High Availability as described in https://docs.ksqldb.io/en/latest/operate-and-deploy/high-availability/
We are deploying the server with docker, so the properties that we put inside the config file are:
KSQL_KSQL_STREAMS_NUM_STANDBY_REPLICAS: “2”
KSQL_KSQL_QUERY_PULL_ENABLE_STANDBY_READS: “true”
KSQL_KSQL_HEARTBEAT_ENABLE: “true”
KSQL_KSQL_LAG_REPORTING_ENABLE: “true”
When doing so and restarting the server, I can see that only the first 2 properties are properly set, and I can see the last two (for example with SHOW PROPERTIES from the ksqdb CLI).
Do you have an idea about why I can’t see them?
Do I have to manually deploy a second ksqldb server with the same ksql.service.id?
If this is the case, what is the correct way to do it? Are there particular properties to be set?

Is it okay to run the latest version of Ops Center (DSE 6.0) on the same instance as an app server...?

Is it okay to run the latest version of Ops Center (DSE 6.0) on the same instance as an app server with 8GB mem and 2vCPUs for a dev server or is Ops Center better on its own instance?
We’ve seen that it requires good amount of resources.
For development environment it could be ok, although this could lead to swapping, or slowness if you have quite a lot of data about cluster.
More general recommendation is to separate database tables used by OpsCenter from the your production tables - OpsCenter writes/reads quite a lot of data, and this could affect the performance. In production is always better to have a separate cluster for OpsCenter's data to separate load. Per DataStax license, instances in cluster used for OpsCenter doesn't require a separate license.

google cloud sql redmine mysql not responding

I've been trying to set up a Redmine on google compute engine with the mysql 5.5 database hosted on google cloud sql (d1, 512mb of ram, always-on, europe, package-billed).
Unfortunately, Redmine stops responding (really stops, I set the timeout to 1hour and nothing happens) to requests after a few minutes. Using newrelic I found out that it's database-related - ActiveRecord seems to have some problems with the database ..
In order to find out if the problems are really related to the cloud sql database, I set up a new database on my own server and it's working fine since then. So there definitely is an issue with the cloud sql database and redmine/ruby.
Does anyone have an idea what I can try to solve the problem?
Best,
Jan
GCE idle connections are closed automatically after 10 minutes as explained in [1]. As you are connecting to CloudSQL from a GCE instance, this is most likely the cause for your issue.
Additionally, take into account Cloud SQL instances can go down and come back anytime due to maintenances and connections must be managed accordingly. Checking the CloudSQL instance operation list would confirm this. Hope this helps.
[1] https://cloud.google.com/sql/docs/gce-access

DSE OpsCenter showing wrong node status

I've come across several occurrences already when one or two of our DSE Search nodes would be shown with "Down - Unresponsive" status in OpsCenter even though the node is up (i.e. I can access the Solr admin UI). Sometimes, nodetool status would also show that the node is down. But more often, it's only OpsCenter. I found out that the fix is to restart the datastax-agent service. Would could be causing this?
I'd also like to follow-up my other questions:
New Solr node in "Active - Joining" state for several days
Fault tolerance and topology transparency of multi-node DSE Cluster

Neo4j HA replication issue on 1.9.M01

I'm using Neo4j 1.9.M01 in a Spring MVC application that exposes some domain specific REST services (read, update). The web application is deployed three times into the same web container (Tomcat 6) and each "node" has it's own embedded Neo4j HA instance part of the same cluster.
the three Neo4j config:
#node 1
ha.server_id=1
ha.server=localhost:6361
ha.cluster_server=localhost:5001
ha.initial_hosts=localhost:5001,localhost:5002,localhost:5003
#node 2
ha.server_id=2
ha.server=localhost:6362
ha.cluster_server=localhost:5002
ha.initial_hosts=localhost:5001,localhost:5002,localhost:5003
#node 3
ha.server_id=3
ha.server=localhost:6363
ha.cluster_server=localhost:5003
ha.initial_hosts=localhost:5001,localhost:5002,localhost:5003
Problem: when performing an update on one of the nodes the change is replicated to only ONE other node and the third node stays in the old state corrupting the consistency of the cluster.
I'm using the milestone because it's not allowed to run anything outside of the web container so I cannot rely on the old ZooKeeper based coordination in pre-1.9 versions.
Do I miss some configuration here or can it be an issue with the new coordination mechanism introduced in 1.9?
This behaviour (replication only to ONE other instance) is the same default as in 1.8. This is controlled by:
ha.tx_push_factor=1
which is the default.
Slaves get updates from master in a couple of ways:
By configuring a higher push factor, for example:
ha.tx_push_factor=2
(on every instance rather, because the one in use is the one on the current master).
By configuring pull interval for slaves to fetch updates from its master, for example:
ha.pull_interval=1s
By manually pulling updates using the Java API
By issuing a write transaction from the slave
See further at http://docs.neo4j.org/chunked/milestone/ha-configuration.html
A first guess would be to set
ha.discovery.enabled = false
see http://docs.neo4j.org/chunked/milestone/ha-configuration.html#_different_methods_for_participating_in_a_cluster for an explanation.
For a full analysis could you please provide data/graph.db/messages.log from all three cluster members.
Side note: It should be possible to use 1.8 also for your requirements. You could also spawn zookeeper directly from tomcat, just mimic what bin/neo4j-coordinator does: run class org.apache.zookeeper.server.quorum.QuorumPeerMain in a seperate thread upon startup of the web application.

Resources