What is the recommended replication strategy for OpsCenter keyspace? - datastax-enterprise

I'm using OpsCenter to monitor and configure my Cassandra cluster (It's actually a DSE cluster) and I have a keyspace that spans multiple datacenters. The OpsCenter keyspace, which is created and maintained by OpsCenter, use SimpleStrategy as the default replication strategy, which prevents me from turning on its repair service (mentioned in OpsCenter's document).
As the Isolating OpsCenter Performance Data blog says that using a dedicated datacenter requires us to manually monitor and scale the OpsCenter nodes, I was wondering what is the recommended replication strategy and factor for the OpsCenter keyspace so that storing OpsCenter data has limited performance impact on my production nodes while requires minimal tuning when I scale my production datacenters?
Suppose my production nodes use NetworkTopologyStrategy with two datacenters 'Cassandra' and 'Solr' (in a DSE setting) where 'Cassandra' datacenter supports OLTP and 'Solr' datacenter is dedicated for searching. Is it a valid solution to set the replication of OpsCenter keyspace to { 'class' : 'NetworkTopologyStrategy', 'Cassandra': 1}?
Thanks,
Ziju

NetworkTopologyStrategy is the recommended one for cases like this, so your proposed solution is valid. RF of 1 is debatable, but since OpsCenter keyspace doesn’t usually store anything very important, it should work okay, although I’d bump that to 2.

Related

Difference between using DSE Advanced Replication vs Manual Multi-DC Setup for Disaster Recovery?

I want to set up up a second cluster primarily for disaster recovery. I came across DSE Advanced Replication but I'm unsure what's the difference in functionality between DSE Advanced Replication vs manually setting up a Multi-DC Setup.
DSE Advanced Replication seems easier to setup and does not interfere with the replication-factor on writes (CMIIW). Can DSE Advanced Replication be used in setting up a second cluster for disaster recovery?
Advanced replication is usually used to setup things like, spoke-hub replication - when you copy data from smaller clusters into bigger cluster, unidirectional replication, when the connection between clusters is not permanent, etc.
In multi-DC setup, your DCs are forming the single cluster. In Advanced replication you're copying data between different clusters.

Kubernetes Deployments across the Datacenters

Is it possible to failover the traffic from a mysql k8s deployment running in one datacenter to a deployment running in another datacenter along with its storage?
If yes , Do we need to spread the same k8s cluster on multiple datacenters or we have to run separate k8s clusters in each datacenter?
How k8s will ship or manage the storage volume across the datacenters? Do we need a special type of cloud storage for that purpose?
note: I just qouted mysql as an example of application that needs to store some data , it can be anything stateful that needs to carry over its data volumes. it is not that kind of HA like mysql-HA , it is just starting serving the application as it is from somewhere else automatically along with its data. any application that stores data to volume.
How can we achieve HA for our stateful application across the datacenters using k8s.
Thanks
You don't need to use Kubernetes to achieve HA.
I would recommend using MySQL Replication(i.e. Master/Slave configuration) to achieve HA. More info in the docs on how to set replication up.
In one data center, you would have a Master, and in your other data center, you would have the slave. You can even have multiple slaves in multiple data centers.
If problems arise on the master, you can automatically failover to a slave using the mysqlfailover utility. This way you have your data in 2 data centers that is in sync.
I'm not sure if this exactly fits your use cases, but it is one option for enabling HA on your MySQL database.

Postgres on kaa cluster nodes dont sync with each other

I have setup kaa cluster with two nodes.
The postgres on second node does not sync with the first one, as i add any schema or sdk. Do I need to manually setup replication between postgres.
Or kaa handles this by itself, if it is so then why my second node is not in sync with the first.
admin-dao.properties
jdbc_url=jdbc:postgresql://192.168.1.21:5432,192.168.1.22:5432/kaa
sql-dao.properties
jdbc_host_port=192.168.1.21:5432,192.168.1.22:5432
Thanks
Rizwan
Yes, the replication has to be setup in order for dbs in cluser to sync. And kaa does not handle sync as per their documentation in Architecture Overview
http://kaaproject.github.io/kaa/docs/v0.10.0/Architecture-overview/
SQL database
SQL database instance is used to store tenants, applications, endpoint
groups and other metadata that does not grow as the number of
endpoints increases.
High availability of a Kaa cluster is achieved by deploying the SQL
database in HA mode. Kaa officially supports MariaDB and PostgreSQL as
the embedded SQL databases at the moment.

Bosun HA and scalability

I have a minor bosun setup, and its collecting metrics from numerous services, and we are planning to scale these services on the cloud.
This will mean more data coming into bosun and hence, the load/efficiency/scale of bosun is affected.
I am afraid of losing data, due to network overhead, and in case of failures.
I am looking for any performance benchmark reports for bosun, or any inputs on benchmarking/testing bosun for scale and HA.
Also, any inputs on good practices to be followed to scale bosun will be helpful.
My current thinking is to run numerous bosun binaries as a cluster, backed by a distributed opentsdb setup.
Also, I am thinking is it worthwhile to run some bosun executors as plain 'collectors' of scollector data (with bosun -n command), and some to just calculate the alerts.
The problem with this approach is it that same alerts might be triggered from multiple bosun instances (running without option -n). Is there a better way to de-duplicate the alerts?
The current best practices are:
Use https://godoc.org/bosun.org/cmd/tsdbrelay to forward metrics to opentsdb. This gets the bosun binary out of the "critical path". It should also forward the metrics to bosun for indexing, and can duplicate the metric stream to multiple data centers for DR/Backups.
Make sure your hadoop/opentsdb cluster has at least 5 nodes. You can't do live maintenance on a 3 node cluster, and hadoop usually runs on a dozen or more nodes. We use Cloudera Manager to manage the hadoop cluster, and others have recommended Apache Ambari.
Use a load balancer like HAProxy to split the /api/put write traffic across multiple instances of tsdbrelay in an active/passive mode. We run one instance on each node (with tsdbrelay forwarding to the local opentsdb instance) and direct all write traffic at a primary write node (with multiple secondary/backup nodes).
Split the /api/query traffic across the remaining nodes pointed directly at opentsdb (no need to go thru the relay) in an active/active mode (aka round robin or hash based routing). This improves query performance by balancing them across the non-write nodes.
We only run a single bosun instance in each datacenter, with the DR site using the read only flag (any failover would be manual). It really isn't designed for HA yet, but in the future may allow two nodes to share a redis instance and allow active/active or active/passive HA.
By using tsdbrelay to duplicate the metric streams you don't have to deal with opentsdb/hbase replication and instead can setup multiple isolated monitoring systems in each datacenter and duplicate the metrics to whichever sites are appropriate. We have a primary and a DR site, and choose to duplicate all metrics to both data centers. I actually use the DR site daily for Grafana queries since it is closer to where I live.
You can find more details about production setups at http://bosun.org/resources including copies of all of the haproxy/tsdbrelay/etc configuration files we use at Stack Overflow.

Scaling with a cluster- best strategy

I am thinking about the best strategy to scale with a cluster of servers. I know there is no hard and fast rules, but I am curious what people think about these scenarios:
cluster of combination app/db servers that are round robin (with failover) balanced using dnsmadeeasy. the db's are synced using replication. Has the advantage that capacity can be augmented easily by adding another server to the cluster, and it is naturally failsafe.
cluster of app servers, again round robin load balanced (with failover) using dnsmadeeasy, all reporting to a big DB server in the back. easy to add app servers, but the single db server creates a single failure point. Could possible add a hot standby with replication.
cluster of app servers (as above) using two databases, one handling reads only, and one handling writes only.
Also, if you have additional ideas, please make suggestions. The data is mostly denormalized and non relational, and the DBs are 50/50 read-write.
Take 2 physical machines and make them Xen servers
A. Xen Base alpha
B. Xen Base beta
In each one do three virtual machines:
"web" server for statics(css,jpg,js...) + load balanced proxy for dynamic request (apache+mod-proxy-balancer,nginx+fair)
"app" server (mongrel,thin,passenger) for dynamic requests
"db" server (mySQL, PostgreSQL...)
Then your distribution of functions can be like this:
A1 owns your public ip and handle requests to A2 and B2
B1 pings A1 and takes over if ping fails
A2 and B2 take dynamic request querying A3 for data
A3 is your dedicated data server
B3 backups A3 second to second and offer readonly access to make copies, backups etc.
B3 pings A3 and become master if A3 becomes unreachable
Hope this can help you some way, or at least give you some ideas.
It really depends on your application.
I've spent a bit of time with various techniques for my company and what we've settled on (for now) is to run a reverse proxy/loadbalancer in front of a cluster of web servers that all point to a single master DB. Ideally, we'd like a solution where the DB is setup in a master/slave config and we can promote the slave to master if there are any issues.
So option 2, but with a slave DB. Also for high availability, two reverse proxies that are DNS round robin would be good. I recommend using a load balancer that has a "fair" algorithm instead of simple round robin; you will get better throughput.
There are even solutions to load balance your DB but those can get somewhat complicated and I would avoid them until you need it.
Rightscale has some good documentation about this sort of stuff available here: http://wiki.rightscale.com/
They provide these types of services for the cloud hosting solutions.
Particularly useful I think are these two entries with the pictures to give you a nice visual representation.
The "simple" setup:
http://wiki.rightscale.com/1._Tutorials/02-AWS/02-Website_Edition/2._Deployment_Setup
The "advanced" setup:
http://wiki.rightscale.com/1._Tutorials/02-AWS/02-Website_Edition/How_do_I_set_up_Autoscaling%3f
I'm only going to comment on the database side:
With a normal RDBMS a 50/50 read/write load for the DB will make replication "expensive" in terms of overhead. For almost all cases having a simple failover solution is less costly than implementing a replicating active/active DB setup. Both in terms of administration/maintenance and licensing cost (if applicable).
Since your data is "mostly denormalized and non relational" you could take a look at HBase which is an OSS implementation of Google Bigtable, a column based key/value database system. HBase again is built on top of Hadoop which is an OSS implementation of Google GFS.
Which solution to go with depends on your expected capacity growth where Hadoop is meant to scale to potentially 1000s of nodes, but should run on a lot less as well.
I've managed active/active replicated DBs, single-write/many-read DBs and simple failover clusters. Going beyond a simple failover cluster opens up a new dimension of potential issues you'll never see in a failover setup.
If you are going for a traditional SQL RDBMS I would suggest a relatively "big iron" server with lots of memory and make it a failover cluster. If your write ratio shrinks you could go with a failover write cluster and a farm of read-only servers.
The answer lies in the details. Is your application CPU or I/O bound? Will you require terabytes of storage or only a few GB?

Resources