Difference between using DSE Advanced Replication vs Manual Multi-DC Setup for Disaster Recovery? - datastax-enterprise

I want to set up up a second cluster primarily for disaster recovery. I came across DSE Advanced Replication but I'm unsure what's the difference in functionality between DSE Advanced Replication vs manually setting up a Multi-DC Setup.
DSE Advanced Replication seems easier to setup and does not interfere with the replication-factor on writes (CMIIW). Can DSE Advanced Replication be used in setting up a second cluster for disaster recovery?

Advanced replication is usually used to setup things like, spoke-hub replication - when you copy data from smaller clusters into bigger cluster, unidirectional replication, when the connection between clusters is not permanent, etc.
In multi-DC setup, your DCs are forming the single cluster. In Advanced replication you're copying data between different clusters.

Related

deploying ElasticSearch production using multiple clusters and nodes under docker

I am new to launching ES for the production environment. I want to create production-ready ElasticSearch clusters having master nodes and data and backup nodes and etc. I read tutorials on the internet regarding this matter including the official document but I cannot get my head around the topic in the official document it's running multiple clusters under one machine what if that machine goes down for some reason? where are the master nodes playing in that scenario? where are the backup nodes? to protect against data loss?
I want to know if there are any straightforward solutions that I can use for deploying the ES on multiple machines serving the same purpose (for one project with specific data types) that can be easily distributed and fault-tolerant?
Running multiple containers on a single host makes sense if you have a lot of resources on a given host that you want to partition up and use efficiently. then you can have multiple hosts with multiple Elasticsearch containers forming a cluster
If you do that, look at using allocation awareness to make sure shards are adequately balanced so that the loss of a single host will mean you maintain your data

Can I have some keyspaces replicated to some nodes?

I am trying to build multiple API for which I want to store the data with Cassandra. I am designing it as if I would have multiple hosts but, the hosts I envisioned would be of two types: trusted and non-trusted.
Because of that I have certain data which I don't want to end up replicated on a group of the hosts but the rest of the data to be replicated everywhere.
I considered simply making a node for public data and one for protected data but that would require the trusted hosts to run two nodes and it would also complicate the way the API interacts with the data.
I am building it in a docker container also, I expect that there will be frequent node creation/destruction both trusted and not trusted.
I want to know if it is possible to use keyspaces in order to achieve my required replication strategy.
You could have two Datacenters one having your public data and the other the private data. You can configure keyspace replication to only replicate that data to one (or both) DCs. See the docs on replication for NetworkTopologyStrategy
However there are security concerns here since all the nodes need to be able to reach one another via the gossip protocol and also your client applications might need to contact both DCs for different reads and writes.
I would suggest you look into configuring security perhaps SSL for starters and then perhaps internal authentication. Note Kerberos is also supported but this might be too complex for what you need at least now.
You may also consider taking a look at the firewall docs to see what ports are used between nodes and from clients so you know which ones to lock down.
Finally as the above poster mentions, the destruction / creation of nodes too often is not good practice. Cassandra is designed to be able to grow / shrink your cluster while running, but it can be a costly operation as it involves not only streaming data from / to the node being removed / added but also other nodes shuffling around token ranges to rebalance.
You can run nodes in docker containers, however note you need to take care not to do things like several containers all accessing the same physical resources. Cassandra is quite sensitive to io latency for example, several containers sharing the same physical disk might render performance problems.
In short: no you can't.
All nodes in a cassandra cluster from a complete ring where your data will be distributed with your selected partitioner.
You can have multiple keyspaces and authentication and authorziation within cassandra and split your trusted and untrusted data into different keyspaces. Or you an go with two clusters for splitting your data.
From my experience you also should not try to create and destroy cassandra nodes as your usual daily business. Adding and removing nodes is costly and needs to be monitored as your cluster needs to maintain repliaction and so on. So it might be good to split cassandra clusters from your api nodes.

Bosun HA and scalability

I have a minor bosun setup, and its collecting metrics from numerous services, and we are planning to scale these services on the cloud.
This will mean more data coming into bosun and hence, the load/efficiency/scale of bosun is affected.
I am afraid of losing data, due to network overhead, and in case of failures.
I am looking for any performance benchmark reports for bosun, or any inputs on benchmarking/testing bosun for scale and HA.
Also, any inputs on good practices to be followed to scale bosun will be helpful.
My current thinking is to run numerous bosun binaries as a cluster, backed by a distributed opentsdb setup.
Also, I am thinking is it worthwhile to run some bosun executors as plain 'collectors' of scollector data (with bosun -n command), and some to just calculate the alerts.
The problem with this approach is it that same alerts might be triggered from multiple bosun instances (running without option -n). Is there a better way to de-duplicate the alerts?
The current best practices are:
Use https://godoc.org/bosun.org/cmd/tsdbrelay to forward metrics to opentsdb. This gets the bosun binary out of the "critical path". It should also forward the metrics to bosun for indexing, and can duplicate the metric stream to multiple data centers for DR/Backups.
Make sure your hadoop/opentsdb cluster has at least 5 nodes. You can't do live maintenance on a 3 node cluster, and hadoop usually runs on a dozen or more nodes. We use Cloudera Manager to manage the hadoop cluster, and others have recommended Apache Ambari.
Use a load balancer like HAProxy to split the /api/put write traffic across multiple instances of tsdbrelay in an active/passive mode. We run one instance on each node (with tsdbrelay forwarding to the local opentsdb instance) and direct all write traffic at a primary write node (with multiple secondary/backup nodes).
Split the /api/query traffic across the remaining nodes pointed directly at opentsdb (no need to go thru the relay) in an active/active mode (aka round robin or hash based routing). This improves query performance by balancing them across the non-write nodes.
We only run a single bosun instance in each datacenter, with the DR site using the read only flag (any failover would be manual). It really isn't designed for HA yet, but in the future may allow two nodes to share a redis instance and allow active/active or active/passive HA.
By using tsdbrelay to duplicate the metric streams you don't have to deal with opentsdb/hbase replication and instead can setup multiple isolated monitoring systems in each datacenter and duplicate the metrics to whichever sites are appropriate. We have a primary and a DR site, and choose to duplicate all metrics to both data centers. I actually use the DR site daily for Grafana queries since it is closer to where I live.
You can find more details about production setups at http://bosun.org/resources including copies of all of the haproxy/tsdbrelay/etc configuration files we use at Stack Overflow.

What is the recommended replication strategy for OpsCenter keyspace?

I'm using OpsCenter to monitor and configure my Cassandra cluster (It's actually a DSE cluster) and I have a keyspace that spans multiple datacenters. The OpsCenter keyspace, which is created and maintained by OpsCenter, use SimpleStrategy as the default replication strategy, which prevents me from turning on its repair service (mentioned in OpsCenter's document).
As the Isolating OpsCenter Performance Data blog says that using a dedicated datacenter requires us to manually monitor and scale the OpsCenter nodes, I was wondering what is the recommended replication strategy and factor for the OpsCenter keyspace so that storing OpsCenter data has limited performance impact on my production nodes while requires minimal tuning when I scale my production datacenters?
Suppose my production nodes use NetworkTopologyStrategy with two datacenters 'Cassandra' and 'Solr' (in a DSE setting) where 'Cassandra' datacenter supports OLTP and 'Solr' datacenter is dedicated for searching. Is it a valid solution to set the replication of OpsCenter keyspace to { 'class' : 'NetworkTopologyStrategy', 'Cassandra': 1}?
Thanks,
Ziju
NetworkTopologyStrategy is the recommended one for cases like this, so your proposed solution is valid. RF of 1 is debatable, but since OpsCenter keyspace doesn’t usually store anything very important, it should work okay, although I’d bump that to 2.

Scaling with a cluster- best strategy

I am thinking about the best strategy to scale with a cluster of servers. I know there is no hard and fast rules, but I am curious what people think about these scenarios:
cluster of combination app/db servers that are round robin (with failover) balanced using dnsmadeeasy. the db's are synced using replication. Has the advantage that capacity can be augmented easily by adding another server to the cluster, and it is naturally failsafe.
cluster of app servers, again round robin load balanced (with failover) using dnsmadeeasy, all reporting to a big DB server in the back. easy to add app servers, but the single db server creates a single failure point. Could possible add a hot standby with replication.
cluster of app servers (as above) using two databases, one handling reads only, and one handling writes only.
Also, if you have additional ideas, please make suggestions. The data is mostly denormalized and non relational, and the DBs are 50/50 read-write.
Take 2 physical machines and make them Xen servers
A. Xen Base alpha
B. Xen Base beta
In each one do three virtual machines:
"web" server for statics(css,jpg,js...) + load balanced proxy for dynamic request (apache+mod-proxy-balancer,nginx+fair)
"app" server (mongrel,thin,passenger) for dynamic requests
"db" server (mySQL, PostgreSQL...)
Then your distribution of functions can be like this:
A1 owns your public ip and handle requests to A2 and B2
B1 pings A1 and takes over if ping fails
A2 and B2 take dynamic request querying A3 for data
A3 is your dedicated data server
B3 backups A3 second to second and offer readonly access to make copies, backups etc.
B3 pings A3 and become master if A3 becomes unreachable
Hope this can help you some way, or at least give you some ideas.
It really depends on your application.
I've spent a bit of time with various techniques for my company and what we've settled on (for now) is to run a reverse proxy/loadbalancer in front of a cluster of web servers that all point to a single master DB. Ideally, we'd like a solution where the DB is setup in a master/slave config and we can promote the slave to master if there are any issues.
So option 2, but with a slave DB. Also for high availability, two reverse proxies that are DNS round robin would be good. I recommend using a load balancer that has a "fair" algorithm instead of simple round robin; you will get better throughput.
There are even solutions to load balance your DB but those can get somewhat complicated and I would avoid them until you need it.
Rightscale has some good documentation about this sort of stuff available here: http://wiki.rightscale.com/
They provide these types of services for the cloud hosting solutions.
Particularly useful I think are these two entries with the pictures to give you a nice visual representation.
The "simple" setup:
http://wiki.rightscale.com/1._Tutorials/02-AWS/02-Website_Edition/2._Deployment_Setup
The "advanced" setup:
http://wiki.rightscale.com/1._Tutorials/02-AWS/02-Website_Edition/How_do_I_set_up_Autoscaling%3f
I'm only going to comment on the database side:
With a normal RDBMS a 50/50 read/write load for the DB will make replication "expensive" in terms of overhead. For almost all cases having a simple failover solution is less costly than implementing a replicating active/active DB setup. Both in terms of administration/maintenance and licensing cost (if applicable).
Since your data is "mostly denormalized and non relational" you could take a look at HBase which is an OSS implementation of Google Bigtable, a column based key/value database system. HBase again is built on top of Hadoop which is an OSS implementation of Google GFS.
Which solution to go with depends on your expected capacity growth where Hadoop is meant to scale to potentially 1000s of nodes, but should run on a lot less as well.
I've managed active/active replicated DBs, single-write/many-read DBs and simple failover clusters. Going beyond a simple failover cluster opens up a new dimension of potential issues you'll never see in a failover setup.
If you are going for a traditional SQL RDBMS I would suggest a relatively "big iron" server with lots of memory and make it a failover cluster. If your write ratio shrinks you could go with a failover write cluster and a farm of read-only servers.
The answer lies in the details. Is your application CPU or I/O bound? Will you require terabytes of storage or only a few GB?

Resources