We have neo4j configured in casual cluster on Kubernetnes based setup. All components are deployed on individual machine with size: t2.xlarge on aws. And we use pod affinity to schedule a deployment. While working with application under stress, we observed that there is considerable system load only one machine. For example see this:
First neo4j machine for core:
and second machine for core in same cluster:
We have bolt+router protocol configured in backend. I am not sure what causing this much resource utilization on one machine whereas others work in minimum.
I checked memory consumption on pod level as well. Neo4j-1 takes 9gb of memory whereas others are taking around 4gb. So my questions is, is this expected behavior?
I am posting my findings as answer. Not sure if its correct though. I checked status of each instance in neo4j using monitoring api's:
http://localhost:7474/db/manage/server/core/writable
This will give leader as True and False for followers. This could be the reason leader takes high resources while others are not taking much. Ref link:https://neo4j.com/docs/operations-manual/current/monitoring/causal-cluster/
Related
There are a lot of articles online about running an Elasticsearch multi-node cluster using docker-compose, including the official documentation for Elasticsearch 8.0. However, I cannot find a reason why you would set up multiple nodes on the same docker host. Is this the recommended setup for a production environment? Or is it an example of theory in practice?
You shouldn't consider this a production environment. The guides are examples, often for lab environments, and testing scenarios with the application. I would not consider them production ready, and compose is often not considered a production grade tool since everything it does is to a single docker node, where in production you typically want multiple nodes spread across multiple availability zones.
Since one ES node heap memory should never get more than half the available memory (and less than ~30.5GB), one reason it makes sense to have several nodes on a given host is when you have hosts with ample memory (say 128GB+). In that case you could run 2 ES nodes (with 64GB of memory each, 30.5GB heap and the rest for Lucene) on the same host by correctly constraining each Docker container.
Note that the above is not related to Docker, you can always configure several nodes per host, whether Docker or not.
Regarding production and given the fact that 2+ nodes would run on the same host, if you lose that host, you lose two nodes, which is not good. However, depending on how many hosts you have, it might be a lesser problem, if and only if, each host is in a different availability zone and you have the appropriate cluster/shard allocation awareness settings configured, which would ensure that your data is redundantly copied in 2+ availability zones. In this case, losing a host (2 nodes) would still keep your cluster running, although in degraded mode.
It's worth noting that Elastic Cloud Enterprise (which powers Elastic Cloud) is designed to run several nodes per hosts depending on the sizing of the nodes and the available hardware. You can find more info on hardware pre-requisites as well as how medium and large scale deployments make use of one or more large 256GB hosts per availability zones.
I have a two-node PostgreSQL cluster running on VMs where each VM runs both the pgpool service and a Postgres server.
due to insufficient memory configuration the Postgres server crashed, so I've bumped the VM memory and the changed Postgres memory config in the postgresql.conf file. since that memory changes the slave pgpool node detaches every night at a specific time, though when looking at node_exporter metrics regarding CPU, load, processes disk usage or memory didn't show any spikes or sudden changes.
the slave node detaching happened before but not day after day. I've stumbled upon this thread and read this part of the documentation about the failover but Since the Postgres server didn't crash and existing connections to the slave node were working (it kept serving existing connections but didn't take new ones) so network issues seemed irrelevant, especially after consulting with our OPS team on whether they noticed any abnormal network or DNS activity that could explain that. Unfortunately, they didn't notice any interesting findings.
I have pg_exporter, postgres_exporter and node_exporter on each node to monitor the server and VM behavior, what should I be looking for to debug this? what should I ask of our OPS team to check specifically? our pgpool log file only states the failure to access the other node but no exact reason, as the aforementioned docs say:
Pgpool-II does not distinguish each case and just decides that the
particular PostgreSQL node is not available if health check fails.
could it still be a network\DNS issue? and if so. how would I confirm this?
thnx for reading and taking your time to assist me in this conundrum
that was interesting
If summing the gist of it,
it was part of the OPS team infrastructure backups
Now the entire process went like that:
setting the ambiance:
we run on-prem on top of VMWare vCenter cluster backing up on the infra side with VMWare VM snapshot and Veeamm VM backup where the vmdk files\ESXi datastores reside on a NetApp storage based on NFS.
when checking node exporter metrics in Node Exporter Full Dashboard I saw network traffic drop in the order of up to 2 packets per second for about 5 to 15 minutes consistently through the last few months, increasing dramatically in phenomenon length in the last month (around the same time late at night).
Rough illustration:
After checking again with our OPS team they indicated it could be the host configurations\Veeam Backups.
It turns out that because the storage for the VMs (including the one that runs the Veeam backup) is attached via network and not directly on the ESXi hosts, the final snapshot saved\consolidated at that late-night time -
node detaches every night at a specific time
With the way NFS works with disk locking (limiting IOPs to existing data) along with the high IOPs requirements from the Veeam backup cause the server to hang\freeze and sometimes on rare occasions even a VM restart. here's the quote from the Veeam issue doc:
The snapshot removal process significantly lowers the total IOPS that can be delivered by the VM because of additional locks on the VMFS storage due to the increase in metadata updates
the snapshot removal process will easily push that into the 80%+ mark and likely much higher. Most storage arrays will see a significant latency penalty once IOP's get into the 80%+ mark which will of course be detrimental to application performance.
This issue occurs when the target virtual machine and the backup appliance [proxy] reside on two different hosts, and the NFSv3 protocol is used to mount NFS datastores. A limitation in the NFSv3 locking method causes a lock timeout, which pauses the virtual machine being backed up [during snapshot removal].
Obviously, that would interfere at the very least with Postgres functionality especially configured as a cluster with replication that requires a near-constant connection between the Postgres servers. A similar thread on SO using a different DB server
a solution is suggested including solving the issue presented in the last quote in this link, though for the time being, we removed the usage of Veeam backup for sensitive VMs until the solution can be verified locally (will update in the future if tried and fixed issue)
additional incidents documentation: similar issue case, suggested solution info from Veeam, third party site solution (around the same problem as a temp fix as I see it), Reddit thread acknowledging the issue and suggesting options
I currently own only one computer, and I won't have another.
I run Spark on its CPU cores : master=local[5], using it directly : I set spark-core and spark-sql for dependencies, do quite no other configuration, and my programs start immediately. It's confortable, of course.
But should I attempt to create an architecture with a master and some workers by the mean of Docker containers or minikube (Kubernetes) on my computer ?
Will solution #2 - with all the settings it requires - reward me with better performances, because Spark is truly designed to work that way, even on a single computer,
or will I loose some time, because the mode I'm currently running it, without network usage, without need of data locality will always give me better performances, and solution #1 will always be the best on a single computer ?
My hypothesis is that #1 is fine. But I have no true measurement for that. No source of comparison. Who have experienced the two manners of doing things on a sigle computer ?
It really depends on your goals - if you always will run your Spark code on the single node with local master, then just use it. But if you intend to run your resulting code in the distributed mode on multiple machines, then emulating cluster with Docker could be useful, as you'll get your code running in truly distributed manner, and you'll able to find problems that not always are found when you run your code with the local master.
Instead of direct Docker usage (that could be tricky to setup, although it's still possible), maybe you can consider to use Spark on Kubernetes, for example, via minikube - there is a plenty of articles found by Google on this topic.
Having done testing on this with executor size, the cutover from when it makes sense to use more multiple executors is # CPUs > 32. AWS EMR spark runtime defaults to at least 4 CPUs per executor and Databricks always uses fat executors which means > 32CPUS on the 8xl instances. Your greatest limitation tends to be the JVMs garbage collection which caps the size of the heap. Local mode has a couple performance advantages compared to cluster mode.
full stage code gen has to be run on both the drive and every single executor. For short queries this can add several 100MS per stage.
driver <-> executor communication has latency.
shared memory between driver and executors. This reduces the chance of OOM and reduces the amount of spilling to disk.
People end up choosing to go with multiple executors/instances not because it would be faster than a single instance but because it is the only way to scale up in terms of data volume and parallization. (also for failure recovery)
If you're feeling ambitious there's a performance testing tool called TPC-DS that runs a set of dataprocessing queries against a standardized dataset
https://github.com/databricks/spark-sql-perf
https://github.com/maropu/spark-tpcds-datagen
Also if you're feeling adventurous the spark code has a script to fire up a mini cluster on minikube if you want a quick and easy way to test this.
I'm setting up an environment for our data scientists to work on. Currently we have a single node running Jupyterhub with Anaconda and Dask installed. (2 sockets with 6 cores and 2 threads per core with 140 gb ram). When users create a LocalCluster, currently the default settings are to take all the available cores and memory (as far as I can tell). This is okay when done explicitly, but I want the standard LocalCluster to use less than this. Because almost everything we do is
Now when looking into the config I see no configuration dealing with n_workers, n_threads_per_worker, n_cores etc. For memory, in dask.config.get('distributed.worker') I see two memory related options (memory and memory-limit) both specifying the behaviour listed here: https://distributed.dask.org/en/latest/worker.html.
I've also looked at the jupyterlab dask extension, which lets me do all this. However, I can't force people to use jupyterlab.
TL;DR I want to be able set the following standard configuration when creating a cluster:
n_workers
processes = False (I think?)
threads_per_worker
memory_limit either per worker, or for the cluster. I know this can only be a soft limit.
Any suggestions for configuration is also very welcome.
As of 2019-09-20 this isn't implemented. I recommend raising an feature request at https://github.com/dask/distributed/issues/new , or even a pull request.
I'm trying to understand the benefits of Docker better and I am not really understanding how it would work in production.
Let's say I have a web frontend, a rest api backend and a db. That makes 3 containers.
Let's say that I want 3 of the front end, 5 of the backend and 7 of the db. (Minor question: Does it ever make sense to have less dbs than backend servers?)
Now, given the above scenario, if I package them all on the same host then I gain the benefit of efficiently using the resources of the host, but then I am DOA when that machine fails or has a network partition.
If I separate them into 1 full application (ie 1 FE, 1 BE & 1 DB) per host, and put extra containers on their own host, I get some advantages of using resources efficiently, but it seems to me that I still lose significantly when I have a network partition since it will take down multiple services.
Hence I'm almost leaning to the conclusion that I should be putting in 1 container per host, but then that means I am using my resources pretty inefficiently and then what are the benefits of containers in production? I mean, an OS might be an extra couple gigs per machine in storage size, but most cloud providers give you a minimum of 10 gigs storage. And let's face it, a rest api backend or a web front end is not gonna even come close to the 10 gigs...even including the OS.
So, after all that, I'm trying to figure out if I'm missing the point of containers? Are the benefits of keeping all containers of an application on 1 host, mostly tied to testing and development benefits?
I know there are benefits from moving containers amongst different providers/machines easily, but for the most part, I don't see that as a huge gain personally since that was doable with images...
Are there any other benefits for containers in production that I am missing? Or are the main benefits for testing and development? (Am I thinking about containers in production wrong)?
Note: The question is very broad and could fill an entire book but I'll shed some light.
Benefits of containers
The exciting part about containers is not about their use on a single host, but their use across hosts connected on a large cluster. Do not look at your machines as independent docker hosts, but as a pool of resource to host your containers.
Containers alone are not ground-breaking (ie. Docker's CTO stating at the last DockerCon that "nobody cares about containers"), but coupled to state of the art schedulers and container orchestration frameworks, they become a very powerful abstraction to handle production-grade software.
As to the argument that it also applies to Virtual Machines, yes it does, but containers have some technical advantage (See: How is Docker different from a normal virtual machine) over VMs that makes them convenient to use.
On a Single host
On a single host, the benefits you can get from containers are (amongst many others):
Use as a development environment mimicking the behavior on a real production cluster.
Reproducible builds independent of the host (convenient for sharing)
Testing new software without bloating your machine with packages you won't use daily.
Extending from a single host to a pool of machines (cluster)
When time comes to manage a production cluster, there are two approaches:
Create a couple of docker hosts and run/connect containers together "manually" through scripts or using solutions like docker-compose. Monitoring the lifetime of your services/containers is at your charge, and you should be prepared to handle service downtime.
Let a container orchestrator deal with everything and monitor the lifetime of your services to better cope with failures.
There are plenty of container orchestrators: Kubernetes, Swarm, Mesos, Nomad, Cloud Foundry, and probably many others. They power many large-scale companies and infrastructures, like Ebay, so they sure found a benefit in using these.
Pick the right replication strategy
A container is better used as a disposable resource meaning you can stop and restart the DB independently and it shouldn't impact the backend (other than throwing an error because the DB is down). As such you should be able to handle any kind of network partition as long as your services are properly replicated across several hosts.
You need to pick a proper replication strategy, to make sure your service stays up and running. You can for example replicate your DB across Cloud provider Availability Zones so that when an entire zone goes down, your data remains available.
Using Kubernetes for example, you can put each of your containers (1 FE, 1 BE & 1 DB) in a pod. Kubernetes will deal with replicating this pod on many hosts and monitor that these pods are always up and running, if not a new pod will be created to cope with the failure.
If you want to mitigate the effect of network partitions, specify node affinities, hinting the scheduler to place containers on the same subset of machines and replicate on an appropriate number of hosts.
How many containers per host?
It really depends on the number of machines you use and the resources they have.
The rule is that you shouldn't bloat a host with too many containers if you don't specify any resource constraint (in terms of CPU or Memory). Otherwise, you risk compromising the host and exhaust its resources, which in turn will impact all the other services on the machine. A good replication strategy is not only important at a single service level, but also to ensure good health for the pool of services that are sharing a host.
Resource constraint should be dealt with depending on the type of your workload: a DB will probably use more resources than your Front-end container so you should size accordingly.
As an example, using Swarm, you can explicitely specify the number of CPUs or Memory you need for a given service (See docker service documentation). Although there are many possibilities and you can also give an upper bound/lower bound in terms of CPU or Memory usage. Depending on the values chosen, the scheduler will pin the service to the right machine with available resources.
Kubernetes works pretty much the same way and you can specify limits for your pods (See documentation).
Mesos has more fine grained resource management policies with frameworks (for specific workloads like Hadoop, Spark, and many more) and with over-commiting capabilities. Mesos is especially convenient for Big Data kind of workloads.
How should services be split?
It really depends on the orchestration solution:
In Docker Swarm, you would create a service for each component (FE, BE, DB) and set the desired replication number for each service.
In Kubernetes, you can either create a pod encompassing the entire application (FE, BE, DB and the volume attached to the DB) or create separate pods for the FE, BE, DB+volume.
Generally: use one service per type of container. Regarding groups of containers, evaluate if it is more convenient to scale the entire group of container (as an atomic unit, ie. a pod) than to manage them separately.
Sum up
Containers are better used with an orchestration framework/platform. There are plenty of available solutions to deal with container scheduling and resource management. Pick one that might fit your use case, and learn how to use it. Always pick an appropriate replication strategy, keeping in mind possible failure modes. Specify resource constraints for your containers/services when possible to avoid resource exhaustion which could potentially lead to bringing a host down.
This depends on the type of application you run in your containers. From the top of my head I can think of a couple different ways to look at this:
is your application diskspace heavy?
do you need the application fail save on multiple machines?
can you run multiple different instance of different applications on the same host without decreasing performance of them?
do you use software like kubernetes or swarm to handle your machines?
I think most of the question are interesting to answer even without containers. Containers might free you of thinking about single hosts, but you still have to decide and measure the load of your host machines yourself.
Minor question: Does it ever make sense to have less dbs than backend servers?
Yes.
Consider cases where you hit normal(without many joins) SQL select statements to get data from the database but your Business Logic demands too much computation. In those cases you might consider keeping your Back-End Service count high and Database Service count low.
It all depends on the use case which is getting solved.
The number of containers per host depends on the design ratio of the host and the workload ratio of the containers. Both ratios are
Throughput/Capacity ratios. In the old days, this was called E/B for execution/bandwidth. Execution was cpu and banwidth was I/o. Solutions were said to be cpu or I/o bound.
Today memories are very large the critical factor is usually cpu/nest
capacity. We describe workloads as cpu intense or nest intense. A useful proxy for nest capacity is the size of highest level cache. A useful design ratio estimator is (clock x cores)/cache. Fir the same core count the machine with a lower design ratio will hold more containers. In part this is because the machine with more cache will scale better and see less saturation at higher utilization. By