Default timeout and keep-alive settings for memorystore - timeout

We're planning a migration from our own hosted redis instances to Google Cloud Memorystore. But there seems to be one thing I cannot find any answers to in the documentation:
default memorystore timeout settings
default tcp keep-alive settings
We have some connection bleeding in one part of the app (due to pre-emptible nodes and oom crashes in Kubernetes) and need to make sure these are set to some sane value.
They are not under "Modifiable": https://cloud.google.com/memorystore/docs/reference/redis-configs#modifiable_configuration_parameters
But also not listed under "Unmodifiable": https://cloud.google.com/memorystore/docs/reference/redis-configs#unmodifiable_configuration_parameters
How can I figure out these settings (and others)? (CONFIG is a blocked command)

Being Cloud Memorystore a fully-managed service you are limited to change only a very limited amount of parameters. As stated on the relevant section of the documentation:
Most parameters are preconfigured for Cloud Memorystore for Redis instances, and you cannot change them. Other parameters you configure when you set up your Cloud Memorystore for Redis instance. For more information, see Redis Configuration Parameters.
As of now, and as you well mention, these are the only modifiable parameters. And you can't modify the timeout nor the tcp-keepalive parameters.
The documentation makes mention of the following REDIS.CONF file in which the default values are:
timeout 0
tcp-keepalive 300

Related

Configuration Dask Distributed

I'm setting up an environment for our data scientists to work on. Currently we have a single node running Jupyterhub with Anaconda and Dask installed. (2 sockets with 6 cores and 2 threads per core with 140 gb ram). When users create a LocalCluster, currently the default settings are to take all the available cores and memory (as far as I can tell). This is okay when done explicitly, but I want the standard LocalCluster to use less than this. Because almost everything we do is
Now when looking into the config I see no configuration dealing with n_workers, n_threads_per_worker, n_cores etc. For memory, in dask.config.get('distributed.worker') I see two memory related options (memory and memory-limit) both specifying the behaviour listed here: https://distributed.dask.org/en/latest/worker.html.
I've also looked at the jupyterlab dask extension, which lets me do all this. However, I can't force people to use jupyterlab.
TL;DR I want to be able set the following standard configuration when creating a cluster:
n_workers
processes = False (I think?)
threads_per_worker
memory_limit either per worker, or for the cluster. I know this can only be a soft limit.
Any suggestions for configuration is also very welcome.
As of 2019-09-20 this isn't implemented. I recommend raising an feature request at https://github.com/dask/distributed/issues/new , or even a pull request.

AWS ELB HealthCheck Improvements

All,
We recently had an issue with ELB HealthCheck in covering up a certain use-case or scenario which caused an application impact.
Can anyone suggest a fault-tolerant approach to handle this?
We have a nodeJS app running in a port - 80
We have 3 instances in the Target Group & that is enrolled in ELB.
ELB HealthCheck was configured to hit root path on port 80 and return success if it gets HTTP 200
Recently one of the node had 100% disk filled on application mount and root mount was still having space.
Though the HealthCheck was succeeding as per ELB the server didn't respond for any other services and it was ideally unhealthy. This means that there are some requests that got
succeeded but some of them failed (that was routed to this disk-filled server).
We did received notifications from other monitoring systems on disk filling but due to overwhelming emails & limited resources it got missed out.
Is there any other way we can improvise the HealthCheck strategy to just have these scenarios intimated to AutoScaling Group or ELB
so that we can target these nodes to be removed and replace them automatically?
Rather than just checking that the index.htm page is returning a 200 response, you can configure Elastic Load Balancing to point to a customer Health Check page (eg healthcheck.php).
You could run some code on that page to test the general health of the application (database connectivity, disk space, free memory). If everything checks out OK, return a 200 response. If something is wrong, return a 500 response. This will cause the Load Balancer to treat the instance as Unhealthy and it will stop serving traffic to the instance.
If Auto Scaling is configured to use the ELB Health Check, then Auto Scaling will terminate the unhealthy instance and automatically replace it with a new instance.

Can a bastion host be launched by auto-scaling-group for failure recovery?

Can I launch a bastion host through auto-scaling-group, so that I set "MinSize": 1 and "DesiredCapacity": 1.
I understand that normally ASG is used along with ELB or SQS and Cloudwatch from load balancing or scaling purpose. And I feel my purpose here is different -- I want to make my bastion machine up and running, and once it's down, I want to bring it back as soon as possible. (I don't need my bastion host to be "HA", but I'd like it to be able to automatically recover, say within 3 mins)
Is there such an use case for auto scaling group?
Yes, using an Auto Scaling Group in this fashion will ensure that a failed host will be replaced automatically if it fail EC2 health checks.
However, this is not the best and up to date way to solve your problem. EC2 supports Auto Recovery as of last year. Recovery can be configured to perform a variety of actions on an instance that fails EC2 health checks. The advantage it has over Auto Scaling is that things like Elastic IPs can be migrated over to the new instance. The docs contain all the information you'll need to set this up.
Yes, that's a valid use case.
Auto scaling groups force you to setup automatically creatable instances: you define a launch configuration that specifies stuff like instance type and the image you want to launch, and the number of instances in the group.
When you set the desired instances to '1', the autoscaling group (AG) will start enforcing that one instance will be running.
Problem: the instances get assigned a different IP when they boot so you won't know where to reach it.
There are two ways around this:
- use an ELB so you can always reach it at the ELB's address. When only running one instance, this is kind of an overkill
- make the instance assign an elastic ip when it boots. I don't think that Amazon supports this out-of-the box yet, but you can find scripts that do this for you on the web.
Note that this setup won't prevent failure. But once an instance fails, it's a matter of terminating it and a new one will be backup in 5 minutes or so.
Refer following link from amazon on the architecture and best practice for Bastion host - http://docs.aws.amazon.com/quickstart/latest/linux-bastion/architecture.html

microservices & service discovery with random ports

My question is related to microservices & service discovery of a service which is spread between several hosts.
The setup is as follows:
2 docker hosts (host A & host B)
a Consul server (service discovery)
Let’s say that I have 2 services:
service A
service B
Service B is deployed 10 times (with random ports): 5 times on host A and 5 times on host B.
When service A communicates with service B, for example, it sends a request to serviceB.example.com (hard coded).
In order to get an IP and a port, service A should query the Consul server for an SRV record.
It will get 10 ip:port pairs, for which the client should apply some load-balancing logic.
Is there a simpler way to handle this without me developing a client resolver (+LB) library for that matter ?
Is there anything like that already implemented somewhere ?
Am I doing it all wrong ?
There are a few options:
Load balance on client as you suggest for which you'll either need to find a ready-build service discovery library that works with SRV records and handles load balancing and circuit breaking. Another answer suggested Netflix' ribbon which I have not used and will only be interesting if you are on JVM. Note that if you are building your own, you might find it simpler to just use Consul's HTTP API for discovering services than DNS SRV records. That way you can "watch" for changes too rather than caching the list and letting it get stale.
If you don't want to reinvent that particular wheel, another popular and simple option is to use a HAProxy instance as the load balancer. You can integrate it with consul via consul-template which will automatically watch for new/failed instances of your services and update LB config. HAProxy then provides robust load balancing and health checking with a lot of options (http/tcp, different balancing algorithms, etc). One possible setup is to have a local HAProxy instance on each docker host and a fixed port assigned statically to each logical service (can store it in Consul KV) so you connect to localhost:1234 for service A for example and localhost:2345 for service B. Local instance means you don't pay for extra round trip to loadbalancer instance then to the actual service instance but this might not be an issue for you.
I suggest you to check out Kontena. It will solve this kind of problem out of the box. Every service will have an internal DNS that you can use in communication between services. Kontena has also built-in load balancer that is very easy to use making it very easy to create and scale micro services.
There are also lot's of built-in features that will help developing containerized applications, like private image registry, VPN access to running services, secrets management, stateful services etc.
Kontena is open source project and the code is visible on Github
If you look for a minimal setup, you can wrap the values you receive from Consul via ribbon, Netflix' client based load balancer.
You will find it as a module for Spring Cloud.
I didn't find an up-to-date standalone example, only this link to chrisgray's dropwizard-consul implementation that is using it in a Dropwizard context. But it might serve as a starting point for you.

Best practice for rate limiting users of a REST API?

I am putting together a REST API and as I'm unsure how it will scale or what the demand for it will be, I'd like to be able to rate limit uses of it as well as to be able to temporarily refuse requests when the box is over capacity or if there is some kind of slashdotted scenario.
I'd also like to be able to gracefully bring the service down temporarily (while giving clients results that indicate the main service is offline for a bit) when/if I need to scale the service by adding more capacity.
Are there any best practices for this kind of thing? Implementation is Rails with mysql.
This is all done with outer webserver, which listens to the world (i recommend nginx or lighttpd).
Regarding rate limits, nginx is able to limit, i.e. 50 req/minute per each IP, all over get 503 page, which you can customize.
Regarding expected temporary down, in rails world this is done via special maintainance.html page. There is some kind of automation that creates or symlinks that file when rails app servers go down. I'd recommend relying not on file presence, but on actual availability of app server.
But really you are able to start/stop services without losing any connections at all. I.e. you can run separate instance of app server on different UNIX socket/IP port and have balancer (nginx/lighty/haproxy) use that new instance too. Then you shut down old instance and all clients are served with only new one. No connection lost. Of course this scenario is not always possible, depends on type of change you introduced in new version.
haproxy is a balancer-only solution. It can extremely efficiently balance requests to app servers in your farm.
For quite big service you end-up with something like:
api.domain resolving to round-robin N balancers
each balancer proxies requests to M webservers for static and P app servers for dynamic content. Oh well your REST API don't have static files, does it?
For quite small service (under 2K rps) all balancing is done inside one-two webservers.
Good answers already - if you don't want to implement the limiter yourself, there are also solutions like 3scale (http://www.3scale.net) which does rate limiting, analytics etc. for APIs. It works using a plugin (see here for the ruby api plugin) which hooks into the 3scale architecture. You can also use it via varnish and have varnish act as a rate limiting proxy.
I'd recommend implementing the rate limits outside of your application since otherwise the high traffic will still have the effect of killing your app. One good solution is to implement it as part of your apache proxy, with something like mod_evasive

Resources