Switching from GKE to Cloud Run - best practices for healthcheck, liveness, readiness and general monitoring - google-cloud-run

I'm looking towards switching some microservices from GKE to Cloud Run but I'm not able to find out anything related to the healthcheck, liveness, readiness and general monitoring when it goes how it is getting handled when deployed to GKE.
I assume that healthcheck and liveness is possible only when container was deployed to the Cloud Run but it will not be available when service is scaled down to 0. At that time in the monitoring I will get that service is down.
So my question is what are the best practices to handle them for the Cloud Run?

I will attempt to answer your points one at a time
Readiness and liveness probes. Cloud Run deploys a LoadBalancer in front of your Service and handles TLS termination for you. Cloud Run expects your container to be available (aka listening on 0.0.0.0 on the defined port) within 4 minutes of its being deployed, otherwise, it considers it down and tries to restart it.
Cloud Run has built-in monitoring and alerting if you are using Cloud Ops you don't have to do anything.
You can ensure your Service will not trigger a false positive by setting the min replicas to 1 ensuring there is always at least 1 replicas of your service available
Let me know if there something you want to clarify

Related

Are services with their own clustering mechanisms suitable for swarm

I just started learning swarm recently. And I have some questions about the swarm usage scenario.
If I have a simple webserver which response to some restful HTTP requests,swarm seems to be a good choice because if I need to expand my webserver horizontally, I just need to use docker service scale and the swarm will do load balancing for me.
But what about services that have their own clustering mechanism(Redis, elastic search?)? I cannot simply expand the capacity through the docker service scale`.
For instance, I have a Redis service, if I docker service scale redis=2, two separate Redis services are generated. This is obviously not what I need.
Are these services fit for swarm mode?If so, how to config these services in swarm mode? And how to expand it?
Stateful services (e.g. Redis, RabbitMQ, etc...) fit swarm mode.
It's your responsibility though to configure the cluster manually, by some predeploy/postdeploy script or in images entrypoint.
Such reconfiguration should run also after each replica restart regardless the reason: single replica failures and subsequent restarts, restart of all service replicas, scaling of new replicas.
Content of such script/steps may vary between clustering solutions and one should refer to the relevant documentation of each solution. It maybe as simple as putting replicas virtual ips to configuration file or complex ordered steps.
General use cases that fit all solutions are: configure cluster inside service replicas for the first time, connect single replica back to cluster after failure, restart all replicas after failure/valid restart.
Some github projects try to automate the process. For example mariadb-cluster

Is a customized load-balancer for Docker containers possible?

I am deploying my application which contains three services by using Docker Swarm. As what i’ve read so far, Docker Swarm has a load-balancer to distribute container over Worker nodes automatically based on some internal factors. That is cool! but what i really need is a load-balancer which uses a set of parameters provided by me to distribute containers. Is it possible or it is too ambitious?
A set of parameters i mentioned here is obtained by running a script or code which calculate cpu usage, bandwidth, etc. Then its results will be passed to the load-balancer for distributing decision.
Thanks everyone for reading my post.
Yes, this is perfectly possible but you will end up setting whole system on your own. You can also use Nginx for proxy, fetch metrics from docker swarm or prometheus and pass the info to script to generate the upstream in nginx and reload nginx.

Is it possible to switch port binding between docker containers without downtime?

Scenario:
There is a container running with image version 1.0 and exposed port 8080 on localhost 80. The new version of the image is available, and there is a need to switch those versions. No, any orchestration tool is running ( Kubernetes, OpenShift etc...).
Is it possible to start a container with version 1.1 make it run without a problem
Please, keep in mind that I don't want to keep it simple, no replication, etc.
Simply docker container with the binded port to localhost.
Questions:
1. Is it possible to switch exposing of port between containers without downtime?
2. If not, is there is any mechanism implemented with docker (free edition) to do such switch?
Without downtime, you'd need a second replica of the service up an running, and a proxy in front of that service that's listening to user requests and routing from one to the other. Both Swarm Mode and Kubernetes provide this capability with similar tools, the port being exposed is indirectly connected to the app via either an application reverse proxy, or some iptables rules and ipvs entries in the kernel.
Out of the box, recent versions of docker include support for Swarm Mode with nothing additional to install. You can run a simple docker swarm init to start a single node swarm cluster in less than a second. And then instead of docker-compose up you switch to docker stack deploy -c docker-compose.yml $stack_name to manage your projects with almost the same compose file. For swarm mode, you'll want to be on version 3 of the compose file syntax.
For a v3 syntax compose file in swarm mode that has no outage on an update, you'll want healthcheck's defined in your image to monitor the application and report back when it's ready to receive requests. Then you'll want a deploy section of the compose file to either have multiple replicas for HA, or at least configure a single replica to have a "start-first" policy to ensure the new service is up before stopping the old one. See the compose docs for settings to adjust: https://docs.docker.com/compose/compose-file/#update_config
For an application based reverse proxy in docker, I really do like traefik, but more to allow me to run multiple http based container services with a single port opened. This allows me to mapping requests based off the hostname/path/http header to the right container, while at the same time giving features to migrate between different versions with weighting of which backend to use so you can do more than a simple round-robin load balancing during an upgrade.
There is no mechanism native to Docker that would allow you replace one container with another with no interruption. On the other hand, the duration of the interruption can probably be measured in milliseconds; whether or not this is really an issue for you depends entirely on your application.
You can get the behavior you want by introducing a dynamic reverse proxy such as Traefik into your configuration. The proxy binds to host ports and handles requests from remote systems, then distributes those requests to one or more backend containers.
You can create and remove backend containers as you please, and as long as at least one is running your application will be available. For your specific use case, this means that you can start the new version of your application first, then retire the old one, all without any interruption in service.

Unable to run a health check on a docker image deployed to Pivotal Cloud Foundry

I'm unable to run a health check other than process on a docker image deployed to Pivotal Cloud Foundry.
I can deploy fine with health-check-type=process, but that isn't terribly useful. Once the container is up and running I can access the health check http endpoint at /nuxeo/runningstatus, but PCF doesn't seem to be able to check that endpoint, presumably because I'm deploying a pre-built docker container rather that an app via source or jar.
I've modified the timeout to be something way longer than it needs to be, so that isn't the problem. Is there some other way of monitoring dockers deployed to PCF?
The problem was the docker container exposed two ports, one on which the healthcheck endpoint was accessible and another that could be used for debugging. PCF always chose to try to run the health check against the debug port.
There is no way to specify, for PCF, a port for the health check to run against. It chooses among the exposed ports and for a reason I don't know always chose the the one intended for debugging.
I tried reordering the ports in the Dockerfile, but that had no effect. Ultimately I just removed the debug port from being exposed in the Docker file and things worked as expected.

How can I schedule tasks to CPUs?

I have some tasks defined as Docker containers, each one will consume one full CPU whilst it runs.
Id like to run them as cost efficiently as possible so that VMs are only active when the tasks are running.
Whats the best way to do this on Google Cloud Platform?
It seems Kuberenetes and other cluster managers assume you will be running some service, and have spare capacity in your cluster to schedule the containers.
Is there any program or system that I can use to define my tasks, and then start/stop VMs on a schedule to run those tasks?
I think the best way is to just use the GCP/Docker APIs and script it myself?
You're right, all the major cloud container services provide you a cluster for running containers - GCP Container Engine, EC2 Container Service, and Azure Container Service.
In all those cases, the compute is charged by the VMs in the cluster, so you'll pay while the VMs are running. If you have an occasional workload you'll need to script creating or starting the VMs before running your containers, and stopping or deleting them when you're done.
An exception is Joyent's cloud, which let's you run Docker containers and charges per container - that could fit your scenario.
Disclaimer - I don't work for Google, Amazon, Microsoft or Joyent. Or Samsung.

Resources