how can I ensure traffic is being served only on my new cloud run service deployment? - google-cloud-run

I am seeing in my stackdriver logs that I am running an old revision of a cloud run service, while the console shows only one newer revision as live.
In this case I have a cloud run container built in a GCP project, but which is deployed in a second project, using the fully specified image name. (My attempts at Terraform were sidetracked into an auth catch 22, so until I have restored determination, I am manually deploying.) I don't know if this wrinkle of the two projects is relevant to my question. Otherwise all the auth works.
In brief, why may I be seeing old deployments receiving traffic minutes after the new deployment is made? Even more than 30 minutes later traffic is still reaching the old deployment.

There are a few things to take into account here:
Try explicitly telling Cloud Run to migrate all traffic to the latest revision. You can do that by adding the following to your .yaml file
metadata:
name: SERVICE
spec:
...
traffic:
- latestRevision: true
percent: 100
Try always adding :latest tag upon the building of a new image
so instead of only having let's say gcr.io/project/newimage it would be gcr.io/project/newimage:latest. This way you will ensure the latest image is being used and not previously automatically assigned tags.
If neither fix your issue, then please provide the logs as there might be something useful that indicates what is the root cause. (Also let us know if you are using any caching config)

You can tell Cloud Run to route all traffic to the latest instance with gcloud run services update-traffic [SERVICE NAME] --to-latest. That will route all traffic to the latest deployment, and update the traffic allocation as you deploy new instances.
You might not want to use this if you need to validate the service after deployment and before "opening the floodgates", or if you're doing canary deployments.

Related

gitlab on kubernetes/docker: pipeline failing: Error cleaning up configmap: resource name may not be empty

We run gitlab-ee-12.10.12.0 under docker and use kubernetes to manage the gitlab-runner
All of a sudden a couple of days ago, all my pipelines, in all my projects, stopped working. NOTHING CHANGED except I pushed some code. Yet ALL projects (even those with no repo changes) are failing. I've looked at every certificate I can find anywhere in the system and they're all good so it wasn't a cert expiry. Disk space is at 45% so it's not that. Nobody logged into the server. Nobody touched any admin screens. One code push triggered the pipeline successfully, next one didn't. I've looked at everything. I've updated the docker images for gitlab and gitlab-runner. I've deleted every kubernetes pod I can find in the namespace and let them get relaunched (my go-to for solving k8s problems :-) ).
Every pipeline run in every project now says this:
Running with gitlab-runner 14.3.2 (e0218c92)
on Kubernetes Runner vXpkH225
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: gitlab
Using Kubernetes executor with image lxnsok01.wg.dir.telstra.com:9000/broadworks-build:latest ...
Using attach strategy to execute scripts...
Preparing environment
00:00
ERROR: Error cleaning up configmap: resource name may not be empty
ERROR: Job failed (system failure): prepare environment: setting up build pod: error setting ownerReferences: configmaps "runner-vxpkh225-project-47-concurrent-0-scripts9ds4c" is forbidden: User "system:serviceaccount:gitlab:gitlab" cannot update resource "configmaps" in API group "" in the namespace "gitlab". Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
That URL talks about bash logout scripts containing bad things. But nothing changed. At least we didn't change anything.
I believe the second error implying that the user doesn't have permissions is not correct. It seems to just be saying that the user couldn't do it. The primary error being the previous one about the configmaps clean up. Again, no serviceaccounts, roles, rolebindings, etc have changed in any way.
So I'm trying to work out what may CAUSE that error. What does it MEAN? What resource name is empty? Where can I find out?
I've checked the output from "docker container logs " and it says exactly what's in the error above. No more, no less.
The only thing I can think of is perhaps 14.3.2 of gitlab-runner doesn't like my k8s or the config. Going back and checking, it seems this has changed. Previous working pipelines ran in 14.1.
So two questions then: 1) Any ideas how to fix the problem (eg update some config, clear some crud, whatever) and 2) How to I get gitlab to use a runner other than :latest?
Turns out something DID change. gitlab-runner changed and kubernetes pulled gitlab/gitlab-runner:latest between runs. Seems gitlab-runner 14.3 has a problem with my kubernetes. I went back through my pipelines and the last successful one was using 14.1
So, after a day of working through it, I edited the relevant k8s deployment to redefine the image tag used for gitlab-runner to :v14.1.0 which is the last one that worked for me.
Maybe I'll wait a few weeks and try a later one (now that I know how to easily change that tag) and see if the issue gets fixed. And perhaps go raise an issue on gitlab-runner

HTTP 503 errors from Cloud Run app in one GCP projects but not the other

The issue
I am using the same container (similar resources) on 2 projects -- production and staging. Both have custom domains setup with cloud flare DNS and are on the same region. Container build is done in a completely different project and IAM is used to handle the access to these containers. Both project services have 80 concurrency and 300 seconds time out for all 5 services.
All was working good 3 days back but from yesterday almost all cloud run services on staging (thankfully) started throwing 503 randomly and for most requests. Some services were not even deployed for a week. The same containers are running fine on production project, no issues.
Ruled out causes
anything to do with Cloudflare (I tried the URL cloud run gives it has the issue of 503)
anything with build or containers (I tried the demo hello world container with go - it has the issue too)
Resources: I tried giving it 1 GB ram and 2 cpus but the problem persisted
issues on deployment (deploy multiple branches - didn't work)
issue in code (just routed traffic to old 2-3 days old revision but still issue was there)
Issue on service level ( I used the same container to create a completely new service, it also had the issue)
Possible causes
something on cloud run or cloud run load balancer
may some env vars but that also doesn't seem to be the issue
Response Codes
I just ran a quick check with vegeta (30 secs with 10 rps) same container on staging and production for a static file path and below are the responses:
Staging
Production
If anyone has any insights on this it would help greatly.
Based on your explanation, I cannot understand what's going on. You explained what doesn't work but didn't point out what works (does your app run locally? are you able to run a hello world sample application?)
So I'll recommend some debugging tips.
If you're getting a HTTP 5xx status code, first, check your application's logs. Is it printing ANY logs? Is there logs of a request? Does your application have and deployed with "verbose" logging setting?
Try hitting your *.run.app domain directly. If it's not working, then it's not a domain or dns or cloudflare issue. Try debugging and/or redeploying your app. Deploy something that works first. If *.run.app domain works, then the issue is not in Cloud Run.
Make sure you aren't using Cloudflare in proxy mode (e.g. your DNS points to Cloud Run; not Cloudflare) as there's a known issue about certificate issuance/renewals when domains are behind Cloudflare, right now.
Beyond these, if a redeploy seems to solve your problem, maybe try redeploying. It could be very likely some configuration recently became different two different projects.
See Cloud Run Troubleshooting
https://cloud.google.com/run/docs/troubleshooting
Do you see 503 errors under high load?
The Cloud Run (fully managed) load balancer strives to distribute incoming requests over the necessary amount of container instances. However, if your container instances are using a lot of CPU to process requests, the container instances will not be able to process all of the requests, and some requests will be returned with a 503 error code.
To mitigate this, try lowering the concurrency. Start from concurrency = 1 and gradually increase it to find an acceptable value. Refer to Setting concurrency for more details.

What is the benefit of dockerize the SPA web app

I dockerize my SPA web app by using nginx as base image then copy my nginx.conf and build files. As Dockerize Vue.js App mention I think many dockerizing SPA solutions are similar.
If I don't use docker I will first build SPA code then copy the build files to nginx root directory (After install/set up nginx I barely change it at all)
So what's the benefit of dockerizing SPA?
----- update -----
One answer said "If the app is dockerized each time you are releasing a new version of your app the Nginx server gets all the new updates available for it." I don't agree with that at all. I don't need the latest version of nginx, after all I only use the basic feature of nginx. Some of my team members just use the nginx version bundled with linux when doing development. If my docker image uses the latest ngixn it actually creates the different environment than the development environment.
I realize my question will be probably closed b/c it will be seen as opinion based. But I have googled it and can't find a satisfied answer.
If I don't use docker I will first build SPA code then copy the build files to nginx root directory (After install/set up nginx I barely change it at all)
This is a security concern... fire and forget is what it seems is being done here regarding the server.
If the app is dockerized each time you are releasing a new version of your app the Nginx server gets all the new updates available for it.
Bear in mind that if your App does not release new versions in a weekly bases then you need to consider to rebuild the docker images at least weekly in order to get the updates and keep everything up to date with the last security patches.
So what's the benefit of dockerizing SPA?
Same environment across development, staging and production. This is called 100% parity across all stages were you run your app, and this true for no matter what type of application you deploy.
If something doesn't work in production you can pull the docker image by the digest and run it locally to debug and try to understand where is the problem. If you need to ssh to a production server it means that you automation pipeline have failed or maybe your are not even using one...
Tools like Webpack compile Javascript applications to static files that can then be served with your choice of HTTP server. Once you’ve built your SPA, the built files are indistinguishable from pages like index.html and other assets like image files: they’re just static files that get served by some HTTP server.
A Docker container encapsulates a single running process. It doesn’t really do a good job at containing these static files per se.
You’ll frequently see “SPA Docker containers” that run a developer-oriented HTTP server. There’s no particular benefit to doing this, though. You can get an equally good developer experience just by developing your application locally, running npm run build or whatever to create a dist directory, and then publishing that the same way you’d publish other assets. An automation pipeline is helpful here, but this isn’t a task Docker makes wildly simpler.
(Also remember when you do this that the built application runs on the user’s browser. That means it can’t see any of the Docker-internal networking machinery: it can’t use the Docker-internal IP addresses and it can’t use the built-in Docker DNS service. Everything it reaches has to be on docker run -p published ports and it has to use a DNS name that reaches the host. The browser literally has no idea Docker is involved in this at all.)
There are a few benefits.
Firstly, building a Docker image means you are explicitly stating what your application's canonical run-time is - this version of nginx, with that SSL configuration, whatever. Changes to the run-time are in source control, so you can upgrade predictably and reversibly. You say you don't want "the latest version" - but what if that latest version patches a critical security vulnerability? Being able to upgrade predictably, on "disposable" containers means you upgrade when you want to.
Secondly, if the entire development team uses the same Docker image, you avoid the challenges with different configurations giving the "it works on my machine" response to bugs - in SPAs, different configurations of nginx can lead to different behaviour. New developers who join the team don't have to install or configure anything, and can use any device they want - they can be certain that what runs in Docker is the same as it is for all the other developers.
Thirdly, by having all your environments containerized (not just development, but test and production), you make it easy to move versions through the pipeline and only change the environment-specific values.
Now, for an SPA, these benefits are real, but may not outweigh the cost and effort of creating and maintaining Docker images - inevitably, the Docker image becomes a bottleneck and the first thing people blame. I'd only invest in it if you see lots of environment-specific pain (suggesting having a consistent run-time environment is necessary), or if you see lots of "it works on my machine" type of bug.

How to launch an app via Docker on every Pull Request?

I run Jenkins and my app is dockerized, i.e. when I run the container it exposes port 3000 and I can point my browser there. On every Github PR I would like to deploy that git commit to a running container somewhere and have Jenkins post back to the PR the link where it can be accessed. On any PR updates it gets auto re-deployed and on PR close/resolve it gets torn down.
I have looked at kubernetes and a little rancher, but what's the easiest way to get this going assuming I can only deploy to one box?
There is a jenkins plugin github-pullrequest can resolve your problem.
Prerequisites:
You have a jenkins server can access by internet if you want trigger your build by a webhook.
Your have a github API token to access/admin your git repository, it can be generate by yourself in settings.
Please follow the guide configuration to setup your jenkins integration with github.
After configuration:
you can trigger your build by PR events: opened/commit changed/closed, or comment with specific pattern.
you can get a PR status via environment variable ${GITHUB_PR_STATE}, so you can start or stop a container on specific value.
you can publish a comment to a PR to tell the address of your web service after you started docker container.
About expose port of cotainer with multi PR, your can just run container with -p 3000, it will auto expose a port in a range on the docker host, docker port <container> will show the specific port number, so for example:
container1 with address <host>:32667 for PR1
container2 with address <host>:35989 for PR2
I think the simplest solution to this would be to create two different Jenkins Jobs, one which deploys and the other which nukes it. The trigger for this can be set 2 webhooks in GitHub one for PR create and one for PR resolve.
As Sylvain GIROD pointed out:
With only one box to run the application you need to change the port that is exposed. When a GitHub PR happens you deploy your application (docker run -p newport:containerport). If you are deploying services you change the target port.
Then you send the link with this port back to the user (email?).
Additionally you need some key-value store to remember which pod was created for which users so that you can decide on a new PR whether to destroy the old containers.
I would also suggest giving the services a time to live and regularly cleaning up stale containers/services.

Docker Hub Update Notifications

Are there any good methods/tools to get notifications on updates to containers on Docker Hub? Just to clarify, I don't want to automatically update, just somehow be notified of updates.
I'm currently running a Kubernetes cluster so if I could just specify a list of containers (as opposed to it using the ones on my system) that would be great.
Have you tried docker-notify? It runs on Node (perhaps imperfect).. but within it's own container. You'll need a mailserver or a webhook for it to trigger against.
I'm surprised this isn't supported by docker.exe or Docker-hub.. base image changes (eg Alpine) cause a lot of churn with both hub-dependent (Nginx, Postgres) and local dependent containers possibly needing rebuild.
I also found myself needing this, so I helped build image-watch.com, which is a subscription-based service that watches Docker images and sends notifications when they are updated. As a hosted service, it's not free, but we tried to make it as cheap as possible.

Resources