How can I ensure a persistent connection to a specific GCP Cloud Run instance? - docker

I've built an app (with flask, flask-login and dash) on GCP Cloud Run. The app allows users to login, look at some fancy dashboards and leave comments on certain pages. It works great performance-wise: instances spin up quickly for users with minimal lag, the BigQuery interface I built works great and pub/sub messages sent from user interactions do exactly what they're supposed to do.
The only issue I'm having right now is that there's something weird about which instance of a container a user connects to. What will often happen is a user will login to my app via their browser successfully, and then when navigating to another password-protected page will receive a 401 error (seemingly randomly).
My belief is that this behavior is happening because the navigation request (clicking a link to another password protected page) from the user to another password protected page spins up another Cloud Run instance. Is there any way to force Cloud Run to maintain a specific instance of my container for a given request? So that if a user logs in and then navigates GCP doesn't take the next request and decide to autoscale?
I've experimented with setting the maximum number of requests for the app's frontend container to 1 but it doesn't seem to improve this behavior which happens sporadically throughout a given user's session.
To clarify, the frontend part of the app is still usable, but it is an annoying user experience to constantly have to login again.
Any help or guidance is appreciated!

The answer was as simple as turning on session affinity per #DazWilkin 's comment.
What I did:
Went to the Cloud Run dashboard on GCP and selected the service of interest
Clicked "Edit and Deploy New Revision"
Went to the "Connections"
Checked the box next to the "Session affinity" preview feature
Clicked deploy
This ended up completely solving the problem!

Related

Is it possible to upload a docker image from the web interface?

Good morning everyone.
I've been having a problem for some time, I have some users in my Rails application, and let's suppose that each user has a resource in their panel where they can register certain parameters in a form.
Now comes what I imagine is not possible, it would be possible by a web interface the user after registering his parameters, he would have access to a button that would start a new docker container on the same server (droplet) of the application or on another server ( droplet), and of course the image that would be raised, would be started with the parameters that the user registered in the resource.
Sounds confusing, but it's the only solution I have for users to have a live monitor type. Because within this container, requests will be triggered with certain conditions and times, depending on the parameters sent by the users.

Zero downtime/blue-green deployment of Single Page Application (SPA)

Yesterday together with the team we were discussing the possibility of using zero downtime deployments to support our single page application.
While discussing it we identified one edge case for it.
After user loads the page in his browser it cannot be removed from memory until he reloads the page. It means that if user loads the page and starts working with the website (for example starts typing a long article like I am doing now) then he cannot receive an updated version of it until he reloads the page.
We could ignore the fact that user sees old application version in his browser but there 2 points listed below.
In case we introduce a breaking change to HTTP Api that is used to serve spa then the user will not be able to save his article (data loss!) or can receive some other error when performing other backend related action.
When user navigates to a new page without reloading SPA he can receive a template of the next page or of some control that is incompatible with outer old container. It can kead to broken markup or application logic.
We cannot force user to relogin as he can be in the middle of typing his article and it is just a bad UX.
Taking all theses points into account one could propose the following solution:
User 1 loads v1 of the SPA into his browser.
Alongside with auth token the version information is sent to browser (using JWT for example).
We want to deploy v2 version of our application. We spin up the v2 version but do not disable v1.
User 2 loads v2 of SPA into his browser
User 1 goes to the next page in SPA. Load balancer checks the version information in his token and routes the traffic of the user 1 to v1 server.
User 2 gets routed in the same way to v2.
User 1 logs out the app and closes the browser.
User 1 logs in back - this time he receives v2.
After v1 application does not receive any traffic for a long time it gets disposed.
In this approach however it is possible to have multiple versions alive, more than 2 (for example if user stays online for whe whole day or two). It means that we will not be able to migrate the database to the new schema until the last user gets logged out (image how it could work for sites like Facebook). It is not a problem to have multiple versions however, such tools as Docker and Rancher allow us to do it easily.
Also in the step 7. User needs to reload the page or close the browser-otherwise he still will be working with v1 and we cannot force him to the next version.
The question I have is what approach do you use to do zero downtime/blue-green deployment of single page applications?
How do you manage the lifetime of "blue" version of your application when you are switching traffic to "green" version, especially in respect to existing "blue" client applications.
Did you solve these issues, do you know any other solution?
I've been struggling with this problem for quite some time and tried several approaches and one specifically worked really well:
Use hashed names when bundling the SPA (including images, et al)
Use a static asset bucket (e.g.: AWS S3) and upload all assets to it before the deployment process kicks in
Enforce internal guidelines to minimize API contracts to be broken (i.e: fields from an endpoint should only be removed after X releases)
Deploy with usual blue/green strategy
Rationale
Using a bucket with hashed bundles ensures that if a customer gets the old version of the SPA, all of its assets will be available before/during/after any deployment process.
Enforcing internal guidelines to not break API compatibility is sometimes tricky but it comes from the very same principles applied to any public API. Embracing/adapting an API deprecation policy from big players helps when communicating with the team with a concrete example.
One approach you might consider is gradual reloading of the SPA in such a moment, when it is not burdensome (or even unnoticeable) for end user.
Suggested approach:
Colored versions of the system (components providing back-end services, API and front-end) "know" (runtimes are provided with) their "color". Component providing users with front-end application embeds this color information into the SPA. This is then sent (via cookie or custom HTTP header) with every request SPA is making to the backend.
Component that routes API calls (API gateway, load balancer, nginx, HAproxy, custom Zuul-based router etc) is aware of this color information and uses it to direct traffic to infrastructure of proper color.
Additionally there is a public URL (not provided by "colored" infrastructure - for example S3 file provided via CloudFront or other proxy) with latest version color. SPA is checking this version every given period of time (60 or 120 seconds). If version does not match the one SPA was provided when loading then on the major next route change page is reloaded "physically", instead of realizing this navigation in browser only.
You can choose which route changes are verifying this version in such a way that it is least obtrusive to the user (possibly almost unnoticeable).
If you choose some of the routes that are used every day by all users then pretty soon all users will migrate to the latest color. Those who have unused opened browser window for long periods of time (computer hibernated for two weeks?) can be handled by forcing reload after certain period of inactivity.
I hope I managed to make myself sound at last a bit cohesive :-)
Regards,
Wojtek
Not sure why would you go for a complete overhaul of your UI since their is always a learning curve involved.Practically in real world it would be a bad idea to switch over to a new UI immediately. You would allow customers to switch over to the new interface over a period of time and then disable older version after a forewarning. Not worth the effort of having such real time switch. A/B testing could be a way to introduce customers to the new interface and then do an actual rollout.
The technique you're describing is called blue-green deployment; You start with your existing server (blue) and add your updated server (green). All new traffic from that point on is redirected to the green environment. The blue environment is only there for servicing existing http connections and also for an optional "roll back" in case the green environment hits major problems. Eventually the "blue" environment can be retired when it has finished servicing all of its requests.
This technique requires that the two systems be somewhat similar. Database schema for instance may make it inpractical.

dashDB service not repsonding?

Last night, I was receiving memory issues for some of my queries (inner joins on multiple tables) against my dashDB service on Bluemix. Today, I cannot even access the dashDB service.
When I access my project instance on Bluemix, using my web browser, and choose my dashDB service, I am presented with a grey page and a white spinning wheel. I never get past that.
Is there an issue with dashDB in general? Could it be just my instance of it? Any way to fix it?
Thanks in advance!
Dan
It doesn't appear to be any general problem as shown at https://developer.ibm.com/bluemix/support/#status
I advice to open a ticket to Bluemix Support.
You can do that using one of the following methods:
Use the Support Widget. It is available from the user avatar in the
upper right corner of the main Bluemix UI. After opening the support
widget panel, select Get Help > Get In Touch, select the type of
assistance you need, and then fill out the support form.
Use the Support Site 'Get Help' form. This form is available on a
separate site that is made available for ticket submission when you
cannot log into Bluemix and access the Support Widget. Go to
http://ibm.biz/bluemixsupport and fill in the support request form.

How to properly handle asynchronous database replication?

I'm considering using Amazon RDS with read replicas to scale our database.
Some of our controllers in our web application are read/write, some of them are read-only. We already have an automated way for identifying which controllers are read-only, so my first approach would have been to open a connection to the master when requesting a read/write controller, else open a connection to a read replica when requesting a read-only controller.
In theory, that sounds good. But then I stumbled open the replication lag concept, which basically says that a replica can be several seconds behind the master.
Let's imagine the following use case then:
The browser posts to /create-account, which is read/write, thus connecting to the master
The account is created, transaction committed, and the browser gets redirected to /member-area
The browser opens /member-area, which is read-only, thus connecting to a replica. If the replica is even slightly behind the master, the user account might not exist yet on the replica, thus resulting in an error.
How do you realistically use read replicas in your application, to avoid these potential issues?
I worked with application which used pseudo-vertical partitioning. Since only handful of data was time-sensitive the application usually fetched from slaves and from master only in selected cases.
As an example: when the User updated their password application would always ask master for authentication prompt. When changing non-time sensitive data (like User Preferences) it would display success dialog along with information that it might take a while until everything is updated.
Some other ideas which might or might not work depending on environment:
After update compute entity checksum, store it in application cache and when fetching the data always ask for compliance with checksum
Use browser store/cookie for storing delta ensuring User always sees the latest version
Add "up-to-date" flag and invalidate synchronously on every slave node before/after update
Whatever solution you choose keep in mind it's subject of CAP Theorem.
This is a hard problem, and there are lots of potential solutions. One potential solution is to look at what facebook did,
TLDR - read requests get routed to the read only copy, but if you do a write, then for the next 20 seconds, all your reads go to the writeable master.
The other main problem we had to address was that only our master
databases in California could accept write operations. This fact meant
we needed to avoid serving pages that did database writes from
Virginia because each one would have to cross the country to our
master databases in California. Fortunately, our most frequently
accessed pages (home page, profiles, photo pages) don't do any writes
under normal operation. The problem thus boiled down to, when a user
makes a request for a page, how do we decide if it is "safe" to send
to Virginia or if it must be routed to California?
This question turned out to have a relatively straightforward answer.
One of the first servers a user request to Facebook hits is called a
load balancer; this machine's primary responsibility is picking a web
server to handle the request but it also serves a number of other
purposes: protecting against denial of service attacks and
multiplexing user connections to name a few. This load balancer has
the capability to run in Layer 7 mode where it can examine the URI a
user is requesting and make routing decisions based on that
information. This feature meant it was easy to tell the load balancer
about our "safe" pages and it could decide whether to send the request
to Virginia or California based on the page name and the user's
location.
There is another wrinkle to this problem, however. Let's say you go to
editprofile.php to change your hometown. This page isn't marked as
safe so it gets routed to California and you make the change. Then you
go to view your profile and, since it is a safe page, we send you to
Virginia. Because of the replication lag we mentioned earlier,
however, you might not see the change you just made! This experience
is very confusing for a user and also leads to double posting. We got
around this concern by setting a cookie in your browser with the
current time whenever you write something to our databases. The load
balancer also looks for that cookie and, if it notices that you wrote
something within 20 seconds, will unconditionally send you to
California. Then when 20 seconds have passed and we're certain the
data has replicated to Virginia, we'll allow you to go back for safe
pages.

User Disconnection Detection (i.e. "Online Status") Daemon

Summary: is there a daemon that will do postbacks when a user connects/disconnects via TCP, or is it a good idea to write one?
Details:
There are a number of questions based around this already; but I believe that this is a different "twist" on it. We're writing a Ruby on Rails web application, and we would like to be able to tell if a user is "online" or "offline", where the following definitions apply:
"online" - the user's browser is open and maintaining a TCP connection to one of our servers.
"offline" - the user's browser is no longer connected to one of our servers.
What we're thinking is a convenient way of doing this is to run a completely separate "online state" server that each of our users will connect to (exactly once):
when a connection is made to the "online state" server, it will postback to our actual RoR site and let it know "this user just logged on".
when a connection is lost from the "online state" server, it will postback to our actual RoR site and let it know "this user just logged off".
This methodology seems reasonable and keeps things quite modularized (the online state server, for instance, will be quite simple, which is nice). We're able to write this online state server, but have the following questions:
Any specific problems with the above architecture that we haven't taken into account?
Is there a daemon or application out there that does this already? Why reinvent the wheel, if it has already been written?
Is there a push server out there that offers this functionality (i.e. it maintains connections to the users, but will postback or send notifications upstream to the web servers when a user connects or disconnects?)
Is this something you envisage users would install on their systems?
If you are looking for a browser based system, WebSockets are probably your only option using something like Socket.IO http://socket.io/.
The node.js socket server provided as part of this project can be found on github: http://github.com/LearnBoost/Socket.IO-node
Node.js is a great platform designed for exactly this problem domain and there are a number of WebSocket servers for node.
Unless your app is entirely ajax based and uses a single parent page, you would need to create a persistent parent frame containing the socket that wraps your application, as each time the user clicks a link the page unloads and reloads, resulting in disconnection and re-connection from the state server.

Resources