how do I remove persistent users off of a node disabled by a monitor? - monitor

In a Big-IP LTM system I have http monitors setup for a pool so that system owners can remove a file a node in the pool to remove a node from rotation. But monitors mark a node as disabled, not offline, so cookie-based persistence will still send existing users to the node that should be down. Whats the best way to use monitors to either offline a node instead of disabling it, or forcing users to a new node despite persistence?

Disabling a pool member still allows active connections/persistent connections to function. And depending on how you have persistence defined, that can end up being a LONG time.
Forced Offline still allows active connections to complete their transactions but would move previously persisted traffic to other nodes.
When doing maintenance I would force the node offline and then give sessions 5 to 10 minutes to complete before taking the node down in infrastructure. There's no good way to drop a node with active connections further dependent on what the client is actually doing.
Here's a great response on F5's Community by one of their MVP's to help explain connections.
Pool Member disabled/forced offline behavior # DevCentral
Let me know if you need more details or if you're seeing different behavior. Also, you can do all of this with REST API so you don't need to use the GUI. Doing a quick Node Offline/Online is super easy and quick.

Related

How many Redis connections are used with Actioncable?

I'm looking at moving my app's deployment to Heroku, and I'd like to determine if it can correctly run there on the basic plan before putting in the effort to migrate. The basic plan limits Redis to 20 connections.
I don't fundamentally understand the Rails/Redis connection architecture. Is there a single connection to Actioncable, which is then distributing the data, or is the connection per actual client (i.e. one connection for every browser tab)?
As per the docs,
An individual user will create one consumer-connection pair per browser tab, window, or device they have open.
ActionCable lets you identify a connection using a connection identifier, typically a global object called current_user in most cases. With this approach, you can later retrieve all open connections by a given user (and potentially disconnect them all if the user is deleted or unauthorized or have too many connections open).
Also, note that ActionCable uses a worker pool to run connection callbacks and channel actions in isolation from your server's main thread.

Are ActionCable Channels instances shared across clients?

I identify connections by public IP address. My understanding until recently (doubting it) is that in such a case, client's subscribing to a channel would reuse the same Channel instance.
A real world example is as follows:
I'm building up an app that requests information from a certain source that is not owned by me. This is done through an HTTP request and a Job. The external resource changes in periodic intervals of time.
My DataChannel class, which inherits from ApplicationCable::Channel manages a cache of the last request so new clients subscribed from the same IP address won't start a new request, but instead reuse the last one.
Summing up:
If I open two tabs and each one subscribes to a channel, do I get two Channel instances even if the connection identifier is the same?
I'm not seeking for a way to do so, just pointing me in the right direction is enough and actually far more valuable.
No, they aren't.
I needed to add some synchronization between all instances owned by a client with connection identified with IP address.
For this, I used Redis DLM, so I could acquire and release locks and thus perform computations which were intended to be unique.
There might be a better way to do so, but I couldn't imagine one that didn't require patching the ActionCable sources. Any further comment is appreciated as always.

Erlang/Elixir on Docker and Hot Code Swap

One of the features of Erlang (and, by definition, Elixir) is that you can do hot code swap. However, this seems to be at odd with Docker, where you would need to stop your instances and restart new ones with new images holding the new code. This essentially seem to be what everyone does.
This being said, I also know that it is possible to use one hidden node to distribute updates to all other nodes over network. Of course, just like that is sounds like asking for trouble, but...
My question would be the following: has anyone tried and achieved with reasonable success to set up a Docker-based infrastructure for Erlang/Elixir that allowed Hot-code swapping? If so, what are the do's, don'ts and caveats?
The story
Imagine a system to handle mobile phone calls or mobile data access (that's what Erlang was created for). There are gateway servers that maintain the user session for the duration of the call, or the data access session (I will call it the session going forward). Those server have an in-memory representation of the session for as long as the session is active (user is connected).
Now there is another system that calculates how much to charge the user for the call or the data transfered (call it PDF - Policy Decision Function). Both systems are connected in such a way that the gateway server creates a handful of TCP connections to PDF and it drops users sessions if those TCP connections go down. The gateway can handle a few hundred thousand customers at a time. Whenever there is an event that the user needs to be charged for (next data transfer, another minute of the call) the gateway notifies PDF about the fact and PDF subtracts a specific amount of money from the user account. When the user account is empty PDF notifies the gateway to disconnect the call (you've run out of money, you need to top up).
Your question
Finally let's talk about your question in this context. We want to upgrade a PDF node and the node is running on Docker. We create a new Docker instance with the new version of the software, but we can't shut down the old version (there are hundreds of thousands of customers in the middle of their call, we can't disconnect them). But we need to move the customers somehow from the old PDF to the new version. So we tell the gateway node to create any new connections with the updated node instead of the old PDF. Customers can be chatty and also some of them may have a long-running data connections (downloading Windows 10 iso) so the whole operation takes 2-3 days to complete. That's how long it can take to upgrade one version of the software to another in case of a critical bug. And there may be dozens of servers like this one, each one handling hundreds thousands of customers.
But what if we used the Erlang release handler instead? We create the relup file with the new version of the software. We test it properly and deploy to PDF nodes. Each node is upgraded in-place - the internal state of the application is converted, the node is running the new version of the software. But most importantly, the TCP connection with the gateway server has not been dropped. So customers happily continue their calls or are downloading the latest Windows iso while we are upgrading the system. All is done in 10 seconds rather than 2-3 days.
The answer
This is an example of a specific system with specific requirements. Docker and Erlang's Release Handling are orthogonal technologies. You can use either or both, it all boils down to the following:
Requirements
Cost
Will you have enough resources to test both approaches predictably and enough patience to teach your Ops team so that they can deploy the system using either method? What if the testing facility cost millions of pounds (because of the required hardware) and can use only one of those two methods at a time (because the test cycle takes days)?
The pragmatic approach might be to deploy the nodes initially using Docker and then upgrade them with Erlang release handler (if you need to use Docker in the first place). Or, if your system doesn't need to be available during the upgrade (as the example PDF system does), you might just opt for always deploying new versions with Docker and forget about release handling. Or you may as well stick with release handler and forget about Docker if you need quick and reliable updates on-the-fly and Docker would be only used for the initial deployment. I hope that helps.

Load Balancing in ASP.NET MVC Web Application. What can/can't be done?

I'm in the middle of developing a web application and have been asked the question whether it will work with a load balancer. My initial reaction is yes, since there is no state tracked between requests anywhere in the system. However, there is some application specific state loaded on app start (configuration settings from the database mainly.)
This data is all Read Only. Is it sufficient to rely on the normal cache dependency mechanisms to manage this and invalidate these objects across all the applications in the cluster or would I have to move to a shared cache system like App Fabric to ensure reliability/consistency?
With diagnostics enabled, I've got numerous logging calls using EventSource.Write and an out of process logger picking these up. I assume in this case, I'd need one logger installed on each of the servers in the cluster to pick up the events each one triggers. I'm not too fussed about that, but what is a good way to identify which server in the cluster serviced the request?
If you initialize the data on each server seperately and it is read-only, there's no problem. The separate applications will have a copy each.
Yes, you'd need a logger on each instance. In order to identify the server you could include the servers' IP into the log. That way you can track the server. (provided you have static IP's, but I assume you do).

Is this the right way of building an Erlang network server for multi-client apps?

I'm building a small network server for a multi-player board game using Erlang.
This network server uses a local instance of Mnesia DB to store a session for each connected client app. Inside each client's record (session) stored in this local Mnesia, I store the client's PID and NODE (the node where a client is logged in).
I plan to deploy this network server on at least 2 connected servers (Node A & B).
So in order to allow a Client A who is logged in on Node A to search (query to Mnesia) for a Client B who is logged in on Node B, I replicate the Mnesia session table from Node A to Node B or vise-versa.
After Client A queries the PID and NODE of the Client B, then Client A and B can communicate with each other directly.
Is this the right way of establishing connection between two client apps that are logged-in on two different Erlang nodes?
Creating a system where two or more nodes are perfectly in sync is by definition impossible. In practice however, you might get close enough that it works for your particular problem.
You don't say the exact reason behind running on two nodes, so I'm going to assume it is for scalability. With many nodes, your system will also be more available and fault-tolerant if you get it right. However, the problem could be simplified if you know you only ever will run in a single node, and need the other node as a hot-slave to take over if the master is unavailable.
To establish a connection between two processes on two different nodes, you need some global addressing(user id 123 is pid<123,456,0>). If you also care about only one process running for User A running at a time, you also need a lock or allow only unique registrations of the addressing. If you also want to grow, you need a way to add more nodes, either while your system is running or when it is stopped.
Now, there are already some solutions out there that helps solving your problem, with different trade-offs:
gproc in global mode, allows registering a process under a given key(which gives you addressing and locking). This is distributed to the entire cluster, with no single point of failure, however the leader election (at least when I last looked at it) works only for nodes that was available when the system started. Adding new nodes requires an experimental version of gen_leader or stopping the system. Within your own code, if you know two players are only going to ever talk to each other, you could start them on the same node.
riak_core, allows you to build on top of the well-tested and proved architecture used in riak KV and riak search. It maps the keys into buckets in a fashion that allows you to add new nodes and have the keys redistributed. You can plug into this mechanism and move your processes. This approach does not let you decide where to start your processes, so if you have much communication between them, this will go across the network.
Using mnesia with distributed transactions, allows you to guarantee that every node has the data before the transaction is commited, this would give you distribution of the addressing and locking, but you would have to do everything else on top of this(like releasing the lock). Note: I have never used distributed transactions in production, so I cannot tell you how reliable they are. Also, due to being distributed, expect latency. Note2: You should check exactly how you would add more nodes and have the tables replicated, for example if it is possible without stopping mnesia.
Zookeper/doozer/roll your own, provides a centralized highly-available database which you may use to store the addressing. In this case you would need to handle unregistering yourself. Adding nodes while the system is running is easy from the addressing point of view, but you need some way to have your application learn about the new nodes and start spawning processes there.
Also, it is not necessary to store the node, as the pid contains enough information to send the messages directly to the correct node.
As a cool trick which you may already be aware of, pids may be serialized (as may all data within the VM) to a binary. Use term_to_binary/1 and binary_to_term/1 to convert between the actual pid inside the VM and a binary which you may store in whatever accepts binary data without mangling it in some stupid way.

Resources