We have a SpringBoot Web app docker container deployed in Kubernetes with 3 replicas. when the controller redirects to a different url within the same controller, we pass an object via the flashAttributes. When we run 1 pod, everything works. But when I scale to 3 pods, the object comes with all internal attributes set to Null. Has anyone come across the issue ? If so prescribe a solution ?
Thanks,
SR
Kubernetes can send multiple requests from a session to different pod within a deployment. That's why the data is lost because the data might in memory for one pod but another pod will not have that data at all.
To avoid this - you can either maintain the session cache in an external store like Redis or use sticky sessions so that for given session requests are sent to the same pod.
Some pointers to the solution
Using Redis for external session data
Sticky sessions - this approach needs using Nginx as an ingress controller
Related
I am working on a web app where a single environmental variable is used for specifying a certain server (a rest api), like this:
.env:
...
URL_SERVER_API="http://localhost:8080"
...
the application is running inside a container, and it uses the server api variable for two things related to my problem:
It could generate and serve dynamic html where it append URL_SERVER_API to complete full api urls, for example {{URL_SERVER_API}}/someendpoint
It calls the api directly from a (php) script using CURL, defining the endpoint in the same fashion as 1
so I end up with a situation where if I set URL_SERVER_API to localhost:8080 the main application forms valid urls to call because the api app (which is also running in a docker container) was exposed in the correspondingly port, but the CURL calls don't work because localhost:8080 is not a known server inside the container.
Also I configured a bridge network and attached both apps to it, and I was capable to ping from the main app to the api succesfuly (e.g. ping api_docker), then when I set URL_SERVER_API=api_docker the CURL calls to the api are successfull, but the html files returned from the main app are constructed with unreachable urls like http://api_docker/someendpoint
Hope you can see my issue
I am able to solve the issue by having two variables URL_SERVER_API and URL_SERVER_API_INTERNAL and using the first for html serving and the second for the CURL calls, but I think it is not the best solution to add new variables to remember because I am not in charge to do so.
Thanks for the time taken to read
I have the following use case that I would appreciate any input on. I have a Docker Swarm running with Traefik pointed to it for ingress and routing. Currently I have it as a single service that is defined to have 6 replicas of the same image, so works out to be two containers each on three nodes.
The containers basically host a GraphQL server and my requirement is that depending on which client the request is coming from, it always goes to the same specific container (ie replica). So as an example say I have user1 and user2 at client1 and user3 and user4 at client2, if user1 makes a request and it goes to replica1, then if user2 makes a request it MUST go to replica1 as well. So basically, I could take a numeric hash on the client id (client1) and mod 6 it and decide which replica it goes to and in that way make sure any subsequent calls from any user in that client id goes to the same replica. Additionally, that information of what client the call is coming from is coded in a JWT token that the user sends in their request.
Any idea how I would go about changing my Docker Swarm to implement this? My best guess is to change the swarm to not be 6 replicas and instead define each container as a separate service with its own port. Then I could potentially point Traefik to nginx or something similar which would then receive the request, grab the JWT, decode it to find the client id, take a hash and then internally route it to the appropriate node:port combination.
But I feel like there must be a more elegant and simpler way of doing this. Maybe Traefik could facilitate this directly somehow or Docker Swarm has some configuration that I don't know about, that could be used. Any ideas?
Edit: Just to clarify my usecase, not just looking for the same user to always go to the same container but the same type of user to always go to the same container
For this kind of routing you need to setup Traefik for Sticky Sessions
This is a Traefik middleware that adds a cookie to responses that is used in subsequent requests to route to the same service.
We have a multiservice architecture consisting of HAProxy front end ( we can change this to another proxy if required), a mongodb database, and multiple instances of a backend app running under Docker Swarm.
Once an initial request is routed to an instance ( container ) of the backend app we would like all future requests from mobile clients to be routed to the same instance. The backend app uses TCP sockets to communicate with a VoIP PBX.
Ideally we would like to control the number of instances of the backend app using the replicas key in the docker-compose file. However if a container died and was recreated we would require mobile clients continue routing to the same container. The reason for this is each container is holding state info.
Is this possible with Docker swarm? We are thinking each instance of the backend app when created gets an identifier which is then used to do some sort of path based routing.
HAproxy has what you need. This article explains all.
As a conclusion of the article, you may choose from two solutions:
IP source affinity to server and Application layer persistence. The latter solution is stronger/better than the first but it requires cookies.
Here is an extras from the article:
IP source affinity to server
An easy way to maintain affinity between a user and a server is to use user’s IP address: this is called Source IP affinity.
There are a lot of issues doing that and I’m not going to detail them right now (TODO++: an other article to write).
The only thing you have to know is that source IP affinity is the latest method to use when you want to “stick” a user to a server.
Well, it’s true that it will solve our issue as long as the user use a single IP address or he never change his IP address during the session.
Application layer persistence
Since a web application server has to identify each users individually, to avoid serving content from a user to an other one, we may use this information, or at least try to reproduce the same behavior in the load-balancer to maintain persistence between a user and a server.
The information we’ll use is the Session Cookie, either set by the load-balancer itself or using one set up by the application server.
What is the difference between Persistence and Affinity
Affinity: this is when we use an information from a layer below the application layer to maintain a client request to a single server
Persistence: this is when we use Application layer information to stick a client to a single server
sticky session: a sticky session is a session maintained by persistence
The main advantage of the persistence over affinity is that it’s much more accurate, but sometimes, Persistence is not doable, so we must rely on affinity.
Using persistence, we mean that we’re 100% sure that a user will get redirected to a single server.
Using affinity, we mean that the user may be redirected to the same server…
Affinity configuration in HAProxy / Aloha load-balancer
The configuration below shows how to do affinity within HAProxy, based on client IP information:
frontend ft_web
bind 0.0.0.0:80
default_backend bk_web
backend bk_web
balance source
hash-type consistent # optional
server s1 192.168.10.11:80 check
server s2 192.168.10.21:80 check
Session cookie setup by the Load-Balancer
The configuration below shows how to configure HAProxy / Aloha load balancer to inject a cookie in the client browser:
frontend ft_web
bind 0.0.0.0:80
default_backend bk_web
backend bk_web
balance roundrobin
cookie SERVERID insert indirect nocache
server s1 192.168.10.11:80 check cookie s1
server s2 192.168.10.21:80 check cookie s2
I have a ruby on rails application where I generate files in the background using Sidekiq. After the file is created, I will attempt to fetch the file. The problem is that sometimes, most likely due to the load balancer, the sidekiq job performs in another of my ec2 instance so the file is created, but I can't fetch it from my current instance.
Is there a way to ensure a request (doesn't specifically have to be about Sidekiq because I feel this issue can also be applied more generically) is performed through the same ec2 instance?
Enable load balancer sticky sessions. This link is for the Classic ELB, but the other ELBs also support this feature.
Configure Sticky Sessions for Your Classic Load Balancer
This seems to have something to do with the subnet/availability zone, but I'm new to using a VPC and it's eluding me.
VPC: 10.80.0.0/16
subnet: 10.80.1.0/24 (us-east-1b)
subnet: 10.80.2.0/24 (us-east-1a)
All instances are Windows Server 2012.
I have an internet facing ELB created within my VPC (10.80.0.0/16). There is one instance added from AZ us-east-1a, which is on subnet 10.80.2.0/24. The instance is running IIS 7.5, with an app running on port 80 and /health.aspx set up for use as the ELB health check.
Internal traffic on the VPC is flowing normally (unrestricted). I can request health.aspx from this instance from another instance in us-east-1b (10.80.1.0/24). I can also copy files from one instance to another.
Outbound traffic is unrestricted. I can RDP to the instance (when connected to our VPN) and open a browser and request a web page and get it.
The ELB says the instance is healthy and I can see the requests to health.aspx in the IIS logs. Both the ELB and the instance are configured with a security group that allows 80 and 443.
But if I try to request {elb-url}/health.aspx over the open internet the request just times out. Similarly, with an elastic IP associated to the instance, a request to {elastic-ip}/health.aspx times out.
#Chris, thanks for the response...as it happens, I've already worked it out with some help from a friend. I'll post my findings here for posterity (in case anybody else was similarly confused about how ELB works).
This would be more clear with a diagram. But the summary is that in each availability zone, you need to create both a public and a private subnet. When you add availability zones to your ELB, you need to select the public subnet for the zone. This had already been done in us-east-1b before I got to this setup, and I had simply missed this nuance of ELB configuration. So for the new availability zone, I had to do this...
us-east-1c
private subnet 10.1.3.0/24 (using nat instance as default route)
public subnet 10.1.4.0/24 (using internet gateway as default route)
Then my instance goes in the private subnet as expected.
And the lynch pin of this whole thing is (drum roll....)
When I add us-east-1c to my ELB, I have to select the public subnet...10.1.4.0. Otherwise the instances will pass the health check (since the ELB can communicate with any instance within my entire VPC) but the responses from the servers cannot make it back out to the public internet.
This is what is so confusing. And I still don't fully understand it. The instance can make a request for, say, www.google.com. I can RDP to it and open a browser and get the web page. But a request from a host (like my laptop at my house) will die. strange.
PS: another note...make sure you are using enough NAT instance for your load. I think we ran into an issue where our NAT instance simply ran out of ports because too many web servers were trying to route outbound connections to 3rd party APIs through it. Quite honestly, I'm not good enough at this level of network/OS troubleshooting to be sure. But my theory is that our 8 instances of IIS were holding too many connections open to the NAT instance. We were also abusing the NIC on that micro instance. I upped us to two large instances, one per AZ and things smoothed back out. Both NAT instances are humming and we're not seeing the hung processes in IIS anymore.
Debugging this kind of issue is always a challenge. I have a few ideas to suggest based on what you have written (and generally apply to trying to solve this problem) that come from dealing with this a number of times.
Have you checked both the security groups and network ACLs? Bear in mind that all network ACLs need to be specified in both directions, as they are stateless. Also bear in mind that ELBs are a bit unique in this regard. While they are associated with your VPC, they sometimes need extra rules to ensure connectivity. In the past I have debugged this by opening all network ACLs on all ports, then removing these rules until it has stopped working in order to identify where the block was.
Security groups should be checked too. They are stateful but ensure that your load balancer has permissions to be hit from the web.
Have you checked this isn't an application configuration problem? I don't know how IIS comes out of the box but I would check it is setup to respond to all hostnames.
Check the ELB isn't an internal one, as that wouldn't be publically addressable.
You say the ELB is configured with the health check, but it's worth checking you also have the listener setup for port 80? It's in a separate tab on the dashboard and you will need this in addition to the health check for connectivity through the ELB.
Hope one of these tips is useful to you.