How to update in all replicas when hitting an endpoint - docker

I have 3 replicas of same application running in kubernetes and application exposes an endpoint. Hitting the endpoint sets a boolean variable value to true which is having a use in my application. But the issue is, when hitting the endpoint, the variable is updated only in one of the replicas. How to make the changes in all replicas just by hitting one endpoint?

You need to store your data in a shared database of some kind, not locally in memory. If all you need is a temporary flag, Redis would be a popular choice, or for more durable stuff Postgres is the gold standard. But there's a wide and wonderful world of databases out there, explore which match your use case.

Seems like you're trying to solve an issue with your app using Kubernetes.
Let me elaborate:
If you have several pods behind a service, you can't access all of them using a single request. This have been proposed here, but in my opinion - isn't best practice.
If you need to share data between your apps, you can have them communicate with each other using a cluster service.
You probably assume you can share data using Kubernetes volumes, such as gcePersistentDisk or some other sort of volume, but then again, volumes were not meant to solve such problems.
In conclusion, the way I would solve such issue, is by having the pods communicate changes with each other. Kubernetes can't solve this for you.
EDIT:
Another approach could be having a shared storage (for example a single pod containing mongoDB for example) but I would say that it's a bit of an overkill for your issue. Also because in order to communicate with this pod you would probably need in-cluster communication anyway.

Related

how do i know when should i use a stateless pod or a stateful one?

I am some kind of new to Kubernetes and Docker and I was studying the concept of statelessness and statefulness and I understand that stateless microservices don't store data on the host, whereas stateful microservices require some kind of storage on the host who serves the requests but if it's up to me I will always use a stateful one why should I ever use a stateless pod? what is the advantage of statelessness?
For a typical Kubernetes Pod, it will be managed by a higher-level controller like a Deployment. You might set the Deployment to have replicas: 3 so that if one of them fails the other two can pick up the load. On an update the existing Pods will get deleted and recreated. If there's heavy load, you can set up a HorizontalPodAutoscaler to increase that replica count for you, which will create more pods when needed.
All of this is really straightforward if your containers are stateless, and there are no consequences to kubectl delete pod.
The problem with a stateful pod is, well, the state. Kubernetes gives you some choices on where to store data, but most of them can only be used on one pod at a time; if you have multiple replicas then each generally needs its own local storage, and the application needs to know how to reconcile the multiple copies of it. (Or, if you can set up something like an NFS server, the application needs to know how to handle concurrent writes.) Operationally, you need to know how to back up and restore all of the individual little volumes that are getting created along the way.
A standard approach is to minimize the number of places where state is stored, and use network I/O from stateless applications to put things in places. The state doesn't even need to be in the cluster: if your application is running in AWS, you could have containers that principally store data in RDS hosted relational databases and Amazon's S3 object store but keep nothing locally, and you can then use normal backup and management approaches for those out-of-cluster stores.

Kubernetes With DPDK

I'm trying to figure out if Kubernetes will work for a certain use case. I understand the networking/clustering concept, and even the load balancing and how that can be used with things like nginx. However, assuming this is not deployed on a public cloud and things like ELB won't be available, could it still be used for a high-speed networking application using DPDK? For example, if we assume the cluster networking provided by k8s is only used for the control/management path, and the containers themselves handle the NIC directly with DPDK, is this something it's commonly used for?
Secondly, I understand the replication controller and petsets feature I think, but I'm not really clear on whether the intent of those features is for high availability or not. It seems that the "pod fails and the RC replaces it on a different node" isn't necessarily for HA, and there aren't really guarantees on how fast it builds a new pod. Am I incorrect?
For the second question, if the replication controller has size large than 1, it is highly available.
For example, you have an service "web-svc" in front of the replication controller "web-app", with size 3, then your request will be load balanced to one of the 3 pod:
web-svc ----> {web-app-pod1, web-app-pod2, web-app-pod3}
If some of the 3 pods fail, kubernetes will replace them with new ones.
And pet set is similar to replication controller, but used for stateful applications like database.

How to improve Kubernetes security especially inter-Pods?

TL;DR Kubernetes allows all containers to access all other containers on the entire cluster, this seems to greatly increase the security risks. How to mitigate?
Unlike Docker, where one would usually only allow network connection between containers that need to communicate (via --link), each Pod on Kubernetes can access all other Pods on that cluster.
That means that for a standard Nginx + PHP/Python + MySQL/PostgreSQL, running on Kubernetes, a compromised Nginx would be able to access the database.
People used to run all those on a single machine, but that machine would have serious periodic updates (more than containers), and SELinux/AppArmor for serious people.
One can mitigate a bit the risks by having each project (if you have various independent websites for example) run each on their own cluster, but that seems wasteful.
The current Kubernetes security seems to be very incomplete. Is there already a way to have a decent security for production?
In the not-too-distant future we will introduce controls for network policy in Kubernetes. As of today that is not integrated, but several vendors (e.g. Weave, Calico) have policy engines that can work with Kubernetes.
As #tim-hockin says, we do plan to have a way to partition the network.
But, IMO, for systems with more moving parts, (which is where Kubernetes should really shine), I think it will be better to focus on application security.
Taking your three-layer example, the PHP pod should be authorized to talk to the database, but the Nginx pod should not. So, if someone figures out a way to execute an arbitrary command in the Nginx pod, they might be able to send a request to the database Pod, but it should be rejected as not authorized.
I prefer the application-security approach because:
I don't think the --links approach will scale well to 10s of different microservices or more. It will be too hard to manage all the links.
I think as the number of devs in your org grows, you will need fine grained app-level security anyhow.
In terms of being like docker compose, it looks like docker compose currently only works on single machines, according to this page:
https://github.com/docker/compose/blob/master/SWARM.md

Kubernetes on Mesos

I Have the following setup in mind:
Kubernetes on Mesos (based on the kubernetes-mesos project) within a /16 network.
Each pod will have its own IP and I believe this will avail 64 000 pods.
The idea is to provide isolation for each app i.e. Each app gets its own mysql within the same pod - the app accesses mysql on localhost(within the pod).
If an additional service were needed, I'd use kubernetes rolling updates to add the service's container to the pod, the app will be able to access this new service on localhost as well.
Each application needs as much isolation as possible.
Are there any defects to such an implementation?
Do I have to use weave?
There's an option to specify the service-ip-range while running the kubernetes-mesos install.
One hole is how do I scale a service, is this really viable?
Is there a better way to do this? i.e. Offering isolated services
Thanks.
PS//I'm obviously a noobie at this and I'm trying to get the best possible setup running.
A common misconception is that a Pod should manage a vertical, multi-tier stack: for example a web tier + DB tier together.
It's interesting to read the Kubernetes design intent of Pods: they're for collecting 'helper' processes rather than composing a vertical stack.
To answer your questions, I'd recommend:
Define a Pod template for the web tier only. This can be scaled to any size required, using a replication controller (questions #1 and #3).
Define another Pod for MySQL.
Use the Service abstraction to locate these components.
This sort of design will work for small applications, but you're right that it'll be tough to scale up if you suddenly want two have a couple instances of a service hit the same mysql backend.
You may want to look into putting each service into a separate namespace. Then a service's DNS lookups will be scoped to its own namespace by default so that it won't find other services' resources unless it's explicitly looking for them. This would let you put mysql (and any other dependencies) in a separate pod so that the frontend could be scaled independently.

Docker and service clusters

We are looking into using Docker plus either Mesos/Marathon or Kubernetes for hosting a cluster. However, the one issue that we haven't really seen any answers for is how to allow clustered services to connect to each other correctly. All of the ones that I have seen need to know about at least one other node before they can join the cluster. Some need to know about every node. However, in Kubernetes and Mesos, there's no way to know what those IP addresses are ahead of time.
So, are there any best practices for this? If it helps, some technologies we're looking into deploying as containers are ElasticSearch, ActiveMQ, and MongoDB. There may be others.
However, the one issue that we haven't really seen any answers for is how to allow clustered services to connect to each other correctly.
I think you're talking about HA/replicated/sharded apps here.
At the moment, in kubernetes, you can accomplish this by making an api call listing all the "endpoints" of the service; that will tell you where your peers are running.
We'd eventually like to support the use case you describe in a more first-class manner.
I filed https://github.com/GoogleCloudPlatform/kubernetes/issues/3419 to maybe get something more standardized started here.
I also wanted to setup an ElasticSearch cluster using Mesos/Marathon. As the existing "solutions" either were merely undocumented, or not working/outdated, I set up my own container.
If you like, have a look at https://github.com/tobilg/docker-elasticsearch-marathon
If you have a running Marathon installation (I use v0.8.1), then setting up an ElasticSearch cluster should be a matter of a few minutes.
UPDATE:
The container now uses Elasticsearch v1.5.2 and is able to run on the latest Marathon v0.8.2.
As for Kubernetes, it currently does require kube-controllers-manager to start with --machines argument given a list of minion IPs or hostnames.
I don't see any easy way how to handle this correctly in Kubernetes now. Yes, you could make a call to the API that returns list of endpoints but you must watch for changes and take an action when endpoints change...
I would prefer to use Mesos/Marathon that is well prepared for this scenario. You should implement custom Framework for Mesos. There is already Framework for ElasticSearch prepared: http://mesos.apache.org/documentation/latest/mesos-frameworks/

Resources