Building platform-as-a-service - docker

I have been assigned a problem statement which goes as follows:
I am building platform-as-a-service from scratch, which has pipelined execution. Here pipelined execution means that output of a service can be input into another service. The platform can offer number of services, which can be pipelined together. Output of a service can be input to multiple services.
I am new to this field so how to go about this task is not very intuitive to me.
After researching a bit, I found out that I can use Docker to deploy services in containers. So I installed Docker on Ubuntu and installed few images and run them as service (for example, MongoDB). What I am thinking of is that I need to run the services in containers, and define a way of taking input and output to these services. But how exactly do I do this using Docker containers. As an example, I want to send a query as an input to MongoDB (running as a service) and want an output, which I want to feed into another service.
Am I thinking in the right direction? If not in what direction should I be thinking of going about implementing this task?
Is there a standard way of exchanging data between services? (For example output of on service as input to another)
Is there something that Docker offers that I can leverage?
NOTE: I cannot use any high level API which does this for me. I need to implement it myself.

Related

How to deploy a kubernetes cluster on multiple physical machines in the best manner?

I recently finished a project where I created an App consisting of several docker containers. The purpose of the app was to collect some data and safe it to an databank and also allow user interactions over an simple web gui. The app was hosted on four different Raspberry Pi's and it was possible to collect data from all physicial maschines through an api. Further you could do some simple machine learning tasks like calculating anomalies in the sensor data of the Pi's.
Now I'm trying to take the next step and using kubernetes for some load balancing and remote updates. My main goal is to remote update all raspberries from my master node. Which, in theory, would be a very handy feature. Also I want to share the ressources of the Pi's within the cluster for calculations.
I read a lot about Kubernets, Minikube, K3's, Kind and all the different approaches to set up an Kubernetes cluster, but feel like I am missing "a last puzzle piece".
So from what I understood I need an approach which allows me to set up an local (because all machines are laying on my desk/ no cloud needed) multi node cluster. My master node would be (idealy) my laptop, running Ubuntu in a virtual machine. My rasberry's would be my slave/worker nodes. If I would want to update my cluster I can use the kubernetes remote update functionality.
So my question out of this would be: Does it makes sense to use several rasberries as nodes in a kubernetes cluster and to manage them from one master node (laptop) and do you have any suggestions about the way to achieve this setup.
I usally dont like those question not containing any specific code or questions by myself, but feel like an simple hint could accelerate my project noteable. If it's the wrong place please feel free to delete this question.
Best regards
You didn't mention which rpi models you are using, but I assume you are not using rpi zeros.
My main goal is to remote update all raspberries from my master node.
Assuming that by that you mean updating your applications running in kubernetes that is installed on rpi then keep reading. Otherwise ignore all I wrote, and what you probably need is ansible or other simmilar provisioning/configuration-management/application-deployment tool.
Now answering to your question:
Does it makes sense to use several rasberries as nodes in a kubernetes cluster
yes, this is why people created k3s, so such setup is possible using less resources.
and to manage them from one master node (laptop)
assuming you will be using it for learning purpouses then why not. It is possible, but just be aware that when master node goes down (e.g. when you turn off your laptop), all cluster goes down (or at least api-server communication so you wont be able to change cluster's state). Also make sure you are using bridge networking interface for your VM so it is visible in your local network as a standalone instance.
and do you have any suggestions about the way to achieve this setup.
installing k3s on all nodes would be the easiest in your case. There are plenty of resources on the internet explaining how to achieve it.
One last thing I would like to explain is the thing with updates.
Speaking of kubernetes updates you need to know that kubernetes doesn't update itself automatically. You need to explicitly update it. New k8s version is beeing released every 3 months that sometimes "breaks" things and backward compatibility is not possible (so always read changelog before updating stuff because rollbacks may not be possible unless you backed up an etcd cluster earlier).
Speaking of updating applications - To run your app all you do is send yaml files describing your application to k8s and it handles the rest. So if you want to update your app just update the tag on container image to newer version and k8s will handle the updates. Read here more about update strategies in k8s.

Rational behind appending versions as Service/Deployment name on k8s with spring cloud skipper

I am kind of new the spring cloud dataflow world and while playing around with the framework, I see that if I have a stream = 'test-steram' with 1 application called 'app'. When I deploy using skipper to kubernetes, I see that It creates pod/deployment & service on kubernetes with name as
test-stream-app-v1.
My question is why do we need to have v1 in service/deployment names on k8s? What role does it play in the overall workflow using spring cloud dataflow?
------Follow up -----------
Just wanted to confirm few points to make sure i am on right track to understand the flow
My understanding is with traditional stream (bind through kafka topics) service (object on kubernetes) do not play a significant role.
Rolling Update (Red/Black) pattern has implemented in following way in skipper and versioning in deployment/service plays a role in following way.
Let's assume that app-v1 deployment already exists and upgrade is requested. Skipper creates app-v2 deployment and
wait for it to be ready. Once ready it destroys app-v1
If my above understanding is right I have following follow up questions...
I see that skipper can deploy and package (and it do not have to be a traditional stream) to work with. Is that the longer term plan or Skipper is only intended to work spring-cloud-dataflow streams?
In case of non-tradtional stream package, where an package has multiple apps(rest microservices) in a group, how this model of versioning will work? I mean when I want to call the microservice from other microservice, I cannot possibly know or less than ideal to know the release-version of the app?
#Anand. Congrats on the 1st post!
The naming convention goes by the idea that each of the stream application is "versioned" if Skipper is used with SCDF. The version gets bumped for when, as a user, when you rolling-upgrade and rolling-downgrade the streaming-application versions or the application-specific properties either on-demand or via CI/CD automation.
It is very relevant for continuous-delivery and continuous-deployment workflows, and we provide native options in SCDF through commands such as stream update .. and stream rollback .. respectively. For any of these operations, the applications will be rolling updated in K8s, and each action will bump the number in the application name. In your example, you'd see them as test-stream-app-v1, `test-stream-app-v2, etc.
With all the historical versions in a central place (i.e., Skipper's database), you'd be able to interact with them via stream history.. and stream manifest .. commands in SCDF.
To learn more about all this, watch this demo-webinar (starts # ~41.25), and also have a look at samples in the reference guide.
I hope this helps.

Kubernetes scaling pods using custom algorithm

Our cloud application consists of 3 tightly coupled Docker containers, Nginx, Web and Mongo. Currently we run these containers on a single machine. However as our users are increasing we are looking for a solution to scale. Using Kubernetes we would form a multi container pod. If we are to replicate we need to replicate all 3 containers as a unit. Our cloud application is consumed by mobile app users. Our app can only handle approx 30000 users per Worker node and we intend to place a single pod on a single worker node. Once a mobile device is connected to worker node it must continue to only use that machine ( unique IP address )
We plan on using Kubernetes to manage the containers. Load balancing doesn't work for our use case as a mobile device needs to be tied to a single machine once assigned and each Pod works independently with its own persistent volume. However we need a way of spinning up new Pods on worker nodes if the number of users goes over 30000 and so on.
The idea is we have some sort of custom scheduler which assigns a mobile device a Worker Node ( domain/ IPaddress) depending on the number of users on that node.
Is Kubernetes a good fit for this design and how could we implement a custom pod scale algorithm.
Thanks
Piggy-Backing on the answer of Jonah Benton:
While this is technically possible - your problem is not with Kubernetes it's with your Application! Let me point you the problem:
Our cloud application consists of 3 tightly coupled Docker containers, Nginx, Web, and Mongo.
Here is your first problem: Is you can only deploy these three containers together and not independently - you cannot scale one or the other!
While MongoDB can be scaled to insane loads - if it's bundled with your web server and web application it won't be able to...
So the first step for you is to break up these three components so they can be managed independently of each other. Next:
Currently we run these containers on a single machine.
While not strictly a problem - I have serious doubt's what it would mean to scale your application and what the challenges that come with scalability!
Once a mobile device is connected to worker node it must continue to only use that machine ( unique IP address )
Now, this IS a problem. You're looking to run an application on Kubernetes but I do not think you understand the consequences of doing that: Kubernetes orchestrates your resources. This means it will move pods (by killing and recreating) between nodes (and if necessary to the same node). It does this fully autonomous (which is awesome and gives you a good night sleep) If you're relying on clients sticking to a single nodes IP, you're going to get up in the middle of the night because Kubernetes tried to correct for a node failure and moved your pod which is now gone and your users can't connect anymore. You need to leverage the load-balancing features (services) in Kubernetes. Only they are able to handle the dynamic changes that happen in Kubernetes clusters.
Using Kubernetes we would form a multi container pod.
And we have another winner - No! You're trying to treat Kubernetes as if it were your on-premise infrastructure! If you keep doing so you're going to fail and curse Kubernetes in the process!
Now that I told you some of the things you're thinking wrong - what a person would I be if I did not offer some advice on how to make this work:
In Kubernetes your three applications should not run in one pod! They should run in separate pods:
your webservers work should be done by Ingress and since you're already familiar with nginx, this is probably the ingress you are looking for!
Your web application should be a simple Deployment and be exposed to ingress through a Service
your database should be a separate deployment which you can either do manually through a statefullset or (more advanced) through an operator and also exposed to the web application trough a Service
Feel free to ask if you have any more questions!
Building a custom scheduler and running multiple schedulers at the same time is supported:
https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
That said, to the question of whether kubernetes is a good fit for this design- my answer is: not really.
K8s can be difficult to operate, with the payoff being the level of automation and resiliency that it provides out of the box for whole classes of workloads.
This workload is not one of those. In order to gain any benefit you would have to write a scheduler to handle the edge failure and error cases this application has (what happens when you lose a node for a short period of time...) in a way that makes sense for k8s. And you would have to come up to speed with normal k8s operations.
With the information provided, hard pressed to see why one would use k8s for this workload over just running docker on some VMs and scripting some of the automation.

Docker Swarm - Deploying stack with shared code base across hosts

I have a question related with the best practices for deploying applications to the production based on the docker swarm.
In order to simplify discussion related with this question/issue lets consider following scenario:
Our swarm contains:
6 servers (different hosts)
on each of these servers, we will have one service
each service will have only one task/replica docker running
Memcached1 and Memcached2 uses public images from docker hub
"Recycle data 1" and "Recycle data 2" uses custom image from private repository
"Client 1" and "Client 2" uses custom image from private repository
So at the end, for our example application, we have 6 dockers running across 6 different servers. 2 dockers are memcached, and 4 of them are clients which are communicating with memcached.
"Client 1" and "Client 2" are going to insert data in the memcached based on the some kind of rules. "Recycle data 1" and "Recycle data 2" are going to update or delete data from memcached based on some kind of rules. Simple as that.
Our applications which are communicating with memcached are custom ones, and they are written by us. The code for these application reside on github (or any other repository). What is the best way to deploy this application to the production:
Build images which will contain copied code within the image which you can use to deploy things to the swarm
Build image which will use volume where code reside outside of the image.
Having in mind that I am deploying swarm to the production for the first time, I can see a lot of issues with way number 1. Having a code incorporate to the images seems non logical to me, having in mind that in 99% of the time, the updates which are going to happen are going to be code based. This will require building image every time when you want to update the code which runs on specific docker (no matter how small that change is).
Way number 2. seems much more logical to me. But at this specific moment I am not sure is this possible? So there are a number of questions here:
What is the best approach in case where we are going to host multiple dockers which will run the same code in the background?
Is it possible on docker swarm, to have one central host,server (manager, anywhere) where we can clone our repositories and share those repositores as volumes across the docker swarm? (in our example, all 4 customer services will mount volume where we have our code hosted)
If this is possible, what is the docker-compose.yml implementation for it?
After digging more deeper and working with docker and docker swarm mode for last 3 months, these are the answers on questions above:
Answer 1: In general, you should consider your docker image as "compiled" version of your program. Your image should contain either code base, or compiled version of the program (depends which programming language you are using), and that specific image represents your version of the app. Every single time when you want to deploy your next version, you will generate the new image.
This is probably best approach for 99% of the apps which are going to be hosted with the docker (exceptions are development environments and apps where you really want to bash and control things directly from the docker container by itself).
Answer 2: It is possible but it is extremely bad approach. As mentioned in answer one, the best one is to copy the app code directly into the image and "consider" your image (running container) as "app by itself".
I was not able to wrap my head around this concept at the begging, because this concept will not allow you to simply go to the server (or where ever you are hosting your docker) and change the app and restart docker (obviously because container will be at the same beginning again after restart using the same image, same base of code you deployed with that image). Any kind of change SHOULD and NEEDS to be deployed as different image with different version. That is what docker is all about.
Additionally, initial idea for sharing same code base across multiple swarm services is possible, but it totally ruins purpose of the versioning across docker swarm.
Consider having 3 services which are used as redundant services (failover), and you want to use new version on one of them as beta test. This will not be possible with the shared code base.

docker-compose per microservice in local and production enironments?

I want to be able to develop microservices locally, but also to 'push' them into production with minimal configurational changes. I used to put all microservices into one docker-compose locally; but I start to see this might no be the practical.
The new idea is to have single docker-compose per service. This does not means it will run with only one container; it might have more inside (like some datastore behind etc).
From that new point of view, let's take a look at the well-known docker voting app example, that consist of 5 components:
(P) Python webapp which lets you vote between two options
(R) Redis queue which collects new votes
(J) Java worker which consumes votes and stores them in…
(S) Postgres database backed by a Docker volume
(N) Node.js webapp which shows the results of the voting in real time
Let's say you want to push this example into production (so having just one docker-compose is not an option:). Not forget that more infrastructure-related components may be added on top of it (like kibana, prometheus...). And we want to be able to scale what we need; and we use e.g. swarm.
The question is:
How to organize this example: in single docker-composes or many?
What microservices do we have here? In other words, which components would you combine into single docker-compose? Example: J and S?
If services are not in single docker-compose, do we add them to same overlay network to use swarm dns feature?
and so on...
(I don't need details on how to install stuff, this question is about top-level organization)
Docker Compose is mostly for defining different container, configure and using a single command make them available (also for sequencing). So it is best suited for local development, integration testing and use it as part of your Continuous Integration process.
While not ruling out Docker compose can be used in production environment, I think it would be a good case of using Kubernetes which gives more control over scaling, managing multiple containers.
This blog has some example scenarios to try out (and many other resources which can be helpful)
https://renzedevries.wordpress.com/2016/05/31/deploying-a-docker-container-to-kubernetes-on-amazon-aws/comment-page-1/#comment-10

Resources