Spring Cloud Dataflow reducing cost of streams

Spring Cloud Dataflow reducing cost of streams - spring-cloud-dataflow

I'm using Spring Cloud Dataflow local server and deploying 60+ streams with a Kafka topic and custom sink. The memory/cpu usage cost is not currently scalable. I've set the Xmx to 64m for most streams.
Currently exploring my options.
Disable embedded Tomcat server. I tried this once and SCDF couldn't tell the deployment status of the stream.
Group multiple Kafka "source" topics to a single sink app. This is allowed by Kafka but unclear if SCDF will permit subscribing to multiple topics.
Switch to using Kubernetes deployer. Won't exactly reduce the memory/cpu usage but distribute it across multiple machines. Haven't pursued this option because Kubernetes isn't used in my org yet. Maybe this will force the issue.
Open to other ideas. Might also be able to tweak Kafka configs such as max.poll.records and reduce memory usage.
Thanks!

First, I'd like to clarify the differences between SCDF and Stream/Task apps in the data pipeline.
SCDF is a lightweight Spring Boot app that includes the DSL, REST-APIs, and the Dashboard. Simply put, it serves as the orchestrator to define and deploy stream and task/batch data pipelines made of stream and task applications respectively.
The actual business logic, its performance, and the underlying resource consumption are at the individual Stream/Task application level. SCDF doesn't interfere with the app's operation, nor it contributes to the resource load. Everything, in the end, is standalone Boot apps - standalone Java processes.
Now, to your exploratory steps.
Disable embedded Tomcat server. I tried this once and SCDF couldn't tell the deployment status of the stream.
SCDF is a REST server and it requires the application container (in this case Tomcat), you cannot disable it.
Group multiple Kafka "source" topics to a single sink app. This is allowed by Kafka but unclear if SCDF will permit subscribing to multiple topics.
Again, there is no relation between SCDF and the apps. SCDF orchestrates full-blown Stream/Task (aka: Boot apps) into coherent data pipeline. If you have to produce or consumer to/from multiple Kafka topics, it is done at application level. Checkout the multi-io sample for more details.
There's the facility to consume from multiple topics directly via named-destination, too. SCDF provides a DSL/UI capability to build fan-in and fan-out pipelines. Refer to docs for more details. This video could be useful, too.
Switch to using Kubernetes deployer.
SCDF's Local-server is generally recommended for development. Primarily because there's no resiliency baked into the Local-server implementation. For example, if the streaming apps crash for any reason, there's no mechanism to restart them automatically. This is exactly why we recommend either SCDF's Kubernetes or Cloud Foundry server implementations in production. The platform provides the resiliency and fault-tolerance by automatically restarting the apps under fault scenarios.
From resourcing standpoint, once again, it depends on each application. They are standalone microservice application doing a specific operation at runtime, and it is up to how much resources the business logic requires.

Related

How to collect messages (total number and size) between microservices?

I have a microservices based software architecture.
There is a php application which orchestrates the communication among microservices and the application's whole logic.
I need to simulate the communication between microservices as a graph.
There will be edges with weights , which will represent the affinities between microservices.
I am searching for a tool in order to collect all messages and their size.
I have read that there are distibuted tracing systems like Zipkin which i have already deployed, and could accomplish this task.
But, i cannot find how to collect the messages i want.
This is the php library i used for the instrumentation of my app
[https://github.com/openzipkin/zipkin-php]
Any ideas about other tools or how to use Zipkin differently to achieve my goal?

Let me add to this thread my three bits. Speaking of Envoy, yes, when attached to your application it adds a lot of useful features from observability bucket, e.g. network level statistics and tracing.
Here is the question, have you considered running your legacy apps inside service mesh, like Istio ?.
Istio simplifies deployment and configuration of Envoy for you. It injects sidecar container (istio-proxy, in fact Envoy instance) to your Pod application, and gives you these extra features like a set of service metrics out of the box*.
Example: Stats produced by Envoy in Prometheus format, like istio_request_bytes are visualized in Kiali Metrics dashboard for inbound traffic as request_size (check screenshot)
*as mentioned by #David Kruk, you still needs to have Prometheus server deployed in your cluster to be able to pull these metrics to Kiali dashboards.
You can learn more about Istio here. There is also a dedicated section on how to visualize metrics collected by Istio (e.g. request size).

Question regarding Monolithic vs. Microservice Architecture

I'm currently rethinking an architecture I was planning.
So suppose I have a system where there are about 8 different services interacting with a single database. Some services listen and react to database events and do stuff like sending SMS.
Then there's an API layer sitting on top of the database and a frontend connected to this API. So in my understanding this is rather monolithic.
In fact I don't see any advantage of using containers in this scenario. Their real advantage is that they can be swapped out, right? My intuition tells me that there is often no purpose in doing that except maybe some load balancing on API level. Instead many companies just seem to blindly jump on the hype train of containerizing everything.
Now the question arises, is docker the right tool for this context? In each forum people refrain from using docker for the sole purpose of a more resource efficient "VM" aggregating all services within a single container. However this is the only real scenario I'd see any advantages in using docker (the environment, e.g. alpine-linux, is the same on all customer's computers when rolling out the system).
Even docker-compose is not "grouping" containers together as a complete system only exposing port 443 but instead starts an infrastructure of multiple interacting containers. Oftentimes services like Kubernetes are then used for deploying these infrastructures on "nodes", i.e. VMs.
However, in my opinion it would be great to have a single self-contained container without putting them into a VM. This container would include every necessary service only exposing one port, e.g. 443.
Since I'm rather confused now, I'd really appreciate your help here.
Thanks in advance!

Kubernetes does many things and has many useful features. But Kubernetes also require that you architect your apps to follow The Twelve-Factor App principles. An important thing here is that your apps are stateless.
When the app is stateless, it is easy to scale out horizontally - this can also be done automatically when the load increases.
When the app is stateless, it is easy to do Rolling Deployments that upgrade the app to a new version without downtime.
You can run containers on bare metal Linux servers, but this is mostly very big servers. If you use a cloud, you probably want more VM instances, but distributed to 3 Availability Zones - for increased availability.
"Self-contained container - exposing one port". With Kubernetes, you typically use a private network and you only expose services via a single load balancer - typically on a port, but different URLs send traffic to different services.
Some services listen and react to database events and do stuff like sending SMS.
As I said, many things is easier when it is horizontal scalable, but this kind of app - that listen for events and react - is one of few examples where you can not scale horizontally. But it is a good fit for a serverless architecture instead, possibly on Kubernetes using Knative.
Now the question arises, is docker the right tool for this context?
My opinion is that most workload will run in containers. It is more a question about how it should be run in Kubernetes - one or multiple replicas. As stateless Deployments or stateful StatefulSet or some other way.

data flow server and skippers role

Does data-flow-server and skipper have any active role after the deployment of the stream applications (other than maintaining state). Asked differently, say for example if I have an http source and amqp sink , does any of the traffic from http to amqp go through the data-flow-server or skipper?

Neither SCDF nor Skipper interact with or contribute towards streaming or batch processing. They are both responsible for the designing and the deployment of data pipelines made of streaming/batch applications. To do the designing and the deployment part, we provide a variety of tools, including UI, CLI, RESTful APIs, and Java DSL.
To say this differently, if you deploy a stream or a task data pipeline using SCDF, the applications involved in the data pipeline are solely responsible for the data processing—no data will ever go through SCDF or Skipper. The applications are independently addressing the use-case requirement. SCDF (and Skipper) will help with central monitoring, management, and continuous deliveries of the applications.
You can read more about the responsibilities from the architecture section in docs.

Performance testing of Dockerized application hosted on Kubernetes

Our project involves containerisation of services / application and later they will be deployed on Kuberentes. My job is to do performance testing using Jmeter after the services are hosted on Kubernetes.
I am relatively new to Performance testing and have basic experience on Jmeter that I gained from working on it. I have understood how the app is load / perf tested using basic URLs or APIs but I want to know how I should go about handling performance testing for Docker containers hosted on Kubernetes.
How could I handle the above scenario?

JMeter doesn't know anything about the underlying technologies used at the backend, it just sends requests via Samplers, waits for responses and measures the elapsed time of the request and some other performance metrics. Later on you can generate HTML Reporting Dashboard to visualize the results
So your goal is to:
Identify the business use cases you need to implement for the performance testing
Identify network protocols which are being used under the hood of these business use cases
Create a JMeter Test Plan to precisely mimic the real user (or other application) accessing your system and doing what it supposed to be doing

Rational behind appending versions as Service/Deployment name on k8s with spring cloud skipper

I am kind of new the spring cloud dataflow world and while playing around with the framework, I see that if I have a stream = 'test-steram' with 1 application called 'app'. When I deploy using skipper to kubernetes, I see that It creates pod/deployment & service on kubernetes with name as
test-stream-app-v1.
My question is why do we need to have v1 in service/deployment names on k8s? What role does it play in the overall workflow using spring cloud dataflow?
------Follow up -----------
Just wanted to confirm few points to make sure i am on right track to understand the flow
My understanding is with traditional stream (bind through kafka topics) service (object on kubernetes) do not play a significant role.
Rolling Update (Red/Black) pattern has implemented in following way in skipper and versioning in deployment/service plays a role in following way.
Let's assume that app-v1 deployment already exists and upgrade is requested. Skipper creates app-v2 deployment and
wait for it to be ready. Once ready it destroys app-v1
If my above understanding is right I have following follow up questions...
I see that skipper can deploy and package (and it do not have to be a traditional stream) to work with. Is that the longer term plan or Skipper is only intended to work spring-cloud-dataflow streams?
In case of non-tradtional stream package, where an package has multiple apps(rest microservices) in a group, how this model of versioning will work? I mean when I want to call the microservice from other microservice, I cannot possibly know or less than ideal to know the release-version of the app?

#Anand. Congrats on the 1st post!
The naming convention goes by the idea that each of the stream application is "versioned" if Skipper is used with SCDF. The version gets bumped for when, as a user, when you rolling-upgrade and rolling-downgrade the streaming-application versions or the application-specific properties either on-demand or via CI/CD automation.
It is very relevant for continuous-delivery and continuous-deployment workflows, and we provide native options in SCDF through commands such as stream update .. and stream rollback .. respectively. For any of these operations, the applications will be rolling updated in K8s, and each action will bump the number in the application name. In your example, you'd see them as test-stream-app-v1, `test-stream-app-v2, etc.
With all the historical versions in a central place (i.e., Skipper's database), you'd be able to interact with them via stream history.. and stream manifest .. commands in SCDF.
To learn more about all this, watch this demo-webinar (starts # ~41.25), and also have a look at samples in the reference guide.
I hope this helps.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart