Spring cloud data flow wirh - spring-cloud-dataflow

Docker Compose Customization - as per the reference guide if we point to mysql in the dockercompose.yml, will that start the mysql data base process, along with other processes kafka, zookeeper, and dataflowserver, or do we need to first manually start the the mysql database process separately before docker-compose up command.

Changing the docker-compose.yml file to point to mysql configuration, does indeed start a
springdataflow_mysql_1 container process.
Creation of streams, and deployment persists these definitions to the STREAM_DEFINITIONS TABLE AND STREAM_DEPLOYMENTS respectively under the DATAFLOW database.

Glad you got it working! You can customize to swap the DB or Message Broker of your choice. The promise of docker-compose is to bring up the described components in an order and there's simple logic (via depends_on) that waits for all the middleware components to start. We describe the customization here.
Otherwise, the autoconfiguration will kick-in to configure the environment for the desired database as far as the right driver is in the classpath of SCDF - see supported databases. And yes, we already ship the open-source MariaDB driver, so it works just fine with MySQL.

Related

Good habits in software development - what exactly are backing services, in a general definition?

In a documentation called "The twelve factor app", I saw that the 4th factor was "backing services" and what I understood from it was that an ideal application must not differentiate local services from external services and this means that every single service must be an external service, accessible by an URL.
I also took a look at the fundamentals behind Docker and my main misunderstanding is: If I have an fully managed app at the same machine or VM, with microsservices architecture using Docker, where each container in isolation does what it is responsible for doing, is this app considered in 4th factor?
In other words, is container isolation considered a backing service, or it's not enough and, to be considered a backing service, the service must be in another machine outside localhost and accessible via TCP/IP?
The important section from The Twelve-Factor App: Backing Services is this (emphasis mine):
To the app, both [local and third-party services] are attached resources, accessed via [...] locator/credentials stored in the config.
That is, the important part is not that that the service is "external" or that it specifically has a URL, but that you can change the location of the database at deploy time. The example in the page is relevant: you could run a PostgreSQL database outside a container on the same host you're developing on, or in an adjacent Compose-managed container, or a Kubernetes StatefulSet+Service, or use a hosted database like Amazon RDS, but you should not need to change code to make this difference.
Continuing with the example of a PostgreSQL database, the standard client libraries support an environment variable $PGHOST that specifies the database host name (also see the Config page, environment variables tend to be easier to configure in container environments). So you're following this practice with a Compose setup like
version: '3.8'
services:
database:
image: postgres:14
application:
build: .
environment:
- PGHOST=database # <-- database host name as environment variable
Since this is configuration, and an environment variable, without changing your code you could run the same application outside a container pointing at an RDS database
export PGHOST=database.012345678901.us-east-1.rds.amazonaws.com
./myapp
What doesn't follow this pattern? There are fairly routine questions that embed the database location directly in their code (and that's frequently localhost) and then try to massage the network environment to try to match their hard-coded developer setup (frequently by disabling Docker networking with network_mode: host). This won't work in clustered environments like Kubernetes or if the database isn't in a container at all.
I keep harping on a database as an example here because databases are special: where containers can usually just be deleted and recreated, databases in particular have the actual data, they need to be backed up, and tasks like migrations have specific life cycles. The database in particular is often I/O bound and can benefit from dedicated hardware under load. It can be a good practice to run a database on bare metal or to use a hosted database solution, and then to run a cluster of completely stateless containers that call out to that external database.

How to configure SCDF Skipper to use pre-existing docker instance?

I'm currently evaluating the usage of Spring Cloud Data Flow for our infrastructure. We already use RabbitMQ and Kubernetes so that would be our target environment.
For local testing purposes I use dockerized MySQL and RabbitMQ and I want SCDF-Skipper to deploy the Stream-Services to my local docker instance so they can use the pre-existing MySQL and RabbitMQ-Containers (and I can manage and monitor everything in one single docker instance).
My first approach was to use Skipper and Dataflow Server from docker-compose but since I failed deploying something, I switched to use the jars following this tutorial:
https://dataflow.spring.io/docs/installation/local/manual/
By now, deployment of the stream works but fails to connect to my preexisting, dockerized MySQL. That is because by default SCDF Skipper seems to deploy to an internal Docker-Instance.
So my question is:
Is there any way to configure SCDF Skipper to use the Docker-Instance on my local machine as deployment-target?
After another iteration of research, I stumbled upon
https://dataflow.spring.io/docs/installation/local/docker/#docker-stream--task-applications
Apparently, to use Skipper and Dataflow-Server from within Docker (DooD, Docker-out-of-Docker), you have to add another docker-compose.yml.
That does NOT solve how to use a pre-existing docker-instance when running Skipper locally from jar, but at least it enables me to run them as a container on a pre-existing docker and thus lets it use it as deployment-target.

Need of an cyclic depends_on in docker compose

I have the following problem. I want to use this docker-compose file, since it takes over the set-up after the matomo start. I want to use it during the development and need some data after the container start in the mariadb. I found the table where I have to insert a sql script which is already written. Now my problem
I need data in the mariadb, therefore I could use the docker-entrypoint-initdb.d. Unfortunately, at this time there are no tables, since matomo which insert the table structure waits until the db is running. The matomo container seems to have no such entrypoint which I could use.
Thus I have more or less a matomo depends_on mariadb and mariadb depends_on matomo.
I have the following question: Are there better ways than write my own image where I adapt the start-up.sh to check my own entrypoint to insert a sql script? As mentioned it is only for the development, I want to keep it simple.
Thanks in advance
Matthias
So we tried out some stuff.
First of all, we used an basic instanc of Matomo and MariaDb and hoped, that the configuration during the first steps has to be done one time. If this would be the case, we would make an database dump and insert it in the MariaDb during start, since there is an endpoint available. Unfortunately does Matomo need the IP of MariaDb and this IP is not the same as the localhost, it depends on the docker container, which changes every start up. Thus this approach was also not successful.
After this we found out, that bitnami changed there docker image in the way I planned it, a few days after I downloaded it. They added exactely what I needed in the post-init shell script.
Now I use the endpoint and everything is working.

Seeding sclorg/mariadb container

I'm using the SoftwareCollections MariaDB container and I can't seem to find a way to initialize the database with some users and data.
The official mariadb container provides the very handy /docker-entrypoint-initdb.d directory. The container runs all .sql and .sql.gz files at database initialization, but this type of functionality seems to be missing from the software collections image.
Why was this functionality not included with software collections? Is it included and I'm just not looking in the right place?
Typically docker containers allows to setup single admin user and password. You can use this later to connect and seed any data you need.
This can be done on application level by tools like liquibase or just by kubernetes job depending on your use case.

Docker compose: possible to model lazy startup of dependent services?

I posted this question originally on the Docker forums, but didn't get any response there.
I'm wondering what the best way would be to model a set of services let's call them db, web, and batch. db is simply a running database server instance (think MySQL). web is a web application that needs to connect to the database. batch is a batch application that needs to connect to that same database (it can/will run in parallel with web). db needs to be running, for either web or batch to run. But web or batch can be run independently of each other (one or both can be running at the same time). If both are running at once, they need to be talking to the same database instance (so db is actually using volumes_from a separate data volume container). So if the use case was simpler (think just db and web, which always run together), then they would simply both be defined as services in the same compose file, with web having a link to db.
As far as I understand it, these can't all be defined in the same Docker compose configuration. Instead, I would need three different configurations. One for db, which is launched first, one for web (which uses external_links to find db), and a third for batch (which also uses external_links to find db). Is that correct, or is there some mechanism available that I'm not considering? Assuming a multi-configuration setup is needed, is there a way to "lazily" initialize the db composition if it's not running, when either the web or batch compositions are launched?
If web has a link defined to db in a docker-compose file, db will always start first.
As far as I know, Docker will never know when the database will be up. It will be your web container's responsibility to properly start and retry until the base is up (with a timeout).
For your batch service, assuming that you don't want to start it everytime you start your web and db containers (using a docker-compose up or run), you can try extending your service. See the docs for more informations on this.
Either you applications in the web and batch images known how to handle database down time and are able to wait for the db service to come up and auto-reconnect ; either you have to make a shell script that will be run when the docker container is started to wait for the db to be available before starting the app.
Depending on the docker images you are using for the web and batch services, you would have to override CMD, ENTRYPOINT or both.
This question has examples of shell script which waits for a MySQL service to be up.
And here are other technics for testing if a network port is opened.

Resources