I'm using the SoftwareCollections MariaDB container and I can't seem to find a way to initialize the database with some users and data.
The official mariadb container provides the very handy /docker-entrypoint-initdb.d directory. The container runs all .sql and .sql.gz files at database initialization, but this type of functionality seems to be missing from the software collections image.
Why was this functionality not included with software collections? Is it included and I'm just not looking in the right place?
Typically docker containers allows to setup single admin user and password. You can use this later to connect and seed any data you need.
This can be done on application level by tools like liquibase or just by kubernetes job depending on your use case.
Related
Can somebody explain it with some examples? Why multi-container docker apps are built? while you can contain your app in a single docker container.
When you make a multi-container app you have to do networking. Is not it easy to run a single image of a single container rather than two images of two containers?
There are several good reasons for this:
It's easier to reuse prebuilt images. If you need MySQL, or Redis, or an Nginx reverse proxy, these all exist as standard images on Docker Hub, and you can just include them in a multi-container Docker Compose setup. If you tried to put them into a single image, you'd have to install and configure them yourself.
The Docker tooling is built for single-purpose containers. If you want the logs of a multi-process container, docker logs will generally print out the supervisord logs, which aren't what you want; if you want to restart one of those containers, the docker stop; docker rm; docker run sequence will delete the whole thing. Instead with a multi-process container you need to use debugging tools like docker exec to do anything, which is harder to manage.
You can upgrade one part without affecting the rest. Upgrading the code in a container usually involves building a new image, stopping and deleting the old container, and running a new container from the new image. The "deleting the old container" part is important, and routine; if you need to delete your database to upgrade your application, you risk losing data.
You can scale one part without affecting the rest. More applicable in a cluster environment like Docker Swarm or Kubernetes. If your application is overloaded (especially in production) you'd like to run multiple copies of it, but it's hard to run multiple copies of a standard relational database. That essentially requires you to run these separately, so you can run one proxy, five application servers, and one database.
Setting up a multi-container application shouldn't be especially difficult; the easiest way is to use Docker Compose, which will deal with things like creating a network for you.
For the sake of simplification, I would say you can run only one application with a public entry point (like API) in a single container. Actually, this approach is recommended by Docker official documentation.
Microservices
Because of this single constraint, you cannot run microservices that require their own entry points in a single docker container.
It could be more a discussion on the advantages of Monolith application vs Microservices.
Database
Even if you decide to run the Monolith application only, still you need to connect some database there. As you noticed, Docker has an additional network-configuration layer, so if you want to run Database and application locally, the easiest way is to use docker-compose to run both images (Database and your Application) inside one, automatically configured network.
# Application definition
application: <your app definition>
# Database definition
database:
image: mysql:5.7
In my example, you can just connect to your DB via https://database:<port> URL from your main app (plus credentials eventually) and it will work.
Scalability
However, why we should split images for the database from the application? One word - scalability. For development purposes, you want to have your local DB, maybe with docker because it is handy. For production purposes, you will put the application image to run somewhere (Kubernetes, Docker-Swarm, Azure App Services, etc.). To handle multiple requests at the same time, you want to run multiple instances of your application. However what about the database? You cannot connect to the internal instance of DB hosted in the same container, because other instances of your app in other containers will have a completely different set of data (without synchronization).
Most often you are electing to use a separate Database server - no matter if running it on the container or fully manged databases (like Azure CosmosDB or Mongo Atlas), but with your own configuration, scaling, and synchronization dedicated for DB only. Your app just needs to worry about the proper URL to that. Most cloud providers are exposing such services out of the box, so you are not worrying about the configuration by yourself.
Easy to change
Last but not least argument is about changing the initial setup overtime. You might change the database provider, or upgrade the version of the image in the future (such things are required from time to time). When you separate images, you can modify one without touching others. It is decreasing the cost of maintenance significantly.
Also, you can add additional services very easy - different logging aggregator? No Problem, additional microservice running out-of-the-box? Easy.
Docker Compose Customization - as per the reference guide if we point to mysql in the dockercompose.yml, will that start the mysql data base process, along with other processes kafka, zookeeper, and dataflowserver, or do we need to first manually start the the mysql database process separately before docker-compose up command.
Changing the docker-compose.yml file to point to mysql configuration, does indeed start a
springdataflow_mysql_1 container process.
Creation of streams, and deployment persists these definitions to the STREAM_DEFINITIONS TABLE AND STREAM_DEPLOYMENTS respectively under the DATAFLOW database.
Glad you got it working! You can customize to swap the DB or Message Broker of your choice. The promise of docker-compose is to bring up the described components in an order and there's simple logic (via depends_on) that waits for all the middleware components to start. We describe the customization here.
Otherwise, the autoconfiguration will kick-in to configure the environment for the desired database as far as the right driver is in the classpath of SCDF - see supported databases. And yes, we already ship the open-source MariaDB driver, so it works just fine with MySQL.
I'm new to docker so I have a very simple question: Where do you put your config files?
Say you want to install mongodb. You install it but then you need to create/edit a file. I don't think they fit on github since they're used for deployment though it's not a bad place to store the files.
I was just wondering if docker had any support for storing such config files so you can add them as part of running an image.
Do you have to use swarms?
Typically you'll store the configuration files on the Docker host and then use volumes to bind mount your configuration files in the container. This allows you to separately manage the configuration file from the running containers. When you make a change to the configuration, you can just restart the container.
You can then use a configuration management tool like Salt, Puppet, or Chef to manage copying/storing the configuration file onto the Docker host. Things like passwords can be managed by the secrets capabilities of the tool. When set up this way, changing a configuration file just means you need to restart your container and not build a new image.
Yes, in most cases you definitely want to keep your Dockerfiles in version control. If your org (or you personally) use GitHub for this, that's fine, but stick them wherever your other repos are. One of the main ideas in DevOps is to treat infrastructure as code. In fact, one of the main benefits of something like a Dockerfile (or a chef cookbook, or a puppet file, etc) is that it is "used for deployment" but can also be version-controlled, meaningfully diffed, etc.
Working on a larger-than-usual project of mine, I am building an web application that will talk to several APIs of mine, each written in its own language. I use two databases, one being MariaDB and the second being Dgraph (graph database.)
Here is my local director architecture:
services - all my services
api - contains all my APIs
auth - contains my user auth/signup API
v1 - contains my current (only) API version
trial - contains my an API of mine called trial
etc...
application - contains the app users will interact with
daemon - contains my programs that will run as daemons
tools - contains tools (import data, scrapers, etc)
databases - to contain my two configs (MariaDB and Dgraph)
Because some components are written in PHP7-NGINX while others are in PYTHON-FLASK-NGINX, how can I do a proper Docker setup with that in mind? Each service, api, daemon and tool is independant and they all talk through their own REST-endpoints.
Each has its own private github repository, and I want to be able to take each one and deploy it to its own server when needed.
I am new to Docker and all the reading I do confuses me: should I create a docker-compose.yml for each service or one for the entire project? But each service is deployed separately so how does docker-compose.yml know that?
Any pointers to a clean solution? Should I create a container for each service and in that container put NGINX, PHP or PYTHON, etc?
The usual approach is to put every independent component into a separate container. General Docker idea is 1 container = 1 logical task. 1 task is not exactly 1 process, it's just the smallest independent unit.
So you would need to find 4 basic images (probably existing ones from Docker registry should fit):
PHP7-NGINX
PYTHON-FLASK-NGINX
MariaDB
Dgraph
You can use https://hub.docker.com/search/ to search for appropriate images.
Then create custom Docker file for every component (taking either PHP7-NGINX or PYTHON-FLASK-NGINX as a parent image).
You probably would not need custom Docker file for databases. Typically database images require just mounting config file into image using --volume option, or passing environment arguments (see description of base image for details).
After that, you can just write docker-compose.yml and define here how your images are linked and other parameters. That would look like https://github.com/wodby/docker4drupal/blob/master/docker-compose.yml .
By the way, github is full of good examples of docker-compose.yml
If you are going to run services on different servers, then you can create a Swarm cluster, and run your docker-compose.yml against it: https://docs.docker.com/compose/swarm/ . After that, you can scale easily by deploying as many instances of each microservice as you need (that's why it's more useful to have separate images for every microservice).
I posted this question originally on the Docker forums, but didn't get any response there.
I'm wondering what the best way would be to model a set of services let's call them db, web, and batch. db is simply a running database server instance (think MySQL). web is a web application that needs to connect to the database. batch is a batch application that needs to connect to that same database (it can/will run in parallel with web). db needs to be running, for either web or batch to run. But web or batch can be run independently of each other (one or both can be running at the same time). If both are running at once, they need to be talking to the same database instance (so db is actually using volumes_from a separate data volume container). So if the use case was simpler (think just db and web, which always run together), then they would simply both be defined as services in the same compose file, with web having a link to db.
As far as I understand it, these can't all be defined in the same Docker compose configuration. Instead, I would need three different configurations. One for db, which is launched first, one for web (which uses external_links to find db), and a third for batch (which also uses external_links to find db). Is that correct, or is there some mechanism available that I'm not considering? Assuming a multi-configuration setup is needed, is there a way to "lazily" initialize the db composition if it's not running, when either the web or batch compositions are launched?
If web has a link defined to db in a docker-compose file, db will always start first.
As far as I know, Docker will never know when the database will be up. It will be your web container's responsibility to properly start and retry until the base is up (with a timeout).
For your batch service, assuming that you don't want to start it everytime you start your web and db containers (using a docker-compose up or run), you can try extending your service. See the docs for more informations on this.
Either you applications in the web and batch images known how to handle database down time and are able to wait for the db service to come up and auto-reconnect ; either you have to make a shell script that will be run when the docker container is started to wait for the db to be available before starting the app.
Depending on the docker images you are using for the web and batch services, you would have to override CMD, ENTRYPOINT or both.
This question has examples of shell script which waits for a MySQL service to be up.
And here are other technics for testing if a network port is opened.