I have 3 docker MySQL container running and I am trying to replicate the data in one container to another.
if I create a database in one mysql-container the other mysql-container should get updated with the the changes I made.
I tried and I did not find a way to do that.
How to make this happen ?
You need to setup an standard replication between mysql containers.
You cannot use something like volume to share mysql data at file level because the mysql's will mess the things up and corrupt data for sure.
You can make a replication as Master-Slave scheme or Master-Master (according to your needs).
Refer to the docs to have detailed information.
I think #Robert is absolutely right in that you can probably start with a simple master-slave or master-master schema.
Since you mentioned Docker, I think the AutoPilot Pattern is a good repository to start. They have a solution for MySQL (replication, backups, failover, ...). Here is where you can find it
https://github.com/autopilotpattern/mysql
Related
My goal is to use Prometheus to monitor several microservices that can replicate or migrate.
I've seen a video of 2017 that says the only solution was through DNS A records, but it does not solve cross-Docker-network pulling, making the migration between different networks impossible right?
I would like you if you know if there is already a solution for this? (I didn't find). If not, is "file_sd_config" the best way to go? With this I'm able to change the file while having prometheus to read the updated file periodically.
Thanks!
I want to take a holistic approach backing up multiple machines running multiple Docker containers. Some might run, for example, Postgres databases. I want to back up this system, without having to have specific backup commands for different types of volumes.
It is fine to have a custom external script that sends e.g. signals to containers or runs Docker commands, but I strongly want to avoid anything specific to a certain image or type of image. In the example of Postgres, the documentation suggests running postgres-specific commands to backup databases, which goes against the design goals for the backup solution I am trying to create.
It is OK if I have to impose restrictions on the Docker images, as long as it is reasonably easy to implement by starting from existing Docker images and extending.
Any thoughts on how to solve this?
I just want to stress that I am not looking for a solution for how to back up Postgres databases under Docker, there are already many answers explaining how to do so. I am specifically looking for a way to back up any volume, without having to know what it is or having to run specific commands for its data.
(I considered whether this question belonged on SO or Serverfault, but I believe this is a problem to be solved by developers, hence it belongs here. Happy to move it if consensus is otherwise)
EDIT: To clarify, I want do something similar to what is explained in this question
How to deal with persistent storage (e.g. databases) in docker
but using the approach in the accepted answer is not going to work with Postgres (and I am sure other database containers) according to documentation.
I'm skeptical that there is a custom solution, holistic, multi machine, multi container, application/container agnostic approach. From my point of view there is a lot of orchestration activities necessary in the first place. And I wonder if you wouldn't use something like Kubernetes anyways that - supposedly - comes with its own backup solution.
For single machine, multi container setup I suggest to store your container's data, configuration, and eventual build scripts within one directory tree (e.g. /docker/) and use a standard file based backup program to backup the root directory.
Use docker-compose to managed your containers. This lets you store the configuration and even build options in a file(s). I have an individual compose file for each service, but a single one would also work.
Have a subdirectory for each service. Mount bind-mount directories aka volumes of the container there. If you need to adapt the build process more thoroughly you can easily store scripts, sources, Dockerfiles, etc. in there as well.
Since containers are supposed to be ephemeral, all persistent data should be in bind-mount and therefore in the main docker directory.
Can we share a common/single named volume across multiple hosts in docker engine swarm mode, what's the easiest way to do it ?
If you have an NFS server setup you can use use some nfs folder as a volume from docker compose like this:
volumes:
grafana:
driver: local
driver_opts:
type: nfs
o: addr=192.168.xxx.xx,rw
device: ":/PathOnServer"
In the grand scheme of things
The other answers are definitely correct. If you feel like you're still missing something or are coming to the conclusion that things might never really improve in this space, then you might want to reconsider the use of the typical POSIX-like hierarchical filesystem abstraction. Not all applications really need it (I might go as far as to say that few do). Maybe yours doesn't either.
In defense of filesystems
It is still very common in many circles, but usually these people know their remote/distributed filesystems very well and know how to set them up and leverage them properly (and they might be very good systems too, though often not with existing Docker volume drivers). Sometimes it's also in part because they're simply forced to (codebases that can't or shouldn't be rewritten to support other storage backends). Using, configuring or even writing arbitrary Docker volume drivers would be a secondary concern only.
Alternatives
If you have the option however, then evaluate other persistence solutions for your applications. Many implementations won't use POSIX filesystem interfaces but network interfaces instead, which pose no particular infrastructure-level difficulties in clusters such as Docker Swarm.
Solutions managed by third-parties (e.g. cloud providers)
Should you succeed in removing all dependencies to filesystems for persistent and shared data (it's still fine for transient local state), then you might claim to have fully "stateless" applications. Of course there is often always state persisted somewhere still, but the idea is that you don't handle it yourself. Many cloud providers (if that's where you're hosting things) will offer fully managed solutions for handling persistent state such that you don't have to care about it at all. If you're going this route, do consider managed services that use APIs compatible with implementations that you can use locally for testing (for example by running a Docker container based on an image for that implementation that is provided by a third-party or that you can maintain yourself).
DIY solutions
If you do want to manage persistent state yourself within a Docker Swarm cluster, then the filesystem abstraction is often inevitable (and you'd probably have more difficulties targeting block devices directly anyway). You'll want to play with node and service constraints to ensure the requirements of whatever you use to persist data are fulfilled. For certain things like a central DBMS server it could be easy ("always run the task on that specific node only"), for others it could be way more involved.
The task of setting up, scaling and monitoring such a setup is definitely not trivial, which is why many application developers are happy to let somebody else (e.g. cloud providers) do it. It's still a very cool space to explore however, though given you had to ask that question it's likely not something you should focus on if you're on a deadline.
Conclusion
As always, use the right abstraction for the job, and pause to think about what your strengths are and where to spend your resources.
From scratch, Docker does not support this by itself. You must use additional components either a docker plugin which would provide you with a new layer type for your volumes, or a sync tool directly on your FS which will sync the data for you.
From my point of view, the easiest solution is rsync or more accurately lsyncdn the daemon version of rsync. But I never tried it for docker volumes, so I can't tell if it handle it fine.
Other solutions are offered using Infinit.sh. It basically does the same thing as lsyncd does. It's a one way sync. So if your docker container are RW in their volumes it won't match your expectations. I tried this solution, and it works pretty well for RO operations. And not in production. It's still an alpha version. Infinit is also on the way to provide a docker driver. Not released yet. So I didn't even tried it. Too risky.
Other solutions I found but was unable to install (and so to try) are flocker and glusterFS. Both are designed to create FS Volume based on several HDD from several machines. But none of their repositories were working these past weeks.
Sorry for giving you only weak solutions, but I'm facing the same problem and haven't find yet a perfect solution.
Cheers,
Olivier
I'm currently using RethinkDB across cloud servers by manually joining each server at setup. I'm interested in moving over to a Swarm approach to make scaling and failover easier. The current approach is cumbersome to scale.
In the current manual approach, I simply create a local folder on each server for RDB and mount as a volume to store its data. However, using a Swarm means that I'd need to handle volumes more dynamically. Each container will need a distinct volume to keep data separate in case of errors.
Any recommendations on how to handle this scenario? A lot of the tutorials I've seen so far mention Flocker to manage persistent storage, but I can't see that being handled dynamically.
Currently I am struggling with a situation like this. I've created a temporary fix with GlusterFS.
What you do is install GlusterFS on all the Docker nodes and mount the folders. This way the data exists on all the nodes. But this is less than ideal if you have a lot of writes. This could be slow because of the way Gluster treats your data replication to prevent data loss. It is solid, but I have some issues with the speed.
In your case I would suggest looking into Flocker. Flocker is a volume plugin that migrates your data when a container moves to another host. I haven't had any experience with it, but in my case the concept of Flocker renders useless, I need my data in multiple containers on multiple hosts (Read only) This is where Gluster came into play
We are looking into using Docker plus either Mesos/Marathon or Kubernetes for hosting a cluster. However, the one issue that we haven't really seen any answers for is how to allow clustered services to connect to each other correctly. All of the ones that I have seen need to know about at least one other node before they can join the cluster. Some need to know about every node. However, in Kubernetes and Mesos, there's no way to know what those IP addresses are ahead of time.
So, are there any best practices for this? If it helps, some technologies we're looking into deploying as containers are ElasticSearch, ActiveMQ, and MongoDB. There may be others.
However, the one issue that we haven't really seen any answers for is how to allow clustered services to connect to each other correctly.
I think you're talking about HA/replicated/sharded apps here.
At the moment, in kubernetes, you can accomplish this by making an api call listing all the "endpoints" of the service; that will tell you where your peers are running.
We'd eventually like to support the use case you describe in a more first-class manner.
I filed https://github.com/GoogleCloudPlatform/kubernetes/issues/3419 to maybe get something more standardized started here.
I also wanted to setup an ElasticSearch cluster using Mesos/Marathon. As the existing "solutions" either were merely undocumented, or not working/outdated, I set up my own container.
If you like, have a look at https://github.com/tobilg/docker-elasticsearch-marathon
If you have a running Marathon installation (I use v0.8.1), then setting up an ElasticSearch cluster should be a matter of a few minutes.
UPDATE:
The container now uses Elasticsearch v1.5.2 and is able to run on the latest Marathon v0.8.2.
As for Kubernetes, it currently does require kube-controllers-manager to start with --machines argument given a list of minion IPs or hostnames.
I don't see any easy way how to handle this correctly in Kubernetes now. Yes, you could make a call to the API that returns list of endpoints but you must watch for changes and take an action when endpoints change...
I would prefer to use Mesos/Marathon that is well prepared for this scenario. You should implement custom Framework for Mesos. There is already Framework for ElasticSearch prepared: http://mesos.apache.org/documentation/latest/mesos-frameworks/