How can I configure RabbitMQ to retain messages on node restart in docker swarm?
I've marked the queues as durable and I'm setting the message's delivery mode to 2. I'm mounting /var/lib/rabbitmq/mnesia to a persistent volume. I've docker exec'd to verify that rabbitmq is indeed creating files in said folder, and all seems well. Everything works in my local machine using docker-compose.
However, when the container crashes, docker swarm creates a new one, and this one seems to initialize a new Mnesia database instead of using the old one. The database's name seems to be related to the container's id. It's just a single node, I'm not configuring any clustering.
I haven't changed anything in rabbitmq.conf, except for the cluster_name, since it seemed to be related to the folder created, but that didn't solve it.
Relevant section of the docker swarm configuration:
rabbitmq:
image: rabbitmq:3.9.11-management-alpine
networks:
- default
environment:
- RABBITMQ_DEFAULT_PASS=password
- RABBITMQ_ERLANG_COOKIE=cookie
- RABBITMQ_NODENAME=rabbit
volumes:
- rabbitmq:/var/lib/rabbitmq/mnesia
- rabbitmq-conf:/etc/rabbitmq
deploy:
placement:
constraints:
- node.hostname==foomachine
Related
First, I do not know whether this issue is with Kafka or with Docker … I am a rookie regarding both topics. But I assume that it is more a Docker than a Kafka problem (in fact it will be my problem not really understanding one or the other …).
I installed Docker on a Raspberry 4 and created Docker images for Kafka and for Zookeeper; I had to create them by myself because 64-bit Raspi was not supported by any of the existing images (at least I could not find anyone). But I got them working.
Next I implemented the Kafka Streams example (Wordcount) from the Kafka documentation; it runs fine, counting the words in all the texts you push into it, keeping the numbers from all previous runs. That is somehow expected; at least it is described that way in that documentation.
So after some test runs I wanted to reset the whole thing.
I thought the easiest way to get there is to shut down the docker containers, delete the mounted folders on the host and start over.
But that does not work: the word counters are still there! Meaning the word count did not start from 0 …
Ok, next turn: not only removing the containers, but rebuild the images, too! Both, Zookeeper and Kafka, of course!
No difference! The word count from all the previous runs were retained.
Using docker system prune --volumes made no difference also …
From my limited understanding of Docker, I assumed that any runtime data is stored in the container, or in the mounted folders (volumes). So when I delete the containers and the folders on the Docker host that were mounted by the containers, I expect that any status would have gone.
Obviously not … so I missed something important here, most probably with Docker.
The docker-compose file I used:
version: '3'
services:
zookeeper:
image: tquadrat/zookeeper:latest
ports:
- "2181:2181"
- "2888:2888"
- "3888:3888"
- "8080:8080"
volumes:
- /data/zookeeper/config:/config
- /data/zookeeper/data:/data
- /data/zookeeper/datalog:/datalog
- /data/zookeeper/logs:/logs
environment:
ZOO_SERVERS: "server.1=zookeeper:2888:3888;2181"
restart: always
kafka:
image: tquadrat/kafka:latest
depends_on:
- zookeeper
ports:
- "9091:9091"
volumes:
- /data/kafka/config:/config
- /data/kafka/logs:/logs
environment:
KAFKA_LISTENERS: "INTERNAL://kafka:29091,EXTERNAL://:9091"
KAFKA_ADVERTISED_LISTENERS: "INTERNAL://kafka:29091,EXTERNAL://TCON-PI4003:9091"
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: "INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT"
KAFKA_INTER_BROKER_LISTENER_NAME: "INTERNAL"
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_DELETE_TOPIC_ENABLE: "true"
restart: always
The script file I used to clear out the mounted folders:
#!/bin/sh
set -eux
DATA="/data"
KAFKA_DATA="$DATA/kafka"
ZOOKEEPER_DATA="$DATA/zookeeper"
sudo rm -R "$KAFKA_DATA"
sudo rm -R "$ZOOKEEPER_DATA"
mkdir -p "$KAFKA_DATA/config" "$KAFKA_DATA/logs"
mkdir -p "$ZOOKEEPER_DATA/config" "$ZOOKEEPER_DATA/data" "$ZOOKEEPER_DATA/datalog" "$ZOOKEEPER_DATA/logs"
Any ideas?
Kafka Streams stores its own state under the "state.dir" config on the Host machine its running on. In Apache Kafka libraries, this is under /tmp. First check if you have overridden that property in your code.
As far as Docker goes, try without volumes first.
Using docker system prune --volumes made no difference also …
That would clean unattached volumes made with docker volume create or volumes: in Compose, not host-mounted directories.
As I assumed right from the beginning, the problem was mainly my lack of knowledge.
The behaviour I observed is not related to a magical data store for Docker that survives all attempts to kill it; it is not related to Docker at all.
I use those Docker images to run Zookeeper and the Kafka server on it. Then I switched back to my workstation machine, wrote that code (the "Wordcount" sample) that implements a Kafka Stream processor. When I started that in my IDE, it was executed on my local machine, accessing Kafka over the network.
My assumption was that any state was stored on the Kafka server, so that dumping that should reset the whole thing; as that does not work, I dumped also the Zookeeper, and as this was to no avail also, I removed nearly everything …
After some hints here I found that Kafka Streams processors maintain their own local state in a filesystem folder that is configured through state.dir (StreamsConfig.STATE_DIR_CONFIG) – see Configuring a Streams Application. This means that a Kafka Streams processor maintains its own local state independent from any Kafka server, and – as in my case when it runs on my local machine – also outside/unrelated to any Docker container …
According to the documentation, the default location should be /var/lib/kafka-streams, but this is not writeable in my environment – no idea where the Stream processor put its state instead.
After setting the configuration value state.dir for my Streams processor explicitly to a folder in my home directory, I could see that state on my disk, and after removing that, the word count started over with one.
A deeper look into the documentation for Kafka Streams revealed that I could have got the same with a call to KafkaStream.cleanup() before starting or after closing the stream processor (no removing files on the filesystem required).
(Query section below towards middle)
Cross posted at https://developer.jboss.org/message/982355
Environment:
Infinispan 9.13,
Embedded cache in a cluster with jgroups,
Single file store,
Using JGroups inside Docker services in a single docker host/daemon (Not in AWS yet).
Infinispan.xml below:
<jgroups><stack-file name="external-file" path="${path.to.jgroups.xml}"/></jgroups>
Application = 2 webapps + database
Issue:
When I deploy the 2 webapps in separate tomcats directly on a machine (not docker yet), the Infinispan manager initializing the cache (in each webapp) forms a cluster using jgroups (i.e it works). But with the exact same configuration (and same channel name in jgroups), when deploying the webapps as services in docker, they don't join the same cluster (rather they are separate and have just one member in view - logs below).
The services are docker containers from images = linux + tomcat + webapp and are launched using docker compose v3.
I have tried the instructions at https://github.com/belaban/jgroups-docker for a container containing JGroups and a couple of demos where it suggests either using --network=host mode for the docker services (this does work but we cannot do this as the config files would need to have separate ports if we scale), or passing the external_addr=docker_host_IP_address field in jgroups.xml (this is NOT working and the query is how to make this work).
Its not a timing issue as I also tried putting a significant delay in starting the 2nd service deployed in the stack, but still the 2 apps's Infinispan cluster have just one member in its view (that container itself). Calling the cacheManager.getMembers() also shows just one entry inside each app (should show 2).
Log showing just one member in first app:
org.infinispan.remoting.transport.jgroups.JGroupsTransport.receiveClusterView ISPN000094: Received new cluster view for channel CHANNEL_NAME: [FirstContainerId-6292|0] (1) [FirstContainerId-6292].
org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded ISPN000079: Channel CHANNEL_NAME local address is FirstContainerId-6292, physical addresses are [10.xx.yy.zz:7800]
Log showing just one member in second app:
org.infinispan.remoting.transport.jgroups.JGroupsTransport.receiveClusterView ISPN000094: Received new cluster view for channel CHANNEL_NAME: [SecondContainerId-3502|0] (1) [SecondContainerId-3502]
29-Apr-2018 11:47:42.357 INFO [localhost-startStop-1] org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded ISPN000079: Channel CHANNEL_NAME local address is 58cfa4b95c16-3502, physical addresses are [10.xx.yy.zz:7800]
The docker compose V3 is below and shows the overlay network:
version: "3"
services:
app1:
image: app1:version
ports:
- "fooPort1:barPort"
volumes:
- "foo:bar"
networks:
- webnet
app2:
image: app2:version
ports:
- "fooPort2:barPort"
volumes:
- "foo:bar"
networks:
- webnet
volumes:
dbdata:
networks:
webnet:
Deployed using $docker stack deploy --compose-file docker-compose.yml OurStack
The JGroups.xml has the relevant config part below:
<TCP
external_addr="${ext-addr:docker.host.ip.address}"
bind_addr="${jgroups.tcp.address:127.0.0.1}"
bind_port="${jgroups.tcp.port:7800}"
enable_diagnostics="false"
thread_naming_pattern="pl"
send_buf_size="640k"
sock_conn_timeout="300"
bundler_type="sender-sends-with-timer"
thread_pool.min_threads="${jgroups.thread_pool.min_threads:1}"
thread_pool.max_threads="${jgroups.thread_pool.max_threads:10}"
thread_pool.keep_alive_time="60000"/>
<MPING bind_addr="${jgroups.tcp.address:127.0.0.1}"
mcast_addr="${jgroups.mping.mcast_addr:228.2.4.6}"
mcast_port="${jgroups.mping.mcast_port:43366}"
ip_ttl="${jgroups.udp.ip_ttl:2}"/>
The code is similar to:
DefaultCacheManager manager = new DefaultCacheManager(jgroupsConfigFile.getAbsolutePath());
Cache someCache = new Cache(manager.getCache("SOME_CACHE").getAdvancedCache().withFlags(Flag.IGNORE_RETURN_VALUES));
Query:
How do we deploy with docker-compose (as two services in docker containers) and jgroups.xml above so that the Infinispan cache in each of the two webapps join and form a cluster - so both apps can access the same data each other read/write in the cache. Right now they connect to the same channel name and each becomes a cluster with one member, even if we point jgroups to external_addr.
Tried so far:
Putting delay in second service's startup so first has enough time to advertise.
JGroups - Running JGroups in Docker can deploy the belaban/jgroups containers as two services in a stack using docker compose and they are able to form a cluster (chat.sh inside container shows 2 member view).
Tried --net=host which works but is infeasible. Tried external_addr=docker.host.ip in jgroups.xml which is the ideal solution but its not working (the log above is from that).
Thanks! Will try to provide any specific info if required.
Apparently external_addr="${ext-addr:docker.host.ip.address}" does not resolve (or reseolves to null), so bind_addr of 127.0.0.1 is used. Is docker.host.ip.address set by you (e.g. as an env variable)?
The external_addr attribute should point to a valid IP address.
I've created a docker-compose.yml file and when trying to "up" it, I'm failing to have my RabbitMQ docker container persisting to my host volume. It's complaining about the erlang cookie file not being accessible by owner only.
Any help with this would be greatly appreciated.
EDIT
So I added the above volume binding and rabbitmq seems to be placing files into that directory when I do a docker-compose up. I then add 2 messages and I can see via the rabbitmq console that the 2 messages are sitting in the queue...but then when I perform a docker-compose down followed by a docker-compose up, expecting the 2 messages to still be there as the directory and files were created, but they aren't and the message count=0 :(.
Maybe it's trying to access some privileged user functions.
Try adding privileged: true section to your docker-compose service in yml and do docker-compose up again.
If it works and you prefer to do some privileges, only what RabbitMQ needs, replace privileged: true by capability section for adding or dropping privileges:
cap_add:
- ALL
- <WHAT_YOU_PREFER>
cap_drop:
- NET_ADMIN
- SYS_ADMIN
- <WHAT_YOU_PREFER>
For further information, please check Compose file documentation
EDIT:
In order to provide data persistency when containers fails, add volumes section to docker-compose.yml file
volumes: /your_host_dir_with_data:/destination_in_docker
I am setting up a Spring application to run using compose. The application needs to establish a connection to ActiveMQ either running locally for developers or to existing instances for staging/production.
I setup the following which is working great for local dev:
amq:
image: rmohr/activemq:latest
ports:
- "61616:61616"
- "8161:8161"
legacy-bridge:
image: myco/myservice
links:
- amq
and in the application configuration I am declaring the AMQ connection as
broker-url=tcp://amq:61616
Running docker-compose up is working great, activeMQ is fired up locally and my application constiner starts and connects to it.
Now I need to set this up for staging/production where the ActiveMQ instances are running on existing hardware within the infrastructure. My thoughts are to either use spring profiles to handle a different configurations in which case the application configuration entry for 'broker-url=tcp://amq:61616' would become something like broker-url=tcp://some.host.here:61616 or find some way to create a dns entry within my production docker-compose.yml which will point an amq dns entry to the associated staging or production queues.
What is the best approach here and if it is DNS, how to I set that up in compose?
Thanks!
Using the extra_hosts flag
First thing that comes to mind is using Compose's extra_hosts flag:
legacy-bridge:
image: myco/myservice
extra_hosts:
- "amq:1.2.3.4"
This will not create a DNS record, but an entry in the container's /etc/hosts file, effectively allowing you to continue using tcp://amq:61616 as your broker URL in your application.
Using an ambassador container
If you're not content with directly specifying the production broker's IP address and would like to leverage existing DNS records, you can use the ambassador pattern:
amq-ambassador:
image: svendowideit/ambassador
command: ["your-amq-dns-name", "61616"]
ports:
- 61616
legacy-bridge:
image: myco/myservice
links:
- "amq-ambassador:amq"
I have a rabbitMQ docker container that I started using the following command:
docker run -d --name myrabbit1 -p 15672:15672 rabbitmq:3-management
I then loggin to the management plugin and create users, vhosts, queues, etc.
I want to save all those settings so they can be loaded up again. To do that I tried committing to a new image:
docker commit myrabbit1 vbrabbit:withVhostAndQueues
I then start up my new container (after stopping the old one):
docker run -d --name vbrabbit2 -p 15672:15672 -p 5672:5672 vbrabbit:withVhostAndQueues
I expect that all the queues, vhosts, etc would be saved, but they are not.
What am I missing?
Result from docker ps -a:
I want to save all those settings so they can be loaded up again
are you needing to create a copy of the container, with the same settings?
or are you just looking to docker stop myrabbit1 and then later docker start myrabbit to run the same container, again?
TL;DR
The RabbitMQ instance within the container is looking for data in a different place. The default configuration changes the data storage/load location per container creation. Thus the OPs data existed in the created "final" image but rabbitmq wasn't loading it.
To fix statically set RABBITMQ_NODENAME which likewise might requiring adding another line to /etc/hosts for RabbitMQ to affirm the node is active.
Details
This happened to me with docker rabbit:3.8.12-management
This is caused by RabbitMQ's default configuration impacting how it does data storage. By default RabbitMQ starts a node on UNIX system with a name of rabbit#$HOSTNAME (see RABBITMQ_NODENAME on config docs). In Docker the $HOSTNAME changes per container run it defaults to the container id (e.g. something like dd84759287560).
In #jhilden's case is when the vbrabbit:withVhostAndQueues image is booted as a new container the RABBITMQ_NODENAME becomes a different value then what was used to create and store the original vhosts, user, queues, etc. And as RabbitMQ stores data inside a directory named after the RABBITMQ_NODENAME the existing data isn't loaded on boot of vbrabbit:withVhostAndQueues. As when the $HOSTNAME changes the RABBITMQ_NODENAME changes. Thus the booting RabbitMQ instance cannot find any existing data. (e.g. the existing data is there in the image but for a different RABBITMQ_NODENAME and isn't loaded).
Note: I've only looked into solving this for a local development single instance cluster. If you're using RabbitMQ docker for a production deployment you'd probably need to look into customized hostnames
To fix this issue we set a static RABBITMQ_NODENAME for the container.
In docker-compose v3 file we updated from:
# Before fix
rabbitmq:
image: "rabbitmq:$RABBITMQ_VERSION"
container_name: rabbitmq
ports:
- "5672:5672"
- "15672:15672"
- "61613:61613"
volumes:
- "./etc/rabbit-plugins:/etc/rabbitmq/enabled_plugins"
- type: volume
source: rabbitmq-data
target: /var/lib/rabbitmq
Into after fix:
rabbitmq:
image: "rabbitmq:$RABBITMQ_VERSION"
container_name: rabbitmq
# Why do we set NODENAME and extra_hosts?
#
# It codifies that we're using the same RabbitMQ instance between container rebuilds.
# If NODENAME is not set it defaults to "rabbit#$HOST" and because $HOST is dynamically
# created in docker it changes per container deployment. Why is a changing host an issue?
# Well because under the hood Rabbit stores data on a per node basis. Thus without the
# static RABBITMQ_NODENAME the directory the data is stored within changes per restart.
# Going from "rabbit#7745942c559e" to "rabbit#036834720485" the next. Okay, but why do we
# need extra_hosts? We'll Rabbit wants to resolve itself to affirm it's management UI is
# functioning post deployment and does that with an HTTP call. Thus to resolve the static
# host from RABBITMQ_NODENAME we need to add it to the containers /etc/hosts file.
environment:
RABBITMQ_NODENAME: "rabbit#staticrabbit"
extra_hosts:
- "staticrabbit:127.0.0.1"
ports:
- "5672:5672"
- "15672:15672"
- "61613:61613"
volumes:
- "./etc/rabbit-plugins:/etc/rabbitmq/enabled_plugins"
- type: volume
source: rabbitmq-data
target: /var/lib/rabbitmq