How to backup a flume master node config? - flume

I am using flume on a project and am looking for a way to backup the config for the master node. How would I go about doing this? I can't seem to find any info regarding backups on the user guide.
Also, I am looking to have multiple copies of the master node running so that if one fails a copy can take over. I am unaware of how to do this either - does anyone have any suggestions?

you need to change the zookeeper log dir in flume-conf.xml , the default is tmp dir which doesn't save the configuration between restarts.
see this thread
http://www.mail-archive.com/flume-user#incubator.apache.org/msg00173.html

Related

Rolling update with shared folders/files Docker

I'm looking for a way to share files/folders between Docker containers. Especially sharing a file gives me issues. I want to use Docker in production with docker-compose and use a deployment technic that gives me zero downtime (like green/blue or something else).
What I have so far, is to deploy the new source code by checking out git source first. I keep the old container running, until the new one is up. Then I stop the old one and remove it.
The problem I'm running into with shared files is that Docker doesn't lock files. So when two containers with the same application are up and writing the same file shared_database.db this causes data corruption.
Folder structure from root looks like this:
/packages (git source)
/www (git source)
/shared_database.db (file I want to share across different deployments)
/www/public/files (folder i want to share across different deployments)
I've tried:
symlinks; unfortunately Docker doesn't support symlinks
mounting shared files/folders within the docker-compose file under volumes section, but since Docker doesn't lock files this causes data corruption
If I need to make myself more clear or need to provide more info i'd be happy to. Let me know.

Using pimcore in Docker environment

We are currently rolling out Pimcore 5 in our Docker Kubernetes environment but we didn't find an appropriate answer for the following question yet:
Which folders need to be persistent?
The documentation points out that the folders /var and /web/var are used to safe logs and assets (from the admin interface). Are there any other folders that need to be persistent to keep the environment stable even after a container restart / rebuild?
Are there any problems with updates or downsides if we run a setup like this:
Git Repository for our Code Base
PHP-fpm Docker image that holds the code base (plus nginx and redis container)
Consistent Database
We would also like to share our results when we managed to come up with a good solution.
Thank you very much!
I know this question is kind of specific :)
Yes, /var and /web/var need to be on a persistent and shared filesystem.
Further hints regarding this setup are in the documentation:
https://pimcore.com/docs/master/Development_Documentation/Installation_and_Upgrade/System_Setup_and_Hosting/Cluster_Setup.html
https://pimcore.com/docs/4.6.x/Development_Documentation/Installation_and_Upgrade/System_Setup_and_Hosting/Amazon_AWS_Setup/index.html

what's the BestPractice for Docker logging?

Im using docker with my Web service.
when I deploy using Docker, loosing some logging files (nginx accesslog, service log, system log.. etc)
Cause, docker deployment system using down and up container architecures.
So I thought about this problem.
LoggingServer and serviceServer(for api) must seperate!
using these, methods..
First, Using logstash(in elk)(attaching all my logFile) .
Second, Using batch system, this batch system will moves logfiles to otherServer on every midnight.
isn't it okay?
I expect a better answer.
thanks.
There are many ways for logging which most the admin uses for containers
1 ) mount log directory to host , so even if docker goes up/down logs will be persisted on host.
2) ELK server, using logstash/filebeat for pushing logs to elastic search server with tailing option of file, so if new log contents it pushes to server.
3) if there is application logs like maven based projects, then there are many plugins which pushes logs to server
4) batch system , which is not recommended because if containers dies before mid-night then logs will be lost.

Postgres stream replication master large diff between pg_current_xlog_location and sent_location

I have postgres master/slave stream replication setup, after performing heavy write operations into master using COPY, pg_xlogs folder starts piling up WAL segment files. After checking pg_current_xlog_location and sent_location on master, and pg_last_xlog_receive_location on slave, I found out that there's a huge difference between pg_current_xlog_location and sent_location, while pg_last_xlog_receive_location on slave indicating it's catching up with sent_location.
According to postgres documentation (https://www.postgresql.org/docs/9.5/static/warm-standby.html#STREAMING-REPLICATION-MONITORING), situation like this indicates master is under heavy load. while in my case I don't have anything else running after the COPY statement is done. How should I debug this?
Another thing worth mentioning is I'm running postgres 9.5 inside docker. Network between two host machines is 2Gbit/s.
I am thinking the the archive_timeout is not set optimally. You can adjust this on how often you need to push this to slave.
WAL settings documentation:
https://www.postgresql.org/docs/current/static/continuous-archiving.html
Documentation home.
https://www.postgresql.org/

Neo4j server failed to start in openshift

I want to create a social network in django framework in Openshift then I need at least a graph db (like Neo4j)and a relational db (like Mysql). I had trouble in add Neo4j to my project because openshift has not any cartridge for it. then I decide to install it with DIY, but I don't understand the functionality of start and stop files in .openshift/action hooks.Then I doing the following steps to install neo4j on server:
1.ssh to my account:
ssh 1238716...#something-prolife.rhcloud.com
2.go in a folder that have permission to write (I go to app-root/repo/ and mkdir test in it) and download the neo4j package from here. and extract it to the test folder that I created before :
tar -xvzf neo4j-community-1.9.4-unix.tar.gz
3.and finally run the neo4j file and start it:
neo4j-community-1.9.4/bin/neo4j start
but I see these logs and can't run the neo4j:
process [3898]... waiting for server to be ready............ Failed
to start within 120 seconds.
Neo4j Server may have failed to start, please check the logs.
how can I run this database in openshift ? where I am wrong ? and where is the logs in please check the logs?
I've developed an openshift cartridge that fixes the permission issue in openshift. I had to change the class HostBoundSocketFactory and SimpleAppServer in neo4j just to bind without using the 0 port, but using an openshift available port.
You can check at: https://github.com/danielnatali/openshift-neo4j-cartridge
it's working for me.
I would also not place it in the app-root/repo but instead I would put it in app-root/data.
You also need to use the IP of the gear - I think the env. variable is something like OPENSHIFT_INTERAL_IP. 127.0.0.1 is not available for binding but I think the ports should be open.
There are 2 ways neo4j can run : embedded or standalone(exposed via a rest service).
Standalone is what you are trying to do. I think the right way to setup neo4j would be by writing a cartridge for openshift, and then add the cartridge to your gear. There has been some discussion about this, but it seems that nobody has taken the time to do this. Check https://www.openshift.com/forums/openshift/neo4j-cartridge. If you decide to write your own cartridge, i might assist. Here are the docs: https://www.openshift.com/developers/download-cartridges.
The other option is running in embedded mode, which i have used. You need to set up a Java EE application(because neo4j embedded mode libraries are available only with java), and put the neo4j libraries in your project. Then, you would expose some routes, check for parameters and run your neo4j queries inside the servlets.

Resources