Unable to pull logs from Airflow Worker - docker

I've got a simple docker development setup for Airflow that includes separate containers for the Airflow UI and Worker. I'm encountering a 403 Forbidden error whenever I attempt to view the log for a task in the Airflow UI.
So far I've ensured they all have the same secret key (in fact, using Docker Volumes they're all reading the exact same configuration file) but this doesn't seem to help. I haven't done anything about time sync, but I'd expect that docker containers would effectively be sharing the system clock anyway so I don't see how they'd get out of sync in the first place.
I can find the log file on the airflow worker, and it has run successfully - but something is obviously missing that should be allowing the airflow UI to display that (and it would be much more convenient for my workflow to be able to see the logs in the UI rather than having to rummage around on the worker).

Related

Rsyslog can't start inside of a docker container

I've got a docker container running a service, and I need that service to send logs to rsyslog. It's an ubuntu image running a set of services in the container. However, the rsyslog service cannot start inside this container. I cannot determine why.
Running service rsyslog start (this image uses upstart, not systemd) returns only the output start: Job failed to start. There is no further information provided, even when I use --verbose.
Furthermore, there are no error logs from this failed startup process. Because rsyslog is the service that can't start, it's obviously not running, so nothing is getting logged. I'm not finding anything relevant in Upstart's logs either: /var/log/upstart/ only contains the logs of a few things that successfully started, as well as dmesg.log which simply contains dmesg: klogctl failed: Operation not permitted. which from what I can tell is because of a docker limitation that cannot really be fixed. And it's unknown if this is even related to the issue.
Here's the interesting bit: I have the exact same container running on a different host, and it's not suffering from this issue. Rsyslog is able to start and run in the container just fine on that host. So obviously the cause is some difference between the hosts. But I don't know where to begin with that: There are LOTS of differences between the hosts (the working one is my local windows system, the failing one is a virtual machine running in a cloud environment), so I wouldn't know where to even begin about which differences could cause this issue and which ones couldn't.
I've exhausted everything that I know to check. My only option left is to come to stackoverflow and ask for any ideas.
Two questions here, really:
Is there any way to get more information out of the failure to start? start itself is a binary file, not a script, so I can't open it up and edit it. I'm reliant solely on the output of that command, and it's not logging anything anywhere useful.
What could possibly be different between these two hosts that could cause this issue? Are there any smoking guns or obvious candidates to check?
Regarding the container itself, unfortunately it's a container provided by a third party that I'm simply modifying. I can't really change anything fundamental about the container, such as the fact that it's entrypoint is /sbin/init (which is a very bad practice for docker containers, and is the root cause of all of my troubles). This is also causing some issues with the docker logging driver, which is why I'm stuck using syslog as the logging solution instead.

How to return logs to Airflow container from remotely called, Dockerized celery workers

I am working on a Dockerized Python/Django project including a container for Celery workers, into which I have been integrating the off-the-shelf airflow docker containers.
I have Airflow successfully running celery tasks in the pre-existing container, by instantiating a Celery app with the redis broker and back end specified, and making a remote call via send_task; however none of the logging carried out by the celery task makes it back to the Airflow logs.
Initially, as a proof of concept as I am completely new to Airflow, I had set it up to run the same code by exposing it to the Airflow containers and creating airflow tasks to run it on the airflow celery worker container. This did result in all the logging being captured, but it's definitely not the way we want it architectured, as this makes the airflow containers very fat due to the repetition of dependencies from the django project.
The documentation says "Most task handlers send logs upon completion of a task" but I wasn't able to find more detail that might give me a clue how to enable the same in my situation.
Is there any way to get these logs back to airflow when running the celery tasks remotely?
Instead of "returning the logs to Airflow", an easy-to-implement alternative (because Airflow natively supports it) is to activate remote logging. This way, all logs from all workers would end up e.g. on S3, and the webserver would automatically fetch them.
The following illustrates how to configure remote logging using an S3 backend. Other options (e.g. Google Cloud Storage, Elastic) can be implemented similarly.
Set remote_logging to True in airflow.cfg
Build an Airflow connection URI. This example from the official docs is particularly useful IMO. One should end up having something like:
aws://AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI%2FK7MDENG%2FbPxRfiCYEXAMPLEKEY#/?
endpoint_url=http%3A%2F%2Fs3%3A4566%2F
        It is also possible to create the connectino through the webserver GUI, if needed.
Make the connection URI available to Airflow. One way of doing so is to make sure that the environment variable AIRFLOW_CONN_{YOUR_CONNECTION_NAME} is available. Example for connection name REMOTE_LOGS_S3:
export AIRFLOW_CONN_REMOTE_LOGS_S3=aws://AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI%2FK7MDENG%2FbPxRfiCYEXAMPLEKEY#/?endpoint_url=http%3A%2F%2Fs3%3A4566%2F
Set remote_log_conn_id to the connection name (e.g. REMOTE_LOGS_S3) in airflow.cfg
Set remote_base_log_folder in airflow.cfg to the desired bucket/prefix. Example:
remote_base_log_folder = s3://my_bucket_name/my/prefix
This related SO touches deeper on remote logging.
If debugging is needed, looking into any worker logs locally (i.e., inside the worker) should help.

Remove Airflow Scheduler logs

I am using Docker Apache airflow VERSION 1.9.0-2 (https://github.com/puckel/docker-airflow).
The scheduler produces a significant amount of logs, and the filesystem will quickly run out of space, so I am trying to programmatically delete the scheduler logs created by airflow, found in the scheduler container in (/usr/local/airflow/logs/scheduler)
I have all of these maintenance tasks set up:
https://github.com/teamclairvoyant/airflow-maintenance-dags
However, these tasks only delete logs on the worker, and the scheduler logs are in the scheduler container.
I have also setup remote logging, sending logs to S3, but as mentioned in this SO post Removing Airflow task logs this setup does not stop airflow from writing to the local machine.
Additionally, I have also tried creating a shared named volume between the worker and the scheduler, as outlined here Docker Compose - Share named volume between multiple containers. However, I get the following error in worker:
ValueError: Unable to configure handler 'file.processor': [Errno 13] Permission denied: '/usr/local/airflow/logs/scheduler'
and the following error in scheduler:
ValueError: Unable to configure handler 'file.processor': [Errno 13] Permission denied: '/usr/local/airflow/logs/scheduler/2018-04-11'
And so, how do people delete scheduler logs??
Inspired by this reply, I have added the airflow-log-cleanup.py DAG (with some changes to its parameters) from here to remove all old airflow logs, including scheduler logs.
My changes are minor except that given my EC2's disk size (7.7G for /dev/xvda1), 30 days default value for DEFAULT_MAX_LOG_AGE_IN_DAYS seemed too large so (I had 4 DAGs) I changed it to 14 days, but feel free to adjust it according to your environment:
DEFAULT_MAX_LOG_AGE_IN_DAYS = Variable.get("max_log_age_in_days", 30) changed to
DEFAULT_MAX_LOG_AGE_IN_DAYS = Variable.get("max_log_age_in_days", 14)
Following could be one option to resolve this issue.
Login to the docker container using following mechanism
#>docker exec -it <name-or-id-of-container> sh
While running above command make sure - container is running.
and then use cron jobs to configure scheduled rm command on those log files.
This answer to "Removing Airflow Task logs" also fits your use case in Airflow 1.10.
Basically, you need to implement a custom log handler and configure Airflow logging to use that handler instead of the default (See UPDATING.md, not README nor docs!!, in Airflow source repo)
One word of caution: Due to the way logging, multiprocessing, and Airflow default handlers interact, it is safer to override handler methods than to extend them by calling super() in a derived handler class. Because Airflow default handlers don't use locks
I spent a lot of time trying to add "maintenance" DAGs that would clear logs generated by the different airflow components started as Docker containers.
The problem was in fact more at the Docker level, each one of those processes are responsible of tons of logs that are, by default, stored in json files by Docker. The solution was to change the logging drivers so that logs are not stored on the Docker hosting instance anymore; but sent directly to AWS CloudWatch Logs in my case.
I just had to add the following to each service in the docker-compose.yml file (https://github.com/puckel/docker-airflow) :
logging:
driver: awslogs
options:
awslogs-group: myAWSLogsGroupID
Note that the EC2 instance on which my "docker-composed" Airflow app is running has an AWS role that allows her to create a log stream and add log events (CreateLogStream and PutLogEvents actions in AWS IAM service).
If you run it on a machine outside of the AWS ecosystem, you'd need to ensure it has access to AWS through credentials.

what's the BestPractice for Docker logging?

Im using docker with my Web service.
when I deploy using Docker, loosing some logging files (nginx accesslog, service log, system log.. etc)
Cause, docker deployment system using down and up container architecures.
So I thought about this problem.
LoggingServer and serviceServer(for api) must seperate!
using these, methods..
First, Using logstash(in elk)(attaching all my logFile) .
Second, Using batch system, this batch system will moves logfiles to otherServer on every midnight.
isn't it okay?
I expect a better answer.
thanks.
There are many ways for logging which most the admin uses for containers
1 ) mount log directory to host , so even if docker goes up/down logs will be persisted on host.
2) ELK server, using logstash/filebeat for pushing logs to elastic search server with tailing option of file, so if new log contents it pushes to server.
3) if there is application logs like maven based projects, then there are many plugins which pushes logs to server
4) batch system , which is not recommended because if containers dies before mid-night then logs will be lost.

Is there any way to obtain detailed logging info when executing 'docker stack deploy'?

In Docker 17.03, when executing
docker stack deploy -c docker-compose.yml [stack-name]
the only info that is output is:
Creating network <stack-name>_myprivatenet
Creating service <stack-name>_mysql
Creating service <stack-name>_app
Is there a way to have Docker output more detailed info about what is happening during deployment?
For example, the following information would be extremely helpful:
image (i.e. 'mysql' image) is being downloaded from the registry (and provide the registry's info)
if say the 'app' image is unable to be downloaded from its private registry, that an error message (i.e. due to incorrect or omitted credentials - registry login required) be output
Perhaps it could be provided via either of the following ways:
docker stack deploy --logs
docker stack log
Thanks!
docker stack logs is actually a requested feature in issue 31458
request for a docker stack logs which can show the logs for a docker stack much like docker service logs work in 1.13.
docker-compose works similarly today, showing the interleaved logs for all containers deployed from a compose file.
This will be useful for troubleshooting any kind of errors that span across heterogeneous services.
This is still pending though, because, as Drew Erny (dperny) details:
there are some changes that have to be made to the API before we can pursue this, because right now we can only get the logs for 1 service at a time unless you make multiple calls (which is silly, because we can get the logs for multiple services in the same stream on swarmkit's side).
After I finish those API changes, this can be done entirely on the client side, and should be really straightforward. I don't know when the API changes will be in because I have started yet, but I can let you know as soon as I have them!

Resources