Email on failure/retry with airflow in docker container - docker

I am trying to run the puckel airflow docker container using the LocalExecutor.yml file found here:
https://github.com/puckel/docker-airflow
I am not able to get airflow to send me emails on failure or retry.
I've tried the following:
Editing the config file with the smtp host name
[smtp]
# If you want airflow to send emails on retries, failure, and you want to use
# the airflow.utils.email.send_email_smtp function, you have to configure an
# smtp server here
smtp_host = smtp#mycompany.com
smtp_starttls = True
smtp_ssl = False
# Uncomment and set the user/pass settings if you want to use SMTP AUTH
# smtp_user = airflow
# smtp_password = airflow
smtp_port = 25
smtp_mail_from = myname#mycompany.com
Editing environment variables in the entrypoint.sh script included in the repo:
: "${AIRFLOW__SMTP__SMTP_HOST:="smtp-host"}"
: "${AIRFLOW__SMTP__SMTP_PORT:="25"}"
# Defaults and back-compat
: "${AIRFLOW_HOME:="/usr/local/airflow"}"
: "${AIRFLOW__CORE__FERNET_KEY:=${FERNET_KEY:=$(python -c "from cryptography.fernet import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print(FERNET_KEY)")}}"
: "${AIRFLOW__CORE__EXECUTOR:=${EXECUTOR:-Sequential}Executor}"
export \
AIRFLOW_HOME \
AIRFLOW__CELERY__BROKER_URL \
AIRFLOW__CELERY__RESULT_BACKEND \
AIRFLOW__CORE__EXECUTOR \
AIRFLOW__CORE__FERNET_KEY \
AIRFLOW__CORE__LOAD_EXAMPLES \
AIRFLOW__CORE__SQL_ALCHEMY_CONN \
AIRFLOW__SMTP__SMTP_HOST \
AIRFLOW__SMTP__SMTP_PORT \
if [ "$AIRFLOW__SMTP__SMTP_HOST" != "smtp-host" ]; then
AIRFLOW__SMTP__SMTP_HOST="smtp-host"
AIRFLOW__SMTP__SMTP_PORT=25
fi
I currently have a dag running that intentionally fails, but I am never alerted for retries or failures.

You don't need to do steps 1 and 2. Specify the environment variable in the docker-compose.yml.
Also, make sure that you have enabled email sending on failure in the Operator,
like in here: https://gist.githubusercontent.com/aawgit/0753b3ef1d715257e442ceafbc8583d3/raw/fa45caa9150e08654d476c2d619ab50615942b46/email-notification-on-task-failure.py

Set the following variables to make mail function work.
It depends on the config of the SMTP mail server.
User name and password should be set correctly.
If use docker-compose, setting the following variable in docker-compose.yml will auto update the airflow variables:
- AIRFLOW__SMTP__SMTP_HOST=smtp_host
- AIRFLOW__SMTP__SMTP_PORT=25_?_the_smtp_port_used_smtp_host
- AIRFLOW__SMTP__SMTP_USER=mail_user_name_of_your_app
- AIRFLOW__SMTP__SMTP_PASSWORD=mail_user_password_of_your_app
- AIRFLOW__SMTP__SMTP_MAIL_FROM=mail_user_name_of_your_app#mail.server.name
Similarly, if bring up docker manually, setting the following variables as environment variables:
AIRFLOW__SMTP__SMTP_HOST=smtp_host
AIRFLOW__SMTP__SMTP_PORT=25_?_the_smtp_port_used_smtp_host
AIRFLOW__SMTP__SMTP_USER=mail_user_name_of_your_app
AIRFLOW__SMTP__SMTP_PASSWORD=mail_user_password_of_your_app
AIRFLOW__SMTP__SMTP_MAIL_FROM=mail_user_name_of_your_app#mail.server.name
export AIRFLOW__SMTP__SMTP_HOST AIRFLOW__SMTP__SMTP_PORT AIRFLOW__SMTP__SMTP_USER AIRFLOW__SMTP__SMTP_PASSWORD AIRFLOW__SMTP__SMTP_MAIL_FROM
# Then bring up docker:
docker run your_docker...

Related

Dockerize 'at' scheduler

I want to put at daemon (atd) in separate docker container for running as external environment independent scheduler service.
I can run atd with following Dockerfile and docker-compose.yml:
$ cat Dockerfile
FROM alpine
RUN apk add --update at ssmtp mailx
CMD [ "atd", "-f" ]
$ cat docker-compose.yml
version: '2'
services:
scheduler:
build: .
working_dir: /mnt/scripts
volumes:
- "${PWD}/scripts:/mnt/scripts"
But problems are:
1) There is no built-in option to reditect atd logs to /proc/self/fd/1 for showing them via docker logs command. at just have -m option, which sends mail to user.
Is it possible to redirect at from user mail to /proc/self/fd/1 (maybe some compile flags) ?
2) Now I add new task via command like docker-compose exec scheduler at -f test.sh now + 1 minute. Is it a good way ? I think a better way is to find a file where at stores a queue, add this file as volume, update it externally and just send docker restart after file change.
But I can't find where at stores its data on alpine linux ( I just found /var/spool/atd/.SEQ where at stores id of last job ). Anyone knows where at stores its data ?
Also will be glad to hear any advices regarding at dockerization.
UPD. I found where at stores its data on alpine, it's /var/spool/atd folder. When I create a task via at command it creates here executable file with name like a000040190a2ff and content like
#!/bin/sh
# atrun uid=0 gid=0
# mail root 1
umask 22
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin; export PATH
HOSTNAME=e605e8017167; export HOSTNAME
HOME=/root; export HOME
cd /mnt/scripts || {
echo 'Execution directory inaccessible' >&2
exit 1
}
#!/usr/bin/env sh
echo "Hello world"
UPD2: the difference between running at with and without -m option is third string of generated script
with -m option:
#!/bin/sh
# atrun uid=0 gid=0
# mail root 1
...
without -m :
#!/bin/sh
# atrun uid=0 gid=0
# mail root 0
...
According official man
The user will be mailed standard error and standard output from his
commands, if any. Mail will be sent using the command
/usr/sbin/sendmail
and
-m
Send mail to the user when the job has completed even if there was no
output.
I tried to run schedule simple Hello World script and found that no mail was sent:
# mail -u root
No mail for root

How can I track if an Xdebug request is being sent from my browser / docker container?

I have a PhpStorm IDE set up with Xdebug, a docker container that is set up with Xdebug included and the browser module applied. However, when I try to 'pick up the phone' and refresh the code contained in the environment, nothing is ever picked up.
I've tried a variety of ports and server names, along with the tutorials I can find. I am not certain if it's security or a bad setup with my docker but I am relatively certain the PhpStorm setup is correct.
I've tried exposing ports in the Dockerfile (9000 & 9001)
My .php file is just echo and some math with break points applied:
echo("TEST 1<br>");
$test = 2;
echo("TEST " . $test . "<br>");
$testArray = xdebug_get_code_coverage();
var_dump($testArray);
phpinfo();
dd("TEST 3");
In my .env file, the following is defined:
PHP_IDE_CONFIG=serverName=jumbledowns-demo
XDEBUG_CONFIG=remote_host=localhost remote_port=9001
And my Dockerfile sets up Xdebug thus:
RUN pecl install xdebug; \
docker-php-ext-enable xdebug; \
echo "error_reporting = E_ALL" >> /usr/local/etc/php/conf.d/docker-php-ext-xdebug.ini; \
echo "display_startup_errors = On" >> /usr/local/etc/php/conf.d/docker-php-ext-xdebug.ini; \
echo "display_errors = On" >> /usr/local/etc/php/conf.d/docker-php-ext-xdebug.ini; \
echo "xdebug.remote_enable=1" >> /usr/local/etc/php/conf.d/docker-php-ext-xdebug.ini;
PHPinfo on the browser side shows the code as expected, and the following settings for Xdebug:
xdebug.remote_host localhost localhost
xdebug.remote_log no value no value
xdebug.remote_mode req req
xdebug.remote_port 9001 9000
I've tried using URLs and assigned IP addresses in the above and can change them as needed.
Ideally, I'd like to be able to have a command line log or output that is showing what Xdebug is trying to call and on what port.
In order to find out what Xdebug is trying to do, you need to set the xdebug.remote_log setting to something non-empty, such as xdebug.remote_log=/tmp/xdebug.log. That file will include all connection attempts, and communication protocol contents if a connection is made.
remote_host=localhost
Is almost certainly not correct though, as PHP and Xdebug, running in your docker container need to open a connection to this host, and localhost, is not going to be the right hostname/IP address. It's more likely that it's host.docker.internal or something like that, or rather, hard code the IP address of the machine where your IDE runs, as is accessible to from your docker container.

Problem in specifying the network in cloud dataflow

I didn't configure the project and I get this error whenever I run my job 'The network default doesn't have rules that open TCP ports 1-65535 for internal connection with other VMs. Only rules with a target tag 'dataflow' or empty target tags set apply. If you don't specify such a rule, any pipeline with more than one worker that shuffles data will hang. Causes: No firewall rules associated with your network.'
google_cloud_options = p_options.view_as(GoogleCloudOptions)
google_cloud_options.region = 'europe-west1'
google_cloud_options.project = 'my-project'
google_cloud_options.job_name = 'rim'
google_cloud_options.staging_location = 'gs://my-bucket/binaries'
google_cloud_options.temp_location = 'gs://my-bucket/temp'
p_options.view_as(StandardOptions).runner = 'DataflowRunner'
p_options.view_as(SetupOptions).save_main_session = True
p_options.view_as(StandardOptions).streaming = True
p_options.view_as(WorkerOptions).subnetwork = 'regions/europe-west1/subnetworks/test'
p = beam.Pipeline(options=p_options)
I tried to specify --network 'test' in the command line since it is not the default configuration
It looks like your default firewall rules were modified and dataflow detected this and prevented your job from launching. Could you verify your firewall rules were not modified in your project?. Please take a look at the documentation here. You will also find a command here to restore the firewall rules:
gcloud compute firewall-rules create [FIREWALL_RULE_NAME] \
--network [NETWORK] \
--action allow \
--direction ingress \
--target-tags dataflow \
--source-tags dataflow \
--priority 0 \
--rules tcp:1-65535
Pick a name for the firewall, and provide a network name. Then pass in the network name with --network when you launch the dataflow job. If you have a network named 'default' dataflow will try to use that automatically, so you won't need to pass in --network. If you've deleted that network you may wish to recreate it.
As of now, till apache beam version 2.19.0. There is no provision from dataflow to set network tag for its VM.
Instead while creating firewall rule, we should add a tag for dataflow.
gcloud compute firewall-rules create FIREWALL_RULE_NAME \
--network NETWORK \
--action allow \
--direction DIRECTION \
--target-tags dataflow \
--source-tags dataflow \
--priority 0 \
--rules tcp:12345-12346
See this link for more details
https://cloud.google.com/dataflow/docs/guides/routes-firewall

How to pass variable as attribute to xml configuration file in Wildfly with Docker

I'm trying to pass values from docker-compose.yml file to Wildfly configuration dynamically.
I want to have flexibility of mail configuration - just for quick change of addres, or username, or port..
In this case, I tried to do that by forwarding environment variables from docker-compose.yml, by dockerfile as arguments "-Dargumentname=$environmentvariable.
Currently wildfly interupts on start with error:
[org.jboss.as.controller.management-operation] (ServerService Thread
Pool -- 45) WFLYCTL0013: Operation ("add") failed - address: ([
("subsystem" => "mail"),
("mail-session" => "default") ]) - failure description: "WFLYCTL0097: Wrong type for ssl. Expected [BOOLEAN] but was STRING"
Same situation, if I try to pass PORT as value in outbound-socket-binding block.
I have no idea how to pass integers/booleans from docker-compose file to Wildfly configuration.
docker-compose.yml (part)
...
services:
some_service:
image: image_name:tag
environment:
- USERNAME=some_username#...
- PASSWORD=some_password
- SSL=true // I also tried with value 1
- HOST=smtp.gmail.com
- PORT=465 // also doesn't work
...
Dockerfile:
FROM some_wildfly_base_image
# install cgroup-bin package
USER root
RUN apt-get update
RUN apt-get install -y cgroup-bin
RUN apt-get install -y bc
USER jboss
ADD standalone-myapp.xml /opt/jboss/wildfly/standalone/configuration/
ADD standalone.conf /opt/jboss/wildfly/bin/
ADD modules/ /opt/jboss/wildfly/modules/
RUN wildfly/bin/add-user.sh usr usr --silent
# Set the default command to run on boot
# This will boot WildFly in the standalone mode and bind to all interface
CMD [ "/opt/jboss/wildfly/bin/standalone.sh", "-c", "standalone-myapp.xml", "-Dmail.username=$USERNAME", "-Dmail.password=$PASSWORD", "-Dmail.ssl=$SSL", "-Drm.host=$HOST", "-Drm.port=$PORT" ]
standalone-myapp.xml:
...
<subsystem xmlns="urn:jboss:domain:mail:2.0">
<mail-session name="default" jndi-name="java:jboss/mail/Default">
<smtp-server password="${mail.password}" username="${mail.username}" ssl="${mail.ssl}" outbound-socket-binding-ref="mail-smtp"/>
</mail-session>
</subsystem>
...
<outbound-socket-binding name="mail-smtp">
<remote-destination host="${rm.host}" port="465"/>
</outbound-socket-binding>
...
Almost there. In your docker file, you have defined environmental variables therefore you need to reference them as environmental variables in your wildfly config. The easiest way is to prefix your env var with env. prefix. So in your example, you have env variables HOST, SSL, USERNAME... which you can reference in standalone.xml like this:
<smtp-server password="${env.PASSWORD}" username="${env.USERNAME}" ssl="${env.SSL}" outbound-socket-binding-ref="mail-smtp"/> </mail-session>
Without env. prefix, jboss/wildfly will try to resolve the expression as jvm property, which you'd have to specify as jvm -D flag.
You can also use default value fallback in your expressions such as:
ssl="${env.SSL:true}"
This way, the ssl will be set the the value of environmental variable named SSL, and if such var does not exist, server will fallback to true.
Happy hacking

Environment variables and PHP

I have an ubuntu server with a handful of custom environment variables set in /etc/environment as per the ubuntu community recommendation
When I use php from the command line I can use php's getenv() function to access this variables.
Also, if I run phpinfo() from the command line I see all of my variables in the ENVIRONMENT section.
However, when trying to access the same data inside processes being run by php5-fpm this data is not available. All I can see in the ENVIRONMENT section of phpinfo() is:
USER www-data
HOME /var/www
I know the command line uses this ini:
/etc/php5/cli/php.ini
And fpm uses:
/etc/php5/fpm/php.ini
I've not managed to find any differences between the two that would explain why the ENV variables are not coming through in both.
Also if run:
sudo su www-data
and then echo the environment variables I am expecting they are indeed available to the www-data user.
What do I need to do to get my environment variables into the php processes run by fpm?
It turns out that you have to explicitly set the ENV vars in the php-fpm.conf
Here's an example:
[global]
pid = /var/run/php5-fpm.pid
error_log = /var/log/php5-fpm.log
[www]
user = www-data
group = www-data
listen = /var/run/php5-fpm.sock
pm = dynamic
pm.max_children = 5
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3
chdir = /
env[MY_ENV_VAR_1] = 'value1'
env[MY_ENV_VAR_2] = 'value2'
1. Setting environment variables automatically in php-fpm.conf
clear_env = no
2. Setting environment variables manually in php-fpm.conf
env[MY_ENV_VAR_1] = 'value1'
env[MY_ENV_VAR_2] = 'value2'
! Both methods are described in php-fpm.conf:
Clear environment in FPM workers Prevents arbitrary environment
variables from reaching FPM worker processes by clearing the
environment in workers before env vars specified in this pool
configuration are added. Setting to "no" will make all environment
variables available to PHP code via getenv(), $_ENV and $_SERVER.
Default Value: yes
clear_env = no
Pass environment variables like LD_LIBRARY_PATH. All $VARIABLEs are
taken from the current environment. Default Value: clean env
env[HOSTNAME] = $HOSTNAME
env[PATH] = /usr/local/bin:/usr/bin:/bin
env[TMP] = /tmp
env[TMPDIR] = /tmp
env[TEMP] = /tmp
I found solution in this github discussion .
The problem is when you run the php-fpm. The process not load the environment.
You can load it in the startup script.
My php-fpm is install by apt-get.
So modify the
/etc/init.d/php5-fpm
and add (beware the space between the dot and the slash)
. /etc/profile
and modify the /etc/profile to add
. /home/user/env.sh
In the env.sh. You can export the environment whatever you need.
Then modify
php-fpm.conf
add env[MY_ENV_VAR_1] = 'value1' under the [www] section.
Last. restart the php-fpm. You'll get the environment load by the fpm.
Adding on to the answers above, I was running php-fpm7 and nginx in an alpine:3.8 docker container. The problem that I faced was the env variables of USER myuser was not getting copied into the USER root
My entrypoint for docker was
sudo nginx # Runs nginx as daemon
sudo php-fpm7 -F -O # Runs php-fpm7 in foreground
The solution for this was
sudo -E nginx
sudo -E php-fpm7 -F -O
-E option of sudo copies all env variables of current user to the root
Of course, your php-fpm.d/www.conf file should have clear_env=no
And FYI, if you're using a daemon service like supervisord they have their own settings to copy the env. For example, supervisord has setting called copy_env=True

Resources