Is it possible to configure my Airflow environment (2.4.3) to auto restart the scheduler after adding a plugin? - environment-variables

I have an Airflow environment (V2.4.3) that gets the following error after I add a plugin and DAGs that reference the plugin
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/airflow/dags/hello_dag.py", line 6, in <module>
from operators.hello_operator import HelloOperator
ModuleNotFoundError: No module named 'operators'
The solution for this is to manually restart the scheduler
However, I am wondering if there exists some configuration via environment variables that I can use to configure the airflow environment to automatically restart the scheduler without requiring me to do it manually?
I looked through the documentation but did not find anything so I am asking here in case I missed something.
If not possible, do let me know

You can try to set lazy_load_plugins to False to load the plugins for each process instead of restarting the scheduler, but the restart is more recommended.
Restarting the scheduler and/or the other components is not supported by Airflow and it should not be supported where this is a CD task depends on how do you deploy Airflow.
If you are deploying it on K8S, you can restart the deployment using kubectl rollout restart deployment/<name>, and if you are using docker or you run it directly on your host machine, you can use Ansible to run docker restart <container name> for docker, and for host:
ps aux | grep "airflow scheduler" | awk '{print $2}' | xargs kill && airflow scheduler <pass your args>

Related

Problem running gcsfuse on Google App Engine

I am trying to run Airflow Webserver on App Engine Flexible however for it to work I need a mounted GCS bucket. I am using custom runtime.
The reason why I am doing it is to get a secured endpoint that app Engine provides together with IAP.
My app.yaml is a simple file with service name, env and runtime
My Dockerfile is a lots of apt-get installs and in CMD there is gcsfuse mounting and running airflow webserver, it is not a big deal.
The error I am getting when trying to use gcsfuse in App Engine is:
daemonize.Run: readFromProcess: sub-process: mountWithArgs: mountWithConn: Mount: mount: running fusermount: exit status 1
stderr:
fusermount: fuse device not found, try 'modprobe fuse' first
I know that Google Composer exists but it is way too expensive for my needs. So I prefer to create a VM with a scheduler and webserver on GAE, sharing a GCS bucket, similar to what Composer gives but without all that HA and insane cost for simple things I want to run.
I am searching to do this in App Engine, all the answers I have found so far mention GKE for some reason.
I know it is a privilege problem, however in App Engine I do not see any option to set privileges, a way to do it would be very helpful.
Is is even possible to do what I want to do on App Engine?
This is possible. I'll show you how to do it manually, you might need to utilize shell script to deal with multiple instances.
define several vars used in this manual
service=YOUR_APPENGINE_VERSION
version=YOUR_APPENGINE_VERSION
project=PROJECTID
get instance list
gcloud app instances list --project $project
SERVICE VERSION ID VM_STATUS DEBUG_MODE
default *************** instance-id-1 RUNNING YES
default *************** instance-id-2 RUNNING
ssh into one instance
gcloud app instances ssh instance-id-1 --service $service --version $version --project $project
get image id
docker ps | grep gaeapp | awk '{print $2}'
you will get an imageid
get env of gaeapp
docker exec gaeapp env
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=*****
GAE_MEMORY_MB=614
GAE_INSTANCE=****
GAE_SERVICE=default
PORT=8080
GCLOUD_PROJECT=*****
GAE_VERSION=*****
GOOGLE_CLOUD_PROJECT=*****
restart gaeapp with privilege
docker rm -f gaeapp
docker run --privileged -d -p 8080:8080 --name gaeapp -e GAE_MEMORY_MB=614 -e GAE_INSTANCE=instance-id-1 -e GAE_SERVICE=$service -e PORT=8080 -e GCLOUD_PROJECT=$project -e GAE_VERSION=$version -e GOOGLE_CLOUD_PROJECT=$project $imageid
enter gaeapp(assume you have gcsfuse installed and have service account key json: /test-service-account.json)
$ docker exec -it gaeapp bash
[in gaeapp] # GOOGLE_APPLICATION_CREDENTIALS=/test-service-account.json gcsfuse BUCKET /mnt/
Using mount point: /mnt
Opening GCS connection...
Opening bucket...
Mounting file system...
File system has been successfully mounted.
To be honest, I have tried all possible solutions. and finally the above solution worked. Unfortunately, it worked for 2-3 days only. After sometime, App Engine restarts the instances automatically, without any failure in app. Therefore all changes for gcsfuse got disappeared.
Main thing for gcsfuse to work in container is to run the docker image in priviliged mode. And App Engine doesnot allow that
The final solution that we are using is GKE which is working fine.
Note: It was expected that GAE should have some provision for privileged mode, but it doesnot have now. In future Google Team may introduce it. Thanks!

How to know which command or compose file has been used to start Docker containers?

Is there any way to find a source of the docker container script? I have a setup where I can not find any docker-compose.yml file nor the bash script etc that would have run all the Docker containers currently running. I have a virtual machine that starts docker containers on the startup, but have no idea which file is actually run.
i think no option to know which docker-compose file is use.
but you can check manual every you project folder.
the docker-compose mechanism is by matching the docker-compose.yml file. so if you run command sudo docker-compose ps in every your project folder. docker-compose will match between the docker-compose file used by container and docker-compose file in your project, if the same than the results will be displayed, if not the results is not displayed
If the containers are running automatically on reboot and you have no cron/bash profile/rc.local or any other startup screen then that may mean that they are containers with --restart option set. You can change that by running below command
docker ps -q | xargs docker update --restart no
docker ps -q | xargs docker stop
Then restart the machine. The containers should not start. If they do then you have some script somewhere which is starting them

Why there is no init / initctl on the docker centos image

Using the public/common docker's centos image I was installing some services that required a /etc/init directory and I had a failure. I have further noticed that initctl does not exist, meaning that init was not run.
How can the centos image be used with a fully functional init process ?
example:
docker run -t -i centos /bin/bash
file /etc/init
/etc/init: cannot open ... no such file or directory ( /etc/init )
initctl
bash: initctl: command not found
A Docker container is more analogous to a process than a VM. That process can spawn other processes though, and the sub-processes will run in the same container. A common pattern is to use a process supervisor like supervisord as described in the Docker documentation. In general though, it's usually recommended to try and run one process per container if you can (so that, for example, you can monitor and cap memory and CPU at the process level).

How to programmatically stop an executable war

I have a problem with stopping an executable war file in the background as a service in Linux system. I can start the executable war with the following script:
java -jar data.war&
but how do I stop it programmatically through Jenkins using SSH plugin?
I would probably start with using the STOP port mechanism. See this documentation for an example.
http://www.eclipse.org/jetty/documentation/current/quickstart-running-jetty.html#quickstart-stopping-jetty
For those that is in need of the answer.
PROCESSID=`(ps aux | grep data| grep -v root | awk '{print $2}')`
kill -9 $PROCESSID
The above code kills the application with the name 'data' in the process list

Unicorn init script - not starting at boot

I'm very new to system administration and have no idea how init.d works. So maybe I'm doing something wrong here.
I'm trying to start unicorn on boot, but somehow it just fails to start everytime. I'm able to manually do a start/stop/restart by simply service app_name start. Can't seem to understand why unicorn doesn't start at boot if manual starting stopping of service works. Some user permission issue maybe ??
My unicorn init script and the unicorn config files are available here https://gist.github.com/1956543
I'm setting up a development environment on Ubuntu 11.1 running inside a VM.
UPDATE - Could it be possible because of the VM ? I'm currently sharing the entire codebase (folder) with the VM, which also happens to contain the unicorn config needed to start unicorn.
Any help would be greatly appreciated !
Thanks
To get Unicorn to run when your system boots, you need to associate the init.d script with the default set of "runlevels", which are the modes that Ubuntu enters as it boots.
There are several different runlevels, but you probably just want the default set. To install Unicorn here, run:
sudo update-rc.d <your service name> defaults
For more information, check out the update-rc.d man page.
You can configure a cron job to start the unicorn server on reboot
crontab -e
and add
#reboot /bin/bash -l -c 'service unicorn_<your service name> start >> /<path to log file>/cron.log 2>&1'

Resources