My container runs fine using docker-compose. But once I apply my deployment on Kubernetes it fails every time on the same error. No matter what I try. I'm stuck and would love some input/help.
I don't know how to debug this. A bit new at Kubernetes. So If someone can guide me, I can try to debug it.
Error stack trace is the following:
Running migration with alembic
INFO [src.core.config] value for BACKEND_CORS_ORIGINS= ['http://localhost:3000', 'http://localhost:8001']
Traceback (most recent call last):
File "pydantic/env_settings.py", line 197, in pydantic.env_settings.EnvSettingsSource.__call__
File "pydantic/env_settings.py", line 131, in pydantic.env_settings.Config.parse_env_var
File "/usr/local/lib/python3.9/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/alembic", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/alembic/config.py", line 590, in main
CommandLine(prog=prog).main(argv=argv)
File "/usr/local/lib/python3.9/site-packages/alembic/config.py", line 584, in main
self.run_cmd(cfg, options)
File "/usr/local/lib/python3.9/site-packages/alembic/config.py", line 561, in run_cmd
fn(
File "/usr/local/lib/python3.9/site-packages/alembic/command.py", line 322, in upgrade
script.run_env()
File "/usr/local/lib/python3.9/site-packages/alembic/script/base.py", line 569, in run_env
util.load_python_file(self.dir, "env.py")
File "/usr/local/lib/python3.9/site-packages/alembic/util/pyfiles.py", line 94, in load_python_file
module = load_module_py(module_id, path)
File "/usr/local/lib/python3.9/site-packages/alembic/util/pyfiles.py", line 110, in load_module_py
spec.loader.exec_module(module) # type: ignore
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "alembic/env.py", line 22, in <module>
from src.core.config import settings
File "/app/src/core/config.py", line 72, in <module>
settings = Settings()
File "pydantic/env_settings.py", line 40, in pydantic.env_settings.BaseSettings.__init__
File "pydantic/env_settings.py", line 75, in pydantic.env_settings.BaseSettings._build_values
File "pydantic/env_settings.py", line 200, in pydantic.env_settings.EnvSettingsSource.__call__
pydantic.env_settings.SettingsError: error parsing env var "BACKEND_CORS_ORIGINS"
Alembic migration done
[2022-11-15 21:25:36 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2022-11-15 21:25:36 +0000] [1] [INFO] Listening at: http://0.0.0.0:8001 (1)
[2022-11-15 21:25:36 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2022-11-15 21:25:36 +0000] [23] [INFO] Booting worker with pid: 23
[2022-11-15 21:25:37 +0000] [24] [INFO] Booting worker with pid: 24
[2022-11-15 21:25:37 +0000] [25] [INFO] Booting worker with pid: 25
[2022-11-15 21:25:40 +0000] [23] [ERROR] Exception in worker process
Now how it is set up.
My docker-compose file builds using a Dockerfile. The end of the Dockerfile is as follow. There is an ENTRYPOINT before the CMD instruction
ENTRYPOINT ["./docker-entrypoint.sh"]
CMD ["./run.sh"]
docker-entrypoint.sh
if [ -n "$DB_SERVER" ]; then
./wait-for-it.sh "$DB_SERVER:${DB_PORT:-3306}"
fi
# Run the main container command.
exec "$#"
This works because I can see the logs from the 'wait-for-it' script
run.sh (where the error comes from)
export APP_MODULE=${APP_MODULE-src.main:app}
export HOST=${HOST:-0.0.0.0}
export PORT=${PORT:-8001}
export BACKEND_CORS_ORIGINS=${BACKEND_CORS_ORIGINS:-'http://localhost:8001'}
echo "Running migration with alembic"
# Run migrations
alembic upgrade head
echo "Alembic migration done"
exec gunicorn -b $HOST:$PORT "$APP_MODULE" --reload --workers=3 --timeout 0 -k uvicorn.workers.UvicornWorker
I've set a default value to BACKEND_CORS_ORIGINS just in case.
In the stack trace above, at the 2nd line you can see a log. This is inside the pydantic validator for BACKEND_CORS_ORIGINS.
Here is part of my config.py file:
class Settings(BaseSettings):
BACKEND_CORS_ORIGINS: List[AnyHttpUrl] = ["http://localhost:3000", "http://localhost:8001"]
#validator("BACKEND_CORS_ORIGINS", pre=True, allow_reuse=True)
def assemble_cors_origins(cls, value: Union[str, List[str]]) -> Union[List[str], str]:
logger.info(f"value for BACKEND_CORS_ORIGINS= {value}, type of BACKEND_CORS_ORIGINS= {type(value)}")
backend_cors_origins = None
if isinstance(value, str) and not value.startswith("["):
backend_cors_origins = [i.strip() for i in value.split(",")]
elif isinstance(value, (list, str)):
backend_cors_origins = value
logger.info(f"value for BACKEND_CORS_ORIGINS= {backend_cors_origins}, type of BACKEND_CORS_ORIGINS= {type(backend_cors_origins)}")
if backend_cors_origins:
return backend_cors_origins
raise ValueError(value)
For some reasons, I am not seeing the last part of the log where I want to see the type.
And to finish, here is my Deployment.yml file for Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: dev
name: api-name
spec:
replicas: 1
selector:
matchLabels:
app: api-name
template:
metadata:
labels:
app: api-name
spec:
imagePullSecrets:
- name: secret_name
containers:
- name: api-name
image: [private_registry]/api
env:
- name: MARIADB_USER
value: 'db_user'
- name: MARIADB_PASSWORD
value: 'password'
- name: MARIADB_SERVER
value: 'IP_DB_SERVER'
- name: MARIADB_DATABASE
value: 'db_name'
- name: MARIADB_PORT
value: '1234'
- name: BACKEND_CORS_ORIGINS
value: 'http://localhost:8001'
resources:
limits:
memory: "512Mi"
cpu: "500m"
ports:
- containerPort: 8001
---
apiVersion: v1
kind: Service
metadata:
namespace: dev
name: svc-api
spec:
type: NodePort
selector:
app: api-name
ports:
- port: 8001
targetPort: 8001
Can someone help or point my in the right direction? Been trying everything I know to resolve this without any luck.
Maybe I need more config files for Kubernetes?
If I remove the alembic command in the run.sh script, I'm still getting the same error, but without the log inside the #validator
The issue seems to come from this line of export
BACKEND_CORS_ORIGINS=${BACKEND_CORS_ORIGINS:-'http://localhost:8001'}
It seems to expect a list of URLs so maybe you can change the code
- name: BACKEND_CORS_ORIGINS
value:
- 'localhost:8001'
in Deployment.yml
Related
We are facing strange issue with EKS Fargate Pods. We want to push logs to cloudwatch with sidecar fluent-bit container and for that we are mounting the separately created /logs/boot and /logs/access folders on both the containers with emptyDir: {} type. But somehow the access folder is getting deleted. When we tested this setup in local docker it produced desired results and things were working fine but not when deployed in the EKS fargate. Below is our manifest files
Dockerfile
FROM anapsix/alpine-java:8u201b09_server-jre_nashorn
ARG LOG_DIR=/logs
# Install base packages
RUN apk update
RUN apk upgrade
# RUN apk add ca-certificates && update-ca-certificates
# Dynamically set the JAVA_HOME path
RUN export JAVA_HOME="$(dirname $(dirname $(readlink -f $(which java))))" && echo $JAVA_HOME
# Add Curl
RUN apk --no-cache add curl
RUN mkdir -p $LOG_DIR/boot $LOG_DIR/access
RUN chmod -R 0777 $LOG_DIR/*
# Add metadata to the image to describe which port the container is listening on at runtime.
# Change TimeZone
RUN apk add --update tzdata
ENV TZ="Asia/Kolkata"
# Clean APK cache
RUN rm -rf /var/cache/apk/*
# Setting JAVA HOME
ENV JAVA_HOME=/opt/jdk
# Copy all files and folders
COPY . .
RUN rm -rf /opt/jdk/jre/lib/security/cacerts
COPY cacerts /opt/jdk/jre/lib/security/cacerts
COPY standalone.xml /jboss-eap-6.4-integration/standalone/configuration/
# Set the working directory.
WORKDIR /jboss-eap-6.4-integration/bin
EXPOSE 8177
CMD ["./erctl"]
Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: vinintegrator
namespace: eretail
labels:
app: vinintegrator
pod: fargate
spec:
selector:
matchLabels:
app: vinintegrator
pod: fargate
replicas: 2
template:
metadata:
labels:
app: vinintegrator
pod: fargate
spec:
securityContext:
fsGroup: 0
serviceAccount: eretail
containers:
- name: vinintegrator
imagePullPolicy: IfNotPresent
image: 653580443710.dkr.ecr.ap-southeast-1.amazonaws.com/vinintegrator-service:latest
resources:
limits:
memory: "7629Mi"
cpu: "1.5"
requests:
memory: "5435Mi"
cpu: "750m"
ports:
- containerPort: 8177
protocol: TCP
# securityContext:
# runAsUser: 506
# runAsGroup: 506
volumeMounts:
- mountPath: /jboss-eap-6.4-integration/bin
name: bin
- mountPath: /logs
name: logs
- name: fluent-bit
image: 657281243710.dkr.ecr.ap-southeast-1.amazonaws.com/fluent-bit:latest
imagePullPolicy: IfNotPresent
env:
- name: HOST_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
limits:
memory: 200Mi
requests:
cpu: 200m
memory: 100Mi
volumeMounts:
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
- name: logs
mountPath: /logs
readOnly: true
volumes:
- name: fluent-bit-config
configMap:
name: fluent-bit-config
- name: logs
emptyDir: {}
- name: bin
persistentVolumeClaim:
claimName: vinintegrator-pvc
Below is the /logs folder ownership and permission. Please notice the 's' in drwxrwsrwx
drwxrwsrwx 3 root root 4096 Oct 1 11:50 logs
Below is the content inside logs folder. Please notice the access folder is not created or deleted.
/logs # ls -lrt
total 4
drwxr-sr-x 2 root root 4096 Oct 1 11:50 boot
/logs #
Below is the configmap of Fluent-Bit
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: eretail
labels:
k8s-app: fluent-bit
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
#INCLUDE application-log.conf
application-log.conf: |
[INPUT]
Name tail
Path /logs/boot/*.log
Tag boot
[INPUT]
Name tail
Path /logs/access/*.log
Tag access
[OUTPUT]
Name cloudwatch_logs
Match *boot*
region ap-southeast-1
log_group_name eks-fluent-bit
log_stream_prefix boot-log-
auto_create_group On
[OUTPUT]
Name cloudwatch_logs
Match *access*
region ap-southeast-1
log_group_name eks-fluent-bit
log_stream_prefix access-log-
auto_create_group On
parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
Below is error log of Fluent-bit container
AWS for Fluent Bit Container Image Version 2.14.0
Fluent Bit v1.7.4
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2021/10/01 06:20:33] [ info] [engine] started (pid=1)
[2021/10/01 06:20:33] [ info] [storage] version=1.1.1, initializing...
[2021/10/01 06:20:33] [ info] [storage] in-memory
[2021/10/01 06:20:33] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/10/01 06:20:33] [error] [input:tail:tail.1] read error, check permissions: /logs/access/*.log
[2021/10/01 06:20:33] [ warn] [input:tail:tail.1] error scanning path: /logs/access/*.log
[2021/10/01 06:20:38] [error] [net] connection #33 timeout after 5 seconds to: 169.254.169.254:80
[2021/10/01 06:20:38] [error] [net] socket #33 could not connect to 169.254.169.254:80
Suggest remove the following from your Dockerfile:
RUN mkdir -p $LOG_DIR/boot $LOG_DIR/access
RUN chmod -R 0777 $LOG_DIR/*
Use the following method to setup the log directories and permissions:
apiVersion: v1
kind: Pod # Deployment
metadata:
name: busy
labels:
app: busy
spec:
volumes:
- name: logs # Shared folder with ephemeral storage
emptyDir: {}
initContainers: # Setup your log directory here
- name: setup
image: busybox
command: ["bin/ash", "-c"]
args:
- >
mkdir -p /logs/boot /logs/access;
chmod -R 777 /logs
volumeMounts:
- name: logs
mountPath: /logs
containers:
- name: app # Run your application and logs to the directories
image: busybox
command: ["bin/ash","-c"]
args:
- >
while :; do echo "$(date): $(uname -r)" | tee -a /logs/boot/boot.log /logs/access/access.log; sleep 1; done
volumeMounts:
- name: logs
mountPath: /logs
- name: logger # Any logger that you like
image: busybox
command: ["bin/ash","-c"]
args: # tail the app logs, forward to CW etc...
- >
sleep 5;
tail -f /logs/boot/boot.log /logs/access/access.log
volumeMounts:
- name: logs
mountPath: /logs
The snippet runs on Fargate as well, run kubectl logs -f busy -c logger to see the tailing. In real world, the "app" is your java app, "logger" is any log agent you desired. Note Fargate has native logging capability using AWS Fluent-bit, you do not need to run AWS Fluent-bit as sidecar.
I am using Openshift to deploy a django application which uses pyodbc for connecting to external database.
Currently I wanted to schedule a cronjob in openshift using yaml file. The cronjob gets created with no problem but throws this error when run:
('IM004', "[IM004] [unixODBC][Driver Manager]Driver's SQLAllocHandle on SQL_HANDLE_HENV failed (0) (SQLDriverConnect)")
This error occcured before as Openshift overrides uid when running a container. I overcame this error by following this workaround: https://github.com/VeerMuchandi/mssql-openshift-tools/blob/master/mssql-client/uid_entrypoint.sh
This error pops up again when the cronjob is run and this maybe due to same uid issue. Following is my yaml file for scheduling cronjob:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: samplecron
spec:
securityContext:
runAsUser: 1001
runAsGroup: 0
schedule: "*/5 * * * *"
concurrencyPolicy: "Forbid"
startingDeadlineSeconds: 60
suspend:
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
metadata:
labels:
parent: "cronjobpi"
spec:
containers:
- name: samplecron
image: docker-registry.default.svc:5000/image-name
volumeMounts:
- mountPath: /path-to-mount
name: "volume-name"
command: [ "python3", "/script.py" ]
volumes:
- name: "vol-name"
restartPolicy: Never
Can someone suggest how I can provide same userid's information in yaml file of cronjob or any other way of solving this issue?
Was able to solve the issue using the entrypoint script I mentioned above and I included the the command to run the python script inside the .sh entrypoint script and instead of command: [ "python3", "/script.py" ] ["sh" , "/entrypoint.sh"] was used..The python script is used to connect to a DB server using pyodbc. pyodbc.connect() causes an issue if UID of container of is not written in etc/passwd which is done by entrypoint script mentioned above.
I created a OpenMapTiles container:
using a volume for /data directory
using the image klokantech/openmaptiles-server:1.6.
The container started nicely. I downloaded the planet file. And the service was working fine.
As I am gonna push this to production: if the container dies, my orchestration system (Kubernetes) will restart it automatically and I want it to pick the previous configuration (so it doesn't need to download the planet file again or set any configuration).
So I killed my container and restart it using the same previous volume.
Problem: when my container was restarted, my restarted MapTiles didn't have the previous configuration and I got this error in the UI:
OpenMapTiles Server is designed to work with data downloaded from OpenMapTiles.com, the following files are unknown and will not be used:
osm-2018-04-09-v3.8-planet.mbtiles
Also, I the logs, it appeared:
/usr/lib/python2.7/dist-packages/supervisor/options.py:298: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2018-05-09 09:20:18,359 CRIT Supervisor running as root (no user in config file)
2018-05-09 09:20:18,359 INFO Included extra file "/etc/supervisor/conf.d/openmaptiles.conf" during parsing
2018-05-09 09:20:18,382 INFO Creating socket tcp://localhost:8081
2018-05-09 09:20:18,383 INFO Closing socket tcp://localhost:8081
2018-05-09 09:20:18,399 INFO RPC interface 'supervisor' initialized
2018-05-09 09:20:18,399 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2018-05-09 09:20:18,399 INFO supervisord started with pid 1
2018-05-09 09:20:19,402 INFO spawned: 'wizard' with pid 11
2018-05-09 09:20:19,405 INFO spawned: 'xvfb' with pid 12
2018-05-09 09:20:20,407 INFO success: wizard entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2018-05-09 09:20:20,407 INFO success: xvfb entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
Starting OpenMapTiles Map Server (action: run)
Existing configuration found in /data/config.json
Data file "undefined" not found!
Starting installation...
Installation wizard started at http://:::80/
List of available downloads ready.
And I guess maybe its this undefined in the config the one that is causing problems:
Existing configuration found in /data/config.json
Data file "undefined" not found!
This is my config file:
root#maptiles-0:/# cat /data/config.json
{
"styles": {
"standard": [
"dark-matter",
"klokantech-basic",
"osm-bright",
"positron"
],
"custom": [],
"lang": "",
"langLatin": true,
"langAlts": true
},
"settings": {
"serve": {
"vector": true,
"raster": true,
"services": true,
"static": true
},
"raster": {
"format": "PNG_256",
"hidpi": 2,
"maxsize": 2048
},
"server": {
"title": "",
"redirect": "",
"domains": []
},
"memcache": {
"size": 23.5,
"servers": [
"localhost:11211"
]
}
}
Should i mount a new volume somewhere else? should I change my /data/config.json? I have no idea how to make it ok for it to be killed
I fixed this using the image klokantech/tileserver-gl:v2.3.1. With this image, you can download the vector tiles in form of MBTiles file OpenMapTiles Downloads
you can find the instructions here: https://openmaptiles.org/docs/host/tileserver-gl/
Also: I deployed it to kubernetes using the following StatefulSet:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
labels:
name: maptiles
name: maptiles
spec:
replicas: 2
selector:
matchLabels:
name: maptiles
serviceName: maptiles
template:
metadata:
labels:
name: maptiles
spec:
containers:
- name: maptiles
command: ["/bin/sh"]
args:
- -c
- |
echo "[INFO] Startingcontainer"; if [ $(DOWNLOAD_MBTILES) = "true" ]; then
echo "[INFO] Download MBTILES_PLANET_URL";
rm /data/*
cd /data/
wget -q -c $(MBTILES_PLANET_URL)
echo "[INFO] Download finished";
fi; echo "[INFO] Start app in /usr/src/app"; cd /usr/src/app && npm install --production && /usr/src/app/run.sh;
env:
- name: MBTILES_PLANET_URL
value: 'https://openmaptiles.com/download/W...'
- name: DOWNLOAD_MBTILES
value: 'true'
livenessProbe:
failureThreshold: 120
httpGet:
path: /health
port: 80
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
ports:
- containerPort: 80
name: http
protocol: TCP
readinessProbe:
failureThreshold: 120
httpGet:
path: /health
port: 80
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: 500m
memory: 4Gi
requests:
cpu: 100m
memory: 2Gi
volumeMounts:
- mountPath: /data
name: maptiles
volumeClaimTemplates:
- metadata:
creationTimestamp: null
name: maptiles
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 60Gi
storageClassName: standard
I first deploy it with DOWNLOAD_MBTILES='true' and after I change it to DOWNLOAD_MBTILES='false' (so it doesnt clean up the map next time it is deployed).
I tested it and when it has DOWNLOAD_MBTILES='false', you can kill the containers and they start again in a minute or so.
I am facing an error while deploying Airflow on Kubernetes (precisely this version of Airflow https://github.com/puckel/docker-airflow/blob/1.8.1/Dockerfile) regarding writing permissions onto the filesystem.
The error displayed on the logs of the pod is:
sed: couldn't open temporary file /usr/local/airflow/sed18bPUH: Read-only file system
sed: -e expression #1, char 131: unterminated `s' command
sed: -e expression #1, char 118: unterminated `s' command
Initialize database...
sed: couldn't open temporary file /usr/local/airflow/sedouxZBL: Read-only file system
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/airflow/configuration.py", line 769, in
....
with open(TEST_CONFIG_FILE, 'w') as f:
IOError: [Errno 30] Read-only file system: '/usr/local/airflow/unittests.cfg'
It seems that the filesystem is read-only but I do not understand why it is. I am not sure if it is a Kubernetes misconfiguration (do I need a special RBAC for pods ? No idea) or if it is a problem with the Dockerfile.
The deployment file looks like the following:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: airflow
namespace: test
spec:
replicas: 1
revisionHistoryLimit: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
template:
metadata:
labels:
app: airflow
spec:
restartPolicy: Always
containers:
- name: webserver
image: davideberdin/docker-airflow:0.0.4
imagePullPolicy: Always
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 50m
memory: 128Mi
securityContext: #does not have any effect
runAsUser: 0 #does not have any effect
ports:
- name: airflow-web
containerPort: 8080
args: ["webserver"]
volumeMounts:
- name: airflow-config-volume
mountPath: /usr/local/airflow
readOnly: false #does not have any effect
- name: airflow-logs
mountPath: /usr/local/logs
readOnly: false #does not have any effect
volumes:
- name: airflow-config-volume
secret:
secretName: airflow-config-secret
- name: airflow-parameters-volume
secret:
secretName: airflow-parameters-secret
- name: airflow-logs
emptyDir: {}
Any idea how I can make the filesystem writable? The container is running as USER airflow but I think that this user has root privileges.
Since kubernetes version 1.9 and forth, volumeMounts behavior on secret, configMap, downwardAPI and projected have changed to Read-Only by default.
A workaround to the problem is to create an emtpyDir volume and copy the contents into it and execute/write whatever you need.
this is a small snippet to demonstrate.
initContainers:
- name: copy-ro-scripts
image: busybox
command: ['sh', '-c', 'cp /scripts/* /etc/pre-install/']
volumeMounts:
- name: scripts
mountPath: /scripts
- name: pre-install
mountPath: /etc/pre-install
volumes:
- name: pre-install
emptyDir: {}
- name: scripts
configMap:
name: bla
Merged PR which causes this break :(
https://github.com/kubernetes/kubernetes/pull/58720
volumeMounts:
- name: airflow-config-volume
mountPath: /usr/local/airflow
volumes:
- name: airflow-config-volume
secret:
secretName: airflow-config-secret
Is the source of your problems, for two reasons: first, you have smashed the airflow user's home directory by volume mounting your secret onto the image directly into a place where the image expects a directory owned by airflow.
Separately, while I would have to fire up a cluster to confirm 100%, I am pretty sure that Secret volume mounts -- and I think their ConfigMap friends -- are read-only projections into the Pod filesystems; that suspicion certainly appears to match your experience. There is certainly no expectation that changes to those volumes propagate back up into the kubernetes cluster, so why pretend otherwise.
If you want to continue to attempt such a thing, you do actually have influence over the defaultMode of the files projected into that volumeMount, so you could set them to 0666, but caveat emptor for sure. The short version is, by far, not to smash $AIRFLOW_HOME with a volume mount.
I have two Docker containers running Flask and redis each that communicate well when linked using Docker container linking.
I am trying to deploy the same on Kubernetes using services and pods but, it's not working. I am learning Kubernetes so I must be doing something wrong here.
Below are the Docker commands that work well:
$ docker run -d --name=redis -v /opt/redis:/redis -p 6379 redis_image redis-server
$ docker run -d -p 5000:5000 --link redis:redis --name flask flask_image
The kubernetes pod and services files are as below:
pod-redis.yaml
apiVersion: v1
kind: Pod
metadata:
name: redis
labels:
name: redis
app: redis
spec:
containers:
- name: redis
image: dharmit/redis
command:
- "redis-server"
volumeMounts:
- mountPath: /redis
name: redis-store
volumes:
- name: redis-store
hostPath:
path: /opt/redis
service-redis.yaml
apiVersion: v1
kind: Service
metadata:
name: redis
labels:
name: redis
spec:
ports:
- port: 6379
targetPort: 6379
selector:
app: redis
pod-flask.yaml
apiVersion: v1
kind: Pod
metadata:
name: flask
labels:
name: flask
app: flask
spec:
containers:
- name: flask
image: dharmit/flask
ports:
- containerPort: 5000
service-flask.yaml
apiVersion: v1
kind: Service
metadata:
name: flask
labels:
name: flask
spec:
ports:
- port: 5000
selector:
app: flask
When I do kubectl create -f /path/to/dir/ all services and pods start up fine and get listed by kubectl commands. But when I try to access the port 5000, Flask app complains that it cannot communicate with redis container. Below are the service related outputs:
flask service
Name: flask
Namespace: default
Labels: name=flask
Selector: app=flask
Type: ClusterIP
IP: 10.254.155.179
Port: <unnamed> 5000/TCP
Endpoints: 172.17.0.2:5000
Session Affinity: None
No events.
redis service
Name: redis
Namespace: default
Labels: name=redis
Selector: app=redis
Type: ClusterIP
IP: 10.254.153.217
Port: <unnamed> 6379/TCP
Endpoints: 172.17.0.1:6379
Session Affinity: None
No events.
And the output of curl command:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
return self.wsgi_app(environ, start_response)
File "/usr/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
response = self.make_response(self.handle_exception(e))
File "/usr/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
reraise(exc_type, exc_value, tb)
File "/usr/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "/usr/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/app/app/views.py", line 9, in index
if r.get("count") == None:
File "/usr/lib/python2.7/site-packages/redis/client.py", line 863, in get
return self.execute_command('GET', name)
File "/usr/lib/python2.7/site-packages/redis/client.py", line 570, in execute_command
connection.send_command(*args)
File "/usr/lib/python2.7/site-packages/redis/connection.py", line 556, in send_command
self.send_packed_command(self.pack_command(*args))
File "/usr/lib/python2.7/site-packages/redis/connection.py", line 532, in send_packed_command
self.connect()
File "/usr/lib/python2.7/site-packages/redis/connection.py", line 436, in connect
raise ConnectionError(self._error_message(e))
ConnectionError: Error -2 connecting to redis:6379. Name or service not known.
What am I doing wrong here?
You need to add containerPort in your pod-redis.yaml
- name: redis
image: dharmit/redis
command:
- "redis-server"
ports:
- containerPort: 6379
hostPort: 6379
You are trying to connect to redis:6379. But who is hostname redis? Probably not the pod that you just launched.
In order to use hostnames with pods and services, check if you can deploy sky-dns also in your cluster. For your case I presume that you only need to use the hostname of the redis service.
Edit
You don't want to connect to the pods directly, you want to use the service ip address for that.
So, you can use for connectivity the ip address of you service.
Or you can have hostnames for your services in order to connect to your pods.
For this the easiest way is to use kube-skydns. Read the documentation on how to deploy it and how to use it.