How to authenticate to Azurite using pyspark? - docker

I am building an application using two docker containers in the same network:
mcr.microsoft.com/azure-storage/azurite
jupyter/pyspark-notebook
Here is my docker-compose file:
version: "3.9"
services:
azurite:
image: mcr.microsoft.com/azure-storage/azurite:latest
ports:
- "10000:10000"
- "10001:10001"
- "10002:10002"
volumes:
- azurite_volume:/data
pyspark:
image: jupyter/pyspark-notebook:latest
ports:
- 10003:8888
user: root
working_dir: /home/${NB_USER}
environment:
- NB_USER=${NB_USER}
- CHOWN_HOME=yes
- GRANT_SUDO=yes
command: start-notebook.sh --NotebookApp.password="" --NotebookApp.token=""
volumes:
- /my/local/folder:/home/${NB_USER}/work
volumes:
azurite_volume:
driver: local
from the jupyter notebook I am trying to connect to and read data from azurite. Here is my code:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName('test') \
.config(
'fs.azure.account.key.devstoreaccount1.blob.core.windows.net',
'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==') \
.getOrCreate()
df = spark.read.json('wasb://my-container#devstoreaccount1/path/to/file.json')
However, this code returns an error:
org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: Unable to access container bronze in account devstoreaccount1 using anonymous credentials, and no credentials found for them in the configuration.
The container in azurite has already been set to "public" although it wouldn`t be necessary because I am providing the credential in the spark config. Even though, the error tells me that I am using anonymous credentials...
I am probably setting the credentials wrongly but I couldn't find anywhere how to set them properly.
How can I set up the credentials to be able to read from azurite using pyspark?

Related

Can't log MLflow artifacts to S3 with docker-based tracking server

I'm trying to set up a simple MLflow tracking server with docker that uses a mysql backend store and S3 bucket for artifact storage. I'm using a simple docker-compose file to set this up on a server and supplying all of the credentials through a .env file. When I try to run the sklearn_elasticnet_wine example from the mlflow repo here: https://github.com/mlflow/mlflow/tree/master/examples/sklearn_elasticnet_wine usingTRACKING_URI = "http://localhost:5005 from the machine hosting my tracking server, the run fails with the following error: botocore.exceptions.NoCredentialsError: Unable to locate credentials. I've verified that my environment variables are correct and available in my mlflow_server container. The runs show up in my backend store so the run only seems to be failing at the artifact logging step. I'm not sure why this isn't working. I've seen a examples of how to set up a tracking server online, including: https://towardsdatascience.com/deploy-mlflow-with-docker-compose-8059f16b6039. Some use minio also but others just specify their s3 location as I have. I'm not sure what I'm doing wrong at this point. Do I need to explicitly set the ARTIFACT_URI as well? Should I be using Minio? Eventually, I'll be logging runs to the server from another machine, hence the nginx container. I'm pretty new to all of this so I'm hoping it's something really obvious and easy to fix but so far the Google has failed me. TIA.
version: '3'
services:
app:
restart: always
build: ./mlflow
image: mlflow_server
container_name: mlflow_server
expose:
- 5001
ports:
- "5001:5001"
networks:
- internal
environment:
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
- AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION}
- AWS_S3_BUCKET=${AWS_S3_BUCKET}
- DB_USER=${DB_USER}
- DB_PASSWORD=${DB_PASSWORD}
- DB_PORT=${DB_PORT}
- DB_NAME=${DB_NAME}
command: >
mlflow server
--backend-store-uri mysql+pymysql://${DB_USER}:${DB_PASSWORD}#${DB_HOST}:${DB_PORT}/${DB_NAME}
--default-artifact-root s3://${AWS_S3_BUCKET}/mlruns/
--host 0.0.0.0
--port 5001
nginx:
restart: always
build: ./nginx
image: mlflow_nginx
container_name: mlflow_nginx
ports:
- "5005:80"
networks:
- internal
depends_on:
- app
networks:
internal:
driver: bridge
Finally figure this out. I didn't realize that the client also needed to have access to the AWS credentials for S3 storage.

Docker Compose Nextcloud privileges error

I´m trying to install on Raspberry Pi 4 a Nextcloud Docker following the next tutorial:
https://www.addictedtotech.net/installing-nextcloud-on-raspberry-pi-4/
version: '2'
services:
db:
image: yobasystems/alpine-mariadb:latest
command: --transaction-isolation=READ-COMMITTED --binlog-format=ROW
restart: always
volumes:
- /media/pi/Elements/nextclouddb:/var/lib/mysql
environment:
- MYSQL_ROOT_PASSWORD=YOURROOTPASSWORD
- MYSQL_PASSWORD=YOURPASSWORD
- MYSQL_DATABASE=nextcloud
- MYSQL_USER=nextcloud
- UID=1000
- GID=1000
app:
image: nextcloud
ports:
- 8181:80
links:
- db
volumes:
- /media/pi/Elements/nextcloud:/var/www/html
environment:
- UID=1000
- GID=1000
restart: always
After launch the stack, it appears an Interface error:
You don't have permission to access this resource.Server unable to read htaccess file, denying access to be safe.
I've checked the directories and where Nextcloud should is empty, so I think it could be a privileges thing, but the UID and GID are the 'pi' user number:
What can I try next?
Update: TI've tried to create the DB in the internal drive and I see that the Database is created (or updated?) by systemd-timesyncd user, which I don't know why appears. Maybe because the bridge between two containers?
Thanks again
Did you add the user pi to the docker group? To do so : sudo usermod -aG docker pi.
Then confirm with the groups command to check that pi is a member of the docker group.

bitnami parse server with docker-compose give blank screen after dashboard login

I'm trying to run bitnami parse-server docker images with docker-compose configuration created by bitnami (link) locally (for testing)
i run the code provided on their page with ubuntu 20.04
$ curl -sSL https://raw.githubusercontent.com/bitnami/bitnami-docker-parse/master/docker-compose.yml > docker-compose.yml
$ docker-compose up -d
the dashboard run fine from the browser on http://localhost/login, but after entering the user and pass the browser start loading then ends up with blank white screen.
cosole errors
cosole errors
here's the docker-compose code
version: '2'
services:
mongodb:
image: docker.io/bitnami/mongodb:4.2
volumes:
- 'mongodb_data:/bitnami/mongodb'
environment:
- ALLOW_EMPTY_PASSWORD=yes
- MONGODB_USERNAME=bn_parse
- MONGODB_DATABASE=bitnami_parse
- MONGODB_PASSWORD=bitnami123
parse:
image: docker.io/bitnami/parse:4
ports:
- '1337:1337'
volumes:
- 'parse_data:/bitnami/parse'
depends_on:
- mongodb
environment:
- PARSE_DATABASE_HOST=mongodb
- PARSE_DATABASE_PORT_NUMBER=27017
- PARSE_DATABASE_USER=bn_parse
- PARSE_DATABASE_NAME=bitnami_parse
- PARSE_DATABASE_PASSWORD=bitnami123
parse-dashboard:
image: docker.io/bitnami/parse-dashboard:3
ports:
- '80:4040'
volumes:
- 'parse_dashboard_data:/bitnami'
depends_on:
- parse
volumes:
mongodb_data:
driver: local
parse_data:
driver: local
parse_dashboard_data:
driver: local
What am i missing here ?
The parse-dashboard knows the parse backend through its docker-compose hostname parse.
So after login, the parse-dashboard (UI), will generate requests to that host http://parse:1337/parse/serverInfo based on the default parse backend hostname. More details about this here.
The problem is that your browser (host computer) doesn't know how to resolve the ip for the hostname parse. Hence the name resolution errors.
As a workaround, you can add an entry to your hosts file to have the parse hostname resolved to 127.0.0.1.
This post describes it well: Linked docker-compose containers making http requests

Docker shared volume is not readable for a container after changing volume contents

I have got following compose file where i'm sharing some generated html data from Jenkins container to the host drive and reading this data by Nginx container from the host drive. I'm using Ubuntu Server 18.04 on AWS.
The problem is that I can read contents of the jenkins/workspace/allure-report only once. After updating of the html data it becomes inaccessible for Nginx and it throws 403 status code.
I tried all the possible solutions but nothing works. The only ugly solution is to restart Nginx container after every html data updating. I don't like this way and looking for some inbuilt docker features to resolve this.
What didn't help: sharing volume straight between containers without using docker host drive, using rslave option, using docker separate volume that can be used as buffer between the two containers... I believe it should be much more easier!
version: '2'
services:
jenkins:
container_name: jenkins
image: "jenkins/jenkins"
ports:
- "8088:8080"
- "50000:50000"
env_file:
- variables.env
volumes:
- ./jenkins:/var/jenkins_home
selenoid:
container_name: selenoid
network_mode: bridge
image: "aerokube/selenoid"
# default directory for browsers.json is /etc/selenoid/
command: -listen :4444 -conf /etc/selenoid/browsers.json -video-output-dir /opt/selenoid/video/ -timeout 3m
ports:
- "4444:4444"
env_file:
- variables.env
volumes:
- $PWD:/etc/selenoid/ # assumed current dir contains browsers.json
- /var/run/docker.sock:/var/run/docker.sock
selenoid-ui:
container_name: selenoid-ui
network_mode: bridge
image: "aerokube/selenoid-ui"
links:
- selenoid
ports:
- "8080:8080"
env_file:
- variables.env
command: ["--selenoid-uri", "http://selenoid:4444"]
nginx:
container_name: nginx
image: "nginx"
ports:
- "80:80"
volumes:
- ./jenkins/workspace/allure-report:/usr/share/nginx/html:ro,rslave
Found the solution: the easiest way to get access to the dynamic data is to use volumes_from in that container you want to look from.
When I configured my compose file like that I faced another issue - the 403 status has gone but the data was static. But that was my fault, I didn't use "cp -r " command correctly so my data has been copied only once.

Docker Compose File cant get .env Variables

I am using docker-compose to run a traefik container. The Domain of this Container should be set by an environment file but everytime i start this service it says:
WARNING: The DOMAIN variable is not set. Defaulting to a blank string
My compose-file setup:
version: '3.5'
networks:
frontend:
name: frontend
backend:
name: backend
services:
Traefik:
image: traefik:latest
command: --api --docker --acme.email="test#test.de"
restart: always
container_name: Traefik
networks:
- backend
- frontend
env_file: ./env.env
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./traefik/traefik.toml:/traefik.toml
- ./traefik/acme.json:/acme.json
ports:
- "80:80"
- "443:443"
labels:
- "traefik.docker.network=frontend"
- "traefik.enable=true"
- "traefik.frontend.rule=Host:traefik.${DOMAIN}"
- "traefik.port=8080"
- "traefik.protocol=http"
My env.env file setup:
DOMAIN=fiture.de
Thanks for your Help!
env_file: ./env.env
The file env.env isn't loaded to parse the compose file, it is loaded to add environment variables within the container being run. At the point docker processes the above instruction, the yaml file has already been loaded and variables have been expanded.
If you are using docker-compose to deploy containers on a single node, you can rename the file .env and docker-compose will load variables from that file before parsing the compose file.
If you are deploying with docker stack deploy, then you need to import the environment variables into your shell yourself. An example of doing that in bash looks like:
set -a && . ./env.env && set +a && docker stack deploy ...

Resources