How does samza generate the container.id when the application is deployed in yarn? - apache-samza

Can someone let me know how does samza generates the samza.container.id / SAMZA_CONTAINER_ID when the application is deployed in yarn? I looked around in the samza code base but not able to locate the logic for the generation of the samza.container.id

In YARN environment, Samza uses YARN generated containerIds as environmental variables to set each container process's samza.container.id.
i.e. when containers are requested by Samza AM process in YARN, YARN RM will reply with a set of allocated container objects, which is of class org.apache.hadoop.yarn.api.records.Container. That's the resource class to uniquely identify a container in YARN and Container#getId().toString() is the container ID string we set to samza.container.id.
The code to get the container Id from YARN RM's response is in YarnClusterResourceManager#onContainersAllocated()

Related

Install Keycloak adapter on WILDFLY that depends on ENVs in standalone.xml

I am trying to install the Keycloak Adapter to my WILDFLY Application Server that runs as Docker Container. I am using the image jboss/wildfly:17.0.0.Final as base image. I am having a big trouble while building my actual own image.
My Dockerfile:
FROM jboss/wildfly:17.0.0.Final
ENV $WILDFLY_HOME /opt/jboss/wildfly
COPY keycloak-adapter.zip $WILDFLY_HOME
RUN unzip $WILDFLY_HOME/keycloak-adapter.zip -d $WILDFLY_HOME
# My standalone.xml that contains ENVs
COPY standalone.xml $WILDFLY_HOME/standalone/configuration/
# Here it crashes!
RUN $WILDFLY_HOME/bin/jboss-cli.sh --file=$WILDFLY_HOME/bin/adapter-elytron-install-offline.cli
The official documentation says:
Unzip the adapter zip file in $WILDFLY_HOME (/opt/jboss/wildfly) - I've done this, works.
In order to install the adapter (when server is offline) you need to execute ./bin/jboss-cli.sh --file=bin/adapter-elytron-install-offline.cli which basically starts the server (which is needed as you cant modify the configuration otherwise) and modifies the standalone.xml.
Here is the problem. My standalone.xml is parametrized with environment variables that are only set during runtime as it runs in multiple different environments. When the ENVs are not set, the server crashes and so does the command above.
The error during docker build at the last step:
Cannot start embedded server WFLYEMB0021: Cannot start embedded process: JBTHR00005: Operation failed WFLYSRV0056: Server boot has fialed in an unrecoverable manner.
The cause
Despite of the not very precise error message I have clearly identified the unset ENVs as the cause by running the container with bash, setting the required ENVs with some random values and executing the jboss-cli command again - and it worked.
I know that the docs say its also possible to configure when the server is running but this is not an option for me, i need this configured at docker build stage.
So the problem here is they provide an offline installation that fails if the standalone.xml depends on environment variables which are usually not set during docker build. Unfortunately, i could not find a way to tell the jboss cli to ignore unset environment variables.
Do you know any workaround?

Separating Docker files and application source files to optimize production environment

I have a bunch of (Ruby) scripts stored on a server. Up until now, my team has used them by opening an accessor app that launches a list of the script names, and they select the script they want to run in that instance on the files in their working folder. The scripts are run directly from the server, so updates made to the script files are automatically reflected when a user runs the script.
The scripts require a fair amount of specific dependencies, so I'm trying to move to a Docker-based workflow to eliminate the problems we encounter with incongruent computer environments. I've been able to successfully build an image with our script library and run an instance of it on my computer.
However, all of the documentation and tutorials include the application source files when building an image, so that all the files are copied over by the Dockerfile. From my understanding, this means that any time the code in the application files needs to be updated, all the users will need to rebuild the image before trying to run anything. I would very rarely ever need to make changes to the environment settings/dependencies, but the app code is changed relatively frequently, so it seems like having every user rebuild an image every single time a line of app code is changed would actually slow down everyone's workflow considerably.
My question is this: Is it not possible to have Docker simply create the environment that a user must have to run the applications, but have the applications themselves still run directly off the server where they were originally stored? And does a new container need to be created every single time a user wants to run any one of the scripts? (The users are not tech-savvy.)
Generally you'd do this by using a Docker image instead of the checked-out tree of scripts. You can use a Docker registry to store a built copy of the image somewhere on the network; Docker Hub works for this, most large public-cloud providers have some version of this (AWS ECR, Google GCR, Azure ACR, ...), or you can run your own. The workflow for using this would generally look like
# Get any updates to the "latest" version of the image
# (can be run infrequently)
docker pull ourorg/scripts
# Actually run the script, injecting config files and credentials
docker run --rm \
-v $PWD/config:/config \
-v $HOME/.ssh:/config/.ssh \
ourorg/scripts \
some_script.rb
# Nothing in this example actually requires a local copy of the scripts
I'm envisioning a directory that has kind of a mix of scripts and support files and not a lot of organization to it. Still, you could write a simple Dockerfile that looks like
FROM ruby:2.7
WORKDIR /opt/scripts
# As of Bundler 2.1, there is no compatibility between Bundler
# versions; this must match exactly what is in Gemfile.lock
RUN gem install bundler -v 2.1.4
# Copy the scripts in and do basic installation
COPY Gemfile Gemfile.lock .
RUN bundle install
COPY . .
ENV PATH /opt/scripts:$PATH
# Prefix all commands with...
ENTRYPOINT ["bundle", "exec"]
# The default command to run is...
CMD ["ls"]
On the back end you'd need a continous integration service (Jenkins is popular if a little unwieldy; there are a large selection of cloud-hosted ones) that can rebuild the Docker image whenever there's a commit to the source repository. You can generally rig this up so that it happens automatically whenever anybody pushes anything.
This process makes more sense of most people are just using the set of scripts and few of them are developing them. It's also a little bit difficult to discover what the scripts are (you might be able to docker run --rm ourorg/scripts ls though).
Is it not possible to have Docker simply create the environment that a user must have to run the applications, but have the applications themselves still run directly off the server where they were originally stored?
This always strikes me as an ineffective use of Docker. You have all of the fiddly steps of your current workflow that require everyone to run a git pull or equivalent routinely, but you also have to inject the host source tree into the container. If there are OS incompatibilities in, for example, native gems in the vendor tree, you have to work around that.
# You still need to do this periodically
git pull
# And you also need to
sudo docker run \
--rm \
-v $PWD:/app \
-v $HOME/config:/config \
-v $HOME/.ssh:/config/.ssh \
-w /app \
ruby:2.7 \
bundle exec ./some_script.rb
Some of these details (especially the config file and credentials) you'd have to deal with even if you did build an image; some others of the details you could improve by building an image. Inside the image you need to correct the ownership and permissions on the ssh keys and replace the $PWD/vendor tree with something the container can run, without modifying the mounted host directories.
Is it not possible to have Docker simply create the environment that a user must have to run the applications, but have the applications themselves still run directly off the server where they were originally stored?
You can build an image with all the environment already installed then mount the directory with the scripts so the container can read the scripts from the host. Something like
docker run -it --rm -v /opt/myscripts:/myscripts myimage somescript.rb
Then your image Dockerfile would end with:
WORKDIR /myscripts
ENTRYPOINT ["/usr/bin/ruby"]
And does a new container need to be created every single time a user wants to run any one of the scripts?
Of course, a container is just an isolated process managed by docker, you could make a wrapper so the users wouldn't need to type the full docker run command.

How do I use environment variables in a static site inside docker?

I have a react app built with webpack that I want to deploy inside a docker container. I'm currently using the DefinePlugin to pass the api url for the app along with some other environment variables into the app during the build phase. The relevant part of my webpack config looks something like:
plugins: [
new DefinePlugin({
GRAPHQL_API_URL: JSON.stringify(process.env.GRAPHQL_API_URL),
DEBUG: process.env.DEBUG,
...
}),
...
]
Since this strategy requires the environment variables at build time, my docker file is a little icky, since I need to actually put the webpack build call as part of the CMD command:
FROM node:10.16.0-alpine
WORKDIR /usr/app/
COPY . ./
RUN npm install
# EXPOSE and serve -l ports should match
EXPOSE 3000
CMD npm run build && npm run serve -- -l 3000
I'd love for the build step in webpack to be a layer in the docker container (a RUN command), so I could potentially clean out all the source files after the build succeeds, and so start up is faster. Is there a standard strategy for dealing with this issue of using information from the docker environment when you are only serving static files?
How do I use environment variables in a static site inside docker?
This question is broader than your specific problem I think. The generic answer to this is, you can't, by nature of the fact that the content is static. If you need the API URL to be dynamic and modifiable at runtime then there needs to be some feature to support that. I'm not familiar enough with webpack to know if this can work but there is a lot of information at the following link that might help you.
Passing environment-dependent variables in webpack
Is there a standard strategy for dealing with this issue of using information from the docker environment when you are only serving static files?
If you are happy to have the API URL baked into the image then the standard strategy with static content in general is to use a multistage build. This generates the static content and then copies it to a new base image, leaving behind any dependencies that were required for the build.
https://docs.docker.com/develop/develop-images/multistage-build/

Security of a Docker image

I am considering to package a Rust application into a Docker container.
The current version of that application contains various credential files used to register to Discord API or Google API through a service account key.
Would these files be accessibles if I package my application as such?
[EDIT: added Dockerfile]
FROM rust:1.28.0
WORKDIR /usr/src/<application>
COPY . .
RUN cargo install --force --path .
CMD ["<application>"]
Never put actual credentials into anything that might not be accessed by you and only you.
You basically have two options:
1) Have your application pull the required credentials from its environment, then set these variables when you start the container. see docs
2) Have your application read the credentials from a config file, that doesn't get pulled into the docker image. Then, when running the container, mount that file into it, see docs
You could actually do both: Have an environment variable that tells your application whether it should look for a config file ( maybe in production) and if that variable is unset, check the environment (for development).
Edit: It's best practice to create a .dockerignore File in your build-context, containing the name (or path) of the file holding the credentials.

Use custom modules with bitnami/magento container on kubernetes

I have a custom module that I want to install on a container running the bitnami/magento docker image within a kubernetes cluster.
I am currently trying to install the module from a local dir into the containers Dockerfile:
# run bitnami's magento container
FROM bitnami/magento:2.2.5-debian-9
# add magento_code directory to the bitnami magento install
# ./magento_data/code contains the module, i.e. Foo/Bar
ADD ./magento_data/code /opt/bitnami/magento/htdocs/app/code
After building and running this image the site pings back a 500 error. The pod logs show that Magento installs correctly but it doesn't know what to do with the custom module:
Exception #0 (UnexpectedValueException): Setup version for module 'Foo_Bar' is not specified
Therefore to get things working I have to open a shell to the container and run some commands:
$ php /opt/bitnami/magento/htdocs/bin/magento setup:upgrade
$ chown -R bitnami:daemon /opt/bitnami/magento/htdocs
The first sorts the magento set up issue, the second ensures the next time an http request comes in Magento is able to correctly generate any directories and files it needs.
This gives me a functioning container, however, kubernetes is not able to rebuild this container as I am manually running a bunch of commands after Magento has installed.
I thought about running the above commands within the containers readinessProbe however not sure if it would work as not 100% on the state of Magento when that is first called, alongside it seeming very hacky.
Any advice on how to best set up custom modules within a bitnami/magento container would be much appreciated.
UPDATE:
Since opening this issue I've been discussing it further on Github: https://github.com/bitnami/bitnami-docker-magento/issues/82
I've got it working via the use of composer instead of manually adding the module to the app/code directory.
I was able to do this by firstly adding the module to Packagist, then I stored my Magento Marketplace authentication details in auth.json:
{
"http-basic": {
"repo.magento.com": {
"username": <MAGENTO_MARKETPLACE_PUBLIC_KEY>,
"password": <MAGENTO_MARKETPLACE_PRIVATE_KEY>
}
}
}
You can get the public & private key values by creating a new access key within marketplace. Place the file in the modules root, alongside your composer.json.
Once I had that I updated my Dockerfile to use the auth.json and require the custom module:
# run bitnami's magento container
FROM bitnami/magento:2.2.5
# Require custom modules
WORKDIR /opt/bitnami/magento/htdocs/
ADD ./auth.json .
RUN composer require foo/bar
I then completed a new install, creating the db container alongside the magento container. However it should also work fine with an existing db so long as the modules versions are the same.

Resources