I would like to inherit environment variables in GNU Parallel. I have several 'scripts' (really just lists of commands, designed for use with GNU Parallel) with hundreds of lines each that all call different external programs. However, these external program (out of my control) requires that several environment variables be set before they will even run.
Setting/exporting them locally doesn't seem to help, and I don't see any way to add this information to a profile.
The documentation doesn't seem to have anything this, and similar SO pages suggest wrapping the command in a script. However, this seems like an inelegant solution. Is there a way to export the current environment, or perhaps specify the required variables in a script?
Thanks!
This works for me:
FOO="My brother's 12\" records"
export FOO
parallel echo 'FOO is "$FOO" Process id $$ Argument' ::: 1 2 3
To make it work for remote connections (through ssh) you need to quote the variable for shell expansion. parallel --shellquote can help you do that:
parallel -S server export FOO=$(parallel --shellquote ::: "$FOO")\;echo 'FOO is "$FOO" Process id $$ Argument' ::: 1 2 3
If that does not solve your issue, please consider showing an example that does not work.
-- Edit --
Look at --env introduced in version 20121022
-- Edit --
Look at env_parallel introduced in 20160322.
Related
Is there a way to get all environment variables that a Docker image accepts? Including authentications and all possible ones to make the best out of that image?
For example, I've run a redis:7.0.8 container and I want to use every possible feature this image offers.
First I used docker inspect and saw this:
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"GOSU_VERSION=1.16",
"REDIS_VERSION=7.0.8",
"REDIS_DOWNLOAD_URL=http://download.redis.io/releases/redis-7.0.8.tar.gz",
"REDIS_DOWNLOAD_SHA=06a339e491306783dcf55b97f15a5dbcbdc01ccbde6dc23027c475cab735e914"
],
I also tried docker exec -it my-container env which just showed me the same thing. I know there are more variables, for example this doesn't include the following:
REDIS_PASSWORD
REDIS_ACLS
REDIS_TLS_CERT_FILE
Absent documentation, this is pretty much impossible.
Let's start by repeating #jonrsharpe's comment:
They accept any env var at all, but they won't respond to all of them.
Consider this Python code, for example:
import os
def get_environ(d, name):
d.get(name, 'absent')
foo = os.environ.get('FOO', 'default_foo')
star_foo = get_environ(os.environ, foo)
print(star_foo)
This fragment looks up an environment variable $FOO. You could probably figure that out, if you knew the main process was in Python and recognized os.environ. But then it passes that value and the standard environment to a helper function, which looks up that environment variable by name. You'd need detailed static analysis to understand this is actually also an environment-variable lookup.
$ ./test.py
absent
$ default_foo=bar ./test.py
bar
$ FOO=BAR BAR=quux ./test.py
quux
$ I=3 ./test.py
absent
(A fair bit of the code I work with accesses environment variables rather haphazardly; it's not just "find the main function" but "find every ENV reference in every file in every library". Some frameworks like Spring Boot make it possible to set hundreds of configuration options via environment variables, and even if it were possible to get every possible setting here, the output would be prohibitive.)
"What environment variables are there" isn't standard container metadata. You'd have to identify the language the main container process runs, and do this sort of analysis on it, including compiled languages. That doesn't seem like a solvable problem.
I have a multi-process web app. The processes are contributed by different buildpacks. The default process will start the web application. I have a use case in which a given shell script should be executed before the default process invocation.
I have tried the following approach;
Create a custom-buildpack
Create a script that needs to be executed and invoke the web process in it.
Create a new process based on the above shell sciprt by specifying it in launch.toml definition
Make the buildpack launchable
The entrypoint.sh
#!/usr/bin/env bash
# Some fancy stuff..
#Invoke the web process
/cnb/process/web
Create lauch.toml from the build script of custom-buildpack. Make the entrypoint process the default one.
cat > "$layers_dir/launch.toml" << EOL
[[processes]]
type = "entrypoint"
command = "bash"
args = ["$scriptlayer/bin/entrypoint.sh"]
default = true
EOL
echo -e '[types]\nlaunch = true' > "$layers_dir/assembly-scripts.toml"
Truncated pack inspect-image output
Processes:
TYPE SHELL COMMAND ARGS
entrypoint (default) bash bash /layers/gw_assembly-scripts/assembly-scripts/bin/entrypoint.sh
task bash catalina.sh run
tomcat bash catalina.sh run
web bash catalina.sh run
Is there any better CNB native approach to achieve this use case?
You have a couple of options here:
The simplest option would be to add a .profile script to the root of your application. It's a bash script, so anything you can write in bash can be done there, however, it's primarily for initializing your app and setting additional env variables.
This file runs prior to the command in your process type. I looked for documentation on this behavior, but only found it briefly mentioned in the buildpacks spec.
As an example, if I put .profile in the root of my application and inside that file, I write echo 'Hello World!'. I'll see Hello World! printed before any of my process types execute.
If you want to create a buildpack, you can achieve something similar to the .profile script by having your buildpack include an exec.d binary.
This is a binary that's part of your launch image and gets run prior to any of your process types. It allows you to take actions to initialize an application and set additional environment variables dynamically before your application starts.
This mechanism is often used by buildpack authors to provide dynamic behavior at runtime based on changes to environment variables or Kubernetes service bindings. For example, turning on/off features like APM tools, debugging, and metrics.
A few other miscellaneous notes.
Neither of the options above allows you to change the actual process type. The process type that will be executed is selected prior to these options (.profile and exec.d) running and you cannot influence that from within. You can only use them to run things prior to the process type running.
The buildpack spec does not allow for a buildpack to modify the process types for another buildpack. So you cannot create a buildpack that wraps or modifies process types set by another buildpack. That said, a buildpack can override the process types set by another buildpack. Buildpacks that are later in the order group will override earlier buildpacks.
From the spec: A combined processes list derived from all launch.toml files such that process types from later buildpacks override identical process types from earlier buildpacks.
With buildpacks, the entrypoint is always the launcher. The launcher is a process that runs and implements the application side of the buildpack specification. It runs .profile, exec.d binaries, sets up buildpack provide environment variables and eventually launch the specified process type.
If you override the entrypoint for a container then the launcher won't run and none of the things it is supposed to do will happen. Sometimes this is desired, like if you're troubleshooting, but usually you want the launcher to be the entrypoint.
I just need a hint. I am trying to run the following command from the GNU parallel tutorial (GNU Parallel tutorial):
parallel -S $SERVER1,$SERVER2 echo ::: running on more hosts
I replaced $SERVERX with known hosts in my network. If I execute the command I'm getting asked for my password for each server and after that nothing happens anymore. The curser blinks all day long and I do not get any error message.
I tried different servers with the same result.
The verbose mode shows:
ssh $SERVER1 -- exec perl -e #GNU_Parallel\\=split/_/,\\"use_IPC::Open3\\;_use_MIME::Base64\\"\\;eval\\"#GNU_Parallel\\"\\;\\$SIG\{CHLD\}\\=\\"IGNORE\\"\\;my\\$zip\\=\(grep\{-x\\$_\}\\"/usr/local/bin/bzip2\\"\)\[0\]\\|\\|\\"bzip2\\"\\;open3\(\\$in,\\$out,\\"\>\\&STDERR\\",\\$zip,\\"-dc\\"\)\\;if\(my\\$perlpid\\=fork\)\{close\\$in\\;\\$eval\\=join\\"\\",\\<\\$out\>\\;close\\$out\\;\}else\{close\\$out\\;print\\$in\(decode_base64\(join\\"\\",#ARGV\)\)\\;close\\$in\\;exit\\;\}wait\\;eval\\$eval\\;
and Followed by random characters
Something similar appears four times. I guess for the four jobs I started. I'd be very happy for help.
I think you are expected to set up passwordless ssh logins to all the remotes so GNU Parallel can get into them. – Mark Setchell
This was the right suggestion. Setting up key authentication using ssh-keygen and ssh-copy-id did the job! Thank you very much now it works. A short hint in the tutorial would have been great.
I am new to lua script features.
I tried using,
os.execute("export MY_VAR=10")
io.popen("export MY_VAR=10")
from lua script.
I try reading MY_VAR variable from shell using echo $MY_VAR after lua script is executed but I do not see MY_VAR getting set to 10.
How do we set the environment variable using lua script?
Your problem isn't a lua problem. Your problem is misunderstanding how process environments work.
Every time you run os.execute or io.popen you are running a new process with new environment.
So while you may be correctly setting MY_VAR in that processes environment (and it would affect any processes run as children processes of that process) it doesn't survive beyond the death of the launched process and so cannot be seen by any other processes.
If you want to affect the lua process's environment (which would then, in turn, affect the environment's of processes run by lua) then you need a binding to the setenv system function (which lua itself doesn't provide as it doesn't pass the clean C test that lua uses for things it includes).
I wrote a script that does maintenance tasks for a rails application. The script uses a class that uses models defined in the application. Just an example, let's say application defines model User, and my class (used within the script), sends messages to it, like User.find id.
I am looking for ways to optimize this script, because right now it has to load the application environment: require '../config/environment'. This takes ~15 seconds.
Had the script not use application codebase to do its job, I could have replaced model abstractions with raw SQL. But unfortunatly I can't do that because I would have to repeat the code in the script that is already present in the codebase. Not only would this violate DRY principle and require alot of work, the script would not be very maintainable, in case the model methods that I am using change.
I would like to hear ideas how to approach this problem. The script is not run from the application itself, but from the shell (with Capistrano for instance).
I hope I've described the problem clear enough. Thank you.
Could you write a little daemon that is in a read on a pipe (or named fifo, or unix domain socket, or, with more complexity, a tcp port) that accepts 'commands' that would be run on your database?
#!/usr/bin/ruby
require '../config/environment'
while (true) do
File.open("/tmp/fifo", "r") do |f|
f.each_line do |line|
case line
when "cleanup" then puts "clean!"
when "publish" then puts "published!"
else puts "invalid command, ignoring"
end
end
end
end
You could start this thing up with vixie cron's #reboot specifier, or you could run it via capistrano commands, or run it out of init or init scripts. Then you write your capistrano rules (that you have now) to simply echo commands into the fifo:
First,
mkfifo /tmp/fifo
In one terminal:
$ ./env.rb
In another terminal:
$ echo -n "cleanup" > /tmp/fifo
$ echo -n "publish" > /tmp/fifo
$ echo -n "go away" > /tmp/fifo
The output in the first terminal looks like this:
clean!
published!
invalid command, ignoring
You could make the matching as friendly (perhaps allow plain echo, rather than require echo -n as my example does) or unfriendly as you want. And the commands that get run can of course call into your model files to do their work.
Please make sure you choose a good location for your fifo -- /tmp/ is probably a bad place, as many distributions clear it on reboot. Also make sure you set the fifo owner and permission (chown and chmod) appropriately for your application -- you might not want to allow your Firefox's flash plugin to write to this file and command your database.