Passing a complex shell script via docker exec sh -c "..." - docker

I have a script that works fine in sh on a linux host as well as inside an alpine container. But when I try executing that using docker exec <containerID> sh -c "<script>" it misbehaves. The script's function is to output stuff similar to ps.
systick=$(getconf CLK_TCK); for c in /proc/*/cmdline; do d=$(dirname $c); name=$(grep Name: $d/status); pid=$(basename $d); uid=$(grep Uid: $d/status); uid=$(echo ${uid#Uid:} | xargs); uid=${uid%% *}; user=$(grep :$uid:[0-9] /etc/passwd); user=${user%%:*}; cmdline=$(cat $c|xargs -0 echo); starttime=$(($(awk '{print $22}' $d/stat) / systick)); uptime=$(awk '{print int($1)}' /proc/uptime); elapsed=$(($uptime-$starttime)); echo $pid $user $elapsed $cmdline; done
EDIT: sh -c "<script>" has the same behavior.

You are not able to run this script from docker exec because the variables will be interpolated before they sent to the container (i.e., you are going to get values from your local machine, not from within the container).
In order to run it as you wish, you need to replace $ with \$ for every occurrence of $ in your script.
What might work better is to put your script into a file, then map the file to a location within the container, using -v (i.e., -v script.sh:/path/to/script.sh), and call the script via docker exec /path/to/script.sh

Part 1: A Working Answer
A Working One-Liner (Quoted For Use By Docker)
getProcessDataDef='shellQuoteWordsDef='"'"'shellQuoteWords() { sq="'"'"'"'"'"'"'"'"'"; dq='"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'; for arg; do printf "'"'"'"'"'"'"'"'"'%s'"'"'"'"'"'"'"'"' " "$(printf '"'"'"'"'"'"'"'"'%s\n'"'"'"'"'"'"'"'"' "$arg" | sed -e "s#${sq}#${sq}${dq}${sq}${dq}${sq}#g")"; done; printf '"'"'"'"'"'"'"'"'\n'"'"'"'"'"'"'"'"'; }'"'"'; shellQuoteNullSeparatedStream() { xargs -0 sh -c "${shellQuoteWordsDef};"'"'"' shellQuoteWords "$#"'"'"' _; }; getProcessData() { systick=$(getconf CLK_TCK); for c in /proc/*/cmdline; do d=${c%/*}; pid=${d##*/}; name=$(awk '"'"'/^Name:/ { print $2 }'"'"' <"$d"/status); uid=$(awk '"'"'/^Uid:/ { print $2 }'"'"' <"$d"/status); pwent=$(getent passwd "$uid"); user=${pwent%%:*}; cmdline=$(shellQuoteNullSeparatedStream <"$c"); starttime=$(awk -v systick="$systick" '"'"'{print int($22 / systick)}'"'"' "$d"/stat); uptime=$(awk '"'"'{print int($1)}'"'"' /proc/uptime); elapsed=$((uptime-starttime)); echo "$pid $user $elapsed $cmdline"; done; }; getProcessData'
sh -c "$getProcessDataDef" # or docker exec <container> sh -c "$getProcessDataDef"
A Working One-Liner (Before Quoting/Escaping)
shellQuoteWordsDef='shellQuoteWords() { sq="'"'"'"; dq='"'"'"'"'"'; for arg; do printf "'"'"'%s'"'"' " "$(printf '"'"'%s\n'"'"' "$arg" | sed -e "s#${sq}#${sq}${dq}${sq}${dq}${sq}#g")"; done; printf '"'"'\n'"'"'; }'; shellQuoteNullSeparatedStream() { xargs -0 sh -c "${shellQuoteWordsDef};"' shellQuoteWords "$#"' _; }; getProcessData() { systick=$(getconf CLK_TCK); for c in /proc/*/cmdline; do d=${c%/*}; pid=${d##*/}; name=$(awk '/^Name:/ { print $2 }' <"$d"/status); uid=$(awk '/^Uid:/ { print $2 }' <"$d"/status); pwent=$(getent passwd "$uid"); user=${pwent%%:*}; cmdline=$(shellQuoteNullSeparatedStream <"$c"); starttime=$(awk -v systick="$systick" '{print int($22 / systick)}' "$d"/stat); uptime=$(awk '{print int($1)}' /proc/uptime); elapsed=$((uptime-starttime)); echo "$pid $user $elapsed $cmdline"; done; }; getProcessData "$#"
What Went Into That One-Liner
shellQuoteWordsDef='shellQuoteWords() { sq="'"'"'"; dq='"'"'"'"'"'; for arg; do printf "'"'"'%s'"'"' " "$(printf '"'"'%s\n'"'"' "$arg" | sed -e "s#${sq}#${sq}${dq}${sq}${dq}${sq}#g")"; done; printf '"'"'\n'"'"'; }'
shellQuoteNullSeparatedStream() {
xargs -0 sh -c "${shellQuoteWordsDef};"' shellQuoteWords "$#"' _
}
getProcessData() {
systick=$(getconf CLK_TCK)
for c in /proc/*/cmdline; do
d=${c%/*}; pid=${d##*/}
name=$(awk '/^Name:/ { print $2 }' <"$d"/status)
uid=$(awk '/^Uid:/ { print $2 }' <"$d"/status)
pwent=$(getent passwd "$uid")
user=${pwent%%:*}
cmdline=$(shellQuoteNullSeparatedStream <"$c")
starttime=$(awk -v systick="$systick" '{print int($22 / systick)}' "$d"/stat)
uptime=$(awk '{print int($1)}' /proc/uptime)
elapsed=$((uptime-starttime))
echo "$pid $user $elapsed $cmdline"
done
}
What Went Into The Shell-Quoting Helper Used By That One-Liner
To allow easier reading and editing, the function stringified above looks like:
# This is the function we're including in our code passed to xargs in-band above:
shellQuoteWords() {
sq="'"; dq='"'
for arg; do
printf "'%s' " "$(printf '%s\n' "$arg" | sed -e "s#${sq}#${sq}${dq}${sq}${dq}${sq}#g")"
done
printf '\n'
}
Part 2: How That Answer Was Created
Python has an excellent shlex.quote() function (or pipes.quote() in Python 2) that can be used to generate a shell-quoted version of a string. In this context, that can be used as follows:
Python 3.7.6 (default, Feb 27 2020, 15:15:00)
[Clang 7.1.0 (tags/RELEASE_710/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s = r'''
... shellQuoteWords() {
... sq="'"; dq='"'
... for arg; do
... printf "'%s' " "$(printf '%s\n' "$arg" | sed -e "s#${sq}#${sq}${dq}${sq}${dq}${sq}#g")"
... done
... printf '\n'
... }
... '''
>>> import shlex
>>> print(shlex.quote(s))
'
shellQuoteWords() {
sq="'"'"'"; dq='"'"'"'"'"'
for arg; do
printf "'"'"'%s'"'"' " "$(printf '"'"'%s\n'"'"' "$arg" | sed -e "s#${sq}#${sq}${dq}${sq}${dq}${sq}#g")"
done
printf '"'"'\n'"'"'
}
'
That result is itself a perfectly valid string in shell. That is to say, one can run:
s='
shellQuoteWords() {
sq="'"'"'"; dq='"'"'"'"'"'
for arg; do
printf "'"'"'%s'"'"' " "$(printf '"'"'%s\n'"'"' "$arg" | sed -e "s#${sq}#${sq}${dq}${sq}${dq}${sq}#g")"
done
printf '"'"'\n'"'"'
}
'
eval "$s"
shellQuoteWords "hello world" 'hello world' "hello 'world'" 'hello "world"'
...and get completely valid output.
The same process was followed to generate a string that evaluated to the definition of getProcessData.

Related

Issue passing single quote as parameter from Jenkins to Ansible

The string parameter in jenkins dest_cmd works fine and gets passed and read by ansible as below:
Jenkins pipeline script:
ansible-playbook -i /web/runcmd/allmwhosts.hosts /web/runcmd/copyfiles.yml -e dest_user=$dest_user -e '{ dest_cmd: $dest_cmd }' --tags validate"
The above works fine for all dest_cmd parameters however, it fails when the user enters single quotes ' as you can see below:
[Pipeline] sh
+ ansible-playbook -i /web/runcmd/allmwhosts.hosts /web/runcmd/copyfiles.yml -e dest_user=wluser -e '{ dest_cmd: arp `hostname` | cut -d' ' -f4 }' --tags validate
usage: ansible-playbook [-h] [--version] [-v] [-k]
[--private-key PRIVATE_KEY_FILE] [-u REMOTE_USER]
[-c CONNECTION] [-T TIMEOUT]
[--ssh-common-args SSH_COMMON_ARGS]
[--sftp-extra-args SFTP_EXTRA_ARGS]
[--scp-extra-args SCP_EXTRA_ARGS]
[--ssh-extra-args SSH_EXTRA_ARGS] [--force-handlers]
Can you please suggest how to resolve this issue?
The problem is the use of quotes within the single-quoted string:
'{ dest_cmd: arp `hostname` | cut -d' ' -f4 }'
problem quotes ^ ^
Try using double quotes:
'{ dest_cmd: arp `hostname` | cut -d" " -f4 }'

cron not running in alpine docker

I have created and added below entry in my entry-point.sh for docker file.
# start cron
/usr/sbin/crond &
exec "${DIST}/bin/ss" "$#"
my crontab.txt looks like below:
bash-4.4$ crontab -l
*/5 * * * * /cleanDisk.sh >> /apps/log/cleanDisk.log
So when I run the docker container, i don't see any file created called as cleanDisk.log.
I have setup all permissions and crond is running as a process in my container see below.
bash-4.4$ ps -ef | grep cron
12 sdc 0:00 /usr/sbin/crond
208 sdc 0:00 grep cron
SO, can anyone, guide me why the log file is not getting created?
my cleanDisk.sh looks like below. Since it runs for very first time,and it doesn't match all the criteria, so I would expect at least to print "No Error file found on Host $(hostname)" in cleanDisk.log.
#!/bin/bash
THRESHOLD_LIMIT=20
RETENTION_DAY=3
df -Ph /apps/ | grep -vE '^Filesystem|tmpfs|cdrom' | awk '{ print $5,$1 }' | while read output
do
#echo $output
used=$(echo $output | awk '{print $1}' | sed s/%//g)
partition=$(echo $output | awk '{print $2}')
if [ $used -ge ${THRESHOLD_LIMIT} ]; then
echo "The partition \"$partition\" on $(hostname) has used $used% at $(date)"
FILE_COUNT=$(find ${SDC_LOG} -maxdepth 1 -mtime +${RETENTION_DAY} -type f -name "sdc-*.sdc" -print | wc -l)
if [ ${FILE_COUNT} -gt 0 ]; then
echo "There are ${FILE_COUNT} files older than ${RETENTION_DAY} days on Host $(hostname)."
for FILENAME in $(find ${SDC_LOG} -maxdepth 1 -mtime +${RETENTION_DAY} -type f -name "sdc-*.sdc" -print);
do
ERROR_FILE_SIZE=$(stat -c%s ${FILENAME} | awk '{ split( "B KB MB GB TB PB" , v ); s=1; while( $1>1024 ){ $1/=1024; s++ } printf "%.2f %s\n", $1, v[s] }')
echo "Before Deleting Error file ${FILENAME}, the size was ${ERROR_FILE_SIZE}."
rm -rf ${FILENAME}
rc=$?
if [[ $rc -eq 0 ]];
then
echo "Error log file ${FILENAME} with size ${ERROR_FILE_SIZE} is deleted on Host $(hostname)."
fi
done
fi
if [ ${FILE_COUNT} -eq 0 ]; then
echo "No Error file found on Host $(hostname)."
fi
fi
done
edit
my docker file looks like this
FROM adoptopenjdk/openjdk8:jdk8u192-b12-alpine
ARG SDC_UID=20159
ARG SDC_GID=20159
ARG SDC_USER=sdc
RUN apk add --update --no-cache bash \
busybox-suid \
sudo && \
echo 'hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4' >> /etc/nsswitch.conf
RUN addgroup --system ${SDC_USER} && \
adduser --system --disabled-password -u ${SDC_UID} -G ${SDC_USER} ${SDC_USER}
ADD --chown=sdc:sdc crontab.txt /etc/crontabs/sdc/
RUN chgrp sdc /etc/cron.d /etc/crontabs /usr/bin/crontab
# Also tried to run like this but not working
# RUN /usr/bin/crontab -u sdc /etc/crontabs/sdc/crontab.txt
USER ${SDC_USER}
EXPOSE 18631
RUN /usr/bin/crontab /etc/crontabs/sdc/crontab.txt
ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["dc", "-exec"]

Merge jenkinsfile | withcredentials and sshagent

I am trying to execute ansible playbook in all EC2 AWS instances using Jenkinsfile function and assume-role.
But I am getting below error.
Obtained devops/JenkinsfileDynamic from git git#bitbucket.org:tui-uk-dev/cng-airflow-dags.git
Running in Durability level: MAX_SURVIVABILITY
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
WorkflowScript: 33: illegal string body character after dollar sign;
solution: either escape a literal dollar sign "\$5" or bracket the value expression "${5}" # line 33, column 134.
SION_TOKEN=${AWS_SESSION_TOKEN} AWS_DEFA
^
Jenkinsfile:-
def Host_Verification2() {
withCredentials([[$class: 'AmazonWebServicesCredentialsBinding', credentialsId: 'cant_be_disclosed']]) {
sh '''
aws sts assume-role --role-arn "arn:aws:iam::12345678901:role/cant_role_jenkins" --role-session-name "connect" > assume-role-output.txt
export AWS_ACCESS_KEY_ID=`cat assume-role-output.txt | jq -c '.Credentials.AccessKeyId' | tr -d '"' | tr -d ' '`
export AWS_SECRET_ACCESS_KEY=`cat assume-role-output.txt | jq -c '.Credentials.SecretAccessKey' | tr -d '"' | tr -d ' '`
export AWS_SESSION_TOKEN=`cat assume-role-output.txt | jq -c '.Credentials.SessionToken' | tr -d '"' | tr -d ' '`
rm assume-role-output.txt
sshagent(credentials: ['tuiuki-cng-dev']) {
sh '''
cd acm/
sudo AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID}" AWS_SECRET_ACCESS_KEY="${AWS_SECRET_ACCESS_KEY}" AWS_SESSION_TOKEN="${AWS_SESSION_TOKEN}" inventory/ec2.py --list --refresh-cache
sudo AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID}" AWS_SECRET_ACCESS_KEY="${AWS_SECRET_ACCESS_KEY}" AWS_SESSION_TOKEN="${AWS_SESSION_TOKEN}" AWS_DEFAULT_REGION="eu-central-1" ansible-playbook -i inventory/ec2.py plays/emr/find.yml
'''
}
'''
}
}
Like the exception is saying:
solution: either escape a literal dollar sign "\$5" or bracket the value expression "${5}"
Try this:
sudo AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID}" AWS_SECRET_ACCESS_KEY="${AWS_SECRET_ACCESS_KEY}" ansible-playbook -i inventory/ec2.py plays/emr/findplaybooks.yml
sudo AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} AWS_SESSION_TOKEN=${AWS_SESSION_TOKEN} ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -i inventory/ec2.py --limit "tag_Name_cluster" plays/emr/find.yml --private-key=${SSH_KEY} -u hadoop

Sed command not working in Jenkins Pipeline

I have a file 'README.txt' which contains the line -
"version": "1.0.0-alpha-test.7"
Using a Jenkins Pipeline, I want to replace this line with
"version": "1.0.0-alpha-test.{BUILD_NUMBER}"
The following sed command works when I try it on a linux cluster
sed -i -E "s#(\"version\"[ ]*:[ ]*\".+alpha-test\.)[0-9]+\"#\1${BUILD_NUMBER}#g" README.txt
The same command does not work using a Jenkins Pipeline.
Tried with the following query but it doesn't work -
sh """
sed -i -E "s|([\"]version[\"][ ]*:[ ]*[\"].+alpha-test\\.)[0-9]+\"|\1${BUILD_NUMBER}|g" README.txt
cat README.txt
"""
/home/jenkins/workspace/test/test-pipeline#tmp/durable-eb774fcf/script.sh:
3:
/home/jenkins/workspace/test/test-pipeline#tmp/durable-eb774fcf/script.sh:
Syntax error: ")" unexpected
Best to use perl command instead...
script{
old_version = (sh (returnStdout: true, script:'''old_version=`cat version.cfg |grep VERSION=|cut -d "=" -f2`
echo $old_version''')).toString().trim()
sh """
if [ "$old_version" != $new_version ]; then
perl -pi -e "s,$old_version,$new_version,g" version.cfg
##-- git push operation --##
fi
"""
}

Parse URL in shell script

I have url like:
sftp://user#host.net/some/random/path
I want to extract user, host and path from this string. Any part can be random length.
[EDIT 2019]
This answer is not meant to be a catch-all, works for everything solution it was intended to provide a simple alternative to the python based version and it ended up having more features than the original.
It answered the basic question in a bash-only way and then was modified multiple times by myself to include a hand full of demands by commenters. I think at this point however adding even more complexity would make it unmaintainable. I know not all things are straight forward (checking for a valid port for example requires comparing hostport and host) but I would rather not add even more complexity.
[Original answer]
Assuming your URL is passed as first parameter to the script:
#!/bin/bash
# extract the protocol
proto="$(echo $1 | grep :// | sed -e's,^\(.*://\).*,\1,g')"
# remove the protocol
url="$(echo ${1/$proto/})"
# extract the user (if any)
user="$(echo $url | grep # | cut -d# -f1)"
# extract the host and port
hostport="$(echo ${url/$user#/} | cut -d/ -f1)"
# by request host without port
host="$(echo $hostport | sed -e 's,:.*,,g')"
# by request - try to extract the port
port="$(echo $hostport | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"
# extract the path (if any)
path="$(echo $url | grep / | cut -d/ -f2-)"
echo "url: $url"
echo " proto: $proto"
echo " user: $user"
echo " host: $host"
echo " port: $port"
echo " path: $path"
I must admit this is not the cleanest solution but it doesn't rely on another scripting
language like perl or python.
(Providing a solution using one of them would produce cleaner results ;) )
Using your example the results are:
url: user#host.net/some/random/path
proto: sftp://
user: user
host: host.net
port:
path: some/random/path
This will also work for URLs without a protocol/username or path.
In this case the respective variable will contain an empty string.
[EDIT]
If your bash version won't cope with the substitutions (${1/$proto/}) try this:
#!/bin/bash
# extract the protocol
proto="$(echo $1 | grep :// | sed -e's,^\(.*://\).*,\1,g')"
# remove the protocol -- updated
url=$(echo $1 | sed -e s,$proto,,g)
# extract the user (if any)
user="$(echo $url | grep # | cut -d# -f1)"
# extract the host and port -- updated
hostport=$(echo $url | sed -e s,$user#,,g | cut -d/ -f1)
# by request host without port
host="$(echo $hostport | sed -e 's,:.*,,g')"
# by request - try to extract the port
port="$(echo $hostport | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"
# extract the path (if any)
path="$(echo $url | grep / | cut -d/ -f2-)"
The above, refined (added password and port parsing), and working in /bin/sh:
# extract the protocol
proto="`echo $DATABASE_URL | grep '://' | sed -e's,^\(.*://\).*,\1,g'`"
# remove the protocol
url=`echo $DATABASE_URL | sed -e s,$proto,,g`
# extract the user and password (if any)
userpass="`echo $url | grep # | cut -d# -f1`"
pass=`echo $userpass | grep : | cut -d: -f2`
if [ -n "$pass" ]; then
user=`echo $userpass | grep : | cut -d: -f1`
else
user=$userpass
fi
# extract the host -- updated
hostport=`echo $url | sed -e s,$userpass#,,g | cut -d/ -f1`
port=`echo $hostport | grep : | cut -d: -f2`
if [ -n "$port" ]; then
host=`echo $hostport | grep : | cut -d: -f1`
else
host=$hostport
fi
# extract the path (if any)
path="`echo $url | grep / | cut -d/ -f2-`"
Posted b/c I needed it, so I wrote it (based on #Shirkin's answer, obviously), and I figured someone else might appreciate it.
This solution in principle works the same as Adam Ryczkowski's, in this thread - but has improved regular expression based on RFC3986, (with some changes) and fixes some errors (e.g. userinfo can contain '_' character). This can also understand relative URIs (e.g. to extract query or fragment).
# !/bin/bash
# Following regex is based on https://www.rfc-editor.org/rfc/rfc3986#appendix-B with
# additional sub-expressions to split authority into userinfo, host and port
#
readonly URI_REGEX='^(([^:/?#]+):)?(//((([^:/?#]+)#)?([^:/?#]+)(:([0-9]+))?))?(/([^?#]*))(\?([^#]*))?(#(.*))?'
# ↑↑ ↑ ↑↑↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
# |2 scheme | ||6 userinfo 7 host | 9 port | 11 rpath | 13 query | 15 fragment
# 1 scheme: | |5 userinfo# 8 :… 10 path 12 ?… 14 #…
# | 4 authority
# 3 //…
parse_scheme () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[2]}"
}
parse_authority () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[4]}"
}
parse_user () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[6]}"
}
parse_host () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[7]}"
}
parse_port () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[9]}"
}
parse_path () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[10]}"
}
parse_rpath () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[11]}"
}
parse_query () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[13]}"
}
parse_fragment () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[15]}"
}
Using Python (best tool for this job, IMHO):
#!/usr/bin/env python
import os
from urlparse import urlparse
uri = os.environ['NAUTILUS_SCRIPT_CURRENT_URI']
result = urlparse(uri)
user, host = result.netloc.split('#')
path = result.path
print('user=', user)
print('host=', host)
print('path=', path)
Further reading:
os.environ
urlparse.urlparse()
If you really want to do it in shell, you can do something as simple as the following by using awk. This requires knowing how many fields you will actually be passed (e.g. no password sometimes and not others).
#!/bin/bash
FIELDS=($(echo "sftp://user#host.net/some/random/path" \
| awk '{split($0, arr, /[\/\#:]*/); for (x in arr) { print arr[x] }}'))
proto=${FIELDS[1]}
user=${FIELDS[2]}
host=${FIELDS[3]}
path=$(echo ${FIELDS[#]:3} | sed 's/ /\//g')
If you don't have awk and you do have grep, and you can require that each field have at least two characters and be reasonably predictable in format, then you can do:
#!/bin/bash
FIELDS=($(echo "sftp://user#host.net/some/random/path" \
| grep -o "[a-z0-9.-][a-z0-9.-]*" | tr '\n' ' '))
proto=${FIELDS[1]}
user=${FIELDS[2]}
host=${FIELDS[3]}
path=$(echo ${FIELDS[#]:3} | sed 's/ /\//g')
Just needed to do the same, so was curious if it's possible to do it in single line, and this is what i've got:
#!/bin/bash
parse_url() {
eval $(echo "$1" | sed -e "s#^\(\(.*\)://\)\?\(\([^:#]*\)\(:\(.*\)\)\?#\)\?\([^/?]*\)\(/\(.*\)\)\?#${PREFIX:-URL_}SCHEME='\2' ${PREFIX:-URL_}USER='\4' ${PREFIX:-URL_}PASSWORD='\6' ${PREFIX:-URL_}HOST='\7' ${PREFIX:-URL_}PATH='\9'#")
}
URL=${1:-"http://user:pass#example.com/path/somewhere"}
PREFIX="URL_" parse_url "$URL"
echo "$URL_SCHEME://$URL_USER:$URL_PASSWORD#$URL_HOST/$URL_PATH"
How it works:
There is that crazy sed regex that captures all the parts of url, when all of them are optional (except for the host name)
Using those capture groups sed outputs env variables names with their values for relevant parts (like URL_SCHEME or URL_USER)
eval executes that output, causing those variables to be exported and available in the script
Optionally PREFIX could be passed to control output env variables names
PS: be careful when using this for arbitrary input since this code is vulnerable to script injections.
Here's my take, loosely based on some of the existing answers, but it can also cope with GitHub SSH clone URLs:
#!/bin/bash
PROJECT_URL="git#github.com:heremaps/here-aaa-java-sdk.git"
# Extract the protocol (includes trailing "://").
PARSED_PROTO="$(echo $PROJECT_URL | sed -nr 's,^(.*://).*,\1,p')"
# Remove the protocol from the URL.
PARSED_URL="$(echo ${PROJECT_URL/$PARSED_PROTO/})"
# Extract the user (includes trailing "#").
PARSED_USER="$(echo $PARSED_URL | sed -nr 's,^(.*#).*,\1,p')"
# Remove the user from the URL.
PARSED_URL="$(echo ${PARSED_URL/$PARSED_USER/})"
# Extract the port (includes leading ":").
PARSED_PORT="$(echo $PARSED_URL | sed -nr 's,.*(:[0-9]+).*,\1,p')"
# Remove the port from the URL.
PARSED_URL="$(echo ${PARSED_URL/$PARSED_PORT/})"
# Extract the path (includes leading "/" or ":").
PARSED_PATH="$(echo $PARSED_URL | sed -nr 's,[^/:]*([/:].*),\1,p')"
# Remove the path from the URL.
PARSED_HOST="$(echo ${PARSED_URL/$PARSED_PATH/})"
echo "proto: $PARSED_PROTO"
echo "user: $PARSED_USER"
echo "host: $PARSED_HOST"
echo "port: $PARSED_PORT"
echo "path: $PARSED_PATH"
which gives
proto:
user: git#
host: github.com
port:
path: :heremaps/here-aaa-java-sdk.git
And for PROJECT_URL="ssh://sschuberth#git.eclipse.org:29418/jgit/jgit" you get
proto: ssh://
user: sschuberth#
host: git.eclipse.org
port: :29418
path: /jgit/jgit
You can use bash string manipulation. It is easy to learn. In case you feel difficulties with regex, try it. As it is from NAUTILUS_SCRIPT_CURRENT_URI, i guess there may have port in that URI. So I also kept that optional.
#!/bin/bash
#You can also use environment variable $NAUTILUS_SCRIPT_CURRENT_URI
X="sftp://user#host.net/some/random/path"
tmp=${X#*//};usr=${tmp%#*}
tmp=${X#*#};host=${tmp%%/*};[[ ${X#*://} == *":"* ]] && host=${host%:*}
tmp=${X#*//};path=${tmp#*/}
proto=${X%:*}
[[ ${X#*://} == *":"* ]] && tmp=${X##*:} && port=${tmp%%/*}
echo "Potocol:"$proto" User:"$usr" Host:"$host" Port:"$port" Path:"$path
I don't have enough reputation to comment, but I made a small modification to #patryk-obara's answer.
RFC3986 § 6.2.3. Scheme-Based Normalization
treats
http://example.com
http://example.com/
as equivalent. But I found that his regex did not match a URL like http://example.com. http://example.com/ (with the trailing slash) does match.
I inserted 11, which changed / to (/|$). This matches either / or the end of the string. Now http://example.com does match.
readonly URI_REGEX='^(([^:/?#]+):)?(//((([^:/?#]+)#)?([^:/?#]+)(:([0-9]+))?))?((/|$)([^?#]*))(\?([^#]*))?(#(.*))?$'
# ↑↑ ↑ ↑↑↑ ↑ ↑ ↑ ↑↑ ↑ ↑ ↑ ↑ ↑
# || | ||| | | | || | | | | |
# |2 scheme | ||6 userinfo 7 host | 9 port || 12 rpath | 14 query | 16 fragment
# 1 scheme: | |5 userinfo# 8 :... || 13 ?... 15 #...
# | 4 authority |11 / or end-of-string
# 3 //... 10 path
If you have access to Bash >= 3.0 you can do this in pure bash as well, thanks to the re-match operator =~:
pattern='^(([[:alnum:]]+)://)?(([[:alnum:]]+)#)?([^:^#]+)(:([[:digit:]]+))?$'
if [[ "http://us#cos.com:3142" =~ $pattern ]]; then
proto=${BASH_REMATCH[2]}
user=${BASH_REMATCH[4]}
host=${BASH_REMATCH[5]}
port=${BASH_REMATCH[7]}
fi
It should be faster and less resource-hungry then all the previous examples, because no external process is be spawned.
A simplistic approach to get just the domain from the full URL:
echo https://stackoverflow.com/questions/6174220/parse-url-in-shell-script | cut -d/ -f1-3
# OUTPUT>>> https://stackoverflow.com
Get only the path:
echo https://stackoverflow.com/questions/6174220/parse-url-in-shell-script | cut -d/ -f4-
# OUTPUT>>> questions/6174220/parse-url-in-shell-script
Not perfect, as the second command strips the preceding slash so you'll need to prepend it by hand.
An awk-based approach for getting just the path without the domain:
echo https://stackoverflow.com/questions/6174220/parse-url-in-shell-script/59971653 | awk -F"/" '{ for (i=4; i<=NF; i++) printf"/%s", $i }'
# OUTPUT>>> /questions/6174220/parse-url-in-shell-script/59971653
I did further parsing, expanding the solution given by #Shirkrin:
#!/bin/bash
parse_url() {
local query1 query2 path1 path2
# extract the protocol
proto="$(echo $1 | grep :// | sed -e's,^\(.*://\).*,\1,g')"
if [[ ! -z $proto ]] ; then
# remove the protocol
url="$(echo ${1/$proto/})"
# extract the user (if any)
login="$(echo $url | grep # | cut -d# -f1)"
# extract the host
host="$(echo ${url/$login#/} | cut -d/ -f1)"
# by request - try to extract the port
port="$(echo $host | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"
# extract the uri (if any)
resource="/$(echo $url | grep / | cut -d/ -f2-)"
else
url=""
login=""
host=""
port=""
resource=$1
fi
# extract the path (if any)
path1="$(echo $resource | grep ? | cut -d? -f1 )"
path2="$(echo $resource | grep \# | cut -d# -f1 )"
path=$path1
if [[ -z $path ]] ; then path=$path2 ; fi
if [[ -z $path ]] ; then path=$resource ; fi
# extract the query (if any)
query1="$(echo $resource | grep ? | cut -d? -f2-)"
query2="$(echo $query1 | grep \# | cut -d\# -f1 )"
query=$query2
if [[ -z $query ]] ; then query=$query1 ; fi
# extract the fragment (if any)
fragment="$(echo $resource | grep \# | cut -d\# -f2 )"
echo "url: $url"
echo " proto: $proto"
echo " login: $login"
echo " host: $host"
echo " port: $port"
echo "resource: $resource"
echo " path: $path"
echo " query: $query"
echo "fragment: $fragment"
echo ""
}
parse_url "http://login:password#example.com:8080/one/more/dir/file.exe?a=sth&b=sth#anchor_fragment"
parse_url "https://example.com/one/more/dir/file.exe#anchor_fragment"
parse_url "http://login:password#example.com:8080/one/more/dir/file.exe#anchor_fragment"
parse_url "ftp://user#example.com:8080/one/more/dir/file.exe?a=sth&b=sth"
parse_url "/one/more/dir/file.exe"
parse_url "file.exe"
parse_url "file.exe#anchor"
I did not like above methods and wrote my own. It is for ftp link, just replace ftp with http if your need it.
First line is a small validation of link, link should look like ftp://user:pass#host.com/path/to/something.
if ! echo "$url" | grep -q '^[[:blank:]]*ftp://[[:alnum:]]\+:[[:alnum:]]\+#[[:alnum:]\.]\+/.*[[:blank:]]*$'; then return 1; fi
login=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\1|' )
pass=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\2|' )
host=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\3|' )
dir=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\4|' )
My actual goal was to check ftp access by url. Here is the full result:
#!/bin/bash
test_ftp_url() # lftp may hang on some ftp problems, like no connection
{
local url="$1"
if ! echo "$url" | grep -q '^[[:blank:]]*ftp://[[:alnum:]]\+:[[:alnum:]]\+#[[:alnum:]\.]\+/.*[[:blank:]]*$'; then return 1; fi
local login=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\1|' )
local pass=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\2|' )
local host=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\3|' )
local dir=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\4|' )
exec 3>&2 2>/dev/null
exec 6<>"/dev/tcp/$host/21" || { exec 2>&3 3>&-; echo 'Bash network support is disabled. Skipping ftp check.'; return 0; }
read <&6
if ! echo "${REPLY//$'\r'}" | grep -q '^220'; then exec 2>&3 3>&- 6>&-; return 3; fi # 220 vsFTPd 3.0.2+ (ext.1) ready...
echo -e "USER $login\r" >&6; read <&6
if ! echo "${REPLY//$'\r'}" | grep -q '^331'; then exec 2>&3 3>&- 6>&-; return 4; fi # 331 Please specify the password.
echo -e "PASS $pass\r" >&6; read <&6
if ! echo "${REPLY//$'\r'}" | grep -q '^230'; then exec 2>&3 3>&- 6>&-; return 5; fi # 230 Login successful.
echo -e "CWD $dir\r" >&6; read <&6
if ! echo "${REPLY//$'\r'}" | grep -q '^250'; then exec 2>&3 3>&- 6>&-; return 6; fi # 250 Directory successfully changed.
echo -e "QUIT\r" >&6
exec 2>&3 3>&- 6>&-
return 0
}
test_ftp_url 'ftp://fz223free:fz223free#ftp.zakupki.gov.ru/out/nsi/nsiProtocol/daily'
echo "$?"
I found Adam Ryczkowski's answers helpful. The original solution did not handle /path in URL, so I enhanced it a little bit.
pattern='^(([[:alnum:]]+):\/\/)?(([[:alnum:]]+)#)?([^:^#\/]+)(:([[:digit:]]+))?(\/?[^:^#]?)$'
url="http://us#cos.com:3142/path"
if [[ "$url" =~ $pattern ]]; then
proto=${BASH_REMATCH[2]}
user=${BASH_REMATCH[4]}
host=${BASH_REMATCH[5]}
port=${BASH_REMATCH[7]}
path=${BASH_REMATCH[8]}
echo "proto: $proto"
echo "user: $user"
echo "host: $host"
echo "port: $port"
echo "path= $path"
else
echo "URL did not match pattern: $url"
fi
The pattern is complex, so please use this site to understand it better: https://regex101.com/
I tested it with a bunch of URLs. However, if there are any issues, please let me know.
If you have access to Node.js:
export MY_URI=sftp://user#host.net/some/random/path
node -e "console.log(url.parse(process.env.MY_URI).user)"
node -e "console.log(url.parse(process.env.MY_URI).host)"
node -e "console.log(url.parse(process.env.MY_URI).path)"
This will output:
user
host.net
/some/random/path
Here's a pure bash url parser. It supports git ssh clone style URLs as well as standard proto:// ones. The example ignores protocol, auths, and port but you can modify to collect as needed... I used regex101 for handy testing: https://regex101.com/r/5QyNI5/1
TEST_URLS=(
https://github.com/briceburg/tools.git
https://foo:12333#github.com:8080/briceburg/tools.git
git#github.com:briceburg/tools.git
https://me#gmail.com:12345#my.site.com:443/p/a/t/h
)
for url in "${TEST_URLS[#]}"; do
without_proto="${url#*:\/\/}"
without_auth="${without_proto##*#}"
[[ $without_auth =~ ^([^:\/]+)(:[[:digit:]]+\/|:|\/)?(.*) ]]
PROJECT_HOST="${BASH_REMATCH[1]}"
PROJECT_PATH="${BASH_REMATCH[3]}"
echo "given: $url"
echo " -> host: $PROJECT_HOST path: $PROJECT_PATH"
done
results in:
given: https://github.com/briceburg/tools.git
-> host: github.com path: briceburg/tools.git
given: https://foo:12333#github.com:8080/briceburg/tools.git
-> host: github.com path: briceburg/tools.git
given: git#github.com:briceburg/tools.git
-> host: github.com path: briceburg/tools.git
given: https://me#gmail.com:12345#my.site.com:443/p/a/t/h
-> host: my.site.com path: p/a/t/h

Resources