zgrep -i XXX XXX | grep -o "RID=[0-9|A-Z]*" |
uniq | cut -d "=" -f2 |
xargs -0 -I string echo "RequestID="string
My output is
But my requirement is to have the request ID prefixed before all the output.
I had a similar task and this worked for me. It might be what you are looking for:
zgrep -i XXX XXX | grep -o "RID=[0-9|A-Z]*" |
uniq | cut -d "=" -f2 |
xargs -I {} echo "RequestID="{}

Try -n option of xargs.
-n max-args
Use at most max-args arguments per command line. Fewer than max-args arguments will be used if the size (see the -s option)
is exceeded,
unless the -x option is given, in which case xargs will exit.
$ echo -e '1\n2' | xargs echo 'str ='
str = 1 2
$ echo -e '1\n2' | xargs -n 1 echo 'str ='
str = 1
str = 2


Search for part of string with grep in all files in folder and subfolders

I have .html files in directories and subdirectories. I need to extract all strings that starts with "domain.com". Part of string can look like this:
href="https://example.com/anotherfolder2" target="
What I want to extract is:
from all files in all folders to one list, each word - new line.
Found some examples on StackOverflow with many likes, but not worked. I tried like this (from some examples):
grep -Po '(?<=example.com=)[^,]*'
grep "example.com" your-directory -r | grep -o '".*"' | cut -d \" -f2| sed -e 's/https:\/\/example.com\///g'
grep "example.com" your-directory -r | grep -o '".*"' your-directory -r | cut -d \" -f2 extracts the content of quoted string
sed -e 's/https:\/\/example.com\///g' get the suffix of https://example.com/
echo "https://example.com/folder1" | tr -s '/' | tr '/' '\n' > file
sed -i '1d' file
sed -n '1p' file # This will give you example.com
sed -n '2p' file # This will give you folder1
sed -i 1s'#example\.com#newsite.com#' file
echo "http://" > nf
sed -n '2,$p' file >> nf
cat nf | tr '\n' '/' > newfile
cat newfile # This should be http://newsite.com/folder1
rm -v ./nf

Why does grep return lines in the middle of the string when I expect it to be anchored at the beginning?

I'm trying to extract the first word of the line when the line starts with whitespace, so I write the following command. But grep also returns the second word when it shouldn't. The ^ is supposed to match the beginning of the line:
echo -e " cat foo\n dog bar\n" | grep -Eo '^ +[^ ]+'
I expect it to return:
I'm running on MacOS 10.15.7.
As stated here in this report, this is actually a bug in BSD grep.
As a work around, you can use these awk and sed command to get equivalent output
cat file
cat foo
dog bar
sed -E 's/(^[[:blank:]]+[^[:blank:]]+).*/\1/' file
awk 'match($0, /^[[:blank:]]+[^[:blank:]]+/){print substr($0, 1, RLENGTH)}' file
it's easy something like this :
echo -e " cat foo\n dog bar\n" | grep -o '[^$(printf '\t') ].*' | grep -o '^[^ ]\+'
or use awk like this
echo -e " cat foo\n dog bar\n" | awk 'NF==2{print $1}'
or sed like this :
echo -e " cat foo\n dog bar\n" | grep -o '[^$(printf '\t') ].*' | sed 's/ .*//'
or use cut like this
echo -e " cat foo\n dog bar\n" | grep -o '[^$(printf '\t') ].*' | cut -d" " -f1
dynamic exclusion of files through grep matching

I have a file source-push.sh which returns the list of files which I want to exclude from the results of find command.
It looks like this:
#!/usr/bin/env bash
find . -not \( -path './node_modules' -prune \) -name '*.js' | grep -vE $(echo $(./source-push.sh | xargs -I{} echo -n "{}|") | rev | cut -b2- | rev) | xargs -L1 standard --fix
find . -not \( -path './node_modules' -prune \) -name '*.css' | grep -vE $(echo $(./source-push.sh | xargs -I{} echo -n "{}|") | rev | cut -b2- | rev) | xargs -L1 stylelint --config stylelint.json
There are supposed to be a way to do the job better than that. Any suggestions?
Instead of:
... | grep -vE $(echo $(./source-push.sh | xargs -I{} echo -n "{}|") | rev | cut -b2- | rev ) | ...
you can use the POSIX options -F and -f:
... | grep -v -F -f <( ./source-push.sh ) | ...
-F tells grep that the patterns are fixed strings
(avoiding the problem that your original code would break if the patterns contain characters that are special to grep -E)
-f file tells grep to use a list of patterns from file
<( ... ) is a bash way to present output of a program as a file (named pipe)

How do i Extract integer value from a string in Unix

when i type this command
/usr/local/afs7/bin/afs_paftools -a about.afs | grep TOTAL_DOCUMENTS
I get a result
How i can extract the integer number(74195) after =
using grep command
One way is to use grep:
$ echo "TOTAL_DOCUMENTS = 74195" | grep -o '[0-9]\+'
or since you know, that it's the last field, use awk:
$ echo "TOTAL_DOCUMENTS = 74195" | awk '{print $NF}'
or just use awk for the lot:
your-command -a about.afs | awk '/TOTAL_DOCUMENTS/{print $NF}'
If there are no space:
Use this awk
echo "TOTAL_DOCUMENTS=74195" | awk -F= '{print $NF}'

Parse URL in shell script

I have url like:
I want to extract user, host and path from this string. Any part can be random length.
[EDIT 2019]
This answer is not meant to be a catch-all, works for everything solution it was intended to provide a simple alternative to the python based version and it ended up having more features than the original.
It answered the basic question in a bash-only way and then was modified multiple times by myself to include a hand full of demands by commenters. I think at this point however adding even more complexity would make it unmaintainable. I know not all things are straight forward (checking for a valid port for example requires comparing hostport and host) but I would rather not add even more complexity.
[Original answer]
Assuming your URL is passed as first parameter to the script:
# extract the protocol
proto="$(echo $1 | grep :// | sed -e's,^\(.*://\).*,\1,g')"
# remove the protocol
url="$(echo ${1/$proto/})"
# extract the user (if any)
user="$(echo $url | grep # | cut -d# -f1)"
# extract the host and port
hostport="$(echo ${url/$user#/} | cut -d/ -f1)"
# by request host without port
host="$(echo $hostport | sed -e 's,:.*,,g')"
# by request - try to extract the port
port="$(echo $hostport | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"
# extract the path (if any)
path="$(echo $url | grep / | cut -d/ -f2-)"
echo "url: $url"
echo " proto: $proto"
echo " user: $user"
echo " host: $host"
echo " port: $port"
echo " path: $path"
I must admit this is not the cleanest solution but it doesn't rely on another scripting
language like perl or python.
(Providing a solution using one of them would produce cleaner results ;) )
Using your example the results are:
url: user#host.net/some/random/path
proto: sftp://
user: user
host: host.net
path: some/random/path
This will also work for URLs without a protocol/username or path.
In this case the respective variable will contain an empty string.
If your bash version won't cope with the substitutions (${1/$proto/}) try this:
# extract the protocol
proto="$(echo $1 | grep :// | sed -e's,^\(.*://\).*,\1,g')"
# remove the protocol -- updated
url=$(echo $1 | sed -e s,$proto,,g)
# extract the user (if any)
user="$(echo $url | grep # | cut -d# -f1)"
# extract the host and port -- updated
hostport=$(echo $url | sed -e s,$user#,,g | cut -d/ -f1)
# by request host without port
host="$(echo $hostport | sed -e 's,:.*,,g')"
# by request - try to extract the port
port="$(echo $hostport | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"
# extract the path (if any)
path="$(echo $url | grep / | cut -d/ -f2-)"
The above, refined (added password and port parsing), and working in /bin/sh:
# extract the protocol
proto="`echo $DATABASE_URL | grep '://' | sed -e's,^\(.*://\).*,\1,g'`"
# remove the protocol
url=`echo $DATABASE_URL | sed -e s,$proto,,g`
# extract the user and password (if any)
userpass="`echo $url | grep # | cut -d# -f1`"
pass=`echo $userpass | grep : | cut -d: -f2`
if [ -n "$pass" ]; then
user=`echo $userpass | grep : | cut -d: -f1`
# extract the host -- updated
hostport=`echo $url | sed -e s,$userpass#,,g | cut -d/ -f1`
port=`echo $hostport | grep : | cut -d: -f2`
if [ -n "$port" ]; then
host=`echo $hostport | grep : | cut -d: -f1`
# extract the path (if any)
path="`echo $url | grep / | cut -d/ -f2-`"
Posted b/c I needed it, so I wrote it (based on #Shirkin's answer, obviously), and I figured someone else might appreciate it.
This solution in principle works the same as Adam Ryczkowski's, in this thread - but has improved regular expression based on RFC3986, (with some changes) and fixes some errors (e.g. userinfo can contain '_' character). This can also understand relative URIs (e.g. to extract query or fragment).
# !/bin/bash
# Following regex is based on https://www.rfc-editor.org/rfc/rfc3986#appendix-B with
# additional sub-expressions to split authority into userinfo, host and port
readonly URI_REGEX='^(([^:/?#]+):)?(//((([^:/?#]+)#)?([^:/?#]+)(:([0-9]+))?))?(/([^?#]*))(\?([^#]*))?(#(.*))?'
# ↑↑ ↑ ↑↑↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
# |2 scheme | ||6 userinfo 7 host | 9 port | 11 rpath | 13 query | 15 fragment
# 1 scheme: | |5 userinfo# 8 :… 10 path 12 ?… 14 #…
# | 4 authority
# 3 //…
parse_scheme () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[2]}"
parse_authority () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[4]}"
parse_user () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[6]}"
parse_host () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[7]}"
parse_port () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[9]}"
parse_path () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[10]}"
parse_rpath () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[11]}"
parse_query () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[13]}"
parse_fragment () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[15]}"
Using Python (best tool for this job, IMHO):
#!/usr/bin/env python
import os
from urlparse import urlparse
result = urlparse(uri)
user, host = result.netloc.split('#')
path = result.path
print('user=', user)
print('host=', host)
print('path=', path)
If you really want to do it in shell, you can do something as simple as the following by using awk. This requires knowing how many fields you will actually be passed (e.g. no password sometimes and not others).
FIELDS=($(echo "sftp://user#host.net/some/random/path" \
| awk '{split($0, arr, /[\/\#:]*/); for (x in arr) { print arr[x] }}'))
path=$(echo ${FIELDS[#]:3} | sed 's/ /\//g')
If you don't have awk and you do have grep, and you can require that each field have at least two characters and be reasonably predictable in format, then you can do:
FIELDS=($(echo "sftp://user#host.net/some/random/path" \
| grep -o "[a-z0-9.-][a-z0-9.-]*" | tr '\n' ' '))
path=$(echo ${FIELDS[#]:3} | sed 's/ /\//g')
Just needed to do the same, so was curious if it's possible to do it in single line, and this is what i've got:
parse_url() {
eval $(echo "$1" | sed -e "s#^\(\(.*\)://\)\?\(\([^:#]*\)\(:\(.*\)\)\?#\)\?\([^/?]*\)\(/\(.*\)\)\?#${PREFIX:-URL_}SCHEME='\2' ${PREFIX:-URL_}USER='\4' ${PREFIX:-URL_}PASSWORD='\6' ${PREFIX:-URL_}HOST='\7' ${PREFIX:-URL_}PATH='\9'#")
PREFIX="URL_" parse_url "$URL"
How it works:
There is that crazy sed regex that captures all the parts of url, when all of them are optional (except for the host name)
Using those capture groups sed outputs env variables names with their values for relevant parts (like URL_SCHEME or URL_USER)
eval executes that output, causing those variables to be exported and available in the script
Optionally PREFIX could be passed to control output env variables names
PS: be careful when using this for arbitrary input since this code is vulnerable to script injections.
Here's my take, loosely based on some of the existing answers, but it can also cope with GitHub SSH clone URLs:
# Extract the protocol (includes trailing "://").
PARSED_PROTO="$(echo $PROJECT_URL | sed -nr 's,^(.*://).*,\1,p')"
# Remove the protocol from the URL.
# Extract the user (includes trailing "#").
PARSED_USER="$(echo $PARSED_URL | sed -nr 's,^(.*#).*,\1,p')"
# Remove the user from the URL.
# Extract the port (includes leading ":").
PARSED_PORT="$(echo $PARSED_URL | sed -nr 's,.*(:[0-9]+).*,\1,p')"
# Remove the port from the URL.
# Extract the path (includes leading "/" or ":").
PARSED_PATH="$(echo $PARSED_URL | sed -nr 's,[^/:]*([/:].*),\1,p')"
# Remove the path from the URL.
echo "proto: $PARSED_PROTO"
echo "user: $PARSED_USER"
echo "host: $PARSED_HOST"
echo "port: $PARSED_PORT"
echo "path: $PARSED_PATH"
which gives
user: git#
host: github.com
path: :heremaps/here-aaa-java-sdk.git
And for PROJECT_URL="ssh://sschuberth#git.eclipse.org:29418/jgit/jgit" you get
proto: ssh://
user: sschuberth#
host: git.eclipse.org
port: :29418
path: /jgit/jgit
You can use bash string manipulation. It is easy to learn. In case you feel difficulties with regex, try it. As it is from NAUTILUS_SCRIPT_CURRENT_URI, i guess there may have port in that URI. So I also kept that optional.
#You can also use environment variable $NAUTILUS_SCRIPT_CURRENT_URI
tmp=${X#*#};host=${tmp%%/*};[[ ${X#*://} == *":"* ]] && host=${host%:*}
[[ ${X#*://} == *":"* ]] && tmp=${X##*:} && port=${tmp%%/*}
echo "Potocol:"$proto" User:"$usr" Host:"$host" Port:"$port" Path:"$path
I don't have enough reputation to comment, but I made a small modification to #patryk-obara's answer.
RFC3986 § 6.2.3. Scheme-Based Normalization
as equivalent. But I found that his regex did not match a URL like http://example.com. http://example.com/ (with the trailing slash) does match.
I inserted 11, which changed / to (/|$). This matches either / or the end of the string. Now http://example.com does match.
readonly URI_REGEX='^(([^:/?#]+):)?(//((([^:/?#]+)#)?([^:/?#]+)(:([0-9]+))?))?((/|$)([^?#]*))(\?([^#]*))?(#(.*))?$'
# ↑↑ ↑ ↑↑↑ ↑ ↑ ↑ ↑↑ ↑ ↑ ↑ ↑ ↑
# || | ||| | | | || | | | | |
# |2 scheme | ||6 userinfo 7 host | 9 port || 12 rpath | 14 query | 16 fragment
# 1 scheme: | |5 userinfo# 8 :... || 13 ?... 15 #...
# | 4 authority |11 / or end-of-string
# 3 //... 10 path
If you have access to Bash >= 3.0 you can do this in pure bash as well, thanks to the re-match operator =~:
if [[ "http://us#cos.com:3142" =~ $pattern ]]; then
It should be faster and less resource-hungry then all the previous examples, because no external process is be spawned.
A simplistic approach to get just the domain from the full URL:
echo https://stackoverflow.com/questions/6174220/parse-url-in-shell-script | cut -d/ -f1-3
# OUTPUT>>> https://stackoverflow.com
Get only the path:
echo https://stackoverflow.com/questions/6174220/parse-url-in-shell-script | cut -d/ -f4-
# OUTPUT>>> questions/6174220/parse-url-in-shell-script
Not perfect, as the second command strips the preceding slash so you'll need to prepend it by hand.
An awk-based approach for getting just the path without the domain:
echo https://stackoverflow.com/questions/6174220/parse-url-in-shell-script/59971653 | awk -F"/" '{ for (i=4; i<=NF; i++) printf"/%s", $i }'
# OUTPUT>>> /questions/6174220/parse-url-in-shell-script/59971653
I did further parsing, expanding the solution given by #Shirkrin:
parse_url() {
local query1 query2 path1 path2
# extract the protocol
proto="$(echo $1 | grep :// | sed -e's,^\(.*://\).*,\1,g')"
if [[ ! -z $proto ]] ; then
# remove the protocol
url="$(echo ${1/$proto/})"
# extract the user (if any)
login="$(echo $url | grep # | cut -d# -f1)"
# extract the host
host="$(echo ${url/$login#/} | cut -d/ -f1)"
# by request - try to extract the port
port="$(echo $host | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"
# extract the uri (if any)
resource="/$(echo $url | grep / | cut -d/ -f2-)"
# extract the path (if any)
path1="$(echo $resource | grep ? | cut -d? -f1 )"
path2="$(echo $resource | grep \# | cut -d# -f1 )"
if [[ -z $path ]] ; then path=$path2 ; fi
if [[ -z $path ]] ; then path=$resource ; fi
# extract the query (if any)
query1="$(echo $resource | grep ? | cut -d? -f2-)"
query2="$(echo $query1 | grep \# | cut -d\# -f1 )"
if [[ -z $query ]] ; then query=$query1 ; fi
# extract the fragment (if any)
fragment="$(echo $resource | grep \# | cut -d\# -f2 )"
echo "url: $url"
echo " proto: $proto"
echo " login: $login"
echo " host: $host"
echo " port: $port"
echo "resource: $resource"
echo " path: $path"
echo " query: $query"
echo "fragment: $fragment"
echo ""
parse_url "http://login:password#example.com:8080/one/more/dir/file.exe?a=sth&b=sth#anchor_fragment"
parse_url "https://example.com/one/more/dir/file.exe#anchor_fragment"
parse_url "http://login:password#example.com:8080/one/more/dir/file.exe#anchor_fragment"
parse_url "ftp://user#example.com:8080/one/more/dir/file.exe?a=sth&b=sth"
parse_url "/one/more/dir/file.exe"
parse_url "file.exe"
parse_url "file.exe#anchor"
I did not like above methods and wrote my own. It is for ftp link, just replace ftp with http if your need it.
First line is a small validation of link, link should look like ftp://user:pass#host.com/path/to/something.
if ! echo "$url" | grep -q '^[[:blank:]]*ftp://[[:alnum:]]\+:[[:alnum:]]\+#[[:alnum:]\.]\+/.*[[:blank:]]*$'; then return 1; fi
login=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\1|' )
pass=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\2|' )
host=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\3|' )
dir=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\4|' )
My actual goal was to check ftp access by url. Here is the full result:
test_ftp_url() # lftp may hang on some ftp problems, like no connection
local url="$1"
if ! echo "$url" | grep -q '^[[:blank:]]*ftp://[[:alnum:]]\+:[[:alnum:]]\+#[[:alnum:]\.]\+/.*[[:blank:]]*$'; then return 1; fi
local login=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\1|' )
local pass=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\2|' )
local host=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\3|' )
local dir=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\4|' )
exec 3>&2 2>/dev/null
exec 6<>"/dev/tcp/$host/21" || { exec 2>&3 3>&-; echo 'Bash network support is disabled. Skipping ftp check.'; return 0; }
read <&6
if ! echo "${REPLY//$'\r'}" | grep -q '^220'; then exec 2>&3 3>&- 6>&-; return 3; fi # 220 vsFTPd 3.0.2+ (ext.1) ready...
echo -e "USER $login\r" >&6; read <&6
if ! echo "${REPLY//$'\r'}" | grep -q '^331'; then exec 2>&3 3>&- 6>&-; return 4; fi # 331 Please specify the password.
echo -e "PASS $pass\r" >&6; read <&6
if ! echo "${REPLY//$'\r'}" | grep -q '^230'; then exec 2>&3 3>&- 6>&-; return 5; fi # 230 Login successful.
echo -e "CWD $dir\r" >&6; read <&6
if ! echo "${REPLY//$'\r'}" | grep -q '^250'; then exec 2>&3 3>&- 6>&-; return 6; fi # 250 Directory successfully changed.
echo -e "QUIT\r" >&6
exec 2>&3 3>&- 6>&-
return 0
test_ftp_url 'ftp://fz223free:fz223free#ftp.zakupki.gov.ru/out/nsi/nsiProtocol/daily'
echo "$?"
I found Adam Ryczkowski's answers helpful. The original solution did not handle /path in URL, so I enhanced it a little bit.
if [[ "$url" =~ $pattern ]]; then
echo "proto: $proto"
echo "user: $user"
echo "host: $host"
echo "port: $port"
echo "path= $path"
echo "URL did not match pattern: $url"
The pattern is complex, so please use this site to understand it better: https://regex101.com/
I tested it with a bunch of URLs. However, if there are any issues, please let me know.
If you have access to Node.js:
export MY_URI=sftp://user#host.net/some/random/path
node -e "console.log(url.parse(process.env.MY_URI).user)"
node -e "console.log(url.parse(process.env.MY_URI).host)"
node -e "console.log(url.parse(process.env.MY_URI).path)"
This will output:
Here's a pure bash url parser. It supports git ssh clone style URLs as well as standard proto:// ones. The example ignores protocol, auths, and port but you can modify to collect as needed... I used regex101 for handy testing: https://regex101.com/r/5QyNI5/1
for url in "${TEST_URLS[#]}"; do
[[ $without_auth =~ ^([^:\/]+)(:[[:digit:]]+\/|:|\/)?(.*) ]]
echo "given: $url"
echo " -> host: $PROJECT_HOST path: $PROJECT_PATH"
results in:
given: https://github.com/briceburg/tools.git
-> host: github.com path: briceburg/tools.git
given: https://foo:12333#github.com:8080/briceburg/tools.git
-> host: github.com path: briceburg/tools.git
given: git#github.com:briceburg/tools.git
-> host: github.com path: briceburg/tools.git
given: https://me#gmail.com:12345#my.site.com:443/p/a/t/h
-> host: my.site.com path: p/a/t/h
