RedHat Memory Used High

RedHat Memory Used High - memory

Looking for some help if you will..
I have a virtual machine on RedHat 6.5 with 32gb memory.
A free is showing 24.6gb used, 8.2gb free. Only 418mb is cached, 1.8gb buffers.
Executed a top and sorted by virtual used, and I can only account for about 6gb of that 24.6gb used.
A "ps aux" doesn't show any processes that could be taking the memory.
I am flummoxed and looking for some advice on where I can look to see whats taking the memory?
Any help would be appreciated.

Below Bash Script will help you figure out which application is consuming how much of memory.
#!/bin/bash
# Make sure only root can run our script
if [ "$(id -u)" != "0" ]; then
echo "This script must be run as root" 1>&2
exit 1
fi
### Functions
#This function will count memory statistic for passed PID
get_process_mem ()
{
PID=$1
#we need to check if 2 files exist
if [ -f /proc/$PID/status ];
then
if [ -f /proc/$PID/smaps ];
then
#here we count memory usage, Pss, Private and Shared = Pss-Private
Pss=`cat /proc/$PID/smaps | grep -e "^Pss:" | awk '{print $2}'| paste -sd+ | bc `
Private=`cat /proc/$PID/smaps | grep -e "^Private" | awk '{print $2}'| paste -sd+ | bc `
#we need to be sure that we count Pss and Private memory, to avoid errors
if [ x"$Rss" != "x" -o x"$Private" != "x" ];
then
let Shared=${Pss}-${Private}
Name=`cat /proc/$PID/status | grep -e "^Name:" |cut -d':' -f2`
#we keep all results in bytes
let Shared=${Shared}*1024
let Private=${Private}*1024
let Sum=${Shared}+${Private}
echo -e "$Private + $Shared = $Sum \t $Name"
fi
fi
fi
}
#this function make conversion from bytes to Kb or Mb or Gb
convert()
{
value=$1
power=0
#if value 0, we make it like 0.00
if [ "$value" = "0" ];
then
value="0.00"
fi
#We make conversion till value bigger than 1024, and if yes we divide by 1024
while [ $(echo "${value} > 1024"|bc) -eq 1 ]
do
value=$(echo "scale=2;${value}/1024" |bc)
let power=$power+1
done
#this part get b,kb,mb or gb according to number of divisions
case $power in
0) reg=b;;
1) reg=kb;;
2) reg=mb;;
3) reg=gb;;
esac
echo -n "${value} ${reg} "
}
#to ensure that temp files not exist
[[ -f /tmp/res ]] && rm -f /tmp/res
[[ -f /tmp/res2 ]] && rm -f /tmp/res2
[[ -f /tmp/res3 ]] && rm -f /tmp/res3
#if argument passed script will show statistic only for that pid, of not – we list all processes in /proc/ #and get statistic for all of them, all result we store in file /tmp/res
if [ $# -eq 0 ]
then
pids=`ls /proc | grep -e [0-9] | grep -v [A-Za-z] `
for i in $pids
do
get_process_mem $i >> /tmp/res
done
else
get_process_mem $1>> /tmp/res
fi
#This will sort result by memory usage
cat /tmp/res | sort -gr -k 5 > /tmp/res2
#this part will get uniq names from process list, and we will add all lines with same process list
#we will count nomber of processes with same name, so if more that 1 process where will be
# process(2) in output
for Name in `cat /tmp/res2 | awk '{print $6}' | sort | uniq`
do
count=`cat /tmp/res2 | awk -v src=$Name '{if ($6==src) {print $6}}'|wc -l| awk '{print $1}'`
if [ $count = "1" ];
then
count=""
else
count="(${count})"
fi
VmSizeKB=`cat /tmp/res2 | awk -v src=$Name '{if ($6==src) {print $1}}' | paste -sd+ | bc`
VmRssKB=`cat /tmp/res2 | awk -v src=$Name '{if ($6==src) {print $3}}' | paste -sd+ | bc`
total=`cat /tmp/res2 | awk '{print $5}' | paste -sd+ | bc`
Sum=`echo "${VmRssKB}+${VmSizeKB}"|bc`
#all result stored in /tmp/res3 file
echo -e "$VmSizeKB + $VmRssKB = $Sum \t ${Name}${count}" >>/tmp/res3
done
#this make sort once more.
cat /tmp/res3 | sort -gr -k 5 | uniq > /tmp/res
#now we print result , first header
echo -e "Private \t + \t Shared \t = \t RAM used \t Program"
#after we read line by line of temp file
while read line
do
echo $line | while read a b c d e f
do
#we print all processes if Ram used if not 0
if [ $e != "0" ]; then
#here we use function that make conversion
echo -en "`convert $a` \t $b \t `convert $c` \t $d \t `convert $e` \t $f"
echo ""
fi
done
done < /tmp/res #this part print footer, with counted Ram usage echo "--------------------------------------------------------" echo -e "\t\t\t\t\t\t `convert $total`" echo "========================================================" # we clean temporary file [[ -f /tmp/res ]] && rm -f /tmp/res [[ -f /tmp/res2 ]] && rm -f /tmp/res2 [[ -f /tmp/res3 ]] && rm -f /tmp/res3

I am going to take a wild stab at this. Without having access to the machine or additional information troubleshooting this will be difficult.
The /tmp file system is special in that it exists entirely in memory. There are a couple others that are like this but /tmp is a special flower. Check the disk usage on this directory and you may see where your memory is getting consumed. ( du -sh /tmp )

Related

how to monitor docker containers resource usage using shell script

how to monitor docker containers resource usage using a shell script.
I was just wondering can we use the docker stats command to get metrics to monitor docker containers resource usage

I have written a small shell script that will help to filter docker containers that are using max system resources. (I guess will work for one docker-swarm node cluster)
#!/bin/bash
#This script is used to complete the output of the docker stats command.
#The docker stats command does not compute the total amount of resources (RAM or CPU)
#Get the total amount of RAM, assumes there are at least 1024*1024 KiB, therefore > 1 GiB
docker stats | while read line
do
HOST_MEM_TOTAL=$(grep MemTotal /proc/meminfo | awk '{print $2/1024/1024}')
#echo "HOST TOTAL Memory: $HOST_MEM_TOTAL"
oldifs=IFS
IFS=;
dStats=$(docker stats --no-stream --format "table {{.MemPerc}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.Name}}\t{{.ID}}" | sed -n '1!p')
#dStats=$( docker stats --no-stream --format "table {{.MemPerc}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.Name}}\t{{.ID}}")
SUM_RAM=`echo $dStats | tail -n +2 | sed "s/%//g" | awk '{s+=$1} END {print s}'`
SUM_CPU=`echo $dStats | tail -n +2 | sed "s/%//g" | awk '{s+=$2} END {print s}'`
SUM_RAM_QUANTITY=`LC_NUMERIC=C printf %.2f $(echo "$SUM_RAM*$HOST_MEM_TOTAL*0.01" | bc)`
# Output the result
echo "########################################### Start of Resources Output ##############################################" >> /tmp/emailFiles/Docker-Resources-Usage-Stats.txt
echo " " >> /tmp/emailFiles/Docker-Resources-Usage-Stats.txt
dat=$(date)
echo "Present date & Time is: $dat" >> /tmp/emailFiles/Docker-Resources-Usage-Stats.txt
#IFS=$olifs
#echo "MEM % CPU % MEM USAGE / LIMIT NAME CONTAINER ID" >> /tmp/emailFiles/Docker-Resources-Usage-Stats.txt
echo "MEM % CPU % MEM USAGE / LIMIT NAME CONTAINER ID" >> /tmp/emailFiles/Docker-Resources-Usage-Stats.txt
IFS=$'\r\n' GLOBIGNORE='*'
for i in $dStats
do
cpuPerc=$(echo $i | awk '{print $2}')
memPerc=$(echo $i | awk '{print $1}')
cpuPerc=${cpuPerc%"%"}
cpuPerc=${cpuPerc/.*}
memPerc=${memPerc%"%"}
memPerc=${memPerc/.*}
#if [ $cpuPerc -ge 100 ] && [ $memPerc -ge 35 ]
if [ $cpuPerc -ge 100 ] || [ $memPerc -ge 50 ]
then
#IFS=$oldifs
echo $i >> /tmp/emailFiles/Docker-Resources-Usage-Stats.txt
else
a="hello"
fi
done
#IFS=$oldifs
SUM_RAM=${SUM_RAM/.*}
SUM_CPU=${SUM_CPU/.*}
if [ $SUM_RAM -ge 70 ] && [ $SUM_CPU -ge 100 ]
#if [ $SUM_RAM -ge 70 ] || [ $SUM_CPU -ge 100 ]
then
echo " " >>/tmp/emailFiles/Docker-Resources-Usage-Stats.txt
echo "Total-MEMORY-Usage Total-CPU-Usage Used-MEM / Total-MEM" >> /tmp/emailFiles/SIMSAPP-Docker-Resources-Usage-Stats.txt
#echo -e "${SUM_RAM}%\t\t\t${SUM_CPU}%\t\t${SUM_RAM_QUANTITY}GiB / ${HOST_MEM_TOTAL}GiB\tTOTAL" >> /tmp/emailFiles/Docker-Resources-Usage-Stats.txt
echo -e "${SUM_RAM}%\t\t\t${SUM_CPU}%\t\t${SUM_RAM_QUANTITY}GiB / ${HOST_MEM_TOTAL}GiB" >> /tmp/emailFiles/Docker-Resources-Usage-Stats.txt
echo " ">>/tmp/emailFiles/Docker-Resources-Usage-Stats.txt
fi
disk_usage=$(df -hT | grep ext4 | awk '{print $6}')
#disk_usage=$(df -kv| grep sda1 | awk '{preint $5}')
disk_usage=${disk_usage%"%"}
#disk_usage=${disk_usage/.*}
if [ $disk_usage -ge 90 ]
then
#echo "Filesystem Size Used Avail Use% Mounted on" >>/tmp/emailFiles/Docker-Resources-Usage-Stats.txt
echo "Filesystem Type Size Used Avail Use% Mounted on" >> /tmp/emailFiles/Docker-Resources-Usage-Stats.txt
#df -kh | grep sda1 >> /tmp/emailFiles/Docker-Resources-Usage-Stats.txt
df -hT | grep ext4 >> /tmp/emailFiles/Docker-Resources-Usage-Stats.txt
echo " "
#cat /tmp/emailFiles/Docker-Resources-Usage-Stats.txt
fi
echo "########################################### End of Resources Output ################################################" >> /tmp/emailFiles/Docker-Resources-Usage-Stats.txt
echo " " >> /tmp/emailFiles/Docker-Resources-Usage-Stats.txt
done
Please modify it according to your requirement, if you find it useful.

Print file name if grep finds multiple occurrences of a string in file, else exit on failure

File1 contains
hello
hello
I need to write a grep command to print the filename if this file contains more than one "hello". Otherwise, I need grep to exit on failure.
So far I have
grep -c "hello" File1 | grep -v :0
but it outputs
2. How do I get the desired output, which should either be filename File1 or no output at all (from what I understand, no match is a non zero exit code for grep)

with GNU grep for -z:
grep -lz 'hello.*hello' file
e.g.:
$ seq 15 | grep -lz '3.*3'
(standard input)
$ echo $?
0
$ seq 5 | grep -lz '3.*3'
$ echo $?
1

Like this:
#!/bin/bash
count=$(grep -c "hello" "$1")
if ((count > 1)); then
echo "$1"
else
exit 1
fi
Usage:
chmod +x script.sh
./script.sh File1
Explanations:
((...)) is an arithmetic command, which returns an exit status of 0 if the expression is nonzero, or 1 if the expression is zero. Also used as a synonym for "let", if side effects (assignments) are needed. See http://mywiki.wooledge.org/ArithmeticExpression

Using perl in a shell:
perl -0 -le '
my $filename = $ARGV[0];
print $filename if grep { /hello\nhello/ } <>
' file

Pause a process before disk is full

Certain processes (like git gc --aggressive) take a long time to run, take up lots of disk space, and die if I run out of disk space. I'd like to pause them if the disk will run out of space soon, so I have time to free up some memory. How can I do this?

Here is an initial solution I came up with. Tested with Mac OS X. Suggestions welcome!
#!/bin/bash
# Change these variables as necessary.
FILESYSTEM="/dev/disk1"
DF=/usr/local/opt/coreutils/libexec/gnubin/df
OSASCRIPT=/usr/bin/osascript
if ! [[ -x $DF ]]; then echo "Error: $DF isn't executable."; exit 1; fi
PID=$1
STOPAT=$2
# Verify input
if [[ -n ${PID//[0-9]/} ]]; then echo "Error: The first parameter should be an integer"; exit 1; fi
if [[ -n ${STOPAT//[0-9]/} ]]; then echo "Error: The second parameter should be an integer"; exit 1; fi
RED='\033[0;31m'; PURPLE='\033[0;35m'; BLUE='\033[0;36m'
NC='\033[0m' # No Color
echo -e "Will pause the following process when there are ${PURPLE}$STOPAT${NC} bytes left on ${PURPLE}$FILESYSTEM${NC}"
PROCESS=`ps -p $PID | grep $PID`
echo -e "${BLUE}$PROCESS${NC}"
# Check every second to see if FILESYSTEM has more than STOPAT bytes left.
while true; do
left=`$DF | grep -m 1 $FILESYSTEM | tr -s ' ' | cut -d" " -f4`
echo -ne "$left bytes left\r";
if [[ $left -lt $STOPAT ]]; then
MSG="pausing process...$PID";
echo $MSG;
if [[ -x $OSASCRIPT ]]; then
$OSASCRIPT -e "display notification \"$MSG\""
fi
kill -TSTP $PID
break
fi
sleep 1s
done

Select text between two paterns in huge html file?

1st issue : My code is working only if grep take constant pattern like this :
echo "$s" | grep -oP '(?<=class="A3">).*(?=</a>)'
2nd issue : assigning output to a variable not working too
Here is my script :
#!/bin/sh
filename="data.txt"
Ptr_ValidChannel="><a title=\"Id: "
Ptr_ChannelNameStart="<class=\"A3\">"
Ptr_ChannelNameEnd="</a>"
while read -r line
do
case "$line" in
# working 100%
#*$Ptr_ValidChannel*) echo "$line" | grep -oP '(?<=class="A3">).*?(?=</a>)' ;;
# not working
#*$Ptr_ValidChannel*) echo $line | grep -oP '(?<=$Ptr_ChannelNameStart).*?(?=$Ptr_ChannelNameEnd)' ;;
# not working
*$Ptr_ValidChannel*) myvar=$(echo $line | grep -oP '(?<=$Ptr_ChannelNameStart).*?(?=$Ptr_ChannelNameEnd)') ;;
esac
done < "$filename"
echo $var_name
exit
To simplify things the data.txt content is :
<TD WIDTH="15%"><a title="Id: I24 NEWS" class="A3">I24 News Français</a><br /><font color="#555555"> <a title="Sporadic or full 16/9 transmission"><img src="/169.gif"></a>
In my system the command :
ls -la /bin/sh
output is :
/bin/sh -> dash
best regards.
PS. NO BASH CODE PLEASE. ONLY SH.

After reading this article: dash as bin sh. i figured out what to do to make my code work correctly and more portable:
#! /bin/sh
filename='data.txt'
Ptr_ValidChannel='><a title="Id: '
Ptr_ChannelNameStart='class="A3">'
Ptr_ChannelNameEnd='</a>'
while read -r line
do
case "$line" in
*"$Ptr_ValidChannel"*) var_name=$(printf %s "$line" | grep -oP '(?<='"$Ptr_ChannelNameStart"').*?(?='"$Ptr_ChannelNameEnd"')'); printf %s "$var_name"; printf '\n'; ;;
esac
done < "$filename"
exit
Thank you for your comments
best reagrds.

Parse URL in shell script

I have url like:
sftp://user#host.net/some/random/path
I want to extract user, host and path from this string. Any part can be random length.

[EDIT 2019]
This answer is not meant to be a catch-all, works for everything solution it was intended to provide a simple alternative to the python based version and it ended up having more features than the original.
It answered the basic question in a bash-only way and then was modified multiple times by myself to include a hand full of demands by commenters. I think at this point however adding even more complexity would make it unmaintainable. I know not all things are straight forward (checking for a valid port for example requires comparing hostport and host) but I would rather not add even more complexity.
[Original answer]
Assuming your URL is passed as first parameter to the script:
#!/bin/bash
# extract the protocol
proto="$(echo $1 | grep :// | sed -e's,^\(.*://\).*,\1,g')"
# remove the protocol
url="$(echo ${1/$proto/})"
# extract the user (if any)
user="$(echo $url | grep # | cut -d# -f1)"
# extract the host and port
hostport="$(echo ${url/$user#/} | cut -d/ -f1)"
# by request host without port
host="$(echo $hostport | sed -e 's,:.*,,g')"
# by request - try to extract the port
port="$(echo $hostport | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"
# extract the path (if any)
path="$(echo $url | grep / | cut -d/ -f2-)"
echo "url: $url"
echo " proto: $proto"
echo " user: $user"
echo " host: $host"
echo " port: $port"
echo " path: $path"
I must admit this is not the cleanest solution but it doesn't rely on another scripting
language like perl or python.
(Providing a solution using one of them would produce cleaner results ;) )
Using your example the results are:
url: user#host.net/some/random/path
proto: sftp://
user: user
host: host.net
port:
path: some/random/path
This will also work for URLs without a protocol/username or path.
In this case the respective variable will contain an empty string.
[EDIT]
If your bash version won't cope with the substitutions (${1/$proto/}) try this:
#!/bin/bash
# extract the protocol
proto="$(echo $1 | grep :// | sed -e's,^\(.*://\).*,\1,g')"
# remove the protocol -- updated
url=$(echo $1 | sed -e s,$proto,,g)
# extract the user (if any)
user="$(echo $url | grep # | cut -d# -f1)"
# extract the host and port -- updated
hostport=$(echo $url | sed -e s,$user#,,g | cut -d/ -f1)
# by request host without port
host="$(echo $hostport | sed -e 's,:.*,,g')"
# by request - try to extract the port
port="$(echo $hostport | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"
# extract the path (if any)
path="$(echo $url | grep / | cut -d/ -f2-)"

The above, refined (added password and port parsing), and working in /bin/sh:
# extract the protocol
proto="`echo $DATABASE_URL | grep '://' | sed -e's,^\(.*://\).*,\1,g'`"
# remove the protocol
url=`echo $DATABASE_URL | sed -e s,$proto,,g`
# extract the user and password (if any)
userpass="`echo $url | grep # | cut -d# -f1`"
pass=`echo $userpass | grep : | cut -d: -f2`
if [ -n "$pass" ]; then
user=`echo $userpass | grep : | cut -d: -f1`
else
user=$userpass
fi
# extract the host -- updated
hostport=`echo $url | sed -e s,$userpass#,,g | cut -d/ -f1`
port=`echo $hostport | grep : | cut -d: -f2`
if [ -n "$port" ]; then
host=`echo $hostport | grep : | cut -d: -f1`
else
host=$hostport
fi
# extract the path (if any)
path="`echo $url | grep / | cut -d/ -f2-`"
Posted b/c I needed it, so I wrote it (based on #Shirkin's answer, obviously), and I figured someone else might appreciate it.

This solution in principle works the same as Adam Ryczkowski's, in this thread - but has improved regular expression based on RFC3986, (with some changes) and fixes some errors (e.g. userinfo can contain '_' character). This can also understand relative URIs (e.g. to extract query or fragment).
# !/bin/bash
# Following regex is based on https://www.rfc-editor.org/rfc/rfc3986#appendix-B with
# additional sub-expressions to split authority into userinfo, host and port
#
readonly URI_REGEX='^(([^:/?#]+):)?(//((([^:/?#]+)#)?([^:/?#]+)(:([0-9]+))?))?(/([^?#]*))(\?([^#]*))?(#(.*))?'
# ↑↑ ↑ ↑↑↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
# |2 scheme | ||6 userinfo 7 host | 9 port | 11 rpath | 13 query | 15 fragment
# 1 scheme: | |5 userinfo# 8 :… 10 path 12 ?… 14 #…
# | 4 authority
# 3 //…
parse_scheme () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[2]}"
}
parse_authority () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[4]}"
}
parse_user () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[6]}"
}
parse_host () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[7]}"
}
parse_port () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[9]}"
}
parse_path () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[10]}"
}
parse_rpath () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[11]}"
}
parse_query () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[13]}"
}
parse_fragment () {
[[ "$#" =~ $URI_REGEX ]] && echo "${BASH_REMATCH[15]}"
}

Using Python (best tool for this job, IMHO):
#!/usr/bin/env python
import os
from urlparse import urlparse
uri = os.environ['NAUTILUS_SCRIPT_CURRENT_URI']
result = urlparse(uri)
user, host = result.netloc.split('#')
path = result.path
print('user=', user)
print('host=', host)
print('path=', path)
Further reading:
os.environ
urlparse.urlparse()

If you really want to do it in shell, you can do something as simple as the following by using awk. This requires knowing how many fields you will actually be passed (e.g. no password sometimes and not others).
#!/bin/bash
FIELDS=($(echo "sftp://user#host.net/some/random/path" \
| awk '{split($0, arr, /[\/\#:]*/); for (x in arr) { print arr[x] }}'))
proto=${FIELDS[1]}
user=${FIELDS[2]}
host=${FIELDS[3]}
path=$(echo ${FIELDS[#]:3} | sed 's/ /\//g')
If you don't have awk and you do have grep, and you can require that each field have at least two characters and be reasonably predictable in format, then you can do:
#!/bin/bash
FIELDS=($(echo "sftp://user#host.net/some/random/path" \
| grep -o "[a-z0-9.-][a-z0-9.-]*" | tr '\n' ' '))
proto=${FIELDS[1]}
user=${FIELDS[2]}
host=${FIELDS[3]}
path=$(echo ${FIELDS[#]:3} | sed 's/ /\//g')

Just needed to do the same, so was curious if it's possible to do it in single line, and this is what i've got:
#!/bin/bash
parse_url() {
eval $(echo "$1" | sed -e "s#^\(\(.*\)://\)\?\(\([^:#]*\)\(:\(.*\)\)\?#\)\?\([^/?]*\)\(/\(.*\)\)\?#${PREFIX:-URL_}SCHEME='\2' ${PREFIX:-URL_}USER='\4' ${PREFIX:-URL_}PASSWORD='\6' ${PREFIX:-URL_}HOST='\7' ${PREFIX:-URL_}PATH='\9'#")
}
URL=${1:-"http://user:pass#example.com/path/somewhere"}
PREFIX="URL_" parse_url "$URL"
echo "$URL_SCHEME://$URL_USER:$URL_PASSWORD#$URL_HOST/$URL_PATH"
How it works:
There is that crazy sed regex that captures all the parts of url, when all of them are optional (except for the host name)
Using those capture groups sed outputs env variables names with their values for relevant parts (like URL_SCHEME or URL_USER)
eval executes that output, causing those variables to be exported and available in the script
Optionally PREFIX could be passed to control output env variables names
PS: be careful when using this for arbitrary input since this code is vulnerable to script injections.

Here's my take, loosely based on some of the existing answers, but it can also cope with GitHub SSH clone URLs:
#!/bin/bash
PROJECT_URL="git#github.com:heremaps/here-aaa-java-sdk.git"
# Extract the protocol (includes trailing "://").
PARSED_PROTO="$(echo $PROJECT_URL | sed -nr 's,^(.*://).*,\1,p')"
# Remove the protocol from the URL.
PARSED_URL="$(echo ${PROJECT_URL/$PARSED_PROTO/})"
# Extract the user (includes trailing "#").
PARSED_USER="$(echo $PARSED_URL | sed -nr 's,^(.*#).*,\1,p')"
# Remove the user from the URL.
PARSED_URL="$(echo ${PARSED_URL/$PARSED_USER/})"
# Extract the port (includes leading ":").
PARSED_PORT="$(echo $PARSED_URL | sed -nr 's,.*(:[0-9]+).*,\1,p')"
# Remove the port from the URL.
PARSED_URL="$(echo ${PARSED_URL/$PARSED_PORT/})"
# Extract the path (includes leading "/" or ":").
PARSED_PATH="$(echo $PARSED_URL | sed -nr 's,[^/:]*([/:].*),\1,p')"
# Remove the path from the URL.
PARSED_HOST="$(echo ${PARSED_URL/$PARSED_PATH/})"
echo "proto: $PARSED_PROTO"
echo "user: $PARSED_USER"
echo "host: $PARSED_HOST"
echo "port: $PARSED_PORT"
echo "path: $PARSED_PATH"
which gives
proto:
user: git#
host: github.com
port:
path: :heremaps/here-aaa-java-sdk.git
And for PROJECT_URL="ssh://sschuberth#git.eclipse.org:29418/jgit/jgit" you get
proto: ssh://
user: sschuberth#
host: git.eclipse.org
port: :29418
path: /jgit/jgit

You can use bash string manipulation. It is easy to learn. In case you feel difficulties with regex, try it. As it is from NAUTILUS_SCRIPT_CURRENT_URI, i guess there may have port in that URI. So I also kept that optional.
#!/bin/bash
#You can also use environment variable $NAUTILUS_SCRIPT_CURRENT_URI
X="sftp://user#host.net/some/random/path"
tmp=${X#*//};usr=${tmp%#*}
tmp=${X#*#};host=${tmp%%/*};[[ ${X#*://} == *":"* ]] && host=${host%:*}
tmp=${X#*//};path=${tmp#*/}
proto=${X%:*}
[[ ${X#*://} == *":"* ]] && tmp=${X##*:} && port=${tmp%%/*}
echo "Potocol:"$proto" User:"$usr" Host:"$host" Port:"$port" Path:"$path

I don't have enough reputation to comment, but I made a small modification to #patryk-obara's answer.
RFC3986 § 6.2.3. Scheme-Based Normalization
treats
http://example.com
http://example.com/
as equivalent. But I found that his regex did not match a URL like http://example.com. http://example.com/ (with the trailing slash) does match.
I inserted 11, which changed / to (/|$). This matches either / or the end of the string. Now http://example.com does match.
readonly URI_REGEX='^(([^:/?#]+):)?(//((([^:/?#]+)#)?([^:/?#]+)(:([0-9]+))?))?((/|$)([^?#]*))(\?([^#]*))?(#(.*))?$'
# ↑↑ ↑ ↑↑↑ ↑ ↑ ↑ ↑↑ ↑ ↑ ↑ ↑ ↑
# || | ||| | | | || | | | | |
# |2 scheme | ||6 userinfo 7 host | 9 port || 12 rpath | 14 query | 16 fragment
# 1 scheme: | |5 userinfo# 8 :... || 13 ?... 15 #...
# | 4 authority |11 / or end-of-string
# 3 //... 10 path

If you have access to Bash >= 3.0 you can do this in pure bash as well, thanks to the re-match operator =~:
pattern='^(([[:alnum:]]+)://)?(([[:alnum:]]+)#)?([^:^#]+)(:([[:digit:]]+))?$'
if [[ "http://us#cos.com:3142" =~ $pattern ]]; then
proto=${BASH_REMATCH[2]}
user=${BASH_REMATCH[4]}
host=${BASH_REMATCH[5]}
port=${BASH_REMATCH[7]}
fi
It should be faster and less resource-hungry then all the previous examples, because no external process is be spawned.

A simplistic approach to get just the domain from the full URL:
echo https://stackoverflow.com/questions/6174220/parse-url-in-shell-script | cut -d/ -f1-3
# OUTPUT>>> https://stackoverflow.com
Get only the path:
echo https://stackoverflow.com/questions/6174220/parse-url-in-shell-script | cut -d/ -f4-
# OUTPUT>>> questions/6174220/parse-url-in-shell-script
Not perfect, as the second command strips the preceding slash so you'll need to prepend it by hand.
An awk-based approach for getting just the path without the domain:
echo https://stackoverflow.com/questions/6174220/parse-url-in-shell-script/59971653 | awk -F"/" '{ for (i=4; i<=NF; i++) printf"/%s", $i }'
# OUTPUT>>> /questions/6174220/parse-url-in-shell-script/59971653

I did further parsing, expanding the solution given by #Shirkrin:
#!/bin/bash
parse_url() {
local query1 query2 path1 path2
# extract the protocol
proto="$(echo $1 | grep :// | sed -e's,^\(.*://\).*,\1,g')"
if [[ ! -z $proto ]] ; then
# remove the protocol
url="$(echo ${1/$proto/})"
# extract the user (if any)
login="$(echo $url | grep # | cut -d# -f1)"
# extract the host
host="$(echo ${url/$login#/} | cut -d/ -f1)"
# by request - try to extract the port
port="$(echo $host | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"
# extract the uri (if any)
resource="/$(echo $url | grep / | cut -d/ -f2-)"
else
url=""
login=""
host=""
port=""
resource=$1
fi
# extract the path (if any)
path1="$(echo $resource | grep ? | cut -d? -f1 )"
path2="$(echo $resource | grep \# | cut -d# -f1 )"
path=$path1
if [[ -z $path ]] ; then path=$path2 ; fi
if [[ -z $path ]] ; then path=$resource ; fi
# extract the query (if any)
query1="$(echo $resource | grep ? | cut -d? -f2-)"
query2="$(echo $query1 | grep \# | cut -d\# -f1 )"
query=$query2
if [[ -z $query ]] ; then query=$query1 ; fi
# extract the fragment (if any)
fragment="$(echo $resource | grep \# | cut -d\# -f2 )"
echo "url: $url"
echo " proto: $proto"
echo " login: $login"
echo " host: $host"
echo " port: $port"
echo "resource: $resource"
echo " path: $path"
echo " query: $query"
echo "fragment: $fragment"
echo ""
}
parse_url "http://login:password#example.com:8080/one/more/dir/file.exe?a=sth&b=sth#anchor_fragment"
parse_url "https://example.com/one/more/dir/file.exe#anchor_fragment"
parse_url "http://login:password#example.com:8080/one/more/dir/file.exe#anchor_fragment"
parse_url "ftp://user#example.com:8080/one/more/dir/file.exe?a=sth&b=sth"
parse_url "/one/more/dir/file.exe"
parse_url "file.exe"
parse_url "file.exe#anchor"

I did not like above methods and wrote my own. It is for ftp link, just replace ftp with http if your need it.
First line is a small validation of link, link should look like ftp://user:pass#host.com/path/to/something.
if ! echo "$url" | grep -q '^[[:blank:]]*ftp://[[:alnum:]]\+:[[:alnum:]]\+#[[:alnum:]\.]\+/.*[[:blank:]]*$'; then return 1; fi
login=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\1|' )
pass=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\2|' )
host=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\3|' )
dir=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\4|' )
My actual goal was to check ftp access by url. Here is the full result:
#!/bin/bash
test_ftp_url() # lftp may hang on some ftp problems, like no connection
{
local url="$1"
if ! echo "$url" | grep -q '^[[:blank:]]*ftp://[[:alnum:]]\+:[[:alnum:]]\+#[[:alnum:]\.]\+/.*[[:blank:]]*$'; then return 1; fi
local login=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\1|' )
local pass=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\2|' )
local host=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\3|' )
local dir=$( echo "$url" | sed 's|[[:blank:]]*ftp://\([^:]\+\):\([^#]\+\)#\([^/]\+\)\(/.*\)[[:blank:]]*|\4|' )
exec 3>&2 2>/dev/null
exec 6<>"/dev/tcp/$host/21" || { exec 2>&3 3>&-; echo 'Bash network support is disabled. Skipping ftp check.'; return 0; }
read <&6
if ! echo "${REPLY//$'\r'}" | grep -q '^220'; then exec 2>&3 3>&- 6>&-; return 3; fi # 220 vsFTPd 3.0.2+ (ext.1) ready...
echo -e "USER $login\r" >&6; read <&6
if ! echo "${REPLY//$'\r'}" | grep -q '^331'; then exec 2>&3 3>&- 6>&-; return 4; fi # 331 Please specify the password.
echo -e "PASS $pass\r" >&6; read <&6
if ! echo "${REPLY//$'\r'}" | grep -q '^230'; then exec 2>&3 3>&- 6>&-; return 5; fi # 230 Login successful.
echo -e "CWD $dir\r" >&6; read <&6
if ! echo "${REPLY//$'\r'}" | grep -q '^250'; then exec 2>&3 3>&- 6>&-; return 6; fi # 250 Directory successfully changed.
echo -e "QUIT\r" >&6
exec 2>&3 3>&- 6>&-
return 0
}
test_ftp_url 'ftp://fz223free:fz223free#ftp.zakupki.gov.ru/out/nsi/nsiProtocol/daily'
echo "$?"

I found Adam Ryczkowski's answers helpful. The original solution did not handle /path in URL, so I enhanced it a little bit.
pattern='^(([[:alnum:]]+):\/\/)?(([[:alnum:]]+)#)?([^:^#\/]+)(:([[:digit:]]+))?(\/?[^:^#]?)$'
url="http://us#cos.com:3142/path"
if [[ "$url" =~ $pattern ]]; then
proto=${BASH_REMATCH[2]}
user=${BASH_REMATCH[4]}
host=${BASH_REMATCH[5]}
port=${BASH_REMATCH[7]}
path=${BASH_REMATCH[8]}
echo "proto: $proto"
echo "user: $user"
echo "host: $host"
echo "port: $port"
echo "path= $path"
else
echo "URL did not match pattern: $url"
fi
The pattern is complex, so please use this site to understand it better: https://regex101.com/
I tested it with a bunch of URLs. However, if there are any issues, please let me know.

If you have access to Node.js:
export MY_URI=sftp://user#host.net/some/random/path
node -e "console.log(url.parse(process.env.MY_URI).user)"
node -e "console.log(url.parse(process.env.MY_URI).host)"
node -e "console.log(url.parse(process.env.MY_URI).path)"
This will output:
user
host.net
/some/random/path

Here's a pure bash url parser. It supports git ssh clone style URLs as well as standard proto:// ones. The example ignores protocol, auths, and port but you can modify to collect as needed... I used regex101 for handy testing: https://regex101.com/r/5QyNI5/1
TEST_URLS=(
https://github.com/briceburg/tools.git
https://foo:12333#github.com:8080/briceburg/tools.git
git#github.com:briceburg/tools.git
https://me#gmail.com:12345#my.site.com:443/p/a/t/h
)
for url in "${TEST_URLS[#]}"; do
without_proto="${url#*:\/\/}"
without_auth="${without_proto##*#}"
[[ $without_auth =~ ^([^:\/]+)(:[[:digit:]]+\/|:|\/)?(.*) ]]
PROJECT_HOST="${BASH_REMATCH[1]}"
PROJECT_PATH="${BASH_REMATCH[3]}"
echo "given: $url"
echo " -> host: $PROJECT_HOST path: $PROJECT_PATH"
done
results in:
given: https://github.com/briceburg/tools.git
-> host: github.com path: briceburg/tools.git
given: https://foo:12333#github.com:8080/briceburg/tools.git
-> host: github.com path: briceburg/tools.git
given: git#github.com:briceburg/tools.git
-> host: github.com path: briceburg/tools.git
given: https://me#gmail.com:12345#my.site.com:443/p/a/t/h
-> host: my.site.com path: p/a/t/h

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

RedHat Memory Used High - memory

Related

how to monitor docker containers resource usage using shell script

Print file name if grep finds multiple occurrences of a string in file, else exit on failure

Pause a process before disk is full

Select text between two paterns in huge html file?

Parse URL in shell script

Categories

Resources