Select text between two paterns in huge html file? - grep

1st issue : My code is working only if grep take constant pattern like this :
echo "$s" | grep -oP '(?<=class="A3">).*(?=</a>)'
2nd issue : assigning output to a variable not working too
Here is my script :
#!/bin/sh
filename="data.txt"
Ptr_ValidChannel="><a title=\"Id: "
Ptr_ChannelNameStart="<class=\"A3\">"
Ptr_ChannelNameEnd="</a>"
while read -r line
do
case "$line" in
# working 100%
#*$Ptr_ValidChannel*) echo "$line" | grep -oP '(?<=class="A3">).*?(?=</a>)' ;;
# not working
#*$Ptr_ValidChannel*) echo $line | grep -oP '(?<=$Ptr_ChannelNameStart).*?(?=$Ptr_ChannelNameEnd)' ;;
# not working
*$Ptr_ValidChannel*) myvar=$(echo $line | grep -oP '(?<=$Ptr_ChannelNameStart).*?(?=$Ptr_ChannelNameEnd)') ;;
esac
done < "$filename"
echo $var_name
exit
To simplify things the data.txt content is :
<TD WIDTH="15%"><a title="Id: I24 NEWS" class="A3">I24 News Français</a><br /><font color="#555555"> <a title="Sporadic or full 16/9 transmission"><img src="/169.gif"></a>
In my system the command :
ls -la /bin/sh
output is :
/bin/sh -> dash
best regards.
PS. NO BASH CODE PLEASE. ONLY SH.

After reading this article: dash as bin sh. i figured out what to do to make my code work correctly and more portable:
#! /bin/sh
filename='data.txt'
Ptr_ValidChannel='><a title="Id: '
Ptr_ChannelNameStart='class="A3">'
Ptr_ChannelNameEnd='</a>'
while read -r line
do
case "$line" in
*"$Ptr_ValidChannel"*) var_name=$(printf %s "$line" | grep -oP '(?<='"$Ptr_ChannelNameStart"').*?(?='"$Ptr_ChannelNameEnd"')'); printf %s "$var_name"; printf '\n'; ;;
esac
done < "$filename"
exit
Thank you for your comments
best reagrds.

Related

Print file name if grep finds multiple occurrences of a string in file, else exit on failure

File1 contains
hello
hello
I need to write a grep command to print the filename if this file contains more than one "hello". Otherwise, I need grep to exit on failure.
So far I have
grep -c "hello" File1 | grep -v :0
but it outputs
2. How do I get the desired output, which should either be filename File1 or no output at all (from what I understand, no match is a non zero exit code for grep)
with GNU grep for -z:
grep -lz 'hello.*hello' file
e.g.:
$ seq 15 | grep -lz '3.*3'
(standard input)
$ echo $?
0
$ seq 5 | grep -lz '3.*3'
$ echo $?
1
Like this:
#!/bin/bash
count=$(grep -c "hello" "$1")
if ((count > 1)); then
echo "$1"
else
exit 1
fi
Usage:
chmod +x script.sh
./script.sh File1
Explanations:
((...)) is an arithmetic command, which returns an exit status of 0 if the expression is nonzero, or 1 if the expression is zero. Also used as a synonym for "let", if side effects (assignments) are needed. See http://mywiki.wooledge.org/ArithmeticExpression
Using perl in a shell:
perl -0 -le '
my $filename = $ARGV[0];
print $filename if grep { /hello\nhello/ } <>
' file

RedHat Memory Used High

Looking for some help if you will..
I have a virtual machine on RedHat 6.5 with 32gb memory.
A free is showing 24.6gb used, 8.2gb free. Only 418mb is cached, 1.8gb buffers.
Executed a top and sorted by virtual used, and I can only account for about 6gb of that 24.6gb used.
A "ps aux" doesn't show any processes that could be taking the memory.
I am flummoxed and looking for some advice on where I can look to see whats taking the memory?
Any help would be appreciated.
Below Bash Script will help you figure out which application is consuming how much of memory.
#!/bin/bash
# Make sure only root can run our script
if [ "$(id -u)" != "0" ]; then
echo "This script must be run as root" 1>&2
exit 1
fi
### Functions
#This function will count memory statistic for passed PID
get_process_mem ()
{
PID=$1
#we need to check if 2 files exist
if [ -f /proc/$PID/status ];
then
if [ -f /proc/$PID/smaps ];
then
#here we count memory usage, Pss, Private and Shared = Pss-Private
Pss=`cat /proc/$PID/smaps | grep -e "^Pss:" | awk '{print $2}'| paste -sd+ | bc `
Private=`cat /proc/$PID/smaps | grep -e "^Private" | awk '{print $2}'| paste -sd+ | bc `
#we need to be sure that we count Pss and Private memory, to avoid errors
if [ x"$Rss" != "x" -o x"$Private" != "x" ];
then
let Shared=${Pss}-${Private}
Name=`cat /proc/$PID/status | grep -e "^Name:" |cut -d':' -f2`
#we keep all results in bytes
let Shared=${Shared}*1024
let Private=${Private}*1024
let Sum=${Shared}+${Private}
echo -e "$Private + $Shared = $Sum \t $Name"
fi
fi
fi
}
#this function make conversion from bytes to Kb or Mb or Gb
convert()
{
value=$1
power=0
#if value 0, we make it like 0.00
if [ "$value" = "0" ];
then
value="0.00"
fi
#We make conversion till value bigger than 1024, and if yes we divide by 1024
while [ $(echo "${value} > 1024"|bc) -eq 1 ]
do
value=$(echo "scale=2;${value}/1024" |bc)
let power=$power+1
done
#this part get b,kb,mb or gb according to number of divisions
case $power in
0) reg=b;;
1) reg=kb;;
2) reg=mb;;
3) reg=gb;;
esac
echo -n "${value} ${reg} "
}
#to ensure that temp files not exist
[[ -f /tmp/res ]] && rm -f /tmp/res
[[ -f /tmp/res2 ]] && rm -f /tmp/res2
[[ -f /tmp/res3 ]] && rm -f /tmp/res3
#if argument passed script will show statistic only for that pid, of not – we list all processes in /proc/ #and get statistic for all of them, all result we store in file /tmp/res
if [ $# -eq 0 ]
then
pids=`ls /proc | grep -e [0-9] | grep -v [A-Za-z] `
for i in $pids
do
get_process_mem $i >> /tmp/res
done
else
get_process_mem $1>> /tmp/res
fi
#This will sort result by memory usage
cat /tmp/res | sort -gr -k 5 > /tmp/res2
#this part will get uniq names from process list, and we will add all lines with same process list
#we will count nomber of processes with same name, so if more that 1 process where will be
# process(2) in output
for Name in `cat /tmp/res2 | awk '{print $6}' | sort | uniq`
do
count=`cat /tmp/res2 | awk -v src=$Name '{if ($6==src) {print $6}}'|wc -l| awk '{print $1}'`
if [ $count = "1" ];
then
count=""
else
count="(${count})"
fi
VmSizeKB=`cat /tmp/res2 | awk -v src=$Name '{if ($6==src) {print $1}}' | paste -sd+ | bc`
VmRssKB=`cat /tmp/res2 | awk -v src=$Name '{if ($6==src) {print $3}}' | paste -sd+ | bc`
total=`cat /tmp/res2 | awk '{print $5}' | paste -sd+ | bc`
Sum=`echo "${VmRssKB}+${VmSizeKB}"|bc`
#all result stored in /tmp/res3 file
echo -e "$VmSizeKB + $VmRssKB = $Sum \t ${Name}${count}" >>/tmp/res3
done
#this make sort once more.
cat /tmp/res3 | sort -gr -k 5 | uniq > /tmp/res
#now we print result , first header
echo -e "Private \t + \t Shared \t = \t RAM used \t Program"
#after we read line by line of temp file
while read line
do
echo $line | while read a b c d e f
do
#we print all processes if Ram used if not 0
if [ $e != "0" ]; then
#here we use function that make conversion
echo -en "`convert $a` \t $b \t `convert $c` \t $d \t `convert $e` \t $f"
echo ""
fi
done
done < /tmp/res #this part print footer, with counted Ram usage echo "--------------------------------------------------------" echo -e "\t\t\t\t\t\t `convert $total`" echo "========================================================" # we clean temporary file [[ -f /tmp/res ]] && rm -f /tmp/res [[ -f /tmp/res2 ]] && rm -f /tmp/res2 [[ -f /tmp/res3 ]] && rm -f /tmp/res3
I am going to take a wild stab at this. Without having access to the machine or additional information troubleshooting this will be difficult.
The /tmp file system is special in that it exists entirely in memory. There are a couple others that are like this but /tmp is a special flower. Check the disk usage on this directory and you may see where your memory is getting consumed. ( du -sh /tmp )

trying to grep '--string' fails

I'm trying to grep for a string that starts with "--"
for some reason it counted as special character, but even when trying to use -F then grep gives me bad syntax:
[root#pc-01 /]# grep -F --restore .
-bash: --restore: command not found
any tips?
Thanks.
Try following.
grep -F -- --restore filename
You can escape the first - :
Without escaping:
[root#TIAGO-TEST2 tmp]# echo '--aa --bb --cc' | grep -o '--b'
grep: option '--b' is ambiguous; possibilities: '--basic-regexp' '--binary' '--byte-offset' '--binary-files' '--before-context'
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
Escaping:
[root#TIAGO-TEST2 tmp]# echo '--aa --bb --cc' | grep -o '\--b'
--b

xargs: String concatenation

zgrep -i XXX XXX | grep -o "RID=[0-9|A-Z]*" |
uniq | cut -d "=" -f2 |
xargs -0 -I string echo "RequestID="string
My output is
RequestID=121212112
8127127128
8129129812
But my requirement is to have the request ID prefixed before all the output.
Any help is appreciated
I had a similar task and this worked for me. It might be what you are looking for:
zgrep -i XXX XXX | grep -o "RID=[0-9|A-Z]*" |
uniq | cut -d "=" -f2 |
xargs -I {} echo "RequestID="{}
Try -n option of xargs.
-n max-args
Use at most max-args arguments per command line. Fewer than max-args arguments will be used if the size (see the -s option)
is exceeded,
unless the -x option is given, in which case xargs will exit.
Example:
$ echo -e '1\n2' | xargs echo 'str ='
str = 1 2
$ echo -e '1\n2' | xargs -n 1 echo 'str ='
str = 1
str = 2

grep multiple extension current and subfolders

I'm trying to grep multiple extensions within the current and all sub-folders.
grep -i -r -n 'hello' somepath/*.{php,html}
This is only grepping the current folder but not sub-folders.
What would be a good way of doing this?
Using only grep:
grep -irn --include='*.php' --include='*.html' 'hello' somepath/
One of these:
find '(' -name '*.php' -o -name '*.html' ')' -exec grep -i -n hello {} +
find '(' -name '*.php' -o -name '*.html' ')' -print0 | xargs -0 grep -i -n hello
I was looking the same and when decided to do a bash script I started with vim codesearch and surprise I already did this before!
#!/bin/bash
context="$3"
#ln = line number mt = match mc = file
export GREP_COLORS="sl=32:mc=00;33:ms=05;40;31:ln="
if [[ "$context" == "" ]]; then context=5; fi
grep --color=always -n -a -R -i -C"$context" --exclude='*.mp*'\
--exclude='*.avi'\
--exclude='*.flv'\
--exclude='*.png'\
--exclude='*.gif'\
--exclude='*.jpg'\
--exclude='*.wav'\
--exclude='*.rar'\
--exclude='*.zip'\
--exclude='*.gz'\
--exclude='*.sql' "$2" "$1" | less -R
paste this code into in a file named codesearch and set the chmod to 700 or 770
I guess this could be better here for the next time that I forgot
this script will show with colors the matches and the context around
./codesearch '/full/path' 'string to search'
and optional defining the number of context line around default 5
./codesearch '/full/path' 'string to search' 3
I edited the code and added some eye candy
example ./codesearch ./ 'eval' 2
Looks like this when you have enabled "allow blinking text" in terminal

Resources