grep regex, replace specific find in text file - grep

I'm trying to automatically replace some Copyright string in files. The string is is following format
"Copyright (C) 2004-2008 by"
but years can differ. I try to find this lines in all files and replace the last year with current.
grep -r ' * Copyright (C) [0-9]\{4\}-[0-9]\{4\} by.' *
Now how can I replace the last group found with 2013? (Want to use from pipe)

grep doesn't do replacements. You can try sed, e.g.:
sed 's/Copyright (C) \([0-9]\{4\}\)-[0-9]\{4\} by/Copyright (C) \1-2013 by/'
or as Kent notes:
sed 's/\(Copyright (C) [0-9]\{4\}\)-[0-9]\{4\} by/\1-2013 by/'
or ssed:
ssed -R 's/(?<=Copyright \(C\) )([0-9]{4})-[0-9]{4}(?= by)/\1-2013/'

This is what I came up with to answer your question:
sed -i 's/Copyright (C) \([0-9]\{4\}\)-[0-9]\{4\} by/Copyright (C) \1-2013 by/' `find -type f`
I'm using the sed query proposed by #Lev, but acting on the files directly. I included the -i to save the changes to the file. I also included a "find" to the end of the command line, so that sed would look for files recursively.
Please note the different types of quotes used. The first two are simple quotes, but the last two are back quotes, used to make the shell run the "find" command and use its results as parameter to "sed".

Related

How to grep a matching filename AND extension from pattern file to a text file?

Content of testfile.txt
/path1/abc.txt
/path2/abc.txt.1
/path3/abc.txt123
Content of pattern.txt
abc.txt$
Bash Command
grep -i -f pattern.txt testfile.txt
Output:
/path1/abc.txt
This is a working solution, but currently the $ in the pattern is manually added to each line and this edited pattern file is uploaded to users. I am trying to avoid the manual amendment.
Alternate solution to loop and read line by line, but required scripting skills or upload scripts to user environment.
Want to keep the original pattern files in an audited environment, users just login and run simple cut-n-paste commands.
Any one liner solution?
You can use sed to add $ to pattern.txt and then use grep, but you might run into issues due to regexp metacharacters like the . character. For example, abc.txt$ will also match abc1txt. And unless you take care of matching only the basename from the file path, abc.txt$ will also match /some/path/foobazabc.txt.
I'd suggest to use awk instead:
$ awk '!f{a[$0]; next} $NF in a' pattern.txt f=1 FS='/' testfile.txt
/path1/abc.txt
pattern.txt f=1 FS='/' testfile.txt here a flag f is set between the two files and field separator is also changed to / for the second file
!f{a[$0]; next} if flag f is not set (i.e. for the first file), build an array a with line contents as the key
$NF in a for the second file, if the last field matches a key in array a, print the line
Just noticed that you are also using -i option, so use this for case insensitive matching:
awk '!f{a[tolower($0)]; next} tolower($NF) in a'
Since pattern.txt contains only a single pattern, and you don't want to change it, since it is an audited file, you could do
grep -i -f "$(<pattern.txt)'$' testfile.txt
instead. Note that this would break, if the maintainer of the file one day decided to actually write there a terminating $.
IMO, it would make more sense to explain to the maintainer of pattern.txt that he is supposed to place there a simple regular expression, which is going to match your testfile. In this case s/he can decide whether the pattern really should match only the right edge or some inner part of the lines.
If pattern.txt contains more than one line, and you want to add the $ to each line, you can likewise do a
grep -i -f <(sed 's/$/$/' <pattern.txt) testfile.txt
As the '$' symbol indicates pattern end. The following script should work.
#!/bin/bash
file_pattern='pattern.txt' # path to pattern file
file_test='testfile.txt' # path to test file
while IFS=$ read -r line
do
echo "$line"
grep -wn "$line" $file_test
done < "$file_pattern"
You can remove the IFS descriptor if the pattern file comes with leading/trailing spaces.
Also the grep option -w matches only whole word and -n provides with line number.

Formatting text in Adb Shell

I was doing some adb shell stuff on windows and stuck at a point. Here's what I was doing..
I was printing all installed apps on my phone and getting their exact path.
zeroltetmo:/ # pm list packages -f
package:/system/app/FilterProvider/FilterProvider.apk=com.samsung.android.provider.filterprovider
package:/system/priv-app/CtsShimPrivPrebuilt/CtsShimPrivPrebuilt.apk=com.android.cts.priv.ctsshim
package:/system/app/YouTube/Youtube.apk=com.google.android.youtube
package:/system/app/vsimservice/vsimservice.apk=com.sec.vsimservice
package:/system/priv-app/WallpaperCropper/WallpaperCropper.apk=com.android.wallpapercropper
package:/system/framework/framework-res.apk=android
package:/system/framework/samsung-framework-res/samsung-framework-res.apk=com.samsung.android.framework.res
package:/data/app/com.whatsapp-1/base.apk=com.whatsapp
package:/data/app/ru.meefik.busybox-2/base.apk=ru.meefik.busybox
package:/data/app/com.google.android.play.games-1/base.apk=com.google.android.play.games
But,
I want this to print only system/app directory but only upto folder name instead of the full path. What i'm doing is piping this to grep and using this pattern to get the result.
zeroltetmo:/ # pm list packages -f | grep -o "system/app.*\/"
system/app/FilterProvider/
system/app/RootPA/
system/app/YouTube/
system/app/ClipboardSaveService/
system/app/TetheringAutomation/
system/app/GoogleExtShared/
system/app/WfdBroker/
system/app/vsimservice/
system/app/USBSettings/
system/app/EasyOneHand3/
But the problem is this / at the end of folder name that I'm stuck with.
You can filter the trailing slashes out with sed like that:
pm list packages -f | grep -o "system/app.*/" | sed 's,/$,,'
Explanation of the sed command:
s stands for substitution
, delimits command name from its arguments - it's easier to use something different / when we want to replace /
/$ - string to be replaced. In this case it means slash at the end of the line
The string to replace /$ with is empty because we want to remove it.

Grep Tab, Carriage Return, & New Line

I'm trying to use Grep to find a string with Tabs, Carriage Returns, & New Lines. Any other method would be helpful also.
grep -R "\x0A\x0D\x09<p><b>Site Info</b></p>\x0A\x0D\x09<blockquote>\x0A\x0D\x09\x09<p>\x0A\x0D\x09</blockquote>\x0A\x0D</blockquote>\x0A\x0D<blockquote>\x0A\x0D\x09<p><b>More Site Info</b></p>" *
From this answer
If using GNU grep, you can use the Perl-style regexp:
$ grep -P '\t' *
Also from here
Use Ctrl+V, Ctrl+M to enter a literal Carriage Return character into your grep string. So:
grep -IUr --color "^M"
will work - if the ^M there is a literal CR that you input as I suggested.
If you want the list of files, you want to add the -l option as well.
Quoting this answer:
Grep is not sufficient for this operation.
pcregrep, which is
found in most of the modern Linux systems can be used ...
Bash Example
$ pcregrep -M "try:\n fro.*\n.*except" file.py
returns
try:
from tifffile import imwrite
except (ModuleNotFoundError, ImportError):

How do I extract partial path from pwd in tcsh?

I want to basically implement an alias (using cd) which takes me to the 5th directory in my pwd. i.e.
If my pwd is /hm/foo/bar/dir1/dir2/dir3/dir4/dir5, I want my alias, say cdf to take me to /hm/foo/bar/dir1/dir2 .
So basically I am trying to figure how I strip a given path to a given number of levels of directories in tcsh.
Any pointers?
Edit:
Okay, I came this far to print out the dir I want to cd into using awk:
alias cdf 'echo `pwd` | awk -F '\''/'\'' '\''BEGIN{OFS="/";} {print $1,$2,$3,$4,$5,$6,$7;}'\'''
I am finding it difficult to do a cd over this as it already turned into a mess of escaped characters.
This should do the trick:
alias cdf source ~/.tcsh/cdf.tcsh
And in ~/.tcsh/cdf.tcsh:
cd "`pwd | cut -d/ -f1-6`"
We use the pwd tool to get the current path, and pipe that to cut, where we split by the delimiter / (-d/) and show the first 5 fields (-f1-6).
You can see cut as a very light awk; in many cases it's enough, and hugely simplifies things.
The problem with your alias is tcsh's quircky quoting rules. I'm not even going to try and fix that. We use source to evade all of that;
tcsh lacks functions, but you can sort of emulate them with this. Never said it was pretty.
#carpetsmoker's solution using cut is nice and simple. But since his solution awkwardly uses another file and source, here's a demonstration of how to avoid that. Using single quotes prevents the premature evaluation.
% alias cdf 'cd "`pwd | cut -d/ -f1-6`"'
% alias cdf
cd "`pwd | cut -d/ -f1-6`"
Here's a simple demonstration of how single quotes can work with backticks:
% alias pwd2 'echo `pwd`'
% alias pwd2
echo `pwd`
% pwd2
/home/shx2

simple filtering with `grep` , `awk`, `sed` or whatever else that's capable

I have a file, each line of which can be described by this grammar:
<text> <colon> <fullpath> <comma> <"by"> <text> <colon> <text> <colon> <text> <colon> <text>
Eg.,
needs fixing (Sunday): src/foo/io.c, by Smith : in progress : <... random comment ...>
How do I get the <fullpath> portion, which lies between the first <colon> and the first <comma>
(I'm not very inclined to write a program to parse this, though this looks like it could be done easily with javacc. Hoping to use some built-in tools like sed, awk, ...)
Or with a regex substitution
sed -n 's/^[^:]*:\([^:,]*\),.*/\1/p' file
Linux sed dialect; if on a different platform, maybe you need an -E option and/or take out the backslashes before the round parentheses; or just go with Perl instead;
perl -nle 'print $1 if m/:(.*?),/' file
Assuming the input will be similar to what you have above:
awk '{print $4}' | tr -d ,
For the entire file you can just type the file name next to the awk command to the command I have above.
If you're using bash script to parse this stuff, you don't even need tools like awk or sed.
$ text="needs fixing (Sunday): src/foo/io.c, by Smith : in progress : <... comment ...>"
$ text=${text%%,*}
$ text=${text#*: }
$ echo "$text"
src/foo/io.c
Read about this on the bash man page under Parameter Expansion.
with GNU grep:
grep -oP '(?<=: ).*?(?=,)'
This may find more than one substring if there are subsequent commas in the line.

Resources