md5sum - ignore trailing whitespace - checksum

Say I have two files:
1.json
{"foo":"bar"}\n
2.json
{"foo":"bar"}
when using checksum routines, is there a way to ignore the trailing whitespace?
maybe something like this:
md5sum < <(cat file | trim_somehow)

You can use sed or xargs.
xargs is much simpler, but be careful with it. I'm not sure if it is safe to use in this context. Read the comments below this answer https://stackoverflow.com/a/12973694/4330274. (There are a lot of answers to your question in that post).
md5sum < <(cat file | xargs) will delete trailing/leading whitespaces (Also, as stated by dave_thompson_085 on the comments down below, it will compress each whiltespace sequence to one whitespace and it will remove quotation marks and backslashes) from the file before passing it to the md5sum utility.
Note: xargs appends a new line to the end of the input.
I recommend using sed for this purpose. It is much safer. Read this answer https://stackoverflow.com/a/3232433/4330274

Related

Grep Filenames from ls for specific part of them

I want to extract a specific part out of the filenames to work with them.
Example:
ls -1
REZ-Name1,Surname1-02-04-2012.png
REZ-Name2,Surname2-07-08-2013.png
....
So I want to get only the part with the name.
How can this be achieved ?
There are several ways to do this. Here's a loop:
for file in REZ-*-??-??-????.png
do
name=${file#*-}
name=${name%-??-??-????.png}
echo "($name)"
done
Given a variety of filenames with all sorts of edge cases from spacing, additional hyphens and line feeds:
REZ-Anna-Maria,de-la-Cruz-12-32-2015.png
REZ-Bjørn,Dæhlie-01-01-2015.png
REZ-First,Last-12-32-2015.png
REZ-John Quincy,Adams-11-12-2014.png
REZ-Ridiculous example # this is one filename
is ridiculous,but fun-22-11-2000.png # spanning two lines
it outputs:
(Anna-Maria,de-la-Cruz)
(Bjørn,Dæhlie)
(First,Last)
(John Quincy,Adams)
(Ridiculous example
is ridiculous,but fun)
If you're less concerned with correctness, you can simplify it further:
$ ls | grep -o '[^-]*,[^-]*'
Maria,de
Bjørn,Dæhlie
First,Last
John Quincy,Adams
is ridiculous,but fun
In this case, cut makes more sense than grep:
ls -l | cut -f2 -d-
cut the second field from the input, using '-' as the field delimiter. That other guy's answer will correctly handle some cases mine will not, but for one off uses, I generally find the semantics of cut to be much easier to remember.

grep from beginning of found word to end of word

I am trying to grep the output of a command that outputs unknown text and a directory per line. Below is an example of what I mean:
.MHuj.5.. /var/log/messages
The text and directory may be different from time to time or system to system. All I want to do though is be able to grep the directory out and send it to a variable.
I have looked around but cannot figure out how to grep to the end of a word. I know I can start the search phrase looking for a "/", but I don't know how to tell grep to stop at the end of the word, or if it will consider the next "/" a new word or not. The directories listed could change, so I can't assume the same amount of directories will be listed each time. In some cases, there will be multiple lines listed and each will have a directory list in it's output. Thanks for any help you can provide!
If your directory paths does not have spaces then you can do:
$ echo '.MHuj.5.. /var/log/messages' | awk '{print $NF}'
/var/log/messages
It's not clear from a single example whether we can generalize that e.g. the first occurrence of a slash marks the beginning of the data you want to extract. If that holds, try
grep -o '/.*' file
To fetch everything after the last space, try
grep -o '[^ ]*$' file
For more advanced pattern matching and extraction, maybe look at sed, or Awk or Perl or Python.
Your line can be described as:
^\S+\s+(\S+)$
That's assuming whitespace is your delimiter between the random text and the directory. It simply separates the whitespace from the non-whitespace and captures the second part.
Or you might want to look into the word boundary character class: \b.
I know you said to use grep, but I can't help to mention that this is trivially done using awk:
awk '{ print $NF }' input.txt
This is assuming that a whitespace is the delimiter and that the path does not contain any whitespaces.

Opposite of "only-matching" in grep?

Is there any way to do the opposite of showing only the matching part of strings in grep (the -o flag), that is, show everything except the part that matches the regex?
That is, the -v flag is not the answer, since that would not show files containing the match at all, but I want to show these lines, but not the part of the line that matches.
EDIT: I wanted to use grep over sed, since it can do "only-matching" matches on multi-line, with:
cat file.xml|grep -Pzo "<starttag>.*?(\n.*?)+.*?</starttag>"
This is a rather unusual requirement, I don't think grep would alternate the strings like that. You can achieve this with sed, though:
sed -n 's/$PATTERN//gp' file
EDIT in response to OP's edit:
You can do multiline matching with sed, too, if the file is small enough to load it all into memory:
sed -rn ':r;$!{N;br};s/<starttag>.*?(\n.*?)+.*?<\/starttag>//gp' file.xml
You can do that with a little help from sed:
grep "pattern" input_file | sed 's/pattern//g'
I don't think there is a way in grep.
If you use ack, you could output Perl's special variables $` and $' variables to show everything before and after the match, respectively:
ack string --output="\$`\$'"
Similarly if you wanted to output what did match along with other text, you could use $& which contains the matched string;
ack string --output="Matched: $&"

How to replace a path with another path in sed? [duplicate]

This question already has answers here:
Using different delimiters in sed commands and range addresses
(3 answers)
Closed 9 months ago.
I have a csh script (although I can change languages if it has any relevance) where I have to:
sed s/AAA/BBB/ file
The problem is that AAA and BBB are paths, and so contain '/'. AAA is fixed, so I can say:
sed s/\\\/A\\\/A\\\A/BBB/ file
However, BBB is based on variables, including $PWD. How do I escape the '/' in $PWD?
OR is there some other way I should be doing this entirely?
sed can use any separator instead of / in the s command. Just use something that is not encountered in your paths:
s+AAA+BBB+
and so on.
Alternatively (and if you don't want to guess), you can pre-process your path with sed to escape the slashes:
pwdesc=$(echo $PWD | sed 's_/_\\/_g')
and then do what you need with $pwdesc.
In circumstances where the replacement string or pattern string contain slashes, you can make use of the fact that GNU sed allows an alternative delimiter for the substitute command. Common choices for the delimiter are the pipe character | or the hash # - the best choice of delimiting character will often depend on the type of file being processed. In your case you can try
sed -i 's#/path/to/AAA#/path/to/BBB#g' your_file
Note: The g after last # is to change all occurrences in file if you want to change first ouccurence do not use g
sed -i "s|$fileWithPath|HAHA|g" file
EDIT 1
sed -i 's|path/to/foo|path/to/bar|g' file
Using csh for serious scripting is usually not recommended. However, that is tangential to the issue at hand.
You're probably after something like:
sed -e "s=$oldpath=$newpath="
where the shell variable $oldpath contains the value to be replaced and $newpath contains the replacement, and it is assumed that neither variable contains an equals sign. That is, you're allowed to choose the delimiter on pattern, and avoiding the usual / delimiter avoids problems with slashes in pathnames. If you think = might appear in your file names, choose something less likely to appear, such as control-A or control-G.
In my case the below method works.
sed -i 's/playstation/PS4/' input.txt
Can be also be written as
sed -i 's+playstation+PS4+' input.txt
sed : is stream editor
-i : Allows to edit the source file
+: Is delimiter.
I hope the above information works for you 😃.
You can use parenthesis expansion ${i/p/r} to escape the slashes.
In this case ${i//p/r} for escaping all occurrences.
$p1=${p1//\//\\/}
$p2=${p2//\//\\/}
sed s/$p1/$p2/ file
Or, more concise, in one line sed s/${p1//\//\\/}/${p2//\//\\/}/ file
The two fist slashes // are a separator in parenthesis expansion saying we are matching all occurrences, then \/ is for escaping the slash in the search template, the / as a second separator in the expansion, and then \\/ is the replacement, in witch the backslash must be escaped.
We just needed to get the /h/ network path references out of the path. if we pointed them back to the /c/ drive they would map to non-existant directories but resolve quickly. In my .bashrc I used
PATH=`echo $PATH | sed -e "s+/h/+/c/+g"`

Grep for beginning and end of line?

I have a file where I want to grep for lines that start with either -rwx or drwx AND end in any number.
I've got this, but it isnt quite right. Any ideas?
grep [^.rwx]*[0-9] usrLog.txt
The tricky part is a regex that includes a dash as one of the valid characters in a character class. The dash has to come immediately after the start for a (normal) character class and immediately after the caret for a negated character class. If you need a close square bracket too, then you need the close square bracket followed by the dash. Mercifully, you only need dash, hence the notation chosen.
grep '^[-d]rwx.*[0-9]$' "$#"
See: Regular Expressions and grep for POSIX-standard details.
It looks like you were on the right track... The ^ character matches beginning-of-line, and $ matches end-of-line. Jonathan's pattern will work for you... just wanted to give you the explanation behind it
It should be noted that not only will the caret (^) behave differently within the brackets, it will have the opposite result of placing it outside of the brackets. Placing the caret where you have it will search for all strings NOT beginning with the content you placed within the brackets. You also would want to place a period before the asterisk in between your brackets as with grep, it also acts as a "wildcard".
grep ^[.rwx].*[0-9]$
This should work for you, I noticed that some posters used a character class in their expressions which is an effective method as well, but you were not using any in your original expression so I am trying to get one as close to yours as possible explaining every minor change along the way so that it is better understood. How can we learn otherwise?
You probably want egrep. Try:
egrep '^[d-]rwx.*[0-9]$' usrLog.txt
are you parsing output of ls -l?
If you are, and you just want to get the file name
find . -iname "*[0-9]"
If you have no choice because usrLog.txt is created by something/someone else and you absolutely must use this file, other options include
awk '/^[-d].*[0-9]$/' file
Ruby(1.9+)
ruby -ne 'print if /^[-d].*[0-9]$/' file
Bash
while read -r line ; do case $line in [-d]*[0-9] ) echo $line; esac; done < file
Many answers provided for this question. Just wanted to add one more which uses bashism-
#! /bin/bash
while read -r || [[ -n "$REPLY" ]]; do
[[ "$REPLY" =~ ^(-rwx|drwx).*[[:digit:]]+$ ]] && echo "Got one -> $REPLY"
done <"$1"
#kurumi answer for bash, which uses case is also correct but it will not read last line of file if there is no newline sequence at the end(Just save the file without pressing 'Enter/Return' at the last line).

Resources