This question already has answers here:
Using different delimiters in sed commands and range addresses
(3 answers)
Closed 9 months ago.
I have a csh script (although I can change languages if it has any relevance) where I have to:
sed s/AAA/BBB/ file
The problem is that AAA and BBB are paths, and so contain '/'. AAA is fixed, so I can say:
sed s/\\\/A\\\/A\\\A/BBB/ file
However, BBB is based on variables, including $PWD. How do I escape the '/' in $PWD?
OR is there some other way I should be doing this entirely?
sed can use any separator instead of / in the s command. Just use something that is not encountered in your paths:
s+AAA+BBB+
and so on.
Alternatively (and if you don't want to guess), you can pre-process your path with sed to escape the slashes:
pwdesc=$(echo $PWD | sed 's_/_\\/_g')
and then do what you need with $pwdesc.
In circumstances where the replacement string or pattern string contain slashes, you can make use of the fact that GNU sed allows an alternative delimiter for the substitute command. Common choices for the delimiter are the pipe character | or the hash # - the best choice of delimiting character will often depend on the type of file being processed. In your case you can try
sed -i 's#/path/to/AAA#/path/to/BBB#g' your_file
Note: The g after last # is to change all occurrences in file if you want to change first ouccurence do not use g
sed -i "s|$fileWithPath|HAHA|g" file
EDIT 1
sed -i 's|path/to/foo|path/to/bar|g' file
Using csh for serious scripting is usually not recommended. However, that is tangential to the issue at hand.
You're probably after something like:
sed -e "s=$oldpath=$newpath="
where the shell variable $oldpath contains the value to be replaced and $newpath contains the replacement, and it is assumed that neither variable contains an equals sign. That is, you're allowed to choose the delimiter on pattern, and avoiding the usual / delimiter avoids problems with slashes in pathnames. If you think = might appear in your file names, choose something less likely to appear, such as control-A or control-G.
In my case the below method works.
sed -i 's/playstation/PS4/' input.txt
Can be also be written as
sed -i 's+playstation+PS4+' input.txt
sed : is stream editor
-i : Allows to edit the source file
+: Is delimiter.
I hope the above information works for you 😃.
You can use parenthesis expansion ${i/p/r} to escape the slashes.
In this case ${i//p/r} for escaping all occurrences.
$p1=${p1//\//\\/}
$p2=${p2//\//\\/}
sed s/$p1/$p2/ file
Or, more concise, in one line sed s/${p1//\//\\/}/${p2//\//\\/}/ file
The two fist slashes // are a separator in parenthesis expansion saying we are matching all occurrences, then \/ is for escaping the slash in the search template, the / as a second separator in the expansion, and then \\/ is the replacement, in witch the backslash must be escaped.
We just needed to get the /h/ network path references out of the path. if we pointed them back to the /c/ drive they would map to non-existant directories but resolve quickly. In my .bashrc I used
PATH=`echo $PATH | sed -e "s+/h/+/c/+g"`
Related
I have a text file using markup language (similar to wikipedia articles)
cat test.txt
This is a sample text having: colon in the text. and there is more [[in single or double: brackets]]. I need to select the first word only.
and second line with no [brackets] colon in it.
I need to select the word "having:" only because that is part of regular text. I tried
grep -v '[*:*]' test.txt
This will correctly avoid the tags, but does not select the expected word.
The square brackets specify a character class, so your regular expression looks for any occurrence of one of the characters * or : (or *, but we said that already, didn't we?)
grep has the option -o to only print the matching text, so something lie
grep -ow '[^[:space:]]*:[^[:space:]]*' file.txt
would extract any text with a colon in it, surrounded by zero or more non-whitespace characters on each side. The -w option adds the condition that the match needs to be between word boundaries.
However, if you want to restrict in which context you want to match the text, you will probably need to switch to a more capable tool than plain grep. For example, you could use sed to preprocess each line to remove any bracketed text, and then look for matches in the remaining text.
sed -e 's/\[.*]//g' -e 's/ [^: ]*$/ /' -e 's/[^: ]* //g' -e 's/ /\n/' file.txt
(This assumes that your sed recognizes \n in the replacement string as a literal newline. There are simple workarounds available if it doesn't, but let's not go there if it's not necessary.)
In brief, we first replace any text between square brackets. (This needs to be improved if your input could contain multiple sequences of square brackets on a line with normal text between them. Your example only shows nested square brackets, but my approach is probably too simple for either case.) Then, we remove any words which don't contain a colon, with a special provision for the last word on the line, and some subsequent cleanup. Finally, we replace any remaining spaces with newlines, and (implicitly) print whatever is left. (This still ends up printing one newline too many, but that is easy to fix up later.)
Alternatively, we could use sed to remove any bracketed expressions, then use grep on the remaining tokens.
sed -e :a -e 's/\[[^][]*\]//' -e ta file.txt |
grep -ow '[^[:space:]]*:[^[:space:]]*'
The :a creates a label a and ta says to jump back to that label and try again if the regex matched. This one also demonstrates how to handle nested and repeated brackets. (I suppose it could be refactored into the previous attempt, so we could avoid the pipe to grep. But outlining different solution models is also useful here, I suppose.)
If you wanted to ensure that there is at least one non-colon character adjacent to the colon, you could do something like
... file.txt |
grep -owE '[^:[:space:]]+:[^[:space:]]*|[^[:space:]]*:[^: [:space:]]+'
where the -E option selects a slightly more modern regex dialect which allows us to use | between alternatives and + for one or more repetitions. (Basic grep in 1969 did not have these features at all; much later, the POSIX standard grafted them on with a slightly wacky syntax which requires you to backslash them to remove the literal meaning and select the metacharacter behavior... but let's not go there.)
Notice also how [^:[:space:]] matches a single character which is not a colon or a whitespace character, where [:space:] is the (slightly arcane) special POSIX named character class which matches any whitespace character (regular space, horizontal tab, vertical tab, possibly Unicode whitespace characters, depending on locale).
Awk easily lets you iterate over the tokens on a line. The requirement to ignore matches within square brackets complicates matters somewhat; you could keep a separate variable to keep track of whether you are inside brackets or not.
awk '{ for(i=1; i<=NF; ++i) {
if($i ~ /\]/) { brackets=0; next }
if($i ~ /\[/) brackets=1;
if(brackets) next;
if($i ~ /:/) print $i }' file.txt
This again hard-codes some perhaps incorrect assumptions about how the brackets can be placed. It will behave unexpectedly if a single token contains a closing square bracket followed by an opening one, and has an oversimplified treatment of nested brackets (the first closing bracket after a series of opening brackets will effectively assume we are no longer inside brackets).
A combined solution using sed and awk:
sed 's/ /\n/g' test.txt | gawk 'i==0 && $0~/:$/{ print $0 }/\[/{ i++} /\]/ {i--}'
sed will change all spaces to a newline
awk (or gawk) will output all lines matching $0~/:$/, as long as i equals zero
The last part of the awk stuff keeps a count of the opening and closing brackets.
Another solution using sed and grep:
sed -r -e 's/\[.*\]+//g' -e 's/ /\n/g' test.txt | grep ':$'
's/\[.*\]+//g' will filter the stuff between brackets
's/ /\n/g' will replace a space with a newline
grep will only find lines ending with :
A third on using only awk:
gawk '{ for (t=1;t<=NF;t++){
if(i==0 && $t~/:$/) print $t;
i=i+gsub(/\[/,"",$t)-gsub(/\]/,"",$t) }}' test.txt
gsub returns the number of replacements.
The variable i is used to count the level of brackets. On every [ it is incremented by 1, and on every ] it is decremented by one. This is done because gsub(/\[/,"",$t) returns the number of replaced characters. When having a token like [[][ the count is increased by (3-1=) 2. When a token has brackets AND a semicolon my code will fail, because the token will match, if it ends with a :, before the count of the brackets.
I'm trying to figure out a more elegant way to replace a unique piece of text in a file with a URL.
It seems that sed is interpreting the URL as part of its evaluation logic instead of just replacing the text from a bash variable.
My file looks something like:
$srcRemoveSoftwareURL = "softwareURL"
and I'm attempting to (case-sensitive) search/replace softwareURL with the actual URL.
I'm using a bash script to help with the manipulation and I'm setting up my variables like so:
STORAGE_ENDPOINT_URL="http://mywebsite.com"
sas_url="se=2021-07-20T18%3A42Z&sp=rl&spr=https&sv=2018-11-09&sr=s&sig=oI/T9oHqzfEtuTjAotLyLN3IXbkiADGTPQllkyJlvEA%3D"
softwareURL="$STORAGE_ENDPOINT_URL/1-remove-software.sh?$sas_url"
# the resulting URL is like this:
# http://mywebsite.com/1-remove-software.sh?se=2021-07-20T18%3A42Z&sp=rl&spr=https&sv=2018-11-09&sr=s&sig=oI/T9oHqzfEtuTjAotLyLN3IXbkiADGTPQllkyJlvEA%3D
I then use sed to replace the text:
sed "s|softwareURL|$softwareURL|" template_file.sh
I recognize that bash is taking preference for the $softwareURL variable and inserting it in, but then sed interprets the URL as part of some evaluation logic.
At the moment my resulting template file looks like so:
$srcRemoveSoftwareURL = "http://mywebsite.com/1-remove-software.sh?se=2021-07-20T18%3A42ZsoftwareURLsp=rlsoftwareURLspr=httpssoftwareURLsv=2018-11-09softwareURLsr=ssoftwareURLsig=oI/T9oHqzfEtuTjAotLyLN3IXbkiIIGTPQllkyJlvEA%3D
It seems that sed is also finding any ampersand & characters in the URL and replacing it with the literal softwareURL.
What I'm doing now is to pipe the result to sed again and replace softwareURL with & which seems a little inefficient.
Is there a better way to do this?
Any guidance is most welcome!
Thanks!
The issue with the current result is that in sed the & refers to the pattern matched in the first part of the sed/s command (in this case & == softwareURL).
One idea would be to escape all &'s in the replacement string, eg:
sed "s|softwareURL|$softwareURL|" template_file.sh # old
sed "s|softwareURL|${softwareURL//&/\\&}|" template_file.sh # new
With this new command generating:
$srcRemoveSoftwareURL = "http://mywebsite.com/1-remove-software.sh?se=2021-07-20T18%3A42Z&sp=rl&spr=https&sv=2018-11-09&sr=s&sig=oI/T9oHqzfEtuTjAotLyLN3IXbkiADGTPQllkyJlvEA%3D"
If you find yourself needing to make several replacements:
$ somevariable="abc & % $ 123 & % $ xyz"
$ somevariable="${somevariable//&/\\&}" # escape literal '&'
$ somevariable="${somevariable//%/\\%}" # escape literal '%'
$ somevariable="${somevariable//$/\\$}" # escape literal '$'
$ echo $somevariable
abc \& \% \$ 123 \& \% \$ xyz
NOTE: I'm not saying you need to escape these particular characters for sed ... just pointing out how to go about making multiple replacements in a variable.
I have some short script which looks like this:
It's a way to execute bash inside a groovy command.
sh (script: 'printf "${INFO} | sed 's/^[^\/]*://g'"',returnStdout: true).trim()
The value of INFO is test/word/fine.
With the script above I want to 'delete' everything till (and including) the first /. I can not make it work with the single quotes between single quotes. If that works I can check if my \/ will work.
Apparently Groovy allows you to use triple quotes so you don't have to force the command to be in single single quotes (sic).
sh """printf "${INFO}" | sed 's/^[^\/]*//'"""
Notice also the placement of the double quotes in the printf command. A better still solution would be to say printf '%s' "${INFO}" but ... do you really need the shell to interpolate the value of the variable INFO, and if so, why are you not simply doing sh 'echo "${INFO#*/}"'?
If indeed you only want the first occurrence to be replaced, the /g flag is superfluous, so I took it out. Your regex is anchored to the beginning of the string so it will only ever find a single match to replace, but saying "replace all occurrences on a line" when apparently that's precisely not what you want is misleading and confusing at best.
If indeed your test data doesn't contain a colon, the colon in your regex was wrong, so I took that out, too.
Commonly, we use a different separator like s%^[^/]*/%% so we don't have to backslash-escape slashes in our sed substitutions.
Solution 1st: Following simple sed may help you on same.
echo "test/word/fine" | sed 's/\([^/]*\)\/\(.*\)/\2/'
Solution 2nd: No need to use sed use bash parameter expansion:
var="test/word/fine"
echo "${var#*/}"
word/fine
I am trying to grep the output of a command that outputs unknown text and a directory per line. Below is an example of what I mean:
.MHuj.5.. /var/log/messages
The text and directory may be different from time to time or system to system. All I want to do though is be able to grep the directory out and send it to a variable.
I have looked around but cannot figure out how to grep to the end of a word. I know I can start the search phrase looking for a "/", but I don't know how to tell grep to stop at the end of the word, or if it will consider the next "/" a new word or not. The directories listed could change, so I can't assume the same amount of directories will be listed each time. In some cases, there will be multiple lines listed and each will have a directory list in it's output. Thanks for any help you can provide!
If your directory paths does not have spaces then you can do:
$ echo '.MHuj.5.. /var/log/messages' | awk '{print $NF}'
/var/log/messages
It's not clear from a single example whether we can generalize that e.g. the first occurrence of a slash marks the beginning of the data you want to extract. If that holds, try
grep -o '/.*' file
To fetch everything after the last space, try
grep -o '[^ ]*$' file
For more advanced pattern matching and extraction, maybe look at sed, or Awk or Perl or Python.
Your line can be described as:
^\S+\s+(\S+)$
That's assuming whitespace is your delimiter between the random text and the directory. It simply separates the whitespace from the non-whitespace and captures the second part.
Or you might want to look into the word boundary character class: \b.
I know you said to use grep, but I can't help to mention that this is trivially done using awk:
awk '{ print $NF }' input.txt
This is assuming that a whitespace is the delimiter and that the path does not contain any whitespaces.
Is there any way to do the opposite of showing only the matching part of strings in grep (the -o flag), that is, show everything except the part that matches the regex?
That is, the -v flag is not the answer, since that would not show files containing the match at all, but I want to show these lines, but not the part of the line that matches.
EDIT: I wanted to use grep over sed, since it can do "only-matching" matches on multi-line, with:
cat file.xml|grep -Pzo "<starttag>.*?(\n.*?)+.*?</starttag>"
This is a rather unusual requirement, I don't think grep would alternate the strings like that. You can achieve this with sed, though:
sed -n 's/$PATTERN//gp' file
EDIT in response to OP's edit:
You can do multiline matching with sed, too, if the file is small enough to load it all into memory:
sed -rn ':r;$!{N;br};s/<starttag>.*?(\n.*?)+.*?<\/starttag>//gp' file.xml
You can do that with a little help from sed:
grep "pattern" input_file | sed 's/pattern//g'
I don't think there is a way in grep.
If you use ack, you could output Perl's special variables $` and $' variables to show everything before and after the match, respectively:
ack string --output="\$`\$'"
Similarly if you wanted to output what did match along with other text, you could use $& which contains the matched string;
ack string --output="Matched: $&"