Regex for substitution in latex - latex

I have a tex file that contains text and latex commands of the form
This is an acronym \acsu{RNS} and another acronym \acf{FHE}.
The commands are the \acsu{RNS} and \acf{FHE}. I would like to extract the text within the brackets and the output should be
This is an acronym RNS and another acronym FHE.
All of the commands start with \ac*, followed with one or two more characters like \acsu, \acs, \acf etc.
I tried the following sed command
sed -i.bkup 's/ [\][a-z]*{\([^]]*\)}/ \1/g' part.txt
but this replaces the first occurence of \acsu{ and the last } in the last occurence, thus the output is
This is an acronym RNS} and another acronym \acf{FHE.
Note there are numerous commands in the tex file that involve {} brackets but I would like to replace only those starting with \ac. Any idea how to fix this?

Seems that
sed -i.bkup 's/\\ac[a-z]*{\([^][}]*\)}/\1/g' file.txt
does the trick.

Related

Using sed to replace unique text with URL

I'm trying to figure out a more elegant way to replace a unique piece of text in a file with a URL.
It seems that sed is interpreting the URL as part of its evaluation logic instead of just replacing the text from a bash variable.
My file looks something like:
$srcRemoveSoftwareURL = "softwareURL"
and I'm attempting to (case-sensitive) search/replace softwareURL with the actual URL.
I'm using a bash script to help with the manipulation and I'm setting up my variables like so:
STORAGE_ENDPOINT_URL="http://mywebsite.com"
sas_url="se=2021-07-20T18%3A42Z&sp=rl&spr=https&sv=2018-11-09&sr=s&sig=oI/T9oHqzfEtuTjAotLyLN3IXbkiADGTPQllkyJlvEA%3D"
softwareURL="$STORAGE_ENDPOINT_URL/1-remove-software.sh?$sas_url"
# the resulting URL is like this:
# http://mywebsite.com/1-remove-software.sh?se=2021-07-20T18%3A42Z&sp=rl&spr=https&sv=2018-11-09&sr=s&sig=oI/T9oHqzfEtuTjAotLyLN3IXbkiADGTPQllkyJlvEA%3D
I then use sed to replace the text:
sed "s|softwareURL|$softwareURL|" template_file.sh
I recognize that bash is taking preference for the $softwareURL variable and inserting it in, but then sed interprets the URL as part of some evaluation logic.
At the moment my resulting template file looks like so:
$srcRemoveSoftwareURL = "http://mywebsite.com/1-remove-software.sh?se=2021-07-20T18%3A42ZsoftwareURLsp=rlsoftwareURLspr=httpssoftwareURLsv=2018-11-09softwareURLsr=ssoftwareURLsig=oI/T9oHqzfEtuTjAotLyLN3IXbkiIIGTPQllkyJlvEA%3D
It seems that sed is also finding any ampersand & characters in the URL and replacing it with the literal softwareURL.
What I'm doing now is to pipe the result to sed again and replace softwareURL with & which seems a little inefficient.
Is there a better way to do this?
Any guidance is most welcome!
Thanks!
The issue with the current result is that in sed the & refers to the pattern matched in the first part of the sed/s command (in this case & == softwareURL).
One idea would be to escape all &'s in the replacement string, eg:
sed "s|softwareURL|$softwareURL|" template_file.sh # old
sed "s|softwareURL|${softwareURL//&/\\&}|" template_file.sh # new
With this new command generating:
$srcRemoveSoftwareURL = "http://mywebsite.com/1-remove-software.sh?se=2021-07-20T18%3A42Z&sp=rl&spr=https&sv=2018-11-09&sr=s&sig=oI/T9oHqzfEtuTjAotLyLN3IXbkiADGTPQllkyJlvEA%3D"
If you find yourself needing to make several replacements:
$ somevariable="abc & % $ 123 & % $ xyz"
$ somevariable="${somevariable//&/\\&}" # escape literal '&'
$ somevariable="${somevariable//%/\\%}" # escape literal '%'
$ somevariable="${somevariable//$/\\$}" # escape literal '$'
$ echo $somevariable
abc \& \% \$ 123 \& \% \$ xyz
NOTE: I'm not saying you need to escape these particular characters for sed ... just pointing out how to go about making multiple replacements in a variable.

Remove two lines using sed

I'm writing a script which can parse an HTML document. I would like to remove two lines, how does sed work with newlines? I tried
sed 's/<!DOCTYPE.*\n<h1.*/<newstring>/g'
which didn't work. I tried this statement but it removes the whole document because it seems to remove all newlines:
sed ':a;N;$!ba;s/<!DOCTYPE.*\n<h1.*\n<b.*/<newstring>/g'
Any ideas? Maybe I should work with awk?
For the simple task of removing two lines if each matches some pattern, all you need to do is:
sed '/<!DOCTYPE.*/{N;/\n<h1.*/d}'
This uses an address matching the first line you want to delete. When the address matches, it executes:
Next - append the next line to the current pattern-space (including \n)
Then, it matches on an address for the contents of the second line (following \n). If that works it executes:
delete - discard current input and start reading next unread line
If d isn't executed, then both lines will print by default and execution will continue as normal.
To adjust this for three lines, you need only use N again. If you want to pull in multiple lines until some delimiter is reached, you can use a line-pump, which looks something like this:
/<!DOCTYPE.*/{
:pump
N
/some-regex-to-stop-pump/!b pump
/regex-which-indicates-we-should-delete/d
}
However, writing a full XML parser in sed or awk is a Herculean task and you're likely better off using an existing solution.
If an xml parsing tool is definitely not an option, awk maybe an option:
awk '/<!DOCTYPE/ { lne=NR+1;next } NR==lne && /<h1/ { next }1' file
When we encounter a line with "<!DOCTYPE" set the variable lne to the line number + 1 (NR+1) and then skip to the next line. Then when the line is equal to lne (NR==lne) and the line contains "<h1", skip to the next line. Print all other lines by using 1.
My solution for a document like this:
<b>...
<first...
<second...
<third...
<a ...
this awk command works well:
awk -v RS='<first[^\n]*\n<second[^\n]*\n<third[^\n]*\n' '{printf "%s", $0}'
that's all.
This might work for you (GNU sed):
sed 'N;/<!DOCTYPE.*\n<h1.*/d;P;D' file
Append the following line and if the pattern matches both lines in the pattern space delete them.
Otherwise, print then delete the first of the two lines and repeat.
To replace the two lines with another string, use:
sed 'N;s/<!DOCTYPE.*\n<h1.*/another string/;P;D'

Replacing part of LaTeX command using BBedit grep

How can I use the BBedit grep option to replace LaTeX commands like
\textcolor{blue}{Some text}
by the contents of the second set of braces, so
Some text
?
The BBEdit Grep Tutorial gives a lot of information and good examples on using the grep option in BBEdit. What you are trying to achieve is actually a variation of one of the examples. The solution is to enter the following:
Find: \\textcolor\{blue\}\{([^\}]*)\}
Replace: \1
The relevant part is the "Find" section. The first part: \\textcolor\{blue\}\{ basically searches for the content \textcolor{blue}{. You need the \s to escape special characters.
Next, we have the cryptic sequence ([^\}]*): The (...) saves everything inside the parentheses into the variable \1, which you can use in the "Replace" section to insert the content. The [^\}]* consists of ^\} which means match all characters which are not ^ a closing brace \}. With [...]* we say, match any number of "not brace" characters. Overall, this expression makes the grep match all characters which are not closing braces, and saves them into \1.
Finally, the expression ends with a \}, i.e. a closing brace, which is the end of what we want to find.
The "Replace" only contains \1, which is everything inside the parentheses (...) in the "Find" field.

grep from beginning of found word to end of word

I am trying to grep the output of a command that outputs unknown text and a directory per line. Below is an example of what I mean:
.MHuj.5.. /var/log/messages
The text and directory may be different from time to time or system to system. All I want to do though is be able to grep the directory out and send it to a variable.
I have looked around but cannot figure out how to grep to the end of a word. I know I can start the search phrase looking for a "/", but I don't know how to tell grep to stop at the end of the word, or if it will consider the next "/" a new word or not. The directories listed could change, so I can't assume the same amount of directories will be listed each time. In some cases, there will be multiple lines listed and each will have a directory list in it's output. Thanks for any help you can provide!
If your directory paths does not have spaces then you can do:
$ echo '.MHuj.5.. /var/log/messages' | awk '{print $NF}'
/var/log/messages
It's not clear from a single example whether we can generalize that e.g. the first occurrence of a slash marks the beginning of the data you want to extract. If that holds, try
grep -o '/.*' file
To fetch everything after the last space, try
grep -o '[^ ]*$' file
For more advanced pattern matching and extraction, maybe look at sed, or Awk or Perl or Python.
Your line can be described as:
^\S+\s+(\S+)$
That's assuming whitespace is your delimiter between the random text and the directory. It simply separates the whitespace from the non-whitespace and captures the second part.
Or you might want to look into the word boundary character class: \b.
I know you said to use grep, but I can't help to mention that this is trivially done using awk:
awk '{ print $NF }' input.txt
This is assuming that a whitespace is the delimiter and that the path does not contain any whitespaces.

How to replace a path with another path in sed? [duplicate]

This question already has answers here:
Using different delimiters in sed commands and range addresses
(3 answers)
Closed 9 months ago.
I have a csh script (although I can change languages if it has any relevance) where I have to:
sed s/AAA/BBB/ file
The problem is that AAA and BBB are paths, and so contain '/'. AAA is fixed, so I can say:
sed s/\\\/A\\\/A\\\A/BBB/ file
However, BBB is based on variables, including $PWD. How do I escape the '/' in $PWD?
OR is there some other way I should be doing this entirely?
sed can use any separator instead of / in the s command. Just use something that is not encountered in your paths:
s+AAA+BBB+
and so on.
Alternatively (and if you don't want to guess), you can pre-process your path with sed to escape the slashes:
pwdesc=$(echo $PWD | sed 's_/_\\/_g')
and then do what you need with $pwdesc.
In circumstances where the replacement string or pattern string contain slashes, you can make use of the fact that GNU sed allows an alternative delimiter for the substitute command. Common choices for the delimiter are the pipe character | or the hash # - the best choice of delimiting character will often depend on the type of file being processed. In your case you can try
sed -i 's#/path/to/AAA#/path/to/BBB#g' your_file
Note: The g after last # is to change all occurrences in file if you want to change first ouccurence do not use g
sed -i "s|$fileWithPath|HAHA|g" file
EDIT 1
sed -i 's|path/to/foo|path/to/bar|g' file
Using csh for serious scripting is usually not recommended. However, that is tangential to the issue at hand.
You're probably after something like:
sed -e "s=$oldpath=$newpath="
where the shell variable $oldpath contains the value to be replaced and $newpath contains the replacement, and it is assumed that neither variable contains an equals sign. That is, you're allowed to choose the delimiter on pattern, and avoiding the usual / delimiter avoids problems with slashes in pathnames. If you think = might appear in your file names, choose something less likely to appear, such as control-A or control-G.
In my case the below method works.
sed -i 's/playstation/PS4/' input.txt
Can be also be written as
sed -i 's+playstation+PS4+' input.txt
sed : is stream editor
-i : Allows to edit the source file
+: Is delimiter.
I hope the above information works for you 😃.
You can use parenthesis expansion ${i/p/r} to escape the slashes.
In this case ${i//p/r} for escaping all occurrences.
$p1=${p1//\//\\/}
$p2=${p2//\//\\/}
sed s/$p1/$p2/ file
Or, more concise, in one line sed s/${p1//\//\\/}/${p2//\//\\/}/ file
The two fist slashes // are a separator in parenthesis expansion saying we are matching all occurrences, then \/ is for escaping the slash in the search template, the / as a second separator in the expansion, and then \\/ is the replacement, in witch the backslash must be escaped.
We just needed to get the /h/ network path references out of the path. if we pointed them back to the /c/ drive they would map to non-existant directories but resolve quickly. In my .bashrc I used
PATH=`echo $PATH | sed -e "s+/h/+/c/+g"`

Resources