How to join mutilines in the Notepad++? - join

In the Notepad++, I have thousands lines of data need to modify, some of them in one appropriate line and end by "$", some data should have in one line but now arrange in several lines, so how to join them together and all end by "$"?
Here is the data sample:
1.we love it $
2.its beautiful $
3.how
can
it? $
4. yes I love it $
5. sorry
its
ugly
too $
for that sample, line 1,2,4 in the right line, but for line 3 and 5, it separates into multi-lines, so how to join them together?PS: except all the ends it has $, in other contents it has no one more "$"

Use regex replace:
find: (?<!\$)[\n\r]+(( ) *)?
replace: $2
The $2 preserves one of the leading spaces (if any) from the joined line.
Given your input, the above produces:
1.we love it $
2.its beautiful $
3.how can it? $
4. yes I love it $
5. sorry its ugly too $
Note that your sample input is "corrupt" in that it has trailing spaces after $ (eg the first line), so you'll have to clean that up first.

Related

Remove two lines using sed

I'm writing a script which can parse an HTML document. I would like to remove two lines, how does sed work with newlines? I tried
sed 's/<!DOCTYPE.*\n<h1.*/<newstring>/g'
which didn't work. I tried this statement but it removes the whole document because it seems to remove all newlines:
sed ':a;N;$!ba;s/<!DOCTYPE.*\n<h1.*\n<b.*/<newstring>/g'
Any ideas? Maybe I should work with awk?
For the simple task of removing two lines if each matches some pattern, all you need to do is:
sed '/<!DOCTYPE.*/{N;/\n<h1.*/d}'
This uses an address matching the first line you want to delete. When the address matches, it executes:
Next - append the next line to the current pattern-space (including \n)
Then, it matches on an address for the contents of the second line (following \n). If that works it executes:
delete - discard current input and start reading next unread line
If d isn't executed, then both lines will print by default and execution will continue as normal.
To adjust this for three lines, you need only use N again. If you want to pull in multiple lines until some delimiter is reached, you can use a line-pump, which looks something like this:
/<!DOCTYPE.*/{
:pump
N
/some-regex-to-stop-pump/!b pump
/regex-which-indicates-we-should-delete/d
}
However, writing a full XML parser in sed or awk is a Herculean task and you're likely better off using an existing solution.
If an xml parsing tool is definitely not an option, awk maybe an option:
awk '/<!DOCTYPE/ { lne=NR+1;next } NR==lne && /<h1/ { next }1' file
When we encounter a line with "<!DOCTYPE" set the variable lne to the line number + 1 (NR+1) and then skip to the next line. Then when the line is equal to lne (NR==lne) and the line contains "<h1", skip to the next line. Print all other lines by using 1.
My solution for a document like this:
<b>...
<first...
<second...
<third...
<a ...
this awk command works well:
awk -v RS='<first[^\n]*\n<second[^\n]*\n<third[^\n]*\n' '{printf "%s", $0}'
that's all.
This might work for you (GNU sed):
sed 'N;/<!DOCTYPE.*\n<h1.*/d;P;D' file
Append the following line and if the pattern matches both lines in the pattern space delete them.
Otherwise, print then delete the first of the two lines and repeat.
To replace the two lines with another string, use:
sed 'N;s/<!DOCTYPE.*\n<h1.*/another string/;P;D'

grep for argument of a latex command

I'm trying to sort out some broken references in a latex file. They are commands such as \cref{ps.1.1}. I would like to grep my file and get only the argument of the command as output, in this case ps.1.1. grep -Po \\\\cref{.*?} my.tex gives me only the command, not the rest of the line, but I'd like to also get rid of the \cref{ and } in the output, so that I could iterate over them.
Here is a Perl one-liner, printing out only the matches, including multiple ones on the same line. It puts out a line per match, even for those on the same line, prepended with their line numbers.
perl -nle 'print "$.: $1" while(/\\cref\{(.*?)\}/g)' file.tex
This may need to and can be modified, depending on the exact output you want.
For example, to print just once for multiple matches on the same line, drop the /g modifier (remove g after the regex). To match multiple patterns, add them to the regex (separated by | and grouped by ()) and add $2, $3 (...) to print. To see the whole line, change $1 to $_. Etc.
A simple script would offer far more flexiblity and processing opportunities.

How to use grep to search for an exact word match in TextWrangler

There is a possibility to search using grep in TextWrangler
I want to find and replace the following word: bauvol, but not bauvolumen.
I tried typing ^bauvol$ into the search field but that didn't do the trick, it didn't find anything, although the word is clearly there.
I think it's because, in grep, the ^and $signify start and end of line, not a word?!
You want to use \b as word boundaries, as #gromi08 said:
\bbauvol\b
If you want to copy any portion of this word (so you can replace it, modify it, change the case, etc.) it is usually best to wrap it in ( and ) braces so you can reference them in the Replace box:
Find:
(\bbauvol\b)
Replace:
<some_tag>\1</some_tag>
Did you have anything specific you were trying to do with the result once you found it (cut it, duplicate it, etc.)?
Use the -w option of grep (see grep man-page.
This option searches for the expression as a word.
Therefore the command will be:
cat file.txt | grep -w bauvol
And yes, ^ and $ are for start and end of line.

Grep for beginning and end of line?

I have a file where I want to grep for lines that start with either -rwx or drwx AND end in any number.
I've got this, but it isnt quite right. Any ideas?
grep [^.rwx]*[0-9] usrLog.txt
The tricky part is a regex that includes a dash as one of the valid characters in a character class. The dash has to come immediately after the start for a (normal) character class and immediately after the caret for a negated character class. If you need a close square bracket too, then you need the close square bracket followed by the dash. Mercifully, you only need dash, hence the notation chosen.
grep '^[-d]rwx.*[0-9]$' "$#"
See: Regular Expressions and grep for POSIX-standard details.
It looks like you were on the right track... The ^ character matches beginning-of-line, and $ matches end-of-line. Jonathan's pattern will work for you... just wanted to give you the explanation behind it
It should be noted that not only will the caret (^) behave differently within the brackets, it will have the opposite result of placing it outside of the brackets. Placing the caret where you have it will search for all strings NOT beginning with the content you placed within the brackets. You also would want to place a period before the asterisk in between your brackets as with grep, it also acts as a "wildcard".
grep ^[.rwx].*[0-9]$
This should work for you, I noticed that some posters used a character class in their expressions which is an effective method as well, but you were not using any in your original expression so I am trying to get one as close to yours as possible explaining every minor change along the way so that it is better understood. How can we learn otherwise?
You probably want egrep. Try:
egrep '^[d-]rwx.*[0-9]$' usrLog.txt
are you parsing output of ls -l?
If you are, and you just want to get the file name
find . -iname "*[0-9]"
If you have no choice because usrLog.txt is created by something/someone else and you absolutely must use this file, other options include
awk '/^[-d].*[0-9]$/' file
Ruby(1.9+)
ruby -ne 'print if /^[-d].*[0-9]$/' file
Bash
while read -r line ; do case $line in [-d]*[0-9] ) echo $line; esac; done < file
Many answers provided for this question. Just wanted to add one more which uses bashism-
#! /bin/bash
while read -r || [[ -n "$REPLY" ]]; do
[[ "$REPLY" =~ ^(-rwx|drwx).*[[:digit:]]+$ ]] && echo "Got one -> $REPLY"
done <"$1"
#kurumi answer for bash, which uses case is also correct but it will not read last line of file if there is no newline sequence at the end(Just save the file without pressing 'Enter/Return' at the last line).

Easiest way to remove Latex tag (but not its content)?

I am using TeXnicCenter to edit a LaTeX document.
I now want to remove a certain tag (say, emph{blabla}} which occurs multiple times in my document , but not tag's content (so in this example, I want to remove all emphasization).
What is the easiest way to do so?
May also be using another program easily available on Windows 7.
Edit: In response to regex suggestions, it is important that it can deal with nested tags.
Edit 2: I really want to remove the tag from the text file, not just disable it.
Using a regular expression do something like s/\\emph\{([^\}]*)\}/\1/g. If you are not familiar with regular expressions this says:
s -- replace
/ -- begin match section
\\emph\{ -- match \emph{
( -- begin capture
[^\}]* -- match any characters except (meaning up until) a close brace because:
[] a group of characters
^ means not or "everything except"
\} -- the close brace
and * means 0 or more times
) -- end capture, because this is the first (in this case only) capture, it is number 1
\} -- match end brace
/ -- begin replace section
\1 -- replace with captured section number 1
/ -- end regular expression, begin extra flags
g -- global flag, meaning do this every time the match is found not just the first time
This is with Perl syntax, as that is what I am familiar with. The following perl "one-liners" will accomplish two tasks
perl -pe 's/\\emph\{([^\}]*)\}/\1/g' filename will "test" printing the file to the command line
perl -pi -e 's/\\emph\{([^\}]*)\}/\1/g' filename will change the file in place.
Similar commands may be available in your editor, but if not this will (should) work.
Crowley should have added this as an answer, but I will do that for him, if you replace all \emph{ with { you should be able to do this without disturbing the other content. It will still be in braces, but unless you have done some odd stuff it shouldn't matter.
The regex would be a simple s/\\emph\{/\{/g but the search and replace in your editor will do that one too.
Edit: Sorry, used the wrong brace in the regex, fixed now.
\renewcommand{\emph}[1]{#1}
any reasonably advanced editor should let you do a search/replace using regular expressions, replacing emph{bla} by bla etc.

Resources