Look for a second carriage return using InDesign Grep - grep

I am looking for a grep expression for InDesign.
I have the following lines:
This is Line 1
This is Line 2
This is Line 3
This is Line 4
This is Line 5
This is Line 6
Line 1 and 2 will be right indent.
Line 3 and 4 will be left indent.
Line 5 and 6 will again be right indent.
There is a carriage return after each line, except line six.
I want to target the carriage return after line 2, 4 and replace it with some other character or forced linebreak.
How can I do that in grep?

Find what: (.+\r.+)\r
Change to: $1#
where # is your symbol.

Related

extract lines from a list of text files

okay so I have a list of files and 3 lines containing a word I need to extract from each line
basically each file can be looked at like this:
random
random
random
random
LINE 1 TEXT RANDOM TEXT
random
LINE 2 TEXT RANDOM TEXT
random
random
LINE 3 TEXT RANDOM TEXT
and what I'm looking to get is a text file containing this (without the FILE * PART):
FILE1 - LINE 1 TEXT RANDOM TEXT | LINE 2 TEXT RANDOM TEXT | LINE 3 TEXT RANDOM TEXT
FILE2 - LINE 1 TEXT RANDOM TEXT | LINE 2 TEXT RANDOM TEXT | LINE 3 TEXT RANDOM TEXT
FILE3 - LINE 1 TEXT RANDOM TEXT | LINE 2 TEXT RANDOM TEXT | LINE 3 TEXT RANDOM TEXT
FILE4 - LINE 1 TEXT RANDOM TEXT | LINE 2 TEXT RANDOM TEXT | LINE 3 TEXT RANDOM TEXT
TEXT RANDOM TEXT is obviously a random text that I'm looking to find, any help would be appreciated I tried powerGREP but it doesn't have an option to retrieve only unique records from each file
(meaning, only 1 match per search term, I get
LINE 1
LINE 2
LINE 2
LINE 3
)
powerGREP, I tried getting the search terms but got instead of 3 unique lines per file I got some 3 unique lines and some 4, 5, 6 because there are sometimes multiple lines with 1 of the search terms

Multiple-line collapse in Notepad++

I want to use notepad++ to do multiple-line collapse. What I mean is that I am looking for a simple operation to turn
1
2
3
4
into
1 2
3 4
How about:
Ctrl+H
Find what: (.+)\R(.+)(\R)
replace with: $1 $2$3
Replace all
Where \R stands for any kind of linebreak.
This will replace a linebreak between 2 lines by these 2 lines separated by a space.

Why does my grep command output "--" between some lines?

I have a fasta file like the test one here:
>HWI-D00196:168:C66U5ANXX:3:1106:16404:19663 1:N:0:GCCAAT
CCTAGCACCATGATTTAATGTTTCTTTTGTACGTTCTTTCTTTGGAAACTGCACTTGTTGCAACCTTGCAAGCCATATAAACACATTTCAGATATAAGGCT
>HWI-D00196:168:C66U5ANXX:3:1106:16404:19663 2:N:0:GCCAAT
AAAACATAAATTTGAGCTTGACAAAAATTAAAAATGAGCCCAGCCTTATATCTGAAATGTGTTTATATGGCTTGCAAGGTTGCAACAAGTGCAGTTTCCAA
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 1:N:0:GCCAAT
ATATTTGAATTATCAGAAATAAACACAAAGAAAACCTAGAACAGATAATTTCTTCCACATTATTGATCAGATACAGATTTCAAGGGTACCGTTGTGAATTG
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 2:N:0:GCCAAT
AAACGATTGATAGATCTATTTGCATTATAAAAACATTAAAAAAACAAAATACTGATTAAATGTCGTCTTTCTATTCCACAATTTTATAGATCTCACTGTAT
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 1:N:0:GCCAAT
CTTACTTTGCCTCTCTCAGCCAATGTCTCCTGAGTCTAATTTTTTGGAGGCTAAGCTATGAGCTAATGATGGGTTCCATTTGGGGCCAATGCTTCAGCCTG
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 2:N:0:GCCAAT
CTATTAGTTCTTATCTTTGCCTGCAAATATAAGACTAGCGCTTGAGTAGCTGACAGAGACAAAGTAAGCTGGAGTGTTTATCACCTGGTCACTCCAATTGT
When i type in a simple grep command like:
grep -B1 "CTT" test.fasta
I get a really strange output in which "--" is sometimes placed on a newline above the grep hit like so:
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 2:N:0:GCCAAT
AAACGATTGATAGATCTATTTGCATTATAAAAACATTAAAAAAACAAAATACTGATTAAATGTCGTCTTTCTATTCCACAATTTTATAGATCTCACTGTAT
--
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 2:N:0:GCCAAT
CTATTAGTTCTTATCTTTGCCTGCAAATATAAGACTAGCGCTTGAGTAGCTGACAGAGACAAAGTAAGCTGGAGTGTTTATCACCTGGTCACTCCAATTGT
I can't figure out why some fasta entries have this and others don't. I don't get this problem when i remove the -B1. I can remove those lines from my file with a grep -v "--" statement, but I'd really like to understand what's going on here.
You are asking for one line of leading context by using the -B1 option. This means grep will display both the line which matched and the line directly before it. Each match will be separated by -- on a line by itself as shown below:
$ man grep | grep -B1 context
-A num, --after-context=num
Print num lines of trailing context after each match. See also
--
-B num, --before-context=num
Print num lines of leading context before each match. See also
--
-C[num, --context=num]
Print num lines of leading and trailing context surrounding each
--
--context[=num]
Print num lines of leading and trailing context. The default is
The reason you aren't seeing -- between every match is that the context is only displayed above a sequence of consecutive matches. So see the following example:
seq 13 | grep -B1 1
1
--
9
10
11
12
13
The seq command produces all the numbers between 1 and 13. Only the first line and the lines from 10 on contain a 1, so you see the 1 in its own group, then --, then the one line context, then the group of consecutive matching lines.
GREP_COLORS section of the grep manpage says :
Specifies the colors and other attributes used to highlight various > parts of the output. Its value is a colon-separated list
of capabilities that defaults to
ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36 with the rv and
ne boolean capabilities omitted (i.e., false).
and
se=36 SGR substring for separators that are inserted between
selected line fields (:), between context line fields, (-), and
between groups of adjacent lines when nonzero context is
specified (--). The default is a cyan text foreground over the
terminal's default background.
Consider file sample.txt :
$cat sample.txt
ABBB
AAB
AAB
S
S
S
AABB
ABAA
BAA
CCC
$grep -B2 'AAB' sample.txt
ABBB
AAB
AAB
--
S
S
AABB
Here -- is the way of grep to tell you that AAB before -- and S after -- are not adjacent lines in the actual file.

How can I extract some data out of the middle of a noisy file using Perl 6?

I would like to do this using idiomatic Perl 6.
I found a wonderful contiguous chunk of data buried in a noisy output file.
I would like to simply print out the header line starting with Cluster Unique and all of the lines following it, up to, but not including, the first occurrence of an empty line. Here's what the file looks like:
</path/to/projects/projectname/ParameterSweep/1000.1.7.dir> was used as the working directory.
....
Cluster Unique Sequences Reads RPM
1 31 3539 3539
2 25 2797 2797
3 17 1679 1679
4 21 1636 1636
5 14 1568 1568
6 13 1548 1548
7 7 1439 1439
Input file: "../../filename.count.fa"
...
Here's what I want parsed out:
Cluster Unique Sequences Reads RPM
1 31 3539 3539
2 25 2797 2797
3 17 1679 1679
4 21 1636 1636
5 14 1568 1568
6 13 1548 1548
7 7 1439 1439
One-liner version
.say if /Cluster \s+ Unique/ ff^ /^\s*$/ for lines;
In English
Print every line from the input file starting with the once containing the phrase Cluster Unique and ending just before the next empty line.
Same code with comments
.say # print the default variable $_
if # do the previous action (.say) "if" the following term is true
/Cluster \s+ Unique/ # Match $_ if it contains "Cluster Unique"
ff^ # Flip-flop operator: true until preceding term becomes true
# false once the term after it becomes true
/^\s*$/ # Match $_ if it contains an empty line
for # Create a loop placing each element of the following list into $_
lines # Create a list of all of the lines in the file
; # End of statement
Expanded version
for lines() {
.say if (
$_ ~~ /Cluster \s+ Unique/ ff^ $_ ~~ /^\s*$/
)
}
lines() is like <> in perl5. Each line from each file listed on the command line is read in one at a time. Since this is in a for loop, each line is placed in the default variable $_.
say is like print except that it also appends a newline. When written with a starting ., it acts directly on the default variable $_.
$_ is the default variable, which in this case contains one line from the file.
~~ is the match operator that is comparing $_ with a regular expression.
// Create a regular expression between the two forward slashes
\s+ matches one or more spaces
ff is the flip-flop operator. It is false as long as the expression to its left is false. It becomes true when the expression to its left is evaluated as true. It becomes false when the expression to its right becomes true and is never evaluated as true again. In this case, if we used ^ff^ instead of ff^, then the header would not be included in the output.
When ^ comes before (or after) ff, it modifies ff so that it is also false the iteration that the expression to its left (or right) becomes true.
/^\*$/ matches an empty line
^ matches the beginning of a string
\s* matches zero or more spaces
$ matches the end of a string
By the way, the flip-flop operator in Perl 5 is .. when it is in a scalar context (it's the range operator in list context). But its features are not quite as rich as in Perl 6, of course.
I would like to do this using idiomatic Perl 6.
In Perl, the idiomatic way to locate a chunk in a file is to read the file in paragraph mode, then stop reading the file when you find the chunk you are interested in. If you are reading a 10GB file, and the chunk is found at the top of the file, it's inefficient to continue reading the rest of the file--much less perform an if test on every line in the file.
In Perl 6, you can read a paragraph at a time like this:
my $fname = 'data.txt';
my $infile = open(
$fname,
nl => "\n\n", #Set what perl considers the end of a line.
); #Removed die() per Brad Gilbert's comment.
for $infile.lines() -> $para {
if $para ~~ /^ 'Cluster Unique'/ {
say $para.chomp;
last; #Quit reading the file.
}
}
$infile.close;
# ^ Match start of string.
# 'Cluster Unique' By default, whitespace is insignificant in a perl6 regex. Quotes are one way to make whitespace significant.
However, in perl6 rakudo/moarVM the open() function does not read the nl argument correctly, so you currently can't set paragraph mode.
Also, there are certain idioms that are considered by some to be bad practice, like:
Postfix if statements, e.g. say 'hello' if $y == 0.
Relying on the implicit $_ variable in your code, e.g. .say
So, depending on what side of the fence you live on, that would be considered a bad practice in Perl.

How can I tell HAML to keep whitespace at the end of my view's lines?

I have some HAML views in my Rails project that are used to send files to the user. They're not rendered as HTML, just plain text files that get downloaded. These files have to match a very specific format, and the format has an idiosyncrasy that ends up requiring every second line to end with a tab character.
Line 0a\t01234
Line 0b\t
Line 1a\t12345
Line 1b\t
Line 2a\t23456
Line 2b\t
The a lines have have their tab characters printed fine, but the b lines do not. If I add any non-whitespace characters after the tab character, the tab gets printed. But when it's the last character on the line, it does not.
My view looks like
- #line_pairs.each do |line_pair|
= line_pair.a.words + "\t" + line_pair.a.numbers
= line_pair.b.words + "\t"
I'm sure that the tab character is not there (my editor shows them visually). There also is no space or anything of the like. I just get
Line 0a\t01234
Line 0b
Line 1a\t12345
Line 1b
Line 2a\t23456
Line 2b
Is there any way to fix this? Thanks for any help.
The haml documentation says that tilde (~) acts just like = but preserves whitespace.
Does this work?
- #line_pairs.each do |line_pair|
~ line_pair.a.words + "\t" + line_pair.a.numbers
~ line_pair.b.words + "\t"
Try :preserve
:preserve
- #line_pairs.each do |line_pair|
The proper solution as pointed out by matt is actually to just use ERB, as HAML is not meant to fill needs around controlling whitespace at this level of detail.

Resources