extract lines from a list of text files - parsing

okay so I have a list of files and 3 lines containing a word I need to extract from each line
basically each file can be looked at like this:
random
random
random
random
LINE 1 TEXT RANDOM TEXT
random
LINE 2 TEXT RANDOM TEXT
random
random
LINE 3 TEXT RANDOM TEXT
and what I'm looking to get is a text file containing this (without the FILE * PART):
FILE1 - LINE 1 TEXT RANDOM TEXT | LINE 2 TEXT RANDOM TEXT | LINE 3 TEXT RANDOM TEXT
FILE2 - LINE 1 TEXT RANDOM TEXT | LINE 2 TEXT RANDOM TEXT | LINE 3 TEXT RANDOM TEXT
FILE3 - LINE 1 TEXT RANDOM TEXT | LINE 2 TEXT RANDOM TEXT | LINE 3 TEXT RANDOM TEXT
FILE4 - LINE 1 TEXT RANDOM TEXT | LINE 2 TEXT RANDOM TEXT | LINE 3 TEXT RANDOM TEXT
TEXT RANDOM TEXT is obviously a random text that I'm looking to find, any help would be appreciated I tried powerGREP but it doesn't have an option to retrieve only unique records from each file
(meaning, only 1 match per search term, I get
LINE 1
LINE 2
LINE 2
LINE 3
)
powerGREP, I tried getting the search terms but got instead of 3 unique lines per file I got some 3 unique lines and some 4, 5, 6 because there are sometimes multiple lines with 1 of the search terms

Related

Look for a second carriage return using InDesign Grep

I am looking for a grep expression for InDesign.
I have the following lines:
This is Line 1
This is Line 2
This is Line 3
This is Line 4
This is Line 5
This is Line 6
Line 1 and 2 will be right indent.
Line 3 and 4 will be left indent.
Line 5 and 6 will again be right indent.
There is a carriage return after each line, except line six.
I want to target the carriage return after line 2, 4 and replace it with some other character or forced linebreak.
How can I do that in grep?
Find what: (.+\r.+)\r
Change to: $1#
where # is your symbol.

Format text with indentation

I am writing multi-lines in the file.
Number Name
Jack
The first line is a header.
I am going to write lines like the above format.
If the number is null, I am writing a line like this.
file.print "Jack".indentation("Number".length + 2)
There are 2 spaces between Number and Name in the header.
But it is showing like this since the length 6 spaces are not the same as "Number".
Number Name
Jack
Is there any solution to format better?

swift uitextview limit number of line breaks to two at the time

I wonder if its possible to limit the number of line break that you can do in row?
I have set a limit to the number of characters that can be inserted but users can still insert text like:
Sometext
(line break)
(line break)
(line break)
(line break)
(line break)
(line break)
some more text...
Is it possible to only low two line breaks at the time? So that the above text ends up like:
Some text
(line break)
(line break)
some more text...
Since I cant edit the text backend I must do it in the fronend.
I want too clean out the text and limit the amount of line breaks in a row to two after the user has finished typing
As long as there are 3 newline characters in a row in the text in the UITextView, you can replace them with only 2 newline characters.
while str.contains("\n\n\n") {
str = str.replacingOccurrences(of: "\n\n\n", with: "\n\n")
}

Sort columns while keeping rows intact

Let say I have three columns, each column has over 1,000 entries
A B C
1 2 6
5 3 7
7 4 8
Now I reorder the elements in column A as
5
1
7
...
How can I sort columns B and C so that I have
5 3 7
1 2 6
7 4 8
...
Excel has the "custom list" sort feature that can do exactly what I want. All I need to do is enter column 1 as "5, 1, 7, ..." into the "custom list". However, it doesn't work if my column 1 has 1,000+ entries (I cannot paste the list there). I am looking for a solution with awk or grep.
If you're open to a Perl solution:
($in, $list) = #ARGV;
open IN, "< $in" or die;
while ($line = <IN>) {
#F = split /\s+/, $line;
if (defined $h{$F[0]}) {
die "ERROR: multiple input lines have first column $F[0]\n";
}
$h{$F[0]} = $line;
}
open LIST, "< $list" or die;
while ($line = <LIST>) {
#F = split /\s+/, $line;
if (defined $h{$F[0]}) {
print $h{$F[0]};
} else {
die "ERROR: no match found for $F[0]\n";
}
}
Save this file as "script"
Save your input data as "input"
Save your custom list as "list"
Run: perl script input list
How it works:
Iterate through the input file which contains columns of data, separated by whitespace. If your input file is comma-separated, change /\s+/ to /,/
Split line into fields array #F
Store line into hash h, keyed based on first field $F[0]
Iterate through the list file
Split line into fields (this handles trailing whitespace)
Print contents of hash for that key
Sanity checking is also done

Adding tabs to non delimited text file with empty and variable length columns

I have a non-delimited text file and want to parse it to add tabs at specific spots to delimit columns. The columns are sometimes empty or vary in length, which is why I need to add tabs to those specific spots. I had found the answer to this once a couple of years ago on the net using batch, but now can't find it or the code. I already have the following code to replace more than 2 spaces in the file, but this doesn't account for when the columns are empty.
gc $FileToOpen | % { $_ -replace ' +',"`t" } | set-content $FileToSave
So, I need to read each line, but be able to only read a portion (certain number of characters) of it and add the tabs after each portion to itself.
Here is a sample of the data file, the top row is the header and the data rows have no blank lines in between them:
MRUN Number Name X Exception Reason Data CDM# Quantity D.O.S
000000 00000000 Name W MODIFIER CANNOT BE FILED WITHOUT 08/13/2015 0000000 0 08/13/2015
000000 00000000 Name W MODIFIER CANNOT BE FILED WITHOUT 0000000 0 08/13/2015
The second data row is missing Data.
Using Ansgar's answer, my code that does find empty fields:
gc $FileToOpen |
? { $_ -match '^(.{8})(.{12})(.{20})(.{3})(.{34})(.{62})(.{10})(.{22})(.{10})$' } |
% { "{0}`t{1}`t{2}`t{3}`t{4}`t{5}`t{6}`t{7}`t{8}" -f $matches[1].Trim(), $matches[2].Trim(), $matches[3].Trim(), $matches[4].Trim(), $matches[5].Trim(), $matches[6].Trim(), $matches[7].Trim(), $matches[8].Trim(), $matches[9].Trim() } |
Set-Content $FileToSave
Thanks for your patience Ansgar, I know I tried it! I really do appreciate the help!
Since you seem to have an input file with fixed-width columns, you should probably use a regular expression for transforming the input into a tab-delimited format.
Assume the following input file:
A B C
foo 13 22
bar 4 17
baz 142 23
The file has 3 columns. The first column is 6 characters wide, the other two columns 4 characters each.
The transformation could be done with a regular expression like this:
Get-Content 'C:\path\to\input.txt' |
? { $_ -match '^(.{6})(.{4})(.{4})$' } |
% { "{0}`t{1}`t{2}" -f $matches[1].Trim(), $matches[2].Trim(), $matches[3].Trim() } |
Set-Content 'C:\path\to\output.txt'
The regular expression defines the columns by character count and captures them in groups (parentheses). The groups can then be accessed as the indexes 1 and above of the resulting $matches collection. Trimming removes the leading/trailing whitespace. The format operator (-f) then inserts the trimmed values into the tab-separated format string.
If the last column has a variable width (because its values are aligned to the left and don't have trailing spaces) you may need to change the regular expression to ^(.{6})(.{4})(.{,4})$ to take care of that. The quantifier {,4} (or {0,4}) means up to four times the preceding expression.

Resources