Edit whitespace in text file

Edit whitespace in text file - powershell-2.0

so here is the problem. User inputs string under 11 letters, then text file is opened and "Foo" is replaced with user given string, but the indentation will remain same. let's say I have this data in my text file below.
Foo bar
Foo bar
Foo bar
Now user inputs "bigdog" and "Foo" will be replaced with it, indentation of text file must remain same, see below.
bigdog bar
bigdog bar
bigdog bar
How can I accomplish that? so far my code looks like this:
$name = Read-Host 'write 10 letter string'
(Get-Content file.txt) |
Foreach-Object {$_ -replace 'Foo ',$name.PadRight(11, ' ')} |
Out-File file.txt
But simply adding spaces to 'Foo' does not cut it, so any suggestions?

Use the format operator (-f) for padding and aligning text:
$newstring = '{0,10}' -f $name
(Get-Content 'file.txt') |
ForEach-Object { $_ -replace '^Foo ', $newstring } |
Out-File 'file.txt'
I'd recommend anchoring the search expression (^ matches the beginning of a string) unless you want to replace it anywhere in the string. Also, you could condense the expression to ^Foo {7} (a number in curly brackets matches the expression preceeding it the given number of times, i.e. {7} matches 7 spaces).

Related

extract the adjacent character of selected letter

I have this text file:
# cat letter.txt
this
is
just
a
test
to
check
if
grep
works
The letter "e" appear in 3 words.
# grep e letter.txt
test
check
grep
Is there any way to return the letter printed on left of the selected character?
expected.txt
t
h
r

With shown samples in awk, could you please try following.
awk '/e/{print substr($0,index($0,"e")-1,1)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/e/{ ##Looking if current line has e in it then do following.
print substr($0,index($0,"e")-1,1)
##Printing sub string from starting value of index e-1 and print 1 character from there.
}
' Input_file ##Mentioning Input_file name here.

You can use positive lookahead to match a character that is followed by an e, without making the e part of the match.
cat letter.txt | grep -oP '.(?=e)'

With sed:
sed -nE 's/.*(.)e.*/\1/p' letter.txt

Assuming you have this input file:
cat file
this
is
just
a
test
to
check
if
grep
works
egg
element
You may use this grep + sed solution to find letter or empty string before e:
grep -oE '(^|.)e' file | sed 's/.$//'
t
h
r
l
m
Or alternatively this single awk command should also work:
awk -F 'e' 'NF > 1 {
for (i=1; i<NF; i++) print substr($i, length($i), 1)
}' file

This might work for you (GNU sed):
sed -nE '/(.)e/{s//\n\1\n/;s/^[^\n]*\n//;P;D}' file
Turn off implicit printing and enable extended regexp -nE.
Focus only on lines that meet the requirements i.e. contain a character before e.
Surround the required character by newlines.
Remove any characters before and including the first newline.
Print the first line (up to the second newline).
Delete the first line (including the newline).
Repeat.
N.B. The solution will print each such character on a separate line.
To print all such characters on their own line, use:
sed -nE '/(.e)/{s//\n\1/g;s/^/e/;s/e[^\n]*\n?//g;s/\B/ /g;p}' file
N.B. Remove the s/\B /g if space separation is not needed.

With GNU awk you can use empty string as FS to split the input as individual characters:
awk -v FS= '/[e]/ {for(i=2;i<=NF;i++) if ($i=="e") print $(i-1)}' file
t
h
r
Excluding "e" at the beginning in the for loop.
edited
empty string if e is the first character in the word.
For example, this input:
cat file2
grep
erroneously
egg
Wednesday
effectively
awk -v FS= '/^[e]/ {print ""} /[e]/ {for(i=2;i<=NF;i++) if ($i=="e") print $(i-1)}' file2
r
n
W
n
f
v

Grep with as least one matching value and at least one not matching

I have some files, and I want grep to return the lines, where I have at least one string Position:"Engineer" AND at least one string which does have Position not equal to "Engineer"
So in the below file should return only first line:
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"
Position:"Engineer" Name:"Eva" Position:"Engineer" Name:"Adam"
I could write something like
grep 'Position:"Engineer"' filename | grep 'Position:"Accountant"'
And this works fine (I get only first line), but the thing is I don't know what are all of the possible values in Position, so the grep needs to be generic something like
grep 'Position:"Engineer"' filename | grep -v 'Position:"Engineer"'
But this doesn't return anything (as both grep contradict each other)
Do you have any idea how this can be done?

This line works :
grep "^Position:\"Engineer\"" filename | grep -v " Position:\"Engineer\""
The first expresion with "$" catch only the Position at the begining of line, the second expression with " " space remove the second "Postion" expression.

You can avoid the pipe and additional subshell by using awk if that is allowed, e.g.
awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Above just checks if the first field contains Engineer and if so checks if field 3 also contains Engineer, and if so skips the record, if not prints it. The second rule, just swaps the order of the tests. The result of the tests is that Engineer can only appear in one of the fields (either first or third, but not both)
Example Use/Output
With your sample input in file, you would have:
$ awk '
$1~/Engineer/ {if ($3~/Engineer/) next; print}
$3~/Engineer/ {if ($1~/Engineer/) next; print}
' file
Position:"Engineer" Name:"Jes" Position:"Accountant" Name:"Criss"

Use negative lookahead to exclude a pattern after match.
grep 'Position:"Engineer"' | grep -P 'Position:"(?!Engineer)'

With two greps in a pipe:
grep -F 'Position:"Engineer"' file | grep -Ev '(Position:"[^"]*").*\1'
or, perhaps more robustly
grep -F 'Position:"Engineer"' file | grep -v 'Position:"Engineer".*Position:"Engineer"'
In general case, if you want to print the lines with unique Position fields,
grep -Ev '(Position:"[^"]*").*\1' file
should do the job, assuming all the lines have the format specified. This will work also when there are more than two Position fields in the line.

Join multiple lines into One (.cap file) CentOS

Single entry has multiple lines. Each entry is separated by two blank lines.
Each entry has to be made into a single line followed by a delimiter(;).
Sample Input:
Name:Sid
ID:123
Name:Jai
ID:234
Name:Arun
ID:12
Tried replacing the blank lines with cat test.cap | tr -s [:space:] ';'
Output:
Name:Sid;ID:123;Name:Jai;ID:234;Name:Arun;ID:12;
Expected Output:
Name:SidID:123;Name:JaiID:234;Name:ArunID:12;
Same is the case with Xargs.
I've used sed command as well but it only joined two lines into one. Where as I've 132 lines as one entry and 1000 such entries in one file.

You may use
cat file | awk 'BEGIN { FS = "\n"; RS = "\n\n"; ORS=";" } { gsub(/\n/, "", $0); print }' | sed 's/;;*$//' > output.file
Output:
Name:SidID:123;Name:JaiID:234;Name:ArunID:12
Notes:
FS = "\n" will set field separators to a newline`
RS = "\n\n" will set your record separators to double newline
gsub(/\n/, "", $0) will remove all newlines from a found record
sed 's/;;*$//' will remove the trailing ; added by awk
See the online demo

Could you please try following.
awk 'NF{val=(val?$0~/^ID/?val $0";":val $0:$0)} END{print val}' Input_file
Output will be as follows.
Name:SidID:123;Name:JaiID:234;Name:ArunID:12;
Explanation: Adding explanation of above code too now.
awk ' ##Starting awk program here.
NF{ ##Checking condition if a LINE is NOT NULL and having some value in it.
val=(val?$0~/^ID/?val $0";":val $0:$0) ##Creating a variable val here whose value is concatenating its own value along with check if a line starts with string ID then add a semi colon at last else no need to add it then.
}
END{ ##Starting END section of awk here.
print val ##Printing value of variable val here.
}
' Input_file ##Mentioning Input_file name here.

This might work for you (GNU sed):
sed -r '/./{N;s/\n//;H};$!d;x;s/.//;s/\n|$/;/g' file
If it is not a blank line, append the following line and remove the newline between them. Append the result to the hold space and if it is not the end of the file, delete the current line. At the end of the file, swap to the hold space, remove the first character (which will be a newline) and then replace all newlines (append an extra semi-colon for the last line only) with semi-colons.

Powershell parse parts of a text file and save to CSV

All, I'm very new to powershell and am hoping someone can get me going on what I think would be a simple script.
I need to parse a text file, capture certain lines from it, and save those lines as a csv file.
For example, each alert is in its own text file. Each file is similar to this:
--start of file ---
Name John Smith
Dept Accounting
Codes bas-2349,cav-3928,deg-3942
iye-2830,tel-3890
Urls hxxp://blah.com
hxxp://foo.com, hxxp://foo2.com
Some text I dont care about
More text i dont care about
Comments
---------
"here is a multi line
comment I need
to capture"
Some text I dont care about
More text i dont care about
Date 3/12/2013
---END of file---
For each text file if I wanted to write only Name, Codes, and Urls to a CSV file. Could someone help me get going on this?
I'm more a PERL guy so I know I could write a regex for capturing a single line beginning with Name. However I am completely lost on how I could read the "Codes" line when it might be one line or it might be X lines long until I run into the Urls field.
Any help would be greatly appreciated!

Text parsing usually means regex. With regex, sometimes you need anchors to know when to stop a match and that can make you care about text you otherwise wouldn't. If you can specify that first line of "Some text I don't care about" you can use that to "anchor" your match of the URLs so you know when to stop matching.
$regex = #'
(?ms)Name (.+)?
Dept .+?
Codes (.+)?
Urls (.+)?
Some text I dont care about.+
Comments
---------
(.+)?
Some text I dont care about
'#
$file = 'c:\somedir\somefile.txt'
[IO.File]::ReadAllText($file) -match $regex
if ([IO.File]::ReadAllText($file) -match $regex)
{
$Name = $matches[1]
$Codes = $matches[2] -replace '\s+',','
$Urls = $matches[3] -replace '\s+',','
$comment = $matches[4] -replace '\s+',' '
}
$Name
$Codes
$Urls
$comment

If the file is not too big to be processed in memory, the simple way is to read it as an array of strings. (What too big means is subject to your system. Anything sub-gigabyte should work without too much a hickup.)
After you've read the file, set up a head and tail counters to point to element zero. Move the tail pointer row by row forward, until you find the date row. You can match data with regexps. Now you know the start and end of a single record. For the next record, set head counter to tail+1, tail to tail+2 and start scanning rows again. Lather, rinse, repeat until end of array is reached.
When a record is matched, you can extract name with a regex. Codes and Urls are a bit trickier. Match the Codes row with a regex. Extract it and all the next rows unless they do not match the code pattern. Same goes to Urls data. If the file always has whitespace padding on rows that are data to previous Urls and Codes, you could use match whitespace count with a regexp to get data rows too.

Maybe something line this would to it:
foreach ($Line in gc file.txt) {
switch -regex ($Line) {
'^(Name|Dept|Codes|Urls)' {
$Capture = $true
break
}
'^[A-Za-z0-9_-]+' {
$Capture = $false
break
}
}
if ($Capture) {
$Line
}
}
If you want the end result as a CSV file then you may use the Export-Csv cmdlet.

According the fact that c:\temp\file.txt contains :
Name John Smith
Dept Accounting
Codes bas-2349,cav-3928,deg-3942
iye-2830,tel-3890
Urls hxxp://blah.com
hxxp://foo.com
hxxp://foo2.com
Some text I dont care about
More text i dont care about
.
.
Date 3/12/2013
You can use regular expressions like this :
$a = Get-Content C:\temp\file.txt
$b = [regex]::match($a, "^.*Codes (.*)Urls (.*)Some.*$", "Multiline")
$codes = $b.groups[1].value -replace '[ ]{2,}',','
$urls = $b.groups[2].value -replace '[ ]{2,}',','

If all files have the same structure you could do something like this:
$srcdir = "C:\Test"
$outfile = "$srcdir\out.csv"
$re = '^Name (.*(?:\r\n .*)*)\r\n' +
'Dept .*(?:\r\n .*)*\r\n' +
'Codes (.*(?:\r\n .*)*)\r\n' +
'Urls (.*(?:\r\n .*)*)' +
'[\s\S]*$'
Get-ChildItem $srcdir -Filter *.txt | % {
[io.file]::ReadAllText($_.FullName)
} | Select-String $re | % {
$f = $_.Matches | % { $_.Groups } | ? { $_.Index -gt 0 }
New-Object -TypeName PSObject -Prop #{
'Name' = $f[0].Value;
'Codes' = $f[1].Value;
'Urls' = $f[2].Value;
}
} | Export-Csv $outfile -NoTypeInformation

Print text between ( ) sed

This is an extension of my previous question. In that question, I needed to retrieve the text between parentheses where all the text was on a single line. Now I have this case:
(aop)
(abc
d)
This time, the open parenthesis can be on one line and the close parenthesis on another line, so:
(abc
d)
also counts as text between the delimiters '( )' and I need to print it as
abc
d
EDIT:
In response to possible confusions of my question, let me clarify a little. Basically, I need to print text between delimiters which could span multiple lines.
for example I have this text in my file:
randomtext(1234
567) randomtext
randomtext(abc)randomtext
Now I want Sed to pick out text between the delimiter "(" and ")". So the output would be:
1234
567
abc
Notice that the left and right brackets are not on the same line but they still count as a delimiter for 1234 567, so I need to print that part of the text. (note, I only want the text between the first pair of delimiters).
Any help would be appreciated.

Ah! another tricky sed puzzle :)
I believe this code will work for your problem:
sed -n '/(/,/)/{:a; $!N; /)/!{$!ba}; s/.*(\([^)]*\)).*/\1/p}' file
OUTPUT
For the provided input it produced:
1234
567
abc
Explanation:
-n suppresses the regular sed output
/(/,/)/ is for range selection between ( and )
:a is for marking a label a
$!N means append the next line of input into the current pattern space
/)/! means do some actions if ) is not matched in current pattern space
/)/!${!ba} means go to label a if ) is not matched in current pattern space
s/.*(\([^)]*\)).*/\1/ means replace content between ( and ) by just the content thus stripping out parenthesis
\1 is for back reference of group 1 i.e. text between \( and \)
p is for printing the replaced content

This link has the answer. I am paraphrasing to match your need:
sed -n '1h;1!H;${;g;s/.*(\([^)]*\)).*/\1/;p}' < your_input

The answer given didn't work for my case. What worked for me was:
cat file | tr -d '\n'
^^^
this puts the whole file in a single line by deleting line breaks.
and then I further piped it into the answer here. (note: instead of brackets, OPEN and CLOSE are used in that question)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Edit whitespace in text file - powershell-2.0

Related

extract the adjacent character of selected letter

Grep with as least one matching value and at least one not matching

Join multiple lines into One (.cap file) CentOS

Powershell parse parts of a text file and save to CSV

Print text between ( ) sed

Categories

Resources