I was wondering if it's possible to use grep to find all of the text that is in between the following two strings:
mutablePath = CGPathCreateMutable();
...
CGPathAddPath(skinMutablePath, NULL, mutablePath);
Basically, the first and last lines will always be the same, and there will be a whole bunch of random stuff in between. I would like to count the number of lines that appear between all instances of the first and last line from above.
Is this even possible?
Here's another awk solution:
awk '/^mutablePath = CGPathCreateMutable\(\);$/ { m=1; c=0 }
/^CGPathAddPath\(skinMutablePath, NULL, mutablePath\);$/ { print c-1; m=0 }
m { c++ }' file
You can't do this with grep, but you can do it with awk. This is totally untested but should work:
awk 'BEGIN { state = 0; count = 0; }
/^mutablePath = CGPathCreateMutable();$/ { state = 1; }
/^CGPathAddPath(skinMutablePath, NULL, mutablePath);$/
{ print count; state = 0; count = 0 }
{ if (state) count++; }' FILE_OF_INTEREST
Here's an awk solution if you have access to that besides grep:
awk '/^mutablePath = CGPathCreateMutable\(\)\;$/ {in_block=1}
in_block==1 {count++}
/^CGPathAddPath\(skinMutablePath, NULL, mutablePath\)$/ {in_block==0; count--}
END{print count}' input
Related
I've got a python script that runs through some logs and figured it'd be instructive to do a few benchmarks against some other approaches before deploying this out. When looking at awk, I'm hoping to minimize overhead to get a 'fair' shake at beating the somewhat optimized python variant.
My log entries look like:
--------
SomeField=SomeValue
OptionallyAppearingField=WhoKnowsWhat
AnotherField=AnotherValue
ExtraStuff=OneBonusKey=1,SecondBonusKey=2,ThirdBonusKey=3,...
--------
And I'm keen to get the value of AnotherField when one of our ThirdBonusKeys exists and has a certain value (actually just the number 1).
The 'stupid' way here is to set our RS to '--------' and then just apply a regex to $0 twice, first to see if ThirdBonusKey=1 is in the record, and then to extract AnotherField=(desired_value).
But that seems like an unfair comparison, given it's just throwing a regex at the problem (twice!). Without a guaranteed ordering of fields to leverage awk's cool FS skills, is there a quicker or more appropriate approach here? It's possible that the answer is just "this is not a job for awk", and that's okay too, I guess.
Cyrus has kindly pointed out that the sketch of code I gave above is not technically code, and he's technically correct, so here's a reasonably stupid implementation:
awk 'BEGIN{RS="--------"} { if ($0 ~ /ThirdBonusKey=1/) { for(i=1;i<NF;i++) {if ($i ~ "AnotherField=") { print $i }}}}'
Given input
--------
SomeField=SomeValue
OptionallyAppearingField=WhoKnowsWhat
AnotherField=DesiredValue1
ExtraStuff=OneBonusKey=1,SecondBonusKey=2,ThirdBonusKey=1,...
--------
SomeField=SomeValue
OptionallyAppearingField=WhoKnowsWhat
AnotherField=DesiredValue2
ExtraStuff=OneBonusKey=1,SecondBonusKey=2,ThirdBonusKey=0,...
--------
SomeField=
ExtraStuff=
--------
we'd expect output
AnotherField=DesiredValue1
Most efficiently I expect:
$ awk '/^AnotherField=/{val=$0; next} /[=,]ThirdBonusKey=1(,|$)/{print val}' file
AnotherField=DesiredValue1
but more robustly and easier to enhance to do anything else you want later:
$ cat tst.awk
BEGIN { FS="[,=[:space:]]"; OFS="=" }
/^-+$/ {
if ( f["ExtraStuff_ThirdBonusKey"] == 1 ) {
print "AnotherField", f["AnotherField"]
}
delete f
next
}
{
if ( $1 == "ExtraStuff" ) {
pfx = $1
sub(/[^=]+=/,"")
f[pfx] = $0
pfx = pfx "_"
}
else {
pfx = ""
}
for (i=1; i<NF; i+=2) {
f[pfx $i] = $(i+1)
}
}
$ awk -f tst.awk file
AnotherField=DesiredValue1
That second script first stores all of the values in an array f[] so you can access the values by their names, here's what the contents of that array look like:
$ cat tst.awk
BEGIN { FS="[,=[:space:]]"; OFS="=" }
/^-+$/ {
for (i in f) printf "> f[%s]=%s\n", i, f[i]
if ( f["ExtraStuff_ThirdBonusKey"] == 1 ) {
print "AnotherField", f["AnotherField"]
}
print "----"
delete f
next
}
{
if ( $1 == "ExtraStuff" ) {
pfx = $1
sub(/[^=]+=/,"")
f[pfx] = $0
pfx = pfx "_"
}
else {
pfx = ""
}
for (i=1; i<NF; i+=2) {
f[pfx $i] = $(i+1)
}
}
.
$ awk -f tst.awk file
----
> f[OptionallyAppearingField]=WhoKnowsWhat
> f[AnotherField]=DesiredValue1
> f[ExtraStuff_SecondBonusKey]=2
> f[ExtraStuff_ThirdBonusKey]=1
> f[ExtraStuff_OneBonusKey]=1
> f[SomeField]=SomeValue
> f[ExtraStuff]=OneBonusKey=1,SecondBonusKey=2,ThirdBonusKey=1,...
AnotherField=DesiredValue1
----
> f[OptionallyAppearingField]=WhoKnowsWhat
> f[AnotherField]=DesiredValue2
> f[ExtraStuff_SecondBonusKey]=2
> f[ExtraStuff_ThirdBonusKey]=0
> f[ExtraStuff_OneBonusKey]=1
> f[SomeField]=SomeValue
> f[ExtraStuff]=OneBonusKey=1,SecondBonusKey=2,ThirdBonusKey=0,...
----
> f[SomeField]=
> f[ExtraStuff]=
----
Given that you can create whatever conditions and/or print whatever combinations of fields you want in any input or output order.
initially I have to state, that I have little to no experience with powershell so far. A previous system generates the wrong output for me. So I want to use PowerShell to change this. From the System I get an output looking like this:
TEST1^|^9999^|^Y^|^NOT IN^|^('1','2','3')^|^N^|^LIKE^|^('4','5','6','7')^|^...^|^Y^|^NOT IN^|^('8','9','10','11','12')
TEST2^|^9998^|^Y^|^NOT IN^|^('4','5','6')^|^N^|^LIKE^|^('6','7','8','9')^|^...^|^Y^|^NOT IN^|^('1','2','15','16','17')^|^Y^|^NOT IN^|^('18','19','20','21','22')
When you look at it, there is a starting part for each line (TEST1^|^9999^|^) followed by a1 to a-n tuples (example: Y^|^NOT IN^|^('1','2','3')^|^).
The way I want this to look like is here:
TEST1^|^9999^|^Y^|^NOT IN^|^('1','2','3')
TEST1^|^9999^|^N^|^LIKE^|^('4','5','6','7')
TEST1^|^9999^|^Y^|^NOT IN^|^('8','9','10','11','12')
TEST2^|^9998^|^Y^|^NOT IN^|^('4','5','6')
TEST2^|^9998^|^N^|^LIKE^|^('6','7','8','9')
TEST2^|^9998^|^Y^|^NOT IN^|^('1','2','15','16','17')
TEST2^|^9998^|^Y^|^NOT IN^|^('18','19','20','21','22')
So the tuples shall be printed out per line, with the starting part attached in front.
My solution approach is the AWK equivalent in Powershell, but to date I lack the understanding of how to tackle the issue of how to deal with an indetermined number of tuples and to repeat the starting block.
I thank you so much in advance for your help!
I'd split the lines at ^|^ and recombine the fields of the resulting array in a loop. Something like this:
$sp = '^|^'
Get-Content 'C:\path\to\input.txt' | % {
$a = $_ -split [regex]::Escape($sp)
for ($i=2; $i -lt $a.length; $i+=3) {
"{0}$sp{1}$sp{2}$sp{3}$sp{4}" -f $a[0,1,$i,($i+1),($i+2)]
}
} | Set-Content 'C:\path\to\output.txt'
The data looks quite regular so you could loop over it using | as the delimiter and counting the following cells in 3s:
$data = #"
TEST1^|^9999^|^Y^|^NOT IN^|^('1','2','3')^|^N^|^LIKE^|^('4','5','6','7')^|^Y^|^NOT IN^|^('8','9','10','11','12')
TEST2^|^9998^|^Y^|^NOT IN^|^('4','5','6')^|^N^|^LIKE^|^('6','7','8','9')^|^Y^|^NOT IN^|^('1','2','15','16','17')^|^Y^|^NOT IN^|^('18','19','20','21','22')
"#
$data.split("`n") | % {
$ds = $_.split("|")
$heading = "$($ds[0])|$($ds[1])"
$j = 0
for($i = 2; $i -lt $ds.length; $i += 1) {
$line += "|$($ds[$i])" -replace "\^(\((?:'\d+',?)+\))\^?",'$1'
$j += 1
if($j -eq 3) {
write-host $heading$line
$line = ""
$j = 0
}
}
}
Parsing an arbitary length string record to row records is quite error prone. A simple solution would be processing the data row-by-row and creating output.
Here is a simple illustration how to process a single row. Processing the whole input file and writing output is left as trivial an exercise to the reader.
$s = "TEST1^|^9999^|^Y^|^NOT IN^|^('1','2','3')^|^N^|^LIKE^|^('4','5','6','7')^|^Y^|^NOT IN^|^('8','9','10','11','12')"
$t = $s.split('\)', [StringSplitOptions]::RemoveEmptyEntries)
$testNum = ([regex]::match($t[0], "(?i)(test\d+\^\|\^\d+)")).value # Hunt for 1st colum values
$t[0] = $t[0] + ')' # Fix split char remove
for($i=1;$i -lt $t.Length; ++$i) { $t[$i] = $testNum + $t[$i] + ')' } # Add 1st colum and split char remove
$t
TEST1^|^9999^|^Y^|^NOT IN^|^('1','2','3')
TEST1^|^9999^|^N^|^LIKE^|^('4','5','6','7')
TEST1^|^9999^|^Y^|^NOT IN^|^('8','9','10','11','12')
I am parsing text output from a disk array that lists information about LUN snapshots in a predictable format. After trying every other way to get this data out of the array in a useable manner, the only thing I can do is generate this text file and parse it. The output looks like this:
SnapView logical unit name: deleted_for_security_reasons
SnapView logical unit ID: 60:06:01:60:52:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX
Target Logical Unit: 291
State: Inactive
This repeats all through the file with one line break between each group. I want to identify a group, parse each of the four lines, create a new PSObject, add the value for each line as a new NoteProperty, and then add the new object to a collection.
What I can figure out is, once I identify the first line in the block of four lines, how to then process the text from lines two, three, and four. I'm looping through each line, finding the start of a block, and then processing it. Here's what I have so far, with comments where the magic goes:
$snaps = get-content C:\powershell\snaplist.txt
$snapObjects = #()
foreach ($line in $snaps)
{
if ([regex]::ismatch($line,"SnapView logical unit name"))
{
$snapObject = new-object system.Management.Automation.PSObject
$snapObject | add-member -membertype noteproperty -name "SnapName" -value $line.replace("SnapView logical unit name: ","")
#Go to the next line and add the UID
#Go to the next line and add the TLU
#Go to the next line and add the State
$snapObjects += $snapObject
}
}
I have scoured the Google and StackOverflow attempting to figure out how I can reference the line number of the object I'm iterating through, and I can't figure it out. I may rely on foreach loops too much and so that's affecting my thinking, I don't know.
As you say, I think you're thinking too much foreach when you should be thinking for. The below modification should be more along the lines of what you're looking for:
$snaps = get-content C:\powershell\snaplist.txt
$snapObjects = #()
for ($i = 0; $i -lt $snaps.length; $i++)
{
if ([regex]::ismatch($snaps[$i],"SnapView logical unit name"))
{
$snapObject = new-object system.Management.Automation.PSObject
$snapObject | add-member -membertype noteproperty -name "SnapName" -value ($snaps[$i]).replace("SnapView logical unit name: ","")
# $snaps[$i+1] Go to the next line and add the UID
# $snaps[$i+2] Go to the next line and add the TLU
# $snaps[$i+3] Go to the next line and add the State
$snapObjects += $snapObject
}
}
A while loop may be even cleaner because then you can increment $i by 4 instead of 1 when you hit this case, but since the other 3 lines won't trigger the "if" statement... there's no danger, just a few wasted cycles.
Another possibility
function Get-Data {
$foreach.MoveNext() | Out-Null
$null, $returnValue = $foreach.Current.Split(":")
$returnValue
}
foreach($line in (Get-Content "C:\test.dat")) {
if($line -match "SnapView logical unit name") {
$null, $Name = $line.Split(":")
$ID = Get-Data
$Unit = Get-Data
$State = Get-Data
New-Object PSObject -Property #{
Name = $Name.Trim()
ID = ($ID -join ":").Trim()
Unit = $Unit.Trim()
State = $State.Trim()
}
}
}
Name ID Unit State
---- -- ---- -----
deleted_for_security_reasons 60:06:01:60:52:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX 291 Inactive
switch -regex -file C:\powershell\snaplist.txt {
'^.+me:\s+(\S*)' {$SnapName = $Matches[1]}
'^.+ID:\s+(\S*)' {$UID = $Matches[1]}
'^.+it:\s+(\S*)' {$TLU = $Matches[1]}
'^.+te:\s+(\S*)' {
New-Object PSObject -Property #{
SnapName = $SnapName
UID = $UID
TLU = $TLU
State = $Matches[1]
}
}
}
try this
Get-Content "c:\temp\test.txt" | ConvertFrom-String -Delimiter ": " -PropertyNames Intitule, Value
if you have multiple packet try this
$template=#"
{Data:SnapView logical unit name: {UnitName:reasons}
SnapView logical unit ID: {UnitId:12:3456:Zz}
Target Logical Unit: {Target:123456789}
State: {State:A State}}
"#
Get-Content "c:\temp\test.txt" | ConvertFrom-String -TemplateContent $template | % {
[pscustomobject]#{
UnitName=$_.Data.UnitName
UnitId=$_.Data.UnitId
Target=$_.Data.Target
State=$_.Data.State
}
}
I am trying to extract 4th column from csv file (comma separated, and skipping first 2 header lines) using this command,
awk 'NR <2 {next}{FS =","}{print $4}' filename.csv | more
However, it doesn't work because the first column cantains comma, thus 4th column is not really 4th. Below is an example of a row:
"sdfsdfsd, sfsdf", 454,fgdfg, I_want_this_column,sdfgdg,34546, 456465, etc
Unless you have specific reasons for using awk, I would recommend using a CSV parsing library. Many scripting languages have one built-in (or at least available) and they'll save you from these headaches.
if your first column has quotes always,
$ awk 'BEGIN{ FS="\042[ ]*," } { m=split($2,a,","); print a[3] } ' file
I_want_this_column
if the column you want is always the last 2nd,
$ awk -F"," '{print $(NF-1)}' file
I_want_this_column
You can try this demo script to break down the columns
awk 'BEGIN{ FS="," }
{
for(i=1;i<=NF;i++){
# save normal
if($i !~ /^[ ]*\042|[ ]*\042[ ]*$/){
a[++j]=$i
}
# if quotes at the end
if(f==1 && $i ~ /[ ]*\042[ ]*$/){
s=s","$i
a[++j]=s
#reset
s="";f=0
}
# if quotes in front
if($i ~ /^[ ]*\042/){
s=s $i
f=1
}
if(f==1 && ( $i !~/\042/ ) ){
s=s","$i
}
}
}
END{
# print columns
for(p=1;p<=j;p++){
print "Field "p,": "a[p]
}
} ' file
output
$ cat file
"sdfsdfsd, sfsdf", "454,fgdfg blah , words ", I_want_this_column,sdfgdg
$ ./shell.sh
Field 1 : "sdfsdfsd, sfsdf"
Field 2 : fgdfg blah
Field 3 : "454,fgdfg blah , words "
Field 4 : I_want_this_column
Field 5 : sdfgdg
You shouldn't use awk here. Use Python csv module or Perl Text::CSV or Text::CSV_XS modules or another real csv parser.
Related question -
parse csv file using gawk
If you can't avoid awk, this piece of code does the job you need:
BEGIN {FS=",";}
{
f=0;
j=0;
for (i = 1; i <=NF ; ++i) {
if (f) {
a[j] = a[j] "," $(i);
if ($(i) ~ "\"$") {
f = 0;
}
}
else {
++j;
a[j] = $(i);
if ((a[j] ~ "^\"[^\"]*$")) {
f = 1;
}
}
}
for (i = 1; i <= j; ++i) {
gsub("^\"","",a[i]);
gsub("\"$","",a[i]);
gsub("\"\"","\"",a[i]);
print "i = \"" a[i] "\"";
}
}
Working with CSV files that have quoted fields with commas inside can be difficult with the standard UNIX text tools.
I wrote a program called csvquote to make the data easy for them to handle. In your case, you could use it like this:
csvquote filename.csv | awk 'NR <2 {next}{FS =","}{print $4}' | csvquote -u | more
or you could use cut and tail like this:
csvquote filename.csv | tail -n +3 | cut -d, -f4 | csvquote -u | more
The code and docs are here: https://github.com/dbro/csvquote
using grep, vim's grep, or another unix shell command, I'd like to find the functions in a large cpp file that contain a specific word in their body.
In the files that I'm working with the word I'm looking for is on an indented line, the corresponding function header is the first line above the indented line that starts at position 0 and is not a '{'.
For example searching for JOHN_DOE in the following code snippet
int foo ( int arg1 )
{
/// code
}
void bar ( std::string arg2 )
{
/// code
aFunctionCall( JOHN_DOE );
/// more code
}
should give me
void bar ( std::string arg2 )
The algorithm that I hope to catch in grep/vim/unix shell scripts would probably best use the indentation and formatting assumptions, rather than attempting to parse C/C++.
Thanks for your suggestions.
I'll probably get voted down for this!
I am an avid (G)VIM user but when I want to review or understand some code I use Source Insight. I almost never use it as an actual editor though.
It does exactly what you want in this case, e.g. show all the functions/methods that use some highlighted data type/define/constant/etc... in a relations window...
(source: sourceinsight.com)
Ouch! There goes my rep.
As far as I know, this can't be done. Here's why:
First, you have to search across lines. No problem, in vim adding a _ to a character class tells it to include new lines. so {_.*} would match everything between those brackets across multiple lines.
So now you need to match whatever the pattern is for a function header(brittle even if you get it to work), then , and here's the problem, whatever lines are between it and your search string, and finally match your search string. So you might have a regex like
/^\(void \+\a\+ *(.*)\)\_.*JOHN_DOE
But what happens is the first time vim finds a function header, it starts matching. It then matches every character until it finds JOHN_DOE. Which includes all the function headers in the file.
So the problem is that, as far as I know, there's no way to tell vim to match every character except for this regex pattern. And even if there was, a regex is not the tool for this job. It's like opening a beer with a hammer. What we should do is write a simple script that gives you this info, and I have.
fun! FindMyFunction(searchPattern, funcPattern)
call search(a:searchPattern)
let lineNumber = line(".")
let lineNumber = lineNumber - 1
"call setpos(".", [0, lineNumber, 0, 0])
let lineString = getline(lineNumber)
while lineString !~ a:funcPattern
let lineNumber = lineNumber - 1
if lineNumber < 0
echo "Function not found :/"
endif
let lineString = getline(lineNumber)
endwhile
echo lineString
endfunction
That should give you the result you want and it's way easier to share, debug, and repurpose than a regular expression spit from the mouth of Cthulhu himself.
Tough call, although as a starting point I would suggest this wonderful VIM Regex Tutorial.
You cannot do that reliably with a regular expression, because code is not a regular language. You need a real parser for the language in question.
Arggh! I admit this is a bit over the top:
A little program to filter stdin, strip comments, and put function bodies on the same line. It'll get fooled by namespaces and function definitions inside class declarations, besides other things. But it might be a good start:
#include <stdio.h>
#include <assert.h>
int main() {
enum {
NORMAL,
LINE_COMMENT,
MULTI_COMMENT,
IN_STRING,
} state = NORMAL;
unsigned depth = 0;
for(char c=getchar(),prev=0; !feof(stdin); prev=c,c=getchar()) {
switch(state) {
case NORMAL:
if('/'==c && '/'==prev)
state = LINE_COMMENT;
else if('*'==c && '/'==prev)
state = MULTI_COMMENT;
else if('#'==c)
state = LINE_COMMENT;
else if('\"'==c) {
state = IN_STRING;
putchar(c);
} else {
if(('}'==c && !--depth) || (';'==c && !depth)) {
putchar(c);
putchar('\n');
} else {
if('{'==c)
depth++;
else if('/'==prev && NORMAL==state)
putchar(prev);
else if('\t'==c)
c = ' ';
if(' '==c && ' '!=prev)
putchar(c);
else if(' '<c && '/'!=c)
putchar(c);
}
}
break;
case LINE_COMMENT:
if(' '>c)
state = NORMAL;
break;
case MULTI_COMMENT:
if('/'==c && '*'==prev) {
c = '\0';
state = NORMAL;
}
break;
case IN_STRING:
if('\"'==c && '\\'!=prev)
state = NORMAL;
putchar(c);
break;
default:
assert(!"bug");
}
}
putchar('\n');
return 0;
}
Its c++, so just it in a file, compile it to a file named 'stripper', and then:
cat my_source.cpp | ./stripper | grep JOHN_DOE
So consider the input:
int foo ( int arg1 )
{
/// code
}
void bar ( std::string arg2 )
{
/// code
aFunctionCall( JOHN_DOE );
/// more code
}
The output of "cat example.cpp | ./stripper" is:
int foo ( int arg1 ) { }
void bar ( std::string arg2 ){ aFunctionCall( JOHN_DOE ); }
The output of "cat example.cpp | ./stripper | grep JOHN_DOE" is:
void bar ( std::string arg2 ){ aFunctionCall( JOHN_DOE ); }
The job of finding the function name (guess its the last identifier to precede a "(") is left as an exercise to the reader.
For that kind of stuff, although it comes to primitive searching again, I would recommend compview plugin. It will open up a search window, so you can see the entire line where the search occured and automatically jump to it. Gives a nice overview.
(source: axisym3.net)
Like Robert said Regex will help. In command mode start a regex search by typing the "/" character followed by your regex.
Ctags1 may also be of use to you. It can generate a tag file for a project. This tag file allows a user to jump directly from a function call to it's definition even if it's in another file using "CTRL+]".
u can use grep -r -n -H JOHN_DOE * it will look for "JOHN_DOE" in the files recursively starting from the current directory
you can use the following code to practically find the function which contains the text expression:
public void findFunction(File file, String expression) {
Reader r = null;
try {
r = new FileReader(file);
} catch (FileNotFoundException ex) {
ex.printStackTrace();
}
BufferedReader br = new BufferedReader(r);
String match = "";
String lineWithNameOfFunction = "";
Boolean matchFound = false;
try {
while(br.read() > 0) {
match = br.readLine();
if((match.endsWith(") {")) ||
(match.endsWith("){")) ||
(match.endsWith("()")) ||
(match.endsWith(")")) ||
(match.endsWith("( )"))) {
// this here is because i guessed that method will start
// at the 0
if((match.charAt(0)!=' ') && !(match.startsWith("\t"))) {
lineWithNameOfFunction = match;
}
}
if(match.contains(expression)) {
matchFound = true;
break;
}
}
if(matchFound)
System.out.println(lineWithNameOfFunction);
else
System.out.println("No matching function found");
} catch (IOException ex) {
ex.printStackTrace();
}
}
i wrote this in JAVA, tested it and works like a charm. has few drawbacks though, but for starters it's fine. didn't add support for multiple functions containing same expression and maybe some other things. try it.