Why doesn't grep work on this file? - grep

I'm trying to grep this file. Here is a sample of the file (Note: my problem is obviously not present if you just copy/paste this sample and run grep)
'startTime': 1415066802,
'timeout': 6,
'totalRequests': 9201823,
'write': 0}]}
INFO:root:Running setup module stop (cwd=/home/techempower/FrameworkBenchmarks/frameworks/Java)
benchmark: 3% |# | Rough ETA: 17:27:56
--------------------------------------------------------------------------------
Running Test: activeweb-raw
--------------------------------------------------------------------------------
INFO:root:Running setup module start (cwd=/home/techempower/FrameworkBenchmarks/frameworks/Java)
INFO:root:Called setup.py start
INFO:root:Sleeping 60 seconds to ensure framework is ready
I'd like to extract lines like these:
benchmark: 1% | | Rough ETA: 00:00:01
Here's the output I get when I run grep:
$ cat NhHR | grep Rough
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
It appears that I'm detecting the text, but the lines that are being returned do not include the detected text (as thought it's not printing in my terminal?). Printing contextual lines doesn't provide any further clues to me
Does anyone know how I can get grep to work for this file, or why it's not working currently?

It looks to me that the matched line contains a carriage return just before the long dashes that when printed to stdout makes the non-dashed part of the line to be overwritten. Try piping grep to a file and open the file in an editor, you should see the matched part.

Related

Grep finds the word and then stucks (bash)

I have a loop (while, extracting 2 variables) where I found one command is not working. Even when I put the command in the console directly (subsituting by my own the variable) it gives the result but continue working without any advance.
The command's objective is to find in a big file.gct, in specific in its first three lines, an object obtained from other file and then print the finding and everything before in that line.
If someone know why it stucks and how to fix it or even an alternative that works well in loops and does not demands more RAM's use it would be appreciated.
head -3 file_2 | grep -E -o ".{0,1000}$variable."
Kind of an example as how it looks the big file (file_2):
head -3 file_2
| #1.2 |
| 57000 | 17300 |
|Irrelevant|Irrelevant2| DATA-B12-18 | DATA-Y17-72 | DATA-A12-44 | .... |
When I run in the terminal: head -3 file_2 | grep -E -o ".{0,1000}DATA-B12-18"
the output is:
Irrelevant Irrelevant2 DATA-B12-18 and then stacks.

grep in pipeline: why it does not work

I want to extract certain information from the output of a program. But my method does not work. I write a rather simple script.
#!/usr/bin/env python
print "first hello world."
print "second"
After making the script executable, I type ./test | grep "first|second". I expect it to show the two sentences. But it does not show anything. Why?
Escape the expression.
$ ./test | grep "first\|second"
first hello world.
second
Also bear in mind that the shebang is #!/usr/bin/env python, not just #/usr/bin/env python.
use \| instead of |
./test | grep "first\|second"

grep specific pattern from a log file

I am passing all my svn commit log messages to a file and want to grep only the JIRA issue numbers from that.
Some lines might have more than 1 issue number, but I want to grab only the first occurrence.
The pattern is XXXX-999 (number of alpha and numeric char is not constant)
Also, I don't want the entire line to be displayed, just the JIRA number, without duplicates. I use the following command but it didn't work.
Could someone help please?
cat /tmp/jira.txt | grep '^[A-Z]+[-]+[0-9]'
Log file sample
------------------------------------------------------------------------
r62086 | userx | 2015-05-12 11:12:52 -0600 (Tue, 12 May 2015) | 1 line
Changed paths:
M /projects/trunk/gradle.properties
ABC-1000 This is a sample commit message
------------------------------------------------------------------------
r62084 | usery | 2015-05-12 11:12:12 -0600 (Tue, 12 May 2015) | 1 line
Changed paths:
M /projects/training/package.jar
EFG-1001 Test commit
Output expected:
ABC-1000
EFG-1001
First of all, it seems like you have the second + in the wrong place, it should be at the end of [0-9] expression.
Second, I think all you need to do this is use the -o option to grep (to display only the matching portion of the line), then pipe the grep output through sort -u, like this:
cat /tmp/jira.txt | grep -oE '^[A-Z]+-[0-9]+' | sort -u
Although if it were me, I'd skip the cat step and just give the filename to grep, as so:
grep -oE '^[A-Z]+-[0-9]+' /tmp/jira.txt | sort -u
Six of one, half a dozen of the other, really.

grep in a textfile all lines containing 'xxx' and the previous line

I want to print all lines in a tomcat catalina.out log containing xxx. A simple thing to accomplish using:
cat catalina.out | grep xxx
However. In the logfile I get the lines containing xxx, the line above this line is containing the date and time when the item was logged. I would like to see those lines above the grepped lines too. How could I accomplish this?
grep -B1
-B[n] lets you see [n] lines before the pattern that you are looking for.
You can also use -A for 'lines after', and -C for 'context' (lines both above and below).
You can also simplify your grep call and remove the pipe with grep xxx -B1 catalina.out.

grep is unable to find all pattern matching "\[\[\[\["

I am having problems with using grep along with a pipe. The scenario is as follows:
I am running a python script that outputs (using print) to the screen debug messages. I use ./prog | grep "\[\[\[\[" to catch the strings with "[[[[" in them. It returns few matching results but not others (Another observation: results found by grep come before the results not found by grep in the file). I have ran the ./prog without pipe and grep and it outputs all the strings with "[[[[" pattern.
The problem is that the left square bracket is a special character in regular expressions. "grep" is not just a string matcher. Regular expressions are an involved language that let you describe patterns of text. Grep is trying to interpret [[[[ as a regular expression, not just a string.
As your question subject suggests, you can usually escape special characters with a backslash. So the following might work:
./prog | grep '\[\[\[\['
You can also "escape" square brackets by putting them inside square brackets. Thus, [[][[][[][[] or [[]{4} if your version of grep handles it.
You also need to determine whether your program, ./prog, is sending output to "standard output" or "standard error". You can put all your stderr through the pipe with:
./proc 2>&1 | egrep '[[]{4}'
UPDATE:
[ghoti#pc ~]$ printf '[[[[\n[[[\n[[[[\n[[[[[\n[[\n' | grep '\[\[\[\['
[[[[
[[[[
[[[[[
[ghoti#pc ~]$ printf '[[[[\n[[[\n[[[[\n[[[[[\n[[\n' | egrep '[[]{4}'
[[[[
[[[[
[[[[[
[ghoti#pc ~]$
Obviously, my results do not match yours. If you can provide more details as to the data you're processing, it will be helpful in trying to duplicate your results.
Error messages are usually sent to stderr, not stdout; your pipe is filtering stdout. (Your "another observation" hints at this.) You can redirect stderr along with stdout to the pipe:
./prog 2>&1 | grep '\[\[\[\['

Resources