Grep for lines that starts with " followed by 10 digits ","" - grep

e.g.
Find line "1437421130",""
but not "1437421130","92729392"
Cant figure out how to handle double quotes. ( I am an idiot when it comes to grep )
I tried
echo "1437421130","" | grep '"\d{10}",""'
echo "1437421130","" | grep '"[0-9]{10}",""'

Did this on centos. Used single quotes to avoid the double quote escaping. -E to allow the {10}

Related

how to avoid lookbehind assertion is not fixed length

I have a file that contains a version number that I need to output. This version number is apart of a string in this file, that looks something like this:
https://some-link:1234/path/to/file/name-of-file/1.2.345/name-of-file_CXP123456-1.2.345.jar"
I need to get the version number, which is 1.2.345.
This grep command works: grep -Po '(?<=/name-of-file_CXP123456-/)\d.\d.\d\d\d'. However, the CXP number changes and as such I thought I could do something like this: grep -Po '(?<=/name-of-file_*-/)\d.\d.\d\d\d' but that gives the following:
grep: lookbehind assertion is not fixed length
Is there anything I can add to the grep statement to avoid this?
Ultimately, this is part of a stage in Jenkins to get this version number. The sh command looks something like this:
VERSION = sh 'ssh -tt user#ip-address "cat dir/file*.content | grep -Po '(?<=/name-of-file_*-/)\d.\d.\d\d\d' 1>&2"'
You can use
grep -Po '/name-of-file_.*-\K\d+(?:\.\d+)+'
See the regex demo. Details:
/name-of-file_ - a literal text
.* - any zero or more chars other than line break chars as many as possible
- - a hyphen
\K - a match reset operator that omits all text matched so far from the memory buffer
\d+ - one or more digits
(?:\.\d+)+ - one or more sequences of a . and one or more digits.
You don't need lookbehind for this job. You also don't need PCREs, or grep at all.
#!/usr/bin/env bash
# ^^^^- bash, *not* sh
case $BASH_VERSION in '') echo "ERROR: bash required" >&2; exit 1;; esac
string="https://some-link:1234/path/to/file/name-of-file/1.2.345/name-of-file_CXP123456-1.2.345.jar"
regex='.*/name-of-file_CXP[[:digit:]]+-([[:digit:].]+)[.]jar'
if [[ $string =~ $regex ]]; then
echo "Version is ${BASH_REMATCH[1]}"
else
echo "No version found in $string"
fi
Maybe too long for a comment... It looks like the version number is the 2nd-to last field if you split on forward slash?
rev | cut -d/ -f 2 | rev
awk -F/ '{print $(NF-1)}'
perl -lanF/ -e 'print $F[-2]'
Or even something like: basename $(dirname $(cat filename))
For those that are really desperate there is another solution which requires you to pre-build your regex string.
It's not a solution I would recommend but if there is really no other way no one can stop you.
While even with this you won't have true dynamic look-behinds and it is still quite limited it is an option available to you.
The idea is to build the look-behind for each possible length you need it to be.
So for example only match if it's not preceded by a # (0 to a 100 characters look-behind).
reg='';
for ((i = 0 ; i <= 100 ; i++)); do reg+='(?<!#.{'"${i}"'})'; done;
reg+='someVariableName=.*?($|;|\\n)';
grep --perl-regexp "$reg" /usr/local/mgmsbox/msc/scripts/msc.cfg
This might not be the best example but it gets the idea across.
This solution has it's own pitfalls. For example you need to double escape \\ escape-sequences like \n and any character that should not be interpreted should be put in a single-quote string (or use printf).

Regex for line containing one or more spaces or dashes

I got .txt file with city names, each in separate line. Some of them are few words with one or multiple spaces or words connected with '-'. I need to create bash command which will echo those lines out. Currently I'm using cat piped with grep but I can't get both spaces and dash into one search and I had problems with checking for multiple spaces.
print lines with dash:
cat file.txt | grep ".*-.*"
print lines with spaces:
cat file.txt | grep ".*\s.*"
tho when I try to do:
cat file.txt | grep ".*\s+.*"
I get nothing.
Thanks for help
Something like that should work:
grep -E -- ' |\-' file.txt
Explanation:
-E: to interpret patterns as extended regular expressions
--: to signify the end of command options
' |\-': the line contains either a space or a dash
This does not directly address your question, but is too much to put in a comment.
You don't need the .* in your patterns. .* at the beginning or end of a pattern is useless, because it means "0 or more of any character" and so will always match.
These lines are all identical:
cat file.txt | grep ".*-.*"
cat file.txt | grep "-.*"
cat file.txt | grep "-"
Plus you don't need to cat and pipe:
grep "-" file.txt
When grep pattern matches, the default action is to print the whole line, so .* in all your patterns are redundant, you may delete them. Also, you don't have to use cat file | as you may specify the file to grep directly after pattern, i.e. grep 'pattern' file.txt.
Here are some more details:
grep ".*-.*" = grep -- "-" - returns any lines having a - char (-- singals the end of options, the next thing is the pattern)
grep ".*\s.*" = grep "\s" - matches and returns lines containing a whitespace char (only GNU grep)
grep ".*\s+.*" = grep "\s+" - returns line containing a whitespace followed with a literal + char (since you are using POSIX BRE regex here the unescaped + matches a literal plus symbol).
You want
grep "[[:space:]-]" file.txt
See the online demo:
#!/bin/bash
s='abc - def
ghi
jkl mno'
grep '[[:space:]-]' <<< "$s"
Output:
abc - def
jkl mno
The [[:space:]-] POSIX BRE and ERE (enabled with -E option) compliant pattern matches either any whitespace (with the [:space:] POSIX character class) or a hyphen.
Note that [\s-] won't work since \s inside a bracket expression is not treated as a regex escape sequence but as a mere \ or s.

Cutting a length of specific string with grep

Let's say we have a string "test123" in a text file.
How do we cut out "test12" only or let's say there is other garbage behind "test123" such as test123x19853 and we want to cut out "test123x"?
I tried with grep -a "test123.\{1,4\}" testasd.txt and so on, but just can't get it right.
I also looked for example, but never found what I'm looking for.
expr:
kent$ x="test123x19853"
kent$ echo $(expr "$x" : '\(test.\{1,4\}\)')
test123x
What you need is -o which print out matched things only:
$ echo "test123x19853"|grep -o "test.\{1,4\}"
test123x
$ echo "test123x19853"|grep -oP "test.{1,4}"
test123x
-o, --only-matching show only the part of a line matching PATTERN
If you are ok with awkthen try following(not this will look for continuous occurrences of alphabets and then continuous occurrences of digits, didn't limit it to 4 or 5).
echo "test123x19853" | awk 'match($0,/[a-zA-Z]+[0-9]+/){print substr($0,RSTART,RLENGTH)}'
In case you want to look for only 1 to 4 digits after 1st continuous occurrence of alphabets then try following(my awk is old version so using --re-interval you could remove it in case you have latest version of ittoo).
echo "test123x19853" | awk --re-interval 'match($0,/[a-zA-Z]+[0-9]{1,4}/){print substr($0,RSTART,RLENGTH)}'

Use grep to report back only line numbers

I have a file that possibly contains bad formatting (in this case, the occurrence of the pattern \\backslash). I would like to use grep to return only the line numbers where this occurs (as in, the match was here, go to line # x and fix it).
However, there doesn't seem to be a way to print the line number (grep -n) and not the match or line itself.
I can use another regex to extract the line numbers, but I want to make sure grep cannot do it by itself. grep -no comes closest, I think, but still displays the match.
try:
grep -n "text to find" file.ext | cut -f1 -d:
If you're open to using AWK:
awk '/textstring/ {print FNR}' textfile
In this case, FNR is the line number. AWK is a great tool when you're looking at grep|cut, or any time you're looking to take grep output and manipulate it.
All of these answers require grep to generate the entire matching lines, then pipe it to another program. If your lines are very long, it might be more efficient to use just sed to output the line numbers:
sed -n '/pattern/=' filename
Bash version
lineno=$(grep -n "pattern" filename)
lineno=${lineno%%:*}
I recommend the answers with sed and awk for just getting the line number, rather than using grep to get the entire matching line and then removing that from the output with cut or another tool. For completeness, you can also use Perl:
perl -nE 'say $. if /pattern/' filename
or Ruby:
ruby -ne 'puts $. if /pattern/' filename
using only grep:
grep -n "text to find" file.ext | grep -Po '^[^:]+'
You're going to want the second field after the colon, not the first.
grep -n "text to find" file.txt | cut -f2 -d:
To count the number of lines matched the pattern:
grep -n "Pattern" in_file.ext | wc -l
To extract matched pattern
sed -n '/pattern/p' file.est
To display line numbers on which pattern was matched
grep -n "pattern" file.ext | cut -f1 -d:

How to escape parenthesis in grep

I want to grep for a function call 'init()' in all JavaScript files in a directory. How do I do this using grep?
Particularly, how do I escape parenthesis, ()?
It depends. If you use regular grep, you don't escape:
echo '(foo)' | grep '(fo*)'
You actually have to escape if you want to use the parentheses as grouping.
If you use extended regular expressions, you do escape:
echo '(foo)' | grep -E '\(fo*\)'
If you want to search for exactly the string "init()" then use fgrep "init()" or grep -F "init()".
Both of these will do fixed string matching, i.e. will treat the pattern as a plain string to search for and not as a regex. I believe it is also faster than doing a regex search.
$ echo "init()" | grep -Erin 'init\([^)]*\)'
1:init()
$ echo "init(test)" | grep -Erin 'init\([^)]*\)'
1:init(test)
$ echo "initwhat" | grep -Erin 'init\([^)]*\)'
Move to your root directory (if you are aware where the JavaScript files are). Then do the following.
grep 'init()' *.js

Resources