Regular expression to validate field structure - grep

I would like to implement a regular expression in linux that using grep allows me to verify that a field contains 15 numerical values and that the value occupying the fifth position (starting from left) is either a 5 or a 6.
I have reached the point of defining the requirement that it contains a maximum of 15 values, however, I can not get that the one that occupies the fifth position is a 5 or 6. It would be:
grep -E "^[0-9]{1,15}"
Any idea?

For exactly 15 numbers, and the 5 position is either 5 or 6:
grep -E "^[0-9]{4}[56][0-9]{10}$"
^ Start of string
[0-9]{4} Match 4 digits
[56] Match either 5 or 6
[0-9]{10} Match 10 digits
$ End of string
To match at least the first 5 characters followed by 0-10 digits after it, and allow a partial match like matching 123462222233333 in 12346222223333344444
grep -Eo "^[0-9]{4}[56][0-9]{0,10}"

Related

Grep for line having only 2 or 3 digits

I'm trying to print line containing 2 or 3 numbers along with the rest of the line. I came with the code:
grep -P '[[:digit:]]{2,3}' address
But this even prints the line having 4 digits. I don know why is this happening.
Output:
Neither this code works;
grep -E '[0-9]{2,3}' address
Here is the file containing address text:
12 main st
123 main street
1234 main street
I have already specified to print 2 or 3 values with {2,3} still the filter doesn't work and more than 3 digits line is being printed. Can anyone assist me on this? Thank you so much.
You can use inverted grep (-v) to filter all lines with 4 digits (and above):
grep -vE '[0-9]{4}' address
EDIT:
I noticed you want only 2 or 3 digit along the line, so first command will get you also 1 digit.
Here's the fix, again using same method:
grep -E '[0-9]{2,3}' txt.txt | grep -vE '[0-9]{4}'

grep for path in process(ps) containing number

I would like to grep for process path which has a variable. Example -
This is one of the proceses running.
/var/www/vhosts/rcsdfg/psd_folr/rcerr-m-deve-udf-172/bin/magt queue:consumers:start customer.import_proditns --single-thread --max-messages=1000
I would like to grep for "psd_folr/rcerr-m-deve-udf-172/bin/magt queue" from the running processes.
The catch is that the number 172 keeps changing, but it will be a 3 digit number only. Please suggest, I tried below but it is not returning any output.
sudo ps axu | grep "psd_folr/rcerr-m-deve-udf-'^[0-9]$'/bin/magt queue"
The most relevant section of your regular expression is -'^[0-9]$'/ which has following problems:
the apostrophes have no syntactical meaning to grep other than read an apostrophe
the caret ^ matches the beginning of a line, but there is no beginning of a line in ps's output at this place
the dollar $ matches the end of a line, but there is no end of a line in ps's output at this place
you want to read 3 digits but [0-9] will only match a single one
Thus, the part of your expression should be modified like this -[0-9]+/ to match any number of digits (+ matches the preceding character any number of times but at least once) or like this -[0-9]{3}/ to match exactly three times ({n} matches the preceding character exactly n times).
If you alter your command, give grep the -E flag so it uses extended regular expressions, otherwise you need to escape the plus or the braces:
sudo ps axu | grep -E "psd_folr/rcerr-m-deve-udf-[0-9]+/bin/magt queue"

Lua patterns - why does custom set '[+-_]' match alphanumeric characters?

I was playing around with some patterns today to try to match some specific characters in a string, and ran into something unusual that I'm hoping someone can explain.
I had created a set looking for a list of characters within some strings, and noticed I was getting back some unexpected results. I eliminated the characters in the set until I got down to just three, and it seems to be these three that are responsible:
string = "alpha.5dc1704B40bc7f.beta.123456789.gamma.987654321.delta.abc123ABC321"
result = ""
for a in string.gmatch(string, '[+-_]') do
result = result .. a .. " "
end
> print(result)
. 5 1 7 0 4 B 4 0 7 . . 1 2 3 4 5 6 7 8 9 . . 9 8 7 6 5 4 3 2 1 . . 1 2 3 A B C 3 2 1
Why are these characters getting returned here (looks like any number or uppercase letter, plus dots)? I note that if I change up the order of the set, I don't get the same output - '[_+-]' or '[-_+]' or '[+_-]' or '[-+_]' all return nothing, as expected.
What is it about '[+-_]' that's causing a match here? I can't figure out what I'm telling lua that is being interpreted as instructions to match these characters.
When a - is between other characters inside square brackets, it means everything between those two. For example, [a-z] is all of the lowercase letters, and [A-F] is A, B, C, D, E, and F. [+-_] means every ASCII character between + and _, which includes all the numbers, all the uppercase letters, and a lot of punctuation.

Only output values within a certain range

I run a command that produce lots of lines in my terminal - the lines are floats.
I only want certain numbers to be output as a line in my terminal.
I know that I can pipe the results to egrep:
| egrep "(369|433|375|368)"
if I want only certain values to appear. But is it possible to only have lines that have a value within ± 50 of 350 (for example) to appear?
grep matches against string tokens, so you have to either:
figure out the right string match for the number range you want (e.g., for 300-400, you might do something like grep -E [34].., with appropriate additional context added to the expression and a number of additional .s equal to your floating-point precision)
convert the number strings to actual numbers in whatever programming language you prefer to use and filter them that way
I'd strongly encourage you to take the second option.
I would go with awk here:
./yourProgram | awk '$1>250 && $1<350'
e.g.
echo -e "12.3\n342.678\n287.99999" | awk '$1>250 && $1<350'
342.678
287.99999

Adding tabs to non delimited text file with empty and variable length columns

I have a non-delimited text file and want to parse it to add tabs at specific spots to delimit columns. The columns are sometimes empty or vary in length, which is why I need to add tabs to those specific spots. I had found the answer to this once a couple of years ago on the net using batch, but now can't find it or the code. I already have the following code to replace more than 2 spaces in the file, but this doesn't account for when the columns are empty.
gc $FileToOpen | % { $_ -replace ' +',"`t" } | set-content $FileToSave
So, I need to read each line, but be able to only read a portion (certain number of characters) of it and add the tabs after each portion to itself.
Here is a sample of the data file, the top row is the header and the data rows have no blank lines in between them:
MRUN Number Name X Exception Reason Data CDM# Quantity D.O.S
000000 00000000 Name W MODIFIER CANNOT BE FILED WITHOUT 08/13/2015 0000000 0 08/13/2015
000000 00000000 Name W MODIFIER CANNOT BE FILED WITHOUT 0000000 0 08/13/2015
The second data row is missing Data.
Using Ansgar's answer, my code that does find empty fields:
gc $FileToOpen |
? { $_ -match '^(.{8})(.{12})(.{20})(.{3})(.{34})(.{62})(.{10})(.{22})(.{10})$' } |
% { "{0}`t{1}`t{2}`t{3}`t{4}`t{5}`t{6}`t{7}`t{8}" -f $matches[1].Trim(), $matches[2].Trim(), $matches[3].Trim(), $matches[4].Trim(), $matches[5].Trim(), $matches[6].Trim(), $matches[7].Trim(), $matches[8].Trim(), $matches[9].Trim() } |
Set-Content $FileToSave
Thanks for your patience Ansgar, I know I tried it! I really do appreciate the help!
Since you seem to have an input file with fixed-width columns, you should probably use a regular expression for transforming the input into a tab-delimited format.
Assume the following input file:
A B C
foo 13 22
bar 4 17
baz 142 23
The file has 3 columns. The first column is 6 characters wide, the other two columns 4 characters each.
The transformation could be done with a regular expression like this:
Get-Content 'C:\path\to\input.txt' |
? { $_ -match '^(.{6})(.{4})(.{4})$' } |
% { "{0}`t{1}`t{2}" -f $matches[1].Trim(), $matches[2].Trim(), $matches[3].Trim() } |
Set-Content 'C:\path\to\output.txt'
The regular expression defines the columns by character count and captures them in groups (parentheses). The groups can then be accessed as the indexes 1 and above of the resulting $matches collection. Trimming removes the leading/trailing whitespace. The format operator (-f) then inserts the trimmed values into the tab-separated format string.
If the last column has a variable width (because its values are aligned to the left and don't have trailing spaces) you may need to change the regular expression to ^(.{6})(.{4})(.{,4})$ to take care of that. The quantifier {,4} (or {0,4}) means up to four times the preceding expression.

Resources