How to apply 2 regex patterns simultaneously in grep - grep

I would like to check for 2 patterns across a piece of text (a total of 4 delimiters).
Either one of the two patterns matches is fine. However when they qualify, I would like the results to be next to each other side by side.
This is the INPUT TEXT (it contains two separate long lines. Actual input text is much more. I have chosen relevent snippets as shown below):
Query Response #5: [2659] ID,(0010,0030) DA #8 [19801024] PatientÆs Birth Date,,,(0020,000D) UI #42 Station Title,,,>(0040,0002) DA #8 [20301212] Scheduled Procedure Step Start Date,,,>(0040,0003) TM #4Information
Query Response #6: ID, (0010,0030) DA #8 [19410203] PatientÆs Birth Date, Title,,,>(0040,0002) DA #8 [20210826] Scheduled Procedure Step Start Date, FIND]
This is the DESIRED OUTPUT:
19801024 20301212
19410203 20210826
There are 2 pairs of delimiters. The 1st set of delimiters is this:
(0010,0030) DA #8 [
] Patient
The 2nd set of delimiters is this:
(0040,0002) DA #8 [
] Scheduled Procedure Step Start Date
I am able to apply each pair of delimiter by itself. Specifically, when I do this:
grep -o -P "(?<=0010,0030\) DA #8 \[).*(?=\] Patient)"
I get this output:
19801024
19410203
When I apply this 2nd pair of delimiter like this below:
grep -o -P "(?<=0040,0002\) DA #8 \[).*(?=\] Scheduled Procedure Step Start Date)"
I get this output:
20301212
20210826
How do I issue a correct combined grep command such that the output result is as shown below? :
19801024 20301212
19410203 20210826
I tried this following approach without success:
grep -e "(?<=0040,0002\) DA #8 \[).*(?=\] Scheduled Procedure Step Start Date)" -e "(?<=0010,0030\) DA #8 \[).*(?=\] Patient)"
The error message I get is follows:
grep: Unmatched ) or \)
Thanks in advance. I hope my question is clear. (Please note I'm using grep under Windows10. The outer quatations marks have to be double quotation marks)

This is actually job for sed more than grep:
sed -E 's/.*\(0010,0030\) DA #8 \[([^]]+)\] Patient.*\(0040,0002\) DA #8 \[([^]]+)\] Scheduled Procedure Step Start Date.*/\1 \2/' file
19801024 20301212
19410203 20210826

With your shown samples only, please try following awk program. Written and tested in GNU awk should work in any awk. Simple explanation would be, making PatientÆs Birth Date as field separator for all lines then in main program checking 1st field if its equal to regex ^Query Response.*DA #8 \[[0-9]+\]$ then getting value between [ and ](excluding [ and ]) and saving it into val variable. Then checking condition if 2nd field matches to ^,.*DA #8[[:space:]]+ then again getting values between [ and ] and printing val variable and current $2's value, which is required output.
awk -F' PatientÆs Birth Date' '
$1~/^Query Response.*DA #8 \[[0-9]+\]$/{
val=""
gsub(/.*\[|\]$/,"",$1)
val=$1
}
$2~/^,.*DA #8[[:space:]]+/{
gsub(/.*\[|\].*/,"",$2)
print val,$2
}
' Input_file

Related

How to grep after a certain pattern?

I have a input file such as
file;14;19;;;hello 2019
file2;2019;2020;;;this is a test 2020
file3;25;31;this is a number 31
I would like to grep numbers only after ;;;. For example if I wanted to grep 2019 it would give me
file;14;19;;;hello 2019
instead of if I did grep '2019' file
file;14;19;;;hello 2019
file2;2019;2020;;;this is a test 2020
How can I accomplish this task?
Regular expression can include stuff other than fixed text, it sounds like all you need is:
grep ';;;.*[0-9]' inputFile.txt
This will deliver all lines that have the text ;;; followed by a digit somewhere after that in the line. In terms of explanation:
;;; is the literal text, three semicolons;
.* is zero or more of any character;
[0-9] is any digit.
That will give you lines with any number. If you want a specific number, use that for the final bullet point above.
Just keep in mind that this will also give you the line xyzzy ;;; 920194 if you go looking for 2019.
If you want just the 2019 numbers (i.e., without any digits on either side), you can use the zero-width negative look-behind and look-ahead assertions, assuming your version of grep has Perl-compatible regular expressions (PCRE, which GNU grep does with the -P flag):
grep -P ';;;.*(?<![0-9])2019(?![0-9])' inputFile.txt
This can be read as:
;;; is the literal text, three semicolons;
.* is zero or more of any character;
(?<![0-9]) means next match cannot be preceded by a digit;
2019 is the number you're looking for;
(?![0-9]) means previous match cannot be followed by a digit.
Use this Perl one-liner:
perl -F';' -lane 'print if $F[-1] =~ /2019/' in_file
Example:
( echo 'file;14;19;;;hello 2019' ; echo 'file2;2019;2020;;;this is a test 2020' ) | perl -F';' -lane 'print if $F[-1] =~ /2019/'
Prints:
file;14;19;;;hello 2019
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F';' : Split into #F on semicolon (;), rather than on whitespace.
$F[-1] : the last element of the array #F = the last element of the input line split on semicolon. Alternatively, use $F[5] (the 6th element - the arrays are 0-indexed), if you need to count from the left.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

difference in behavior of "tr" command on "dash" (-) between busybox and Ubuntu/Raspbian/etc

I have a function in a script which is used to validate that input strings don't contain any unacceptable characters. In this case, allowable characters are alpha, numeric, underscore, dash, period, and space.
#!/bin/sh
pattern="\_\-\. [a-zA-Z0-9]"
while [ 1 ]; do
echo "enter your test string"
read string
echo "result:"
echo "$string" | tr -cd "$pattern" | sed 's/\[//' | sed 's/\]//'
echo
echo
done
Testing on Raspbian (Raspberry Pi):
pi#raspberrypi:~ $ ./trtest.sh
enter your test string
dash-dash
result:
dash-dash
enter your test string
under_score
result:
under_score
Testing on an Onion board (OpenWRT/busybox):
root#Omega-FD22:~# ./trtest.sh
enter your test string
dash-dash
result:
dashdash <<<----- I'm not expecting this
enter your test string
under_score
result:
under_score
So,
#1 I am not sure why there is a difference in behavior between "tr" in these two cases, specifically on the "dash" character.
#2 If there's another way to do this, I'm open to it.
Thanks for any insights.
DL
FYI one of my diligent colleagues figured it out, so I am passing on his solution. If you move "\-" to the end of the pattern matching string, then it works in both environments. Somewhat beyond my ability to explain are the technical/philosophical underpinnings of this, but I'm glad it works.
Before:
pattern="\_\-\. [a-zA-Z0-9]"
After:
pattern="\_\. [a-zA-Z0-9]\-"

Parsing text from .txt files

I've a tabbed log file but I need only few chracters of the line marked 30.10 in the beginning.
Using the command
awk '/^30.10/{print}' FOOD_ORDERS_201907041307.DEL
i get this output
30.1006 35470015000205910002019070420190705 00000014870000000034
30.1006 35470015000205900002019070420190705 00000014890000000029
30.1006 35470023000205920002019070420190705 00000014900000000011
What i need to extract is 3547 and the last nth caracthers from the very end after zeros.
So, expected output will be:
3547
34
29
11
But if the last 10 caracthers contains leading zeros and a number, i need that number
While your question is unclear, your answer to Ed Morton's comment provides a bit more clarity on what you are trying to achieve. Where it is still unclear is just exactly you want from the third field. From your question and the various comments, it appears if the line begins with 30.10 you want the first 4-digits from second field and you want the rightmost digits that are [1-9] from the third field.
If that accurately captures what you need, then awk with a combination of substr, match and length string functions can isolate the digits you are interested in. For example:
awk '/^30.10/ {
l=match ($3, /[1-9]+$/)
print substr ($2, 1, 4) " " substr ($3, l, length($3)-l+1)
}' test
Would take the input file (borrowed from Dudi Boy's answer), e.g.
$ cat test
30.1006 35470015000205910002019070420190705 00000014870000000034
30.1006 35470015000205900002019070420190705 00000014890000001143
30.1006 35470015000205900002019070420190705 00000014890000000029
30.1006 35470023000205920002019070420190705 00000014900000000011
and return to you:
3547 34
3547 1143
3547 29
3547 11
Let me know if that accurately captures what you need.
Here is a simple awk script to do the task:
script.awk
/^30.10/ { # for each line starting with 30.10
last2chars = substr($3, length($3)-1); # extract last 2 chars from 3rd field into variable last2chars
if($3 ~ /00001143$/) last2chars = 1143; # if 3rd field ends with 1143, update variable last2chars respectively
print last2chars; # output variable last2chars
}
input.txt
30.1006 35470015000205910002019070420190705 00000014870000000034
30.1006 35470015000205900002019070420190705 00000014890000001143
30.1006 35470015000205900002019070420190705 00000014890000000029
30.1006 35470023000205920002019070420190705 00000014900000000011
running:
awk -f script.awk input.txt
outupt:
34
1143
29
11
GOT Part of it!
awk '/^30.10/{print}' FOOD_ORDERS_201907041307.DEL | sed 's/.*(..)/\1/'

XTEXT: Avoiding grammar match when used as a parameter

I'm still new to Xtext, so my apologies if this is a simple question.
I have a custom scripting language, that I am attempting to use XTEXT for syntax checking only. The language has one command per line, and has the format:
COMMAND:PARAMETERS
I have run into an issue when a parameter for a command is also a command keyword. The relevant part of the grammar file:
Model:
(commands += AbstractCommand)*
;
AbstractCommand:
Command1 | Command2
;
Command1:
command = 'command1' ':' value = Parameter
;
Command2:
command = 'command2' ':' value = Parameter
;
Parameter:
value = QualifiedParameter
;
QualifiedParameter:
(ID | ' ' | INT | '.' | '-' )+
;
The problem arises when one of the commands uses another another command as it's parameter. The rules of the language don't allow an actual 2nd command on the same line. In this case, it is just plain text that happens to have the same value as a pre-existing command. For example, assume Command1 and Command2 are expecting a complete sentence as it's parameter. Some sample valid commands would be:
Command1:This is a sentence
Command2:This is also a sentence
Command1:This sentence has Command2 in it
All 3 commands are valid, but the last line will generate an error "missing ":" at " ", because "Command2" has its own rules for parsing.
I've been reading the XTEXT documentation, and it seems like I can use first token set predicates to avoid reading the second token when the first is identified, but I cannot find any examples of this.
i am not sure if i get your question. maybe what you are looking for is the following:
Model: greetings+=Greeting*;
Greeting: "Hello" name=MyID "!";
MyID: "Hello" | ID;
this now allows to parse
Hello You!
Hello Hello!

extract a line from a file using csh

I am writing a csh script that will extract a line from a file xyz.
the xyz file contains a no. of lines of code and the line in which I am interested appears after 2-3 lines of the file.
I tried the following code
set product1 = `grep -e '<product_version_info.*/>' xyz`
I want it to be in a way so that as the script find out that line it should save that line in some variable as a string & terminate reading the file immediately ie. it should not read furthermore aftr extracting the line.
Please help !!
grep has an -m or --max-count flag that tells it to stop after a specified number of matches. Hopefully your version of grep supports it.
set product1 = `grep -m 1 -e '<product_version_info.*/>' xyz`
From the man page linked above:
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines. If the input is
standard input from a regular file, and NUM matching lines are
output, grep ensures that the standard input is positioned to
just after the last matching line before exiting, regardless of
the presence of trailing context lines. This enables a calling
process to resume a search. When grep stops after NUM matching
lines, it outputs any trailing context lines. When the -c or
--count option is also used, grep does not output a count
greater than NUM. When the -v or --invert-match option is also
used, grep stops after outputting NUM non-matching lines.
As an alternative, you can always the command below to just check the first few lines (since it always occurs in the first 2-3 lines):
set product1 = `head -3 xyz | grep -e '<product_version_info.*/>'`
I think you're asking to return the first matching line in the file. If so, one solution is to pipe the grep result to head
set product1 = `grep -e '<product_version_info.*/>' xyz | head -1`

Resources