HAPI: Hl7 Parser do not separate the segment of the Hl7 message - hl7

I'm getting started to Hl7 using HAPI API.
I have an Hl7 test message (picked up from a real case) and I want to read the field of the various segment (MSH, PID, PV1, etc.)
The problem is that the Parser do not split the message in segments, but put all the message in the MSH segment.
I tried both the PipeParser and the GenericParser.
The message is an ORU_R01 message, version 2.3.1 (I used the debug and the getVersion() method to get those information)
Here is the test class
import java.io.IOException;
import ca.uhn.hl7v2.DefaultHapiContext;
import ca.uhn.hl7v2.HL7Exception;
import ca.uhn.hl7v2.HapiContext;
import ca.uhn.hl7v2.model.Message;
import ca.uhn.hl7v2.model.v231.group.ORU_R01_PIDPD1NK1NTEPV1PV2;
import ca.uhn.hl7v2.model.v231.group.ORU_R01_PIDPD1NK1NTEPV1PV2ORCOBRNTEOBXNTECTI;
import ca.uhn.hl7v2.model.v231.message.ORU_R01;
import ca.uhn.hl7v2.model.v231.segment.MSH;
import ca.uhn.hl7v2.model.v231.segment.PID;
import ca.uhn.hl7v2.parser.Parser;
public class Hl7EncodeTester
{
public static void main (String[] args) throws HL7Exception, IOException
{
String msg = "MSH|^~\\&|REPO|BCS|THE|Theorema|201412170140||ORU^R01|20140012620415|P|2.3.1||||||||PID|||711697^^^^PI^PRIAMO~RSSMHL69B15B081W^^^^NN||ROSSINI^MICHELE||19690215|F|||VIA^POMPEI^6^ASSO^CO^22063||031700239|||M||RSSMHL69B15B081W|300CP102||||ASSO|||100|||||PV1|||265^INRCA Ospedale^^959^^^0310^^||||||||||||||||||||||||||||||||||||||||||||||||||OBR|1||2014047190|69^RADIOLOGIA||||201412162247||||||||||||0310692014047190||201412162247|||||||||||||||OBX|1|FT|20140001988995||REFERTO FORMATO P7M BASE64 ||||||F|||20141216||CFmedicorefertante^CognomeMedicoRefertante^NomeMedicoRefertante||||TXA|1||multipart|||20141216|||||CFmedicorefertante^^^^^^^^^^^^COMPILATORE||||2014047190||AU|||||||NTE|1|O||";
HapiContext context = new DefaultHapiContext();
// Parser p = context.getPipeParser();
Parser p = context.getGenericParser();
Message adt = p.parse(msg);
ORU_R01 oruMsg = (ORU_R01) adt;
System.out.println("ORU_RO1 message = " + oruMsg.encode());
MSH msh = oruMsg.getMSH();
System.out.println("MSH segment = " + msh.encode());
ORU_R01_PIDPD1NK1NTEPV1PV2ORCOBRNTEOBXNTECTI segment1 =
oruMsg.getPIDPD1NK1NTEPV1PV2ORCOBRNTEOBXNTECTI();
ORU_R01_PIDPD1NK1NTEPV1PV2 segment2= segment1.getPIDPD1NK1NTEPV1PV2();
PID pid = segment2.getPID();
System.out.println("Patient Name = " + pid.getPatientName(0).getXpn2_GivenName().getValue());
}
}
This is the console output:
ORU_RO1 message = MSH|^~\&|REPO|BCS|THE|Theorema|201412170140||ORU^R01|20140012620415|P|2.3.1||||||||PID|||711697^^^^PI^PRIAMO~RSSMHL69B15B081W^^^^NN||ROSSINI^MICHELE||19690215|F|||VIA^POMPEI^6^ASSO^CO^22063||031700239|||M||RSSMHL69B15B081W|300CP102||||ASSO|||100|||||PV1|||265^INRCA Ospedale^^959^^^0310||||||||||||||||||||||||||||||||||||||||||||||||||OBR|1||2014047190|69^RADIOLOGIA||||201412162247||||||||||||0310692014047190||201412162247|||||||||||||||OBX|1|FT|20140001988995||REFERTO FORMATO P7M BASE64 ||||||F|||20141216||CFmedicorefertante^CognomeMedicoRefertante^NomeMedicoRefertante||||TXA|1||multipart|||20141216|||||CFmedicorefertante^^^^^^^^^^^^COMPILATORE||||2014047190||AU|||||||NTE|1|O
MSH segment = MSH|^~\&|REPO|BCS|THE|Theorema|201412170140||ORU^R01|20140012620415|P|2.3.1||||||||PID|||711697^^^^PI^PRIAMO~RSSMHL69B15B081W^^^^NN||ROSSINI^MICHELE||19690215|F|||VIA^POMPEI^6^ASSO^CO^22063||031700239|||M||RSSMHL69B15B081W|300CP102||||ASSO|||100|||||PV1|||265^INRCA Ospedale^^959^^^0310||||||||||||||||||||||||||||||||||||||||||||||||||OBR|1||2014047190|69^RADIOLOGIA||||201412162247||||||||||||0310692014047190||201412162247|||||||||||||||OBX|1|FT|20140001988995||REFERTO FORMATO P7M BASE64 ||||||F|||20141216||CFmedicorefertante^CognomeMedicoRefertante^NomeMedicoRefertante||||TXA|1||multipart|||20141216|||||CFmedicorefertante^^^^^^^^^^^^COMPILATORE||||2014047190||AU|||||||NTE|1|O
Patient Name = null
The whole message and the MSH segment I obtained are the same, I cannot split the message in segments (there are supposed to be: MSH, PID, PV1, OBR, OBX, TXA and NTE)
Someone knows a solution to my problem?

There's no segment terminator in your message. So the parser can't find it. Note that this terminator is binary and hard-coded in the standard
HL7 Messaging Standard Version 2.5 → Control → Message Framework → 2.5.4 Message delimiters
...The segment terminator is always a carriage return (in ASCII, a hex 0D). The other delimiters are defined in the MSH segment, with the field delimiter in the 4th character position...
See also:
blog article hl7standards.com: HL7 Separator Characters

Related

quickfixj DataDictionary Validation Issue

I'm new to Java. Working on QuikcFixJ.
I'm trying to load custom.xml file using datadictionary and later parsing msg string from this datadictionary. Not sure what is wrong with datadictionary , its not throwing any error whether I pass correct fix.xml file path or incorrect one.
Later when I pass msg string to Message object it just parse initial 3 tags irrespective of whatever xml file I mention in datadictionary.
Please give me some pointers to resolve it.
Adding code snippet:
public Message createMsg(String type, String message)
{
message = "35=" + type + "\001" + message;
message = "8=FIX4.2\0019=" + message.length() + "\001"+message ;
String msgCharStr = message.replaceAll("\\|", "\001");
int checksum = MessageUtils.checksum(msgCharStr);
msgCharStr = msgCharStr +"\00110=" + (1 +
MessageUtils.checksum(msgCharStr)) + "\001" ;
quickfix.Message msg = new Message();
msg.fromString(msgCharStr, null,true);
}
This code works fine for regular neworder or quoterequest msgs.
But, when I pass quoterequest msg with custom repeating group which I loaded via UseDictionary config settings, it gives undesired output msg.
e.g: Lets say if I pass input, "35=R|146=1|...|453=2|448=XX|447=G|452=1|448=YY|447=D|452=2|......"
parser gives output as :
"35=R|146=1|...|448=XX|447=G|452=1|453=2|448=YY|447=D|452=2|......"
----Code re-build steps for new custom FIX42.xml ---
Copied quickfixj2.3.1.zip (url: https://github.com/quickfix-j/quickfixj/releases )
Updated xml at path quickfixj-messages/quickfixj-messages-fix42/src/main/resources/FIX42.xml
run mvn build command.
Maven re-build jar for below:
quickfixj-all
quickfixj-core/
quickfixj-messages/
quickfixj-codegenerator/
quickfixj-dictgenerator/
Tried using all above jar files in my RobotFramework with an assumption that it will now refer the modified QuoteRequest msg which has repeating group support.
To my surprise, its output was exactly same. No changes.
After quickfixj rebuild for customised xml file.
Final version of code snippet:
public Message createMsg(String type, String message)
{
message = "35=" + type + "\001" + message;
message = "8=FIX4.2\0019=" + message.length() + "\001"+message ;
String msgCharStr = message.replaceAll("\\|", "\001");
int checksum = MessageUtils.checksum(msgCharStr);
msgCharStr = msgCharStr +"\00110=" + (1 +
MessageUtils.checksum(msgCharStr)) + "\001" ;
DataDictioanry dd = DataDictionary("FIX42.xml");
quickfix.Message msg = new Message();
msg.fromString(msgCharStr, dd ,true);
}

NLTK Word Extraction

So I am trying to read a txt file, process it by taking out the stop words, and then output that result into a new file. However, I keep getting the following error:
TypeError: expected a string or other character buffer object
This is my code:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
f=open('tess1.txt','rU')
stop_words = set(stopwords.words('english'))
raw=f.read()
word_tokens = word_tokenize(raw)
text = nltk.Text(word_tokens)
filtered_sentence = [w for w in word_tokens if not w in stop_words]
if w not in stop_words:
filtered_sentence.append(w)
K = open("tess12.txt", "w")
K.write(filtered_sentence)
K.close()
print(filtered_sentence)
The solution's to write a string inside the buffer:
K.write(str(filtered_sentence))

Printing the same character several times without a loop

I would like to "beautify" the output of one of my Dart scripts, like so:
-----------------------------------------
OpenPGP signing notes from key `CD42FF00`
-----------------------------------------
<Paragraph>
And I wonder if there is a particularly simple and/or optimized way of printing the same character x times in Dart. In Python, print "-" * x would print the "-" character x times.
Learning from this answer, for the purpose of this question, I wrote the following minimal code, which makes use of the core Iterable class:
main() {
// Obtained with '-'.codeUnitAt(0)
const int FILLER_CHAR = 45;
String headerTxt;
Iterable headerBox;
headerTxt = 'OpenPGP signing notes from key `CD42FF00`';
headerBox = new Iterable.generate(headerTxt.length, (e) => FILLER_CHAR);
print(new String.fromCharCodes(headerBox));
print(headerTxt);
print(new String.fromCharCodes(headerBox));
// ...
}
This gives the expected output, but is there a better way in Dart to print a character (or string) x times? In my example, I want to print the "-" character headerTxt.length times.
The original answer is from 2014, so there must have been some updates to the Dart language: a simple string multiplied by an int works.
main() {
String title = 'Dart: Strings can be "multiplied"';
String line = '-' * title.length
print(line);
print(title);
print(line);
}
And this will be printed as:
---------------------------------
Dart: Strings can be "multiplied"
---------------------------------
See Dart String's multiply * operator docs:
Creates a new string by concatenating this string with itself a number of times.
The result of str * n is equivalent to str + str + ...(n times)... + str.
Returns an empty string if times is zero or negative.
I use this way.
void main() {
print(new List.filled(40, "-").join());
}
So, your case.
main() {
const String FILLER = "-";
String headerTxt;
String headerBox;
headerTxt = 'OpenPGP signing notes from key `CD42FF00`';
headerBox = new List.filled(headerTxt.length, FILLER).join();
print(headerBox);
print(headerTxt);
print(headerBox);
// ...
}
Output:
-----------------------------------------
OpenPGP signing notes from key `CD42FF00`
-----------------------------------------

ANTLR best way to include meta-data in lexing/parsing (custom objects, kind of annotation)

I plan to include text metadata (like bold, font-size, etc.) in the process of parsing to achieve better recognition.
For instance, I have a given structure, where a word on its own line word/r/n which is bold and sized 24px, is the title for some article. In order to get better recognition results, I want to take the characters as well as the metadata in account. In terms of ANTRL I'm not sure how this could be done best. I'd like to do something like:
Wrap each character of the original text into a custom object with fields for the metadata and pass that to ANTLR.
Preprocess the text and insert at specific places annotations for the metadata which is considered by the grammer.
I really like to take option 1. but I'm not sure which part from ANTLR I need to subclass etc. Do I have to start at the ANTLRInputStream-Object, in order to get a proper stream for a subclassed Lexer to get custom Tokens for a subclassed Parser etc. Is there a more elegant way, especially in querying the tokens while parsing with actions in a {} block ?
If anyone has some hints and/or experiences this would be great!
EDIT:
Here is a more specific simple example: I have a file wich includes the encoding of metadata which I parse forehand. the actual text including newline look like the following:
entryOne
Here is some content one.
entryTwo
Here is some content two.
Where the titlesentryOneand entryTwo are originally font-size of 24px and the content is font-size of 12px (as exemplary given values). Char by char I create a new instance of a custom object encapsulating the character as String and the font-size.
I initialize respective objects for each of the characters with fields of the font-size, e.g for the first letter of entryOne like
MyChar aTitelChar = new MyChar("e", 24);
For the content, like the second line Here is some content one. I create instances of MyChar like:
MyChar aContentChar= new MyChar("H", 12);
All characters of the texts are wrapped in instances of the below MyChar-Class and added to a List<MyChar> in order to produce a new input for ANTLR.
below is the Java Class for the characters:
public class MyChar {
private int fontSizePx;
private String text;
public MyChar(String text, int fontSizePx) {
this.text = text;
this.fontSizePx = fontSizePx;
}
public int getFontSizePx() {
return fontSizePx;
}
public String getText() {
return text;
}
}
I want that my grammar matches the above two entries (or more formatted this way) which in turn consist each of a title and a content which is terminated with a fullstop. This grammar could look like this:
rule: entry+ NEWLINE
;
entry:
title
content
;
title:
letters NEWLINE
;
content:
(letters)+ '.' NEWLINE
;
letters:
LETTERS
;
LETTERS:
('a'..'z' | 'A'..'Z')+
;
WS:
(' ' | '\t' | 'f' ) + {$channel = HIDDEN;};
NEWLINE:'\r'? '\n';
Now, for instance, what I want to do is to find out if it's really a title of an entry by checking the font-size of all letters encompassing the title-token before titel-rule returns. In case the input conforms to the grammar but is actually some kind of mistake (the original metadata-encoded file starts with something that conforms to the title-rule but its actually the content) the author of the grammar could sort that out if he knows that the original font-size for titles is 24 and check this. If one of the letter-tokens doesn't equal to font-size 24 throw an exception/don't return/do smthg. appropriate.
The thing I'm pondering on is where to plug in the List<MyChar> to provide this functionality (to query kinds of metadata while parsing in context of ANTLR). I'm experimenting with ANTLR's Classes but as I'm new to ANTLR I thought probably some of the experienced users can point me in the right direction, like where would be a good insertion points for custom objects? should I start by implenting CharStream and override some methods? Probably there is something which ANTLR provides which I haven't found yet?
Here's one way to accomplish what I think you're going for, using the parser to manage matching input to metadata. Note that I made whitespace significant because it's part of the content and can't be skipped. I also made periods part of content to simplify the example, rather than using them as a marker.
SysEx.g
grammar SysEx;
#header {
import java.util.List;
}
#parser::members {
private List<MyChar> metadata;
private int curpos;
private boolean isTitleInput(String input) {
return isFontSizeInput(input, 24);
}
private boolean isContentInput(String input){
return isFontSizeInput(input, 12);
}
private boolean isFontSizeInput(String input, int fontSize){
List<MyChar> sublist = metadata.subList(curpos, curpos + input.length());
System.out.println(String.format("Testing metadata for input=\%s, font-size=\%d", input, fontSize));
int start = curpos;
//move our metadata pointer forward.
skipInput(input);
for (int i = 0, count = input.length(); i < count; ++i){
MyChar chardata = sublist.get(i);
char c = input.charAt(i);
if (chardata.getText().charAt(0) != c){
//This character doesn't match the metadata (ERROR!)
System.out.println(String.format("Content mismatch at metadata position \%d: metadata=(\%s,\%d); input=\%c", start + i, chardata.getText(), chardata.getFontSizePx(), c));
return false;
} else if (chardata.getFontSizePx() != fontSize){
//The font is wrong.
System.out.println(String.format("Format mismatch at metadata position \%d: metadata=(\%s,\%d); input=\%c", start + i, chardata.getText(), chardata.getFontSizePx(), c));
return false;
}
}
//All characters check out.
return true;
}
private void skipInput(String str){
curpos += str.length();
System.out.println("\t\tMoving metadata pointer ahead by " + str.length() + " to " + curpos);
}
}
rule[List<MyChar> metadata]
#init {
this.metadata = metadata;
}
: entry+ EOF
;
entry
: title content
{System.out.println("Finished reading entry.");}
;
title
: line {isTitleInput($line.text)}? newline {System.out.println("Finished reading title " + $line.text);}
;
content
: line {isContentInput($line.text)}? newline {System.out.println("Finished reading content " + $line.text);}
;
newline
: (NEWLINE{skipInput($NEWLINE.text);})+
;
line returns [String text]
#init {
StringBuilder builder = new StringBuilder();
}
#after {
$text = builder.toString();
}
: (ANY{builder.append($ANY.text);})+
;
NEWLINE:'\r'? '\n';
ANY: .; //whitespace can't be skipped because it's content.
A title is a line that matches the title metadata (size 24 font) followed by one or more newline characters.
A content is a line that matches the content metadata (size 12 font) followed by one or more newline characters. As mentioned above, I removed the check for a period for simplification.
A line is a sequence of characters that does not include newline characters.
A validating semantic predicate (the {...}? after line) is used to validate that the line matches the metadata.
Here is the code I used to test the grammar (minus imports, for brevity):
SysExGrammar.java
public class SysExGrammar {
public static void main(String[] args) throws Exception {
//Create some metadata that matches our input.
List<MyChar> matchingMetadata = new ArrayList<MyChar>();
appendMetadata(matchingMetadata, "entryOne\r\n", 24);
appendMetadata(matchingMetadata, "Here is some content one.\r\n", 12);
appendMetadata(matchingMetadata, "entryTwo\r\n", 24);
appendMetadata(matchingMetadata, "Here is some content two.\r\n", 12);
parseInput(matchingMetadata);
System.out.println("Finished example #1");
//Create some metadata that doesn't match our input (negative test).
List<MyChar> mismatchingMetadata = new ArrayList<MyChar>();
appendMetadata(mismatchingMetadata, "entryOne\r\n", 24);
appendMetadata(mismatchingMetadata, "Here is some content one.\r\n", 12);
appendMetadata(mismatchingMetadata, "entryTwo\r\n", 12); //content font size!
appendMetadata(mismatchingMetadata, "Here is some content two.\r\n", 12);
parseInput(mismatchingMetadata);
System.out.println("Finished example #2");
}
private static void parseInput(List<MyChar> metadata) throws Exception {
//Test setup
InputStream resource = SysExGrammar.class.getResourceAsStream("SysExTest.txt");
CharStream input = new ANTLRInputStream(resource);
resource.close();
SysExLexer lexer = new SysExLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
SysExParser parser = new SysExParser(tokens);
parser.rule(metadata);
System.out.println("Parsing encountered " + parser.getNumberOfSyntaxErrors() + " syntax errors");
}
private static void appendMetadata(List<MyChar> metadata, String string,
int fontSize) {
for (int i = 0, count = string.length(); i < count; ++i){
metadata.add(new MyChar(string.charAt(i) + "", fontSize));
}
}
}
SysExTest.txt (note this uses Windows newlines (\r\n)
entryOne
Here is some content one.
entryTwo
Here is some content two.
Test output (trimmed; the second example has deliberately-mismatched metadata):
Parsing encountered 0 syntax errors
Finished example #1
Parsing encountered 2 syntax errors
Finished example #2
This solution requires that each MyChar corresponds to a character in the input (including newline characters, although you can remove that limitation if you like -- I would remove it if I didn't already have this answer written up ;) ).
As you can see, it's possible to tie the metadata to the parser and everything works as expected. I hope this helps.

How to avoid building intermediates and useless AST nodes with ANTLR3?

I wrote an ANTLR3 grammar subdivided into smaller rules to increase readability.
For example:
messageSequenceChart:
'msc' mscHead bmsc 'endmsc' end
;
# Where mscHead is a shortcut to :
mscHead:
mscName mscParameterDecl? timeOffset? end
mscInstInterface? mscGateInterface
;
I know the built-in ANTLR AST building feature allows the user to declare intermediate AST nodes that won't be in the final AST. But what if you build the AST by hand?
messageSequenceChart returns [msc::MessageSequenceChart* n = 0]:
'msc' mscHead bmsc'endmsc' end
{
$n = new msc::MessageSequenceChart(/* mscHead subrules accessors like $mscHead.mscName.n ? */
$bmsc.n);
}
;
mscHead:
mscName mscParameterDecl? timeOffset? end
;
The documentation does not talk about such a thing. So it looks like I will have to create nodes for every intermediate rules to be able to access their subrules result.
Does anyone know a better solution ?
Thank you.
You can solve this by letting your sub-rule(s) return multiple values and accessing only those you're interested in.
The following demo shows how to do it. Although it is not in C, I am confident that you'll be able to adjust it so that it fits your needs:
grammar Test;
parse
: sub EOF {System.out.printf("second=\%s\n", $sub.second);}
;
sub returns [String first, String second, String third]
: a=INT b=INT c=INT
{
$first = $a.text;
$second = $b.text;
$third = $c.text;
}
;
INT
: '0'..'9'+
;
SPACE
: ' ' {$channel=HIDDEN;}
;
And if your parse the input "12 34 56" with the generated parser, second=34 is printed to the console, as you can see after running:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
TestLexer lex = new TestLexer(new ANTLRStringStream("12 34 56"));
TokenStream tokens = new TokenRewriteStream(lex);
TestParser parser = new TestParser(tokens);
parser.parse();
}
}
So, a shortcut from the parse rule like $sub.INT, or $sub.$a to access one of the three INT tokens, in not possible, unfortunately.

Resources