Problems with smooks - edi

I am currently evaluating smooks (www.smooks.org). It looks just like what we need but I am having problems getting a simple example to work.
I've got an ant script which downloads me all the dependencies including the mapping and binding jars for EDIFACT messages.
I am trying to convert a simple EDIFACT APERAK message to Java using their EJC (I am using JavaSE for this little test).
The conversion fails with an exception that some block wasn't expected (see below). So I am wondering whether I am missing some configuration (notice the first few lines in the log output).
Has anyone worked with Smooks' EJC? How can I get more info about what line it is complaining?
So this is the code:
D00BInterchangeFactory factory = D00BInterchangeFactory.getInstance();
File file = new File("aperak.edi");
BufferedInputStream ediSource = new BufferedInputStream(new FileInputStream(file));
StreamResult xmlStream = new StreamResult();
StringWriter xmlWriter = new StringWriter();
xmlStream.setWriter(xmlWriter);
UNEdifactInterchange interchange = factory.fromUNEdifact(ediSource);
//System.err.println("MEssage "+xmlWriter.toString());
if(interchange instanceof UNEdifactInterchange41){
UNEdifactInterchange41 interchange41 = (UNEdifactInterchange41)interchange;
for(UNEdifactMessage41 message: interchange41.getMessages()){
Object messageObj = message.getMessage();
System.err.println("Ref Num "+message.getMessageHeader().getMessageRefNum());
if(messageObj instanceof Aperak){
Aperak aperak = (Aperak)message.getMessage();
System.err.println("Aperak "+aperak);
}
}
}
When I run it I get this exception
02-Nov-2011 15:58:09 org.milyn.delivery.ContentDeliveryConfigBuilder$ContentHandlerExtractionStrategy addCDU
WARNING: ContentHandlerFactory [org.milyn.delivery.JavaContentHandlerFactory] unable to create resource processing instance for resource [Target Profile: [[*]], Selector: [cdu-creator], Selector Namespace URI: [null], Resource: [org.milyn.smooks.scripting.groovy.GroovyContentHandlerFactory], Num Params: [1]]. org/codehaus/groovy/control/CompilationFailedException
02-Nov-2011 15:58:10 org.milyn.delivery.ContentDeliveryConfigBuilder$ContentHandlerExtractionStrategy addCDU
WARNING: ContentHandlerFactory [org.milyn.delivery.JavaContentHandlerFactory] unable to create resource processing instance for resource [Target Profile: [[*]], Selector: [cdu-creator], Selector Namespace URI: [null], Resource: [org.milyn.smooks.scripting.groovy.GroovyContentHandlerFactory], Num Params: [1]]. org/codehaus/groovy/control/CompilationFailedException
Exception in thread "main" org.milyn.SmooksException: Failed to filter source.
at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:86)
at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:61)
at org.milyn.Smooks._filter(Smooks.java:516)
at org.milyn.Smooks.filterSource(Smooks.java:475)
at org.milyn.Smooks.filterSource(Smooks.java:449)
at org.milyn.edi.unedifact.d00b.D00BInterchangeFactory.fromUNEdifact(D00BInterchangeFactory.java:58)
at org.milyn.edi.unedifact.d00b.D00BInterchangeFactory.fromUNEdifact(D00BInterchangeFactory.java:40)
at EDITestReader.readFile(EDITestReader.java:37)
at EDITestReader.main(EDITestReader.java:59)
Caused by: org.xml.sax.SAXException: Unknown/Unexpected UN/EDIFACT control block segment code '
UN'.
at org.milyn.edisax.unedifact.handlers.r41.UNEdifact41ControlBlockHandlerFactory.getControlBlockHandler(UNEdifact41ControlBlockHandlerFactory.java:53)
at org.milyn.edisax.unedifact.UNEdifactInterchangeParser.parse(UNEdifactInterchangeParser.java:95)
at org.milyn.smooks.edi.unedifact.UNEdifactReader.parse(UNEdifactReader.java:77)
at org.milyn.delivery.sax.SAXParser.parse(SAXParser.java:70)
at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:75)
... 8 more
enter code here
The actual EDIFACT message is fairly simple:
UNA:+.? '
UNB+UNOC:3+IMP+XXX+20110902:1024+44090560'
UNH+440905601+APERAK:D:00B:UN:IMP10'
BGM+313++9+RE'
RFF+ACW:XXXXXXXXX1109020'
DTM+182:201109021018:203'
RFF+BM:XXXXXXXXX'
RFF+AGO:XXXXXXX1109020'
RFF+EQ:XXXXXXXX'
NAD+MS+IMP'
CTA+MS+:EDI'
COM+XXXXXXXXXXX:TE'
COM+support#XXXXX.XX:EM'
ERC+200:IMP02:DAK'
FTX+AAO+++ERR4045?: Gest.datum ist mehr als 90 Tage kleiner als das Tagesdatum+DE'
UNT+14+440905601'
UNZ+1+44090560'
When I remove the leading UNA and UNB segment it comes up with this exception: ([APERAK][D:00B:UN]. Must be a minimum of 1 instances of segment [BGM]). There is a BGM segment so I am not sure why it is complaining.
Caused by: org.milyn.edisax.EDIParseException: EDI message processing failed [APERAK][D:00B:UN]. Must be a minimum of 1 instances of segment [BGM]. Currently at segment number 2.
at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:460)
at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:411)
at org.milyn.edisax.EDIParser.parse(EDIParser.java:387)
at org.milyn.edisax.EDIParser.parse(EDIParser.java:371)
at org.milyn.edisax.unedifact.handlers.r41.UNHHandler.process(UNHHandler.java:80)
at org.milyn.edisax.unedifact.UNEdifactInterchangeParser.parse(UNEdifactInterchangeParser.java:98)
at org.milyn.smooks.edi.unedifact.UNEdifactReader.parse(UNEdifactReader.java:77)
at org.milyn.delivery.sax.SAXParser.parse(SAXParser.java:70)
UPDATE:
When I remove the carriage returns from the message
UNH+440905601+APERAK:D:00B:UN:IMP10'BGM+313++9+RE'RFF+ACW:XXXXXXXXX1109020'DTM+182:201109021018:203'
it works fine. But how do I get smooks to accept carriage returns and whitespaces and the two leading UNA/UNB segments? I probably skipped some part of message processing smooks normally does.
UPDATE 2:
Figured out UNA/UNB segments are supported (my mistake) but I am still having problems with the carriage returns.
Renat suggested to use a 'ignoreNewLines' option on the EDIParser. I've tried that but it doesn't seem to make a difference. I've also tried to configure smooks with this:
<?xml version="1.0" encoding="UTF-8"?>
<smooks-resource-list
xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:edi="http://www.milyn.org/xsd/smooks/edi-1.4.xsd">
<edi:reader mappingModel="/org/milyn/smooks/edi/xsd14/edi-to-xml-mapping.xml" ignoreNewLines="true" />
</smooks-resource-list>
Again with no success.
I have the feeling that the D00AInterchangeFactory (or what every version you use) configures its EDIParser differently and ignoreNewLines is ignored.
Is there a way to get the EDIParser the InterchangeFactory is using?

You need to add enable 'ignore new line' switch on the EDIParser.
You have multiple ways to do that, for example you can use a XMLReader#setFeature()
http://download.oracle.com/javase/1.5.0/docs/api/org/xml/sax/XMLReader.html#setFeature(java.lang.String,%20boolean)
or directly via EDIParser method call. See samples here
https://gist.github.com/825845
and here
https://gist.github.com/825843
Renat

Related

How to remove non-ascii char from MQ messages with ESQL

CONCLUSION:
For some reason the flow wouldn't let me convert the incoming message to a BLOB by changing the Message Domain property of the Input Node so I added a Reset Content Descriptor node before the Compute Node with the code from the accepted answer. On the line that parses the XML and creates the XMLNSC Child for the message I was getting a 'CHARACTER:Invalid wire format received' error so I took that line out and added another Reset Content Descriptor node after the Compute Node instead. Now it parses and replaces the Unicode characters with spaces. So now it doesn't crash.
Here is the code for the added Compute Node:
CREATE FUNCTION Main() RETURNS BOOLEAN
BEGIN
DECLARE NonPrintable BLOB X'0001020304050607080B0C0E0F101112131415161718191A1B1C1D1E1F7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF1F2F3F4F5F6F7F8F9FAFBFCFDFEFF';
DECLARE Printable BLOB X'20202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020';
DECLARE Fixed BLOB TRANSLATE(InputRoot.BLOB.BLOB, NonPrintable, Printable);
SET OutputRoot = InputRoot;
SET OutputRoot.BLOB.BLOB = Fixed;
RETURN TRUE;
END;
UPDATE:
The message is being parsed as XML using XMLNSC. Thought that would cause a problem, but it does not appear to be.
Now I'm using PHP. I've created a node to plug into the legacy flow. Here's the relevant code:
class fixIncompetence {
function evaluate ($output_assembly,$input_assembly) {
$output_assembly->MRM = $input_assembly->MRM;
$output_assembly->MQMD = $input_assembly->MQMD;
$tmp = htmlentities($input_assembly->MRM->VALUE_TO_FIX, ENT_HTML5|ENT_SUBSTITUTE,'UTF-8');
if (!empty($tmp)) {
$output_assembly->MRM->VALUE_TO_FIX = $tmp;
}
// Ensure there are no null MRM fields. MessageBroker is strict.
foreach ($output_assembly->MRM as $key => $val) {
if (empty($val)) {
$output_assembly->MRM->$key = '';
}
}
}
}
Right now I'm getting a vague error about read only messages, but before that it wasn't working either.
Original Question:
For some reason I am unable to impress upon the senders of our MQ
messages that smart quotes, endashes, emdashes, and such crash our XML
parser.
I managed to make a working solution with SQL queries, but it wasted
too many resources. Here's the last thing I tried, but it didn't work
either:
CREATE FUNCTION CLEAN(IN STR CHAR) RETURNS CHAR BEGIN
SET STR = REPLACE('–',STR,'–');
SET STR = REPLACE('—',STR,'—');
SET STR = REPLACE('·',STR,'·');
SET STR = REPLACE('“',STR,'“');
SET STR = REPLACE('”',STR,'”');
SET STR = REPLACE('‘',STR,'&lsqo;');
SET STR = REPLACE('’',STR,'’');
SET STR = REPLACE('•',STR,'•');
SET STR = REPLACE('°',STR,'°');
RETURN STR;
END;
As you can see I'm not very good at this. I have tried reading about
various ESQL string functions without much success.
So in ESQL you can use the TRANSLATE function.
The following is a snippet I use to clean up a BLOB containing non-ASCII low hex values so that it then be cast into a usable character string.
You should be able to modify it to change your undesired characters into something more benign. Basically each hex value in NonPrintable gets translated into its positional equivalent in Printable, in this case always a full-stop i.e. x'2E' in ASCII. You'll need to make your BLOB's long enough to cover the desired range of hex values.
DECLARE NonPrintable BLOB X'000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F';
DECLARE Printable BLOB X'2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E';
SET WorkBlob = TRANSLATE(WorkBlob, NonPrintable, Printable);
BTW if messages with invalid characters only come in every now and then I'd probably specify BLOB on the input node and then use something similar to the following to invoke the XMLNSC parser.
CREATE LASTCHILD OF OutputRoot DOMAIN 'XMLNSC'
PARSE(InputRoot.BLOB.BLOB CCSID InputRoot.Properties.CodedCharSetId ENCODING InputRoot.Properties.Encoding);
With the exception terminal wired up you can then correct the BLOB's of any messages containing parser breaking invalid characters before attempting to reparse.
Finally my best wishes as I've had a number of battles over the years with being forced to correct invalid message content in the "Integration Layer" after all that's what it's meant to do.

Error with index global in Lua recipe for foldit.org

I am just starting writing scripts for Foldit. This is my first experience of Lua, so my status is "noob with insight". In my first script, I want to print out contactmap data. I have copied and pasted the functions from http://foldit.wikia.com/wiki/Foldit_Lua_Functions. The error I get is:
attempt to index global 'structure' (a nil value)
Here is my code:
segmentCount = structure.GetCount()
print ("Segment count: " .. segmentCount)
for source = 1, segmentCount do
for target = source + 1, segmentCount do
inContact = contactmap.IsContact(source, target)
heat = contactmap.GetHeat(source, target)
print("Segments "..source..", ".. target..": heat = "..heat)
end
end
Thanks in advance for any enlightenment.
EDIT
The problem was that I had originally loaded a V1 recipe, and foldit continued to treat it as V1 syntax. The solution was to create a New (ScriptV2) recipe, and load exactly the same code.
You seem to be doing everything according to the documentation. The error means that contactmap is not defined (has nil value) and the code fails on accessing its elements. I'd check the version of Foldit you are using as the documentation says that contactmap is new in version 2.

Groovy- searching and excretion xml code from log file

I have so many texts in log file but sometimes i got responses as a xml code and I have to cut this xml code and move to other files.
For example:
sThread1....dsadasdsadsadasdasdasdas.......dasdasdasdadasdasdasdadadsada
important xml code to cut and move to other file: <response><important> 1 </import...></response>
important xml code to other file: <response><important> 2 </important...></response>
sThread2....dsadasdsadsadasdasdasdas.......dasdasdasdadasdasdasdadadsada
Hindrance: xml code starting from difference numbers of sign (not always start in the same number of sign)
Please help me with finding method how to find xml code in text
Right now i tested substring() method but xml code not always start from this same sign :(
EDIT:
I found what I wanted, function which I searched was indexOf().
I needed a number of letter where String "Response is : " ending: so I used:
int positionOfXmlInLine = lineTxt.indexOf("<response")
And after this I can cut string to the end of the line :
def cuttedText = lineTxt.substring(positionOfXmlInLine);
So I have right now only a XML text/code from log file.
Next is a parsing XML value like BDKosher wrote under it.
Hoply that will help someone You guys
You might be able to leverage XmlSlurper for this, assuming your XML is valid enough. The code below will take each line of the log, wrap it in a root element, and parse it. Once parsed, it extracts and prints out the value of the <important> element's value attribute, but instead you could do whatever you need to do with the data:
def input = '''
sThread1..sdadassda..sdadasdsada....sdadasdas...
important code to cut and move to other file: **<response><important value="1"></important></response>**
important code to other file: ****<response><important value="3"></important></response>****
sThread2..dsadasd.s.da.das.d.as.das.d.as.da.sd.a.
'''
def parser = new XmlSlurper()
input.eachLine { line, lineNo ->
def output = parser.parseText("<wrapper>$line</wrapper>")
if (!output.response.isEmpty()) {
println "Line $lineNo is of importance ${output.response.important.#value.text()}"
}
}
This prints out:
Line 2 is of importance 1
Line 3 is of importance 3

Exception when parsing a message

I am using a message set to parse a file and I am getting the following exception.I am not able to understand what it actually means.Please help me understand and resolve the problem.
<ParserException>
<File>/build/S000_P/src/MTI/MTIforBroker/MtiImbParser2/MtiImbFIHandler.cpp</File>
<Line>1017</Line>
<Function>MtiImbFIHandler::endMessageContent</Function>
<Type></Type>
<Name></Name>
<Label></Label>
<Catalog>BIPmsgs</Catalog>
<Severity>3</Severity>
<Number>5288</Number>
<Text>MTI. Not all the buffer was used when reading message</Text>
<Insert>
<Type>2</Type>
<Text>0</Text>
</Insert>
<Insert>
<Type>2</Type>
<Text>1659</Text>
</Insert>
</ParserException>
This kind of error happens when the message set "ends" before the data.
For example:
we have got a msg set like this:
NAME - 5 CHAR
SURNAME - 5 CHAR
Data is:
MARIOROSSIAAAAA
Parsed will raise that kind of exception because the msg set don't expect the string "AAAAA" but only:
NAME = "MARIO"
SURNAME = "ROSSI"
Without the precise example is not possibile go any further.

How to validate the connection string comming from a text file

there is a way by which we parse any time input given by user...
here i am taking the input of a connection string from a text file
i am supposed to assume that the user who keeps this string in the text file can be trusted.
but what if he makes a typo a mistake unknowingly, that will cause the coming code to cause exception,
i would like a way to check the string for the correct format, like parse it some way to see if it is the way an connection tring should be, and then possibly use that parsed result.
edit
as requested the sample code i using, and the connection the sqladaptor using
//Connect to a remote instance of SQL Server.
Server srv;
ServerConnection conContext = new ServerConnection();
conContext.ServerInstance = #"A-63A9D4D7E7934\SECOND";
conContext.LoginSecure = false;
conContext.Login = "sa";
conContext.Password = "two";
srv = new Server(conContext);
Response.Write(srv.Information.Version);
//Data Source=A-63A9D4D7E7834\SECOND;Initial Catalog=replicate;User ID=sa;Password=***********
A regular expresion can't validate an incorrect password, user name, or host name, so it is possible for your user to generate a valid string that will still cause an exception.
Also, you can't build a regex that will cover every possible combination of connection arguments.
The only robust solution, would be to wrap the connection code in a try/catch block and provide an appropriate error message.

Resources