Coding in third-party EDI interface - edi

first time posting here so please bare with. I'm working on a project with a third-party software provider in which they have a built in EDI interface. They have installed some templates that parse the data that comes through in an .xml file and dumps all the data into the correct tables and columns in our database.
These templates were installed about 4 years ago, much before my time. And anyone at the time who was involved in the project has since moved on, with no documentation on what programming language this is or how everything works together. It's a pretty terrible interface, there is no way to debug, or see what the code is doing without sending the same dummy data over and over again, until I can figure out what it's doing.
tl;dr: I have some code that I'm working on and have no resources or documentation on what language it is or how it's working from start to finish. Can someone help me identify please? If I call the third party for help, they will charge my company $225/hour just to look at the code. They say it will take them 4 hours to look at it. Maybe a few more hours to fix it.
Example of code:
if (N702<>"" or N711 <> "" or N715<>"" or N722<>"") {
edi_stop_note$comment_type <- "OC";
edi_stop_note$comments <-concat("Trlr: ",N702," Desc: ",N711_desc," Len: ",N715," Type: ",N722);
ADDNOTE("edi_stop","edi_stop_note") ;
Another example:
LineCnt <- "-1";
Stop_No <- "0";
Stop_Seq <- "0";
edi_order$shipper_stop_id <-edi_stop$id;
edi_order$version <- edi_version;
edi_order$gs_date_time <- date_time(gs04_date, gs05_time);
edi_order$gs06_group_cntlno <- gs06_incntlno;
edi_order$st02_trxset_cntlno <- st02_cntlno;
edi_order$partner_id <- gs02_partner;
edi_order$reply_created <- "N";
edi_order$isa13_intr_cntlno <- isa13_intr_cntlno;
edi_order$direction <- "I";
edi_order$alt_partner_id <- alt_partner_id;
TIME <- cur_time("4");
DATE <- cur_date("6") ;
B204 <- "";


Confirming existence of a string in an xml table Lua

Good afternoon everyone,
My problem is that I have 2 XML lists
<List1> <Agency>String</Agency> </List1>
In Lua I need to create a program which is parsing this list and when the user inputs a matching string from List 1 or List 2, the program needs to actually confirm to the user if the string belongs to either L1 or L2 or if the string is inexistent. I'm new to Lua and to programming generally speaking and I would be very grateful for you answers. I have LuaExpat as a plugin but I can't seem to be able to actually read from file, I can only do some beginner tricks if the xml list is written in the code. At a later time this small program will be fed by an RSS.
local stuff = {}
xmldata="<Top><A/> <B a='1'/> <B a='2'/><B a='3'/><C a='3'/></Top>"
function doFunc(parser, name, attr)
if not (name == 'B') then return end
stuff[#stuff+1]= attr
local xml ={StartElement = doFunc}
This code is a tutorial over the web that works, everything is just fine it prints nr. 3. Now I want to know how to do that from an actual file, as if I input, "r" or "rb" ) under xmldata variable and run the same thing it returns either empty space or nil.

What do I need to add to use monadUserState with alex when parsing?

I am trying to write a program that will understand a language where embedded comments are allowed. Such as:
/* Here's a comment
/* This comment is further embedded */ second comment is closed
Must close first comment */
This should be recognized as a comment (and as such not stop at the first */ it sees unless it has only seen 1 comment opening prior).
This would be an easy issue to fix in C, I could simply have a counter that incremented when it saw comment opens and decrements when it sees a comment close. If the counter is at 0, we're in "code section".
However, without having state in Haskell, it's a little more challenging.
I've read up on monadUserState which supposedly allows to keep track of a state for this exact type of parsing. However, I can't find very much reading material on it aside from the tutorial page on alex.
When I try to compile it gives the error
templates\wrappers.hs:213:16: Not in scope: `alexEOF`
It should be noted that I directly changed from the "basic" wrapper to the "monadUserState" without changing my code (I don't know what to add in order to use it). It says that this must be initialized in the user code:
data AlexState = AlexState {
alex_pos :: !AlexPosn, -- position at current input location
alex_inp :: String, -- the current input
alex_chr :: !Char, -- the character before the input
alex_bytes :: [Byte], -- rest of the bytes for the current char
alex_scd :: !Int, -- the current startcode
alex_ust :: AlexUserState -- AlexUserState will be defined in the user program
I'm a bit of a lexxing noob and I'm not at all sure what I should be adding here to make it at least compile... then I can worry about the logic of the thing.
Update: Working example available here:
The file "tiger.x" (link) in the alex github repo contains an example of how to track embedded comments using the monadUserState wrapper.
Well, unfortunately that example doesn't compile but the ideas there should work.
Basically, these lines perform embedded comment processing:
<0> "/*" { enterNewComment `andBegin` state_comment }
<state_comment> "/*" { embedComment }
<state_comment> "*/" { unembedComment }
<state_comment> . ;
<state_comment> \n { skip }
As for alexEOF, the idea is to add an EOF token to your token data type:
data Tokens = ... | EOF
and define alexEOF as:
alexEOF = return EOF
See the file tests/tokens_monadUserState_bytestring.x in the alex repo for an example of this.

Where can I find a large tabbed hierarchical data set for parser testing?

First, apologies as I realize this is only tangentially related to parser programming.
I've spend hours looking for a text file containing something like the following but with hundreds (hopefully thousands) of sub-entries. A complete biological classification file would be perfect. A massive version of the following would be great as my parser parses simple tabbed files:
TL,DR - I need a massive single-file hierarchical data set something like the following:
The best I've been able to find are tree-of-life images (from which I transcribed the sample data set above). A single file with a TON of real data would be awesome. It doesn't have to be a biological classification data set, but I would really like the data to reflect something in the real-world. (My parser feeds a menu - would be great if the remainder of my testing was with a data set that actually meant something!) Even if the file is not tabbed but the data was fairly easily regex'ed to a tabbed format... that would be great.
Any ideas? Thanks!
It is possible that the xml layout was changed since the last answer but the code submitted above is no longer accurate. The resulting dump is extraneous. Some of the nodes have aliases (denoted as 'othername') that are reported as distinct nodes themselves.
I used the script below to generate the correct dump.
$reader = new XMLReader();
$reader->open(''); //15963 is the primates index
while ($reader->read()) {
switch ($reader->nodeType) {
if ($reader->name == "OTHERNAMES"){
if ($reader->name == "NODES"){
if ($reader->name == "NODE"){
if ($reader->name == "NAME" AND $set == -1){
echo str_repeat("\t", $reader->depth - 2); //repeat tabs for depth
$node = $reader->expand();
echo $node->textContent . "\n";
This turned out to be such a pain in the ass. I finally tracked down a data feed from "The Tree of Life Web Project" at I made the php script below to provide the basic functionality my post was looking for.
Change the node_id to have it print a tabbed representation of any of's data - just take the id from the page you're browsing on their site and change the node_id below.
Be aware though - their data feeds serve up large files, so definitely download the file to your own server (and change the "open" method below to point to the local file) if you're going to hit it more than once or twice.
More info on data feeds can be found here:
$reader = new XMLReader();
$reader->open(''); //15963 is the primates index
while ($reader->read()) {
switch ($reader->nodeType) {
if ($reader->name == "NAME"){
echo str_repeat("\t", $reader->depth - 2); //repeat tabs for depth
$node = $reader->expand();
echo $node->textContent . "\n";

Parse arbitary user input [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I have a database full of messages from a bulletin board. The board uses BB codes as formatting style. I.e.:
I'm not formatted
This is [b]bold[/b] text
Tags can also [i][b]be[/b] nested[/i]
And the [b]nesting [i]can be[/b] rather[/i] ugly
My ultimate goal is to convert these messages to some well formed XML (no discussion here ;) ). I don't want to use regular expression, which will fail at some point (in fact: it does).
First step: parse a message into some kind of internal representation (a graph, a tree, etc.). And I'm stuck at this point. The actual extraction is not that big problem, but the storage is.
How do I represent this kind of markup into some meaningful structure. My problem seems to be similar (or almost identical) to a browser building a DOM from a HTML file. So I think there are some strategies to solve it. I know the solution will not be perfect but im willing to invest a vast amount of time to do build the best possible.
Question: Do you have any tips/hint/comments? Any articles or paper you can recommend? Or a book which discusses these topic? I'm grateful for any input.
And the [b]nesting [i]can be[/b] rather[/i] ugly
I've written a parser very similar to what you are looking to do except that it would throw an error on your fourth example. Something to the effect of "Unexpected end tag [/b] within [i]".
I think that what you want to do is very doable but internally you will want to create a tree as if your original text was:
"And the [b]nesting [i]can be[/i][/b][i] rather[/i] ugly". (I don't think this would be necessary if you didn't need to convert it to XML later. If there were no need to convert to XML you could keep a linked list of text sections where each section is marked with its format combination)
Two possible approaches to this problem come to mind (of course there could be better possibilities). 1) Preprocess and insert the missing end and begin tags where necessary. 2) Build your parse tree and where there are overlapping tags imply the missing ones based on the current context. I think approach number (2) would be simpler and cleaner.
You could model your tree based on a composite pattern where you have an AbstractElement class, a TextElement class that extends AbstractElement, and a Tag class that extends AbstractElement and contains a list of sub-elements of type AbstractElement.
You would start by creating a root Tag instance. You would then call rootTag.parse(text). You would need a scanner that could return 3 types of tokens: text, start-tags, and end-tags. The scanner would allow you to push tokens onto it, which it would return before any normal scanned token. This would allow you to push new start tag tokens on after encountering and dealing with the unexpected end tag. You would also have to know when you are done with input. I'll use a 4th token type for that.
/* methods within class Tag */
public void parse(String text) {
MyScanner scanner = new MyScanner(text);
/* returns next token */
private Token parse(MyScanner scanner) {
Token firstToken = scanner.getNextToken();
return parse(scanner,firstToken);
private Token parse(MyScanner scanner) {
Token firstToken = scanner.getNextToken();
return parse(scanner,firstToken);
private Token parse(MyScanner scanner, Token token) {
while (!token.isDone() && !token.isEndTag()) {
if (token.isStartTag()) {
Tag subTag = new Tag(token.getValue());
token = scanner.getNextToken();
token = subTag.parse(scanner,token);
else {
TextElement text = new TextElement(token.getValue());
token = scanner.getNextToken();
if (token.isEndTag()) {
if (!token.getValue().equals(getName()) {
scanner.push(new Token(Token.START_TAG,token.getValue()));
else {
token = scanner.getNextToken();
return token;
So if you were to parse "And the [b]nesting [i]can be[/b] rather[/i] ugly", The following should get created.
rootTag.parse should be adding:
TextElement: "And the "
Tag: "b"
TextElement: "nesting "
Tag: "i"
TextElement: "can be"
(... at this point the odd [/b] is encountered ...)
(... push "i" start tag on the scanner ...)
(... here the [/b] is encountered (again) ...)
Tag: "i" (this was scanned because it had been pushed to the scanner)
TextElement: " rather"
TextElement: " ugly"
Note: Coding within a text area does not lend itself well to testing and debugging. Accept this answer as a hint or a possibility, not as your definate answer.

DBF Large Char Field

I have a database file that I beleive was created with Clipper but can't say for sure (I have .ntx files for indexes which I understand is what Clipper uses). I am trying to create a C# application that will read this database using the System.Data.OleDB namespace.
For the most part I can sucessfully read the contents of the tables there is one field that I cannot. This field called CTRLNUMS that is defined as a CHAR(750). I have read various articles found through Google searches that suggest field larger than 255 chars have to be read through a different process than the normal assignment to a string variable. So far I have not been successful in an approach that I have found.
The following is a sample code snippet I am using to read the table and includes two options I used to read the CTRLNUMS field. Both options resulted in 238 characters being returned even though there is 750 characters stored in the field.
Here is my connection string:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\datadir;Extended Properties=DBASE IV;
Can anyone tell me the secret to reading larger fields from a DBF file?
using (OleDbConnection conn = new OleDbConnection(connectionString))
using (OleDbCommand cmd = new OleDbCommand())
cmd.Connection = conn;
cmd.CommandType = CommandType.Text;
cmd.CommandText = string.Format("SELECT ITEM,CTRLNUMS FROM STUFF WHERE ITEM = '{0}'", stuffId);
using (OleDbDataReader dr = cmd.ExecuteReader())
if (dr.Read())
stuff.StuffId = dr["ITEM"].ToString();
string ctrlNums = dr["CTRLNUMS"].ToString();
char[] buffer = new char[750];
int index = 0;
int readSize = 5;
while (index < 750)
long charsRead = dr.GetChars(dr.GetOrdinal("CTRLNUMS"), index, buffer, index, readSize);
index += (int)charsRead;
if (charsRead < readSize)
You can find a description of the DBF structure here:
What I think Clipper used to do was modify the Field structure so that, in Character fields, the Decimal Places held the high-order byte of the size, so Character field sizes were really 256*Decimals+Size.
I may have a C# class that reads dbfs (natively, not ADO/DAO), it could be modified to handle this case. Let me know if you're interested.
Are you still looking for an answer? Is this a one-off job or something that needs doing regularly?
I have a Python module that is primarily intended to extract data from all kinds of DBF files ... it doesn't yet handle the length_high_byte = decimal_places hack, but it's a trivial change. I'd be quite happy to (a) share this with you and/or (b) get a copy of such a DBF file for testing.
Added later: Extended-length feature added, and tested against files I've created myself. Offer to share code with anyone who would like to test it still stands. Still interested in getting some "real" files myself for testing.
3 suggestions that might be worth a shot...
1 - use Access to create a linked table to the DBF file, then use .Net to hit the table in the access database instead of going direct to the DBF.
2 - try the FoxPro OLEDB provider
3 - parse the DBF file by hand. Example is here.
My guess is that #1 should work the easiest, and #3 will give you the opportunity to fine tune your cussing skills. :)
