Is there a clear algorithm for sorting/ordering the loops in an X12 file? - edi

Even though loops are kind of a logical concept in X12 (not directly physically represented in the text), every transaction set defines a set of loops that it can contain, including identifiers for the loops and an ordering for them. My question is, what is the rule for sorting loops, generically? Is there a concise set of rules that can be expressed in some code that should be able to take a collection of loops (with known identifiers such as 1000A, 2300BB, etc) and properly sort them?
The context of my question is that I'm working on a general-purpose library that applications will use to construct a model of an X12 document/transaction-set (and write out the text such a model represents). It has objects to represent Elements, Segments, and Loops. Ordering of Segments in a particular Loop is easy, they're dictated by the Implementation Guides. But I'm trying to get Loop ordering (within a Transaction Set) to work generically; that's what I'm asking about
It seems that the general rule is that Loops are ordered based on their identifiers using the numeric portion as the primary sort key, with the alpha portion as the secondary sort key. Of course hierarchical loops contained in others will be placed before and loops following the parent in that sort order (eg: 1000A, 2000A, 2010A, 2010B, 2100, 2300 - where 2010A and 2010B are children of 2000A).
I understand that the spec and Implementation Guides contain all of this info; I'm looking for the all-encompassing rule about loop ordering (not Segment ordering). Is there any concise way to express the rule algorithmically? Is there even a hard-and-fast rule at all?

As I mentioned in my comments, the standard has a loop value. Take a look at my screenshot of the Liaison Dictionary Viewer. The CLM segment has a LOOP value of 100. The segments underneath are children of the CLM segment (extended tag). Any "order" can be defined arbitrarily by the partner, or can be in any (undefined) order provided the data is qualified. But that loop can occur 100 times max and can have repeating segments inside the loop value.
The implementation guide will give you the correct order your partner wants them in. It seems like you're writing your own syntax validation engine though.

Related

What does it mean for a fact used in Isabelle to have a number after the name?

Most of the proofs suggested by Sledgehammer use this notation of number inside of parentheses:
by (smt (z3) ApplyAllResult.distinct(1)
ApplyResult.case(1)
ApplyResult.case(2)
ApplyResult.exhaust
applyInput.simps(1))
What does it mean for the fact to have such number?
Isabelle permits the use of fact lists indexed by a natural number starting with 1. Given a fact list fs and an index i, you can access an individual fact from the list by using the syntax fs(i). You can also select multiple facts from the list using multiple indexes (e.g., fs(1,3)), ranges (e.g. fs(2-5), fs(3-)) or a combination of both (e.g., fs(2,4-6)).
Examples of predefined fact lists are assms (which contains the assumptions of a theorem) and f.simps (which contains the equations defining function f).

Is there a well-defined difference between "normalizing" and "canonicalizing" data?

I understand canonicalization and normalization to mean removing any non-meaningful or ambiguous parts of of a data's presentation, turning effectively identical data into actually identical data.
For example, if you want to get the hash of some input data and it's important that anyone else hashing the canonically same data gets the same hash, you don't want one file indenting with tabs and the other using spaces (and no other difference) to cause two very different hashes.
In the case of JSON:
object properties would be placed in a standard order (perhaps alphabetically)
unnecessary white spaces would be stripped
indenting either standardized or stripped
the data may even be re-modeled in an entirely new syntax, to enforce the above
Is my definition correct, and the terms are interchangeable? Or is there a well-defined and specific difference between canonicalization and normalization of input data?
"Canonicalize" & "normalize" (from "canonical (form)" & "normal form") are two related general mathematical terms that also have particular uses in particular contexts per some exact meaning given there. It is reasonable to label a particular process by one of those terms when the general meaning applies.
Your characterizations of those specific uses are fuzzy. The formal meanings for general & particular cases are more useful.
Sometimes given a bunch of things we partition them (all) into (disjoint) groups, aka equivalence classes, of ones that we consider to be in some particular sense similar or the same, aka equivalent. The members of a group/class are the same/equivalent according to some particular equivalence relation.
We pick a particular member as the representative thing from each group/class & call it the canonical form for that group & its members. Two things are equivalent exactly when they are in the same equivalence class. Two things are equivalent exactly when their canonical forms are equal.
A normal form might be a canonical form or just one of several distinguished members.
To canonicalize/normalize is to find or use a canonical/normal form of a thing.
Canonical form.
The distinction between "canonical" and "normal" forms varies by subfield. In most fields, a canonical form specifies a unique representation for every object, while a normal form simply specifies its form, without the requirement of uniqueness.
Applying the definition to your example: Have you a bunch of values that you are partitioning & are you picking some member(s) per each class instead of the other members of that class? Well you have JSON values and short of re-modeling them you are partitioning them per what same-class member they map to under a function. So you can reasonably call the result JSON values canonical forms of the inputs. If you characterize re-modeling as applicable to all inputs then you can also reasonably call the post-re-modeling form of those canonical values canonical forms of re-modeled input values. But if not then people probably won't complain that you call the re-modeled values canonical forms of the input values even though technically they wouldn't be.
Consider a set of objects, each of which can have multiple representations. From your example, that would be the set of JSON objects and the fact that each object has multiple valid representations, e.g., each with different permutations of its members, less white spaces, etc.
Canonicalization is the process of converting any representation of a given object to one and only one, unique per object, representation (a.k.a, canonical form). To test whether two representations are of the same object, it suffices to test equality on their canonical forms, see also wikipedia's definition.
Normalization is the process of converting any representation of a given object to a set of representations (a.k.a., "normal forms") that is unique per object. In such case, equality between two representations is achieved by "subtracting" their normal forms and comparing the result with a normal form of "zero" (typically a trivial comparison). Normalization may be a better option when canonical forms are difficult to implement consistently, e.g., because they depend on arbitrary choices (like ordering of variables).
Section 1.2 from the "A=B" book, has some really good examples for both concepts.

Rails - Simplifying calculation models & objects

I have asked a few questions about this recently and I am getting where I need to go, but have perhaps not been specific enough in my last questions to get all the way there. So, I am trying to put together a structure for calculating some metrics based on app data, which should be flexible to allow additional metrics to be added easily (and securely), and also relatively simple to use in my views.
The overall goal is that I will be able to have a custom helper that allows something like the following in my view:
calculate_metric(#metrics.where(:name => 'profit'),#customer,#start_date,#end_date)
This should be fairly self explanatory - the name can be substituted to any of the available metric names, and the calculation can be performed for any customer or group of customers, for any given time period.
Where the complexity arises is in how to store the formula for calculating the metric - I have shown below the current structure that I have put together for doing this:
You will note that the key models are metric, operation, operation_type and operand. This kind of structure works ok when the formula is very simple, like profit - one would only have two operands, #customer.sales.selling_price.sum and #customer.sales.cost_price.sum, with one operation of type subtraction. Since we don't need to store any intermediate values, register_target will be 1, as will return_register.
I don't think I need to write out a full example to show where it becomes more complicated, but suffice to say if I wanted to calculate the percentage of customers with email addresses for customers who opened accounts between two dates (but did not necessarily buy), this would become much more complex since the helper function would need to know how to handle the date variations.
As such, it seems like this structure is overly complicated, and would be hard to use for anything other than a simple formula - can anyone suggest a better way of approaching this problem?
EDIT: On the basis of the answer from Railsdog, I have made some slight changes to my model, and re-uploaded the diagram for clarity. Essentially, I have ensured that the reporting_category model can be used to hide intermediate operands from users, and that operands that may be used in user calculations can be presented in a categorised format. All I need now is for someone to assist me in modifying my structure to allow an operation to use either an actual operand or the result of a previous operation in a rails-esqe way.
Thanks for all of your help so far!
Oy vey. It's been years (like 15) since I did something similar to what it seems like you are attempting. My app was used to model particulate deposition rates for industrial incinerators.
In the end, all the computations boiled down to two operands and an operator (order of operations, parentheticals, etc). Operands were either constants, db values, or the result of another computation (a pointer to another computation). Any Operand (through model methods) could evaluate itself, whether that value was intrinsic, or required a child computation to evaluate itself first.
The interface wasn't particularly elegant (that's the real challenge I think), but the users were scientists, and they understood the computation decomposition.
Thinking about your issue, I'd have any individual Metric able to return it's value, and create the necessary methods to arrive at that answer. After all, a single metric just needs to know how to combine it's two operands using the indicated operator. If an operand is itself a metric, you just ask it what it's value is.

Is it acceptable to store the previous state as a global variable?

One of the biggest problems with designing a lexical analyzer/parser combination is overzealousness in designing the analyzer. (f)lex isn't designed to have parser logic, which can sometimes interfere with the design of mini-parsers (by means of yy_push_state(), yy_pop_state(), and yy_top_state().
My goal is to parse a document of the form:
CODE1 this is the text that might appear for a 'CODE' entry
SUBCODE1 the CODE group will have several subcodes, which
may extend onto subsequent lines.
SUBCODE2 however, not all SUBCODEs span multiple lines
SUBCODE3 still, however, there are some SUBCODES that span
not only one or two lines, but any number of lines.
this makes it a challenge to use something like \r\n
as a record delimiter.
CODE2 Moreover, it's not guaranteed that a SUBCODE is the
only way to exit another SUBCODE's scope. There may
be CODE blocks that accomplish this.
In the end, I've decided that this section of the project is better left to the lexical analyzer, since I don't want to create a pattern that matches each line (and identifies continuation records). Part of the reason is that I want the lexical parser to have knowledge of the contents of each line, without incorporating its own tokenizing logic. That is to say, if I match ^SUBCODE[ ][ ].{71}\r\n (all records are blocked in 80-character records) I would not be able to harness the power of flex to tokenize the structured data residing in .{71}.
Given these constraints, I'm thinking about doing the following:
Entering a CODE1 state from the <INITIAL> start condition results
in calls to:
yy_push_state(CODE_STATE)
yy_push_state(CODE_CODE1_STATE)
(do something with the contents of the CODE1 state identifier, if such contents exist)
yy_push_state(SUBCODE_STATE) (to tell the analyzer to expect SUBCODE states belonging to the CODE_CODE1_STATE. This is where the analyzer begins to masquerade as a parser.
The <SUBCODE1_STATE> start condition is nested as follows: <CODE_STATE>{ <CODE_CODE1_STATE> { <SUBCODE_STATE>{ <SUBCODE1_STATE> { (perform actions based on the matching patterns) } } }. It also sets the global previous_state variable to yy_top_state(), to wit SUBCODE1_STATE.
Within <SUBCODE1_STATE>'s scope, \r\n will call yy_pop_state(). If a continuation record is present (which is a pattern at the highest scope against which all text is matched), yy_push_state(continuation_record_states[previous_state]) is called, bringing us back to the scope in 2. continuation_record_states[] maps each state with its continuation record state, which is used by the parser.
As you can see, this is quite complicated, which leads me to conclude that I'm massively over-complicating the task.
Questions
For states lacking an extremely clear token signifying the end of its scope, is my proposed solution acceptable?
Given that I want to tokenize the input using flex, is there any way to do so without start conditions?
The biggest problem I'm having is that each record (beginning with the (SUB)CODE prefix) is unique, but the information appearing after the (SUB)CODE prefix is not. Therefore, it almost appears mandatory to have multiple states like this, and the abstract CODE_STATE and SUBCODE_STATE states would act as groupings for each of the concrete SUBCODE[0-9]+_STATE and CODE[0-9]+_STATE states.
I would look at how the OMeta parser handles these things.

Time series modeling in f#-- seq vs array vs vector vs list vs generic list

If I want to make a time series type in F# to hold stock prices, which basic type should I use? We need
Select a subset based on time index,
Calculate basic statistics for a subset like mean, STD or for several subsets like correlations,
Append item for new data and fast update statistics or technical indicators,
Do linear regression between time series, etc
I have read that array has a better performance, seq has a smaller memory footnote, list is better for adding items and F# vector is easier for certain math calculation. To balance all the trade offs, how would you model a stock price time series in f#? Thanks.
As a concrete representation you can choose either array or list or some other .NET colllection type. A sequence seq<'T> is an abstract type and both array and list are automatically also sequences - this means that when you write some code that works with sequences, it will work with any concrete data type (array, list or any other .NET collection).
So, when writing data processing, you can use Seq by default (as it gives you great flexibility - it doesn't matter what concrete representation you use) and then optimize some operations to use the concrete representation (whatever that will be) if you need something to run faster.
Regarding the concrete representation - I think the crucial question is whether you want to add elements without changing original data structure (immutable list or array used in an immutable way) or whether you want to mutate the data structure (e.g. use some mutable .NET collection).
If you need to add new items freuqently then you can either use immutable list (which supports appending elements to front) or a mutable collection (array won't do as it cannot be resized).
If you're working on a more sophisticated system, I would recommend taking a look at ObservableCollection<T> (see MSDN). This is a collection that automatically notifies you when it is changed. In response to the notification, you could update your statistics (it also tells you which elements were added, so you don't need to recalculate everything). However, F# doesn't have any libraries for working with this type, so you'll need to write a lot of things yourself.
If you're adding data only rarely or adding them in larger groups, you could use array (and allocate new array each time you add items). If you have only relatively small number of items in the collection, you could use lists (where adding item is easy).
For numerical calculations, the F# PowerPack (and types like vector) offer only quite limitied set of features, so you may need to look at some thrid party libraries. Extreme optimizations is a commercial library with some F# examples and Math.NET is an open source alternative.
Otherwise, it is difficult to give any concrete advice - can you add some more details about your system? (e.g. how large the data set is, how many items need to be added how often etc...)

Resources