Is there an EDI Segment that can contain more than 256 characters? - edi

Is there an EDI x12 segment that has no character limit? We often use the MSG segment for open text fields but this is capped at 256 characters, so we’re looking for an alternative that can handle 500+ characters.

The short answer
The MTX Text segment allows you to send messages of up to 4096 characters long, which is the longest available in X12. You can’t just swap out an MSG segment for an MTX segment, though. You can only use MTX if it’s included in the transaction set, and that depends on which X12 'release' (version) you're using.
For the 005010 release (one of the more popular ones), here are the transaction sets that MTX appears in:
105 Business Entity Filings
113 Election Campaign and Lobbyist Reporting
150 Tax Rate Notification
155 Business Credit Report
179 Environmental Compliance Reporting
194 Grant or Assistance Application
251 Pricing Support
274 Healthcare Provider Information
284 Commercial Vehicle Safety Reports
500 Medical Event Reporting
620 Excavation Communication
625 Well Information
650 Maintenance Service Order
805 Contract Pricing Proposal
806 Project Schedule Reporting
814 General Request, Response or Confirmation
832 Price/Sales Catalog
836 Procurement Notices
840 Request for Quotation
843 Response to Request for Quotation
850 Purchase Order
855 Purchase Order Acknowledgment
860 Purchase Order Change Request - Buyer Initiated
865 Purchase Order Change Acknowledgment/Request - Seller Initiated
Some additional clarification
Technically, character limits don't apply to X12 segments – what you're referring to is an X12 element. A segment is just a container for elements, and the element you're referring to is the element referenced in MSG01 (the first element of the MSG segment).
Each X12 element references an ID number. For each element, the ID number points to a dictionary that specifies the name, description, type, minimum length, and maximum length. In the case of MSG01, it points to data element [933][1].
Data element 933 – the one you're currently using – actually has a character limit of 264 characters (more than 256 characters, but not by much). Note: the link above is to the 005010 X12 release, but I checked backed to 003010 and up to 008030 and it seems to be 264 characters all the way through.
Now, back to your original question: is there a data element that allows for a larger character payload?
The answer is that there are 8 data elements that accept a payload larger than 264 characters.
Two of them are binary data types, which we can likely eliminate off the bat:
785. Binary Data. A string of octets which can assume any binary pattern from hexadecimal 00 to FF. Note: The maximum length is dependent upon the maximum data value that can be entered in DE 784, which value is 999,999,999,999,999. Max characters: 999999999999999.
1700. Transformed Data. Binary or filtered data having one or more security policy options applied; transformed data may represent compressed, encrypted, or compressed and encrypted plaintext. Max characters: 10000000000000000.
The rest are strings, which is promising:
364. Communication Number. Complete communications number including country or area code when applicable. Max characters: 2048.
1565. Look-up Value. Value used to identify a certificate containing a public key. Max characters: 4096.
1566. Keying Material. Additional material required for decrypting the one-time key. Max characters: 512.
1567. One-time Encryption Key. Hexadecimally filtered encrypted one-time key. Max characters: 512.
1573. Encoded Security Value. Encoded representation of the Security Value specified by the Security Value Qualifier. Max characters: 1.00E+16.
And, last but not least:
1551. Textual Data. To transmit large volumes of message text. Max characters: 4096.
Looks like a winner!
Note that element 1551 appears in only one segment: MTX, which was introduced in the 003060 X12 release. And in the initial 003060 release, it was only included in one X12 Transaction Set: 194 Grant or Assistance Application (which makes sense – a longer field was needed for grant applications).
It seems that as new releases were developed, the MTX segment made its way into more and more transaction sets – likely for exactly the reason you're asking. In 003070, it was included in 5 transaction sets; in 004010, 15; in 005010, 24, and so on.
The MTX segment uses element 1551 in both MTX02 and MTX03, so you can get double the length by using both of them. Note that there's a 'relational condition': If MTX-03 is present, then MTX-02 is required (in other words, you can't use MTX03 if you don't use MTX02 first).
And depending on the transaction set, the MTX segment may be able to be repeated as well.
Long story short: if the MTX segment is in the transaction set / release you're using, you're likely in luck.
Hope this helps.

Use multiple MSGs and trim the data at each to the maximum allowed. You usually have free text segments set with repetitions > 1, so you should be okay. That's how everybody does it.

Related

How many values can be stored per physical address in Memory?

I've read that you can only store one value per physical address in Ram. Now this data could be an instruction or data. Is this due to when the CPU reads in a Word from Ram, it can only deal with one value at a time? be that an instruction, int or a string. Is there a technical reason you can't fit more than one value per index. I've read about Scalar Processors but aren't they really old. Couldn't you fit two or more values in the width of a 64 bit Word for example? Or am i missing something really obvious here. I guess i'm asking is this a programming concept or is there an actual technical/hardware reason the cpu can't deal with more than one value per read of a Word from Ram..
Thanks
Rob
Most recent computers use addresses that point to a "Byte" location in memory.
Each machine instruction that includes "load (or store) from memory" functionality includes either an implicit or explicit specification of the number of bytes to be loaded/stored, starting at the target byte address. Common sizes are 1, 2, 4, 8 Bytes (corresponding to single data items of the most commonly supported sizes).
It is up to the application program to decide how to interpret the bytes and what operations to perform on them. It is certainly common to store the characters of a string in consecutive byte memory locations and process 4 or 8 characters at a time using 32-bit (4-Byte) or 64-bit (8-Byte) load and store instructions. Operation on the individual bytes (characters) may involves masking, shifting, and copying within the processor's general-purpose registers, but since the late 1990's, many/most microprocessors have included instructions specifically designed to treat the contents of a register as multiple independent (smaller) values.
"Packing" multiple data items into consecutive bytes of memory need not be limited to the sizes of registers for supported arithmetic types (1, 2, 4, 8 Bytes). Since about 2000, many processors have also included "Single Instruction Multiple Data" (SIMD) instructions to load bigger payloads into a set of "SIMD registers". (Common sizes are 16 and 32 Bytes, but some processors support 64 Byte registers.) Systems that include these SIMD load and store instructions typically also include instructions to operate on the SIMD registers "in parallel" -- treating the register contents as multiple independent values. It is common to provide instructions to treat the contents of a 256-bit (32-Byte) register as 32 1-Byte values, 16 2-Byte values, 8 4-Byte values, or 4 8-Byte values. The details vary by processor architecture and generation.

What should be the maximum length of the field considering it also have sub-fields/components?

If a field (for example in the PID segment) in an HL7 message contains sub-fields/components (e.g. the field PID.11.1 with & character delimiter) how I can calculate the length of the field?
Maximum length of field is total number of characters in all sub-fields/components.
Let us continue with your example PID.11.1. The maximum length of the field is 106 with datatype XAD (Extended Address). This datatype may have multiple sub-fields/components. Note that Length column in there is displayed zero.
So the maximum length of 106 can be consumed by only one component, or it can be split by two or more components.
Just a suggestion: Apart from standards, one must also take in to account the other party that is supposed to consume the message. There might be additional length related validations though does not match with specifications.

Is information stored in registers/memory structured as binary?

Looking at this question on Quora HERE ("Are data stored in registers and memory in hex or binary?"), I think the top answer is saying that data persistence is achieved through physical properties of hardware and is not directly relatable to either binary or hex.
I've always thought of computers as 'binary', but have just realized that that only applies to the usage of components (magnetic up/down or an on/off transistor) and not necessarily the organisation of, for example, memory contents.
i.e. you could, theoretically, create an abstraction in memory that used 'binary components' but that wasn't binary, like this:
100000110001010001100
100001001001010010010
111101111101010100001
100101000001010010010
100100111001010101100
And then recognize that as the (badly-drawn) image of 'hello', rather than the ASCII encoding of 'hello'.
An answer on SO (What's the difference between a word and byte?) mentions that processors can handle 'words', i.e. several bytes at a time, so while information representation has to be binary I don't see why information processing has to be.
Can computers do arithmetic on hex directly? In this case, would the internal representation of information in memory/registers be in binary or hex?
Perhaps "digital computer" would be a good starting term and then from there "binary digit" ("bit"). Electronically, the terms for the values are sometimes "high" and "low". You are right, everything after that depends on the operation. Most of the time, groups of bits are operated on together. Commonly groups are 1, 8, 16, 32 and 64 bits. The meaning of the bits depends on the program but some operations go hand-in-hand with some level of meaning.
When the meaning of a group of bits is not known or important, humans like to be able to decern the value of each bit. Binary could be used but more than 8 bits is hard to read. Although it is rare to operate on groups of 4 bits, hexadecimal is much more readable and is generally used regardless of the number of bits. Sometimes octal is used but that's based on contexts where there is some meaning to a subgrouping of the 3 bits or an avoidance of digits beyond 9.
Integers can be stored in two's complement format and often CPUs have instructions for such integers. Once such operation is negation. For a group of 8 bits, it would map 1 to -1,… 127 to -127, and -1 to 1, … -127 to 127, and 0 to 0 and -128 to -128. Decimal is likely the most valuable to humans here, not base 256, base 2 or base 16. In unsigned hexadecimal, that would be 01 to FF, …, 00 to 00, 80 to 80.
For an intro to how a CPU might do integer addition on a group of bits, see adder circuits.
Other number formats include IEEE-754 floating point and binary-coded decimal.
I think you understand that digital circuits are binary. So, based on the above, yes, operations do operate on a higher conceptual level despite the actual storage.

How are Urbit phonetic names encoded?

Urbit points (network addresses) are identified by 32-bit integers, but they're typically not referred to by their number. Instead, I usually see them represented in a human-pronounceable form where every byte is converted into a three-letter syllable. For example:
8 bits galaxy ~lyt
16 bits star ~diglyt
32 bits planet ~picder-ragsyt
64 bits moon ~diglyt-diglyt-picder-ragsyt
128 bits comet ~racmus-mollen-fallyt-linpex--watres-sibbur-modlux-rinmex
I initially assumed that every byte had a single text representation, but have seen that planets names usually don't include the name of their star, so it must be more complicated than that.
How does Urbit's phonetic name encoding system (#p-names) work?
Urbit's phonetic naming system encodes unsigned integers as human-readable strings. These unsigned integers sometimes represent the byte strings they encode to in big-endian (although that representation can't track leading zeros so the byte length must communicated out-of-band if needed). The phonetic naming scheme operates on these big-endian bytes.
The phonetic naming system has two variants. For general use there is #q-encoding, which is suitable for values of any length, and is frequently used to represent binary data in Hoon code or when interacting with the Dojo REPL. For Urbit point names there is #p-encoding, which is based on #q-encoding but modifies certain cases.
#q-Encoding: Pairs of Syllables
Urbit phonetic names are made up of 3-letter syllables, organized in two lists of 256 syllables each. Each syllable consists of a consonant, a vowel, then another consonant. The "prefix" syllable list uses the vowels a, i, and o, and the "suffix" syllable list uses the vowels e, u, and y, with one exception: zod, the first entry in the suffix list. The full syllable lists are included below.
Values fitting in one byte, from 0x00 to 0xFF, are encoded by taking the corresponding syllable from the suffix list. Examples: 0x00 becomes ~zod, 0x01 becomes ~nec.
Values fitting in two bytes, from 0x0100 to 0xFFFF, are encoded by looking up the syllable corresponding to the high byte in the prefix list and concatenating the syllable corresponding to the low byte in the suffix list. Examples: 0x0100 becomes ~marzod, 0x0101 becomes ~marnec.
Larger values are encoded by splitting them into two-byte pairs in big-endian order, encoding each as described above for values fitting in two bytes, and joining the results with - hyphen/minus characters. If the value is an odd number of bytes, the first byte pair is padded with a leading zero. Examples: 0x01_0000 becomes ~doznec-dozzod, 0x0101_0101 becomes ~marnec-marnec.
#p-Encoding: Scrambling Planets
The #p-encoding scheme is the same as #q for most values. However, it is different for values between 17 and 64 bits, which correspond to the IDs of planets and moons.
Planets are intended to correspond to real individuals on the Urbit network. Each planet is spawned from a star, and the 16 lower bits of the planet's ID are those of its parent star's ID. Under the #q-encoding system, this would also mean that the last two syllables of every planet's name would be its star's name. The Urbit developers didn't want each individual's name on the network to include the name of the star that happened to spawn their planet initially: that would artificially associate them with the star forever, even though they could immediately transfer their planet to a different star.
Their solution was to scramble all planet names randomly, to obfuscate the relationship between a planet's name and its parent star's name. This is implemented as a custom (obviously non-secure) cipher over the space of possible planet IDs. Because each star has 216 - 1 planets, the number of planets is not a power of two, so a conventional block cipher won't work directly. Instead, they use the construction described in Ciphers with Arbitrary Finite Domains (Black and Rockway 2002) over a custom Feistel-style block cipher optimized for speed (and compatibility).
This scrambling is applied on planet IDs, and on the lower 32 bits of a moon ID (which correspond to its parent planet's ID). Under #p-encoding, the planet with ID 0x01_0101 becomes ~ralnyt-botdyt, showing no connections to its parent star ~marnec. The star-planet relationship is the only one that is obfuscated. If you look at the names of a planet's moons, they include the name of the planet directly: for example, ~ralnyt-botdyt's moon 0x01_0001_0101 becomes ~doznec-ralnyt-botdyt, and 0x02_0001_0101 becomes ~dozbud-ralnyt-botdyt.
Implementations
When writing Hoon code, such as at the Dojo REPL, you can use the standard #p and #q functions directly to encode values to the corresponding phonetic names. In Hoon, a #p-encoded value is identified with the prefix ~ and a #q-encoded value is identified with the prefix .~, and either can be decoded back with the #u function. Hoon also uses . the period character as a (mandatory) thousands separator in integer literals.
> `#p`1.529.729.032
~diglyt-diglyt
> `#q`1.529.729.032
.~fonbyn-mopful
> `#u`~diglyt-diglyt
1.529.729.032
> `#u`.~diglyt-diglyt
3.246.440.832
In JavaScript, the official urbit-ob package provides similar functions.
import ob from "urbit-ob";
ob.patp(1529729032); // ~diglyt-diglyt
ob.patq(1529729032); // ~fonbyn-mopful
ob.patp2dec("~diglyt-diglyt"); // 1529729032
ob.patq2dec("~diglyt-diglyt"); // 3246440832
Full Syllable Lists
prefixes = ["doz","mar","bin","wan","sam","lit","sig","hid","fid","lis","sog",
"dir","wac","sab","wis","sib","rig","sol","dop","mod","fog","lid","hop","dar",
"dor","lor","hod","fol","rin","tog","sil","mir","hol","pas","lac","rov","liv",
"dal","sat","lib","tab","han","tic","pid","tor","bol","fos","dot","los","dil",
"for","pil","ram","tir","win","tad","bic","dif","roc","wid","bis","das","mid",
"lop","ril","nar","dap","mol","san","loc","nov","sit","nid","tip","sic","rop",
"wit","nat","pan","min","rit","pod","mot","tam","tol","sav","pos","nap","nop",
"som","fin","fon","ban","mor","wor","sip","ron","nor","bot","wic","soc","wat",
"dol","mag","pic","dav","bid","bal","tim","tas","mal","lig","siv","tag","pad",
"sal","div","dac","tan","sid","fab","tar","mon","ran","nis","wol","mis","pal",
"las","dis","map","rab","tob","rol","lat","lon","nod","nav","fig","nom","nib",
"pag","sop","ral","bil","had","doc","rid","moc","pac","rav","rip","fal","tod",
"til","tin","hap","mic","fan","pat","tac","lab","mog","sim","son","pin","lom",
"ric","tap","fir","has","bos","bat","poc","hac","tid","hav","sap","lin","dib",
"hos","dab","bit","bar","rac","par","lod","dos","bor","toc","hil","mac","tom",
"dig","fil","fas","mit","hob","har","mig","hin","rad","mas","hal","rag","lag",
"fad","top","mop","hab","nil","nos","mil","fop","fam","dat","nol","din","hat",
"nac","ris","fot","rib","hoc","nim","lar","fit","wal","rap","sar","nal","mos",
"lan","don","dan","lad","dov","riv","bac","pol","lap","tal","pit","nam","bon",
"ros","ton","fod","pon","sov","noc","sor","lav","mat","mip","fip"]
suffixes = ["zod","nec","bud","wes","sev","per","sut","let","ful","pen","syt",
"dur","wep","ser","wyl","sun","ryp","syx","dyr","nup","heb","peg","lup","dep",
"dys","put","lug","hec","ryt","tyv","syd","nex","lun","mep","lut","sep","pes",
"del","sul","ped","tem","led","tul","met","wen","byn","hex","feb","pyl","dul",
"het","mev","rut","tyl","wyd","tep","bes","dex","sef","wyc","bur","der","nep",
"pur","rys","reb","den","nut","sub","pet","rul","syn","reg","tyd","sup","sem",
"wyn","rec","meg","net","sec","mul","nym","tev","web","sum","mut","nyx","rex",
"teb","fus","hep","ben","mus","wyx","sym","sel","ruc","dec","wex","syr","wet",
"dyl","myn","mes","det","bet","bel","tux","tug","myr","pel","syp","ter","meb",
"set","dut","deg","tex","sur","fel","tud","nux","rux","ren","wyt","nub","med",
"lyt","dus","neb","rum","tyn","seg","lyx","pun","res","red","fun","rev","ref",
"mec","ted","rus","bex","leb","dux","ryn","num","pyx","ryg","ryx","fep","tyr",
"tus","tyc","leg","nem","fer","mer","ten","lus","nus","syl","tec","mex","pub",
"rym","tuc","fyl","lep","deb","ber","mug","hut","tun","byl","sud","pem","dev",
"lur","def","bus","bep","run","mel","pex","dyt","byt","typ","lev","myl","wed",
"duc","fur","fex","nul","luc","len","ner","lex","rup","ned","lec","ryd","lyd",
"fen","wel","nyd","hus","rel","rud","nes","hes","fet","des","ret","dun","ler",
"nyr","seb","hul","ryl","lud","rem","lys","fyn","wer","ryc","sug","nys","nyl",
"lyn","dyn","dem","lux","fed","sed","bec","mun","lyr","tes","mud","nyt","byr",
"sen","weg","fyr","mur","tel","rep","teg","pec","nel","nev","fes"]

Maximum length of a domain name without the http://www. & .com parts

What is the maximum length of the 'name' part in a domain?
I'm referring to the google in http://www.google.com. How long can the google part be without what's before and after it?
Each label may contain up to 63 characters.
"URI producers should use names
that conform to the DNS syntax, even when use of DNS is not
immediately apparent, and should limit these names to no more than
255 characters in length."
https://www.rfc-editor.org/rfc/rfc3986
"The DNS itself places only one restriction on the particular labels
that can be used to identify resource records. That one restriction
relates to the length of the label and the full name. The length of
any one label is limited to between 1 and 63 octets. A full domain
name is limited to 255 octets (including the separators)."
https://www.rfc-editor.org/rfc/rfc2181
The full domain name may not exceed a total length of 253 characters in its external dotted-label specification.
http://en.wikipedia.org/wiki/Domain_Name_System
If you are getting anywhere close to 253 characters, I think you should look for a shorter domain name...
TLDR Answer
Use these limits:
Labels: 61 octets.
Names: 253 octets.
Many applications will work even if you exceed these limits (like Gmail), but there are many older applications that will not.
Source
RFC1035: Domain Names - Implementation And Specification (published November 1987), an accepted Internet Standard, gives the following limits to subdomains and to the entire domain length when viewed in a browser...
Various objects and parameters in the DNS have size limits. They are
listed below. Some could be easily changed, others are more
fundamental.
labels 63 octets [bytes/characters] or less
names 255 octets [bytes/characters] or less
The working level of these are:
Labels: 61 octets.
Names: 253 octets.
That's because RFC821 (published August 1982) defines emails in the format of user#domain.com, and the smallest value for user would be one character. That leaves one character for #, and then you only have 253 characters left for the domain.com part.
This was reconfirmed numerous times...
RFC2181: Clarifications to the DNS Specification (published July 1997) : Only a proposed standard. "A full domain name is limited to 255 octets (including the separators)."
RFC3986: Uniform Resource Identifier (URI): Generic Syntax (published January 2005) : Accepted Internet standard. "URI producers should use names that conform to the DNS syntax, even when use of DNS is not immediately apparent, and should limit these names to no more than 255 characters in length."
RFC5321: Simple Mail Transfer Protocol (published October 2008) : Only a proposed standard. This RFC gives the max length of label or subdomains to be 64, one more than the others of 63. I recommend sticking with 63. "The maximum total length of a domain name or number is 255 octets."
You may have 63 characters per label (or subdomain) and 255 characters per name (this includes the TLD).
Notice that it gives the definition in octets. That's because it's looking at physical bytes, not literal bytes. For instance, \. is interpreted as . (one literal byte), because the \ escapes it, but it is encoded as \. (two physical bytes). These octet limits are physical byte limits.
As a demonstration, this website has a 63 characters domain name, the maximum allowed:
http://63-characters-is-the-longest-possible-domain-name-for-a-website.com

Resources