EDI X12 tool is not able to convert the perpendicular as a segment separator - edi

While processing an EDI 210 X12 inbound file, receiving a following exception as EdiInvoice Service process failed: '', hexadecimal value 0x15, is an invalid character. Line 2, position 37.'. Because X12 Input file having a perpendicular in 106 position of ISA 16 element.
Can you please provide a solution to handle this symbol

It is not uncommon to define a segment separator like "|" is the X12 ISA segment (ISA16, character 106 of the ISA segment). Have a look at a related tutorial.
To my knowledge, characters with ASCII codes below 128 (hex 0x80) are allowed.
If your EdiInvoice service cannot handle partner-specific segment separators, you most likely have to contact the developer of your tool or the provider of your service first.
As suggested by eppye: If the sending partner can switch to an "easier" segment separator, this would also be an option, but there has to be a good reason for the partner to invest time and effort.
If the syntax of the EDI 210 X12 message conforms to the specification, the sending partner has no obligation to change anything.

No, neither CR nor CR+LF are allowed as Segment Terminator.
X12 is based on the idea of "graphical characters", to be independent of character encodings. CR is a non-printable character, not a graphical character.
The allowable chars in the basic char list are
"ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
!"&'()*+,-./:;?="
The extended character list is
"abcdefghijklmnopqrstuvwxyz%#[]_{}\|<>~^`#$"
Supporting the extended characters is a Trading Partner agreement item.
To be "X12 legal", a character from the basic set must be used. If the extended set is supported, a char from it may be used.
"X12 legal" is only as important as your trading partner thinks it is (or their software).
Using CR or CR+LF is not uncommon, even if frowned upon by purists.
Having a CR or CR+LF after each Segment Terminator is more common.

some companies indeed use non-printable character as segment terminator. AFAIK this is OK in ANSI X12. Sort of smart, as you are not allowed to use the segment terminator in your data, and data will (almost;-)) never contain hex 15. I have seen hex 07, carriage return etc..
possible solutions:
1. Contact the provider of the service, they should fix it.
2. Ask edi-partner if they can use other segment terminator.
3. pre-process the file and replace that character. Maybe not possible.

Not sure which EDI tool you're using, but another option is to define the element and segment terminators in your partner profile in your tool. I've done this in Sterling Integrator and know others support this as well.

In your case you need to confirm the trading partner of the below rule so that they may not get mistaken ISA16 with Segment Terminator or Suffix .
ISA16 (Sub Element separator)
Char
3a if the type is Hex
Limited to the values in the ASCII character set.
Segment terminator
~ if the type is Char
7e if the type is Hex.
empty
but if you do, you need to designate a suffix.This element is limited to the values in the ASCII character set.
Suffix
either None
CR (carriage return)
LF(line feed)
CRLF (carriage return/line feed).
Various combination for Segment Terminator and Suffix
Segment terminator
Segment terminator + carriage return
Segment terminator + line feed
Segment terminator + carriage return/line feed
Carriage return
Line feed
Carriage return/line feed

Related

EdiFabric - Change separator from caret to colon (or any change of separator)

This applies to EdiFabric 6.7.2.
In the code below, I'm loading a list in an 837 EDI format on a D_1328_7 field.
When the EDI is output the List<string> is arranged with a caret (^) separator. I've been informed by the consumer of my EDI 837 file that the separator must be a colon (:).
For the life of me I can't figure out how to change it, or even where the caret separator comes from.
object837.G_HL[_heirachy_HL_Index].G_CLM[_HL_G_CLM_Index].G_LX[_LX_Index].S_SV1.D_1328_7 = new List<string>();
object837.G_HL[_heirachy_HL_Index].G_CLM[_HL_G_CLM_Index].G_LX[_LX_Index].S_SV1.D_1328_7.Add(_diagnosisPointer1);
object837.G_HL[_heirachy_HL_Index].G_C
LM[_HL_G_CLM_Index].G_LX[_LX_Index].S_SV1.D_1328_7.Add(_diagnosisPointer2);
OUTPUT = SV1*HC:98940*75*UN*1*11**1^2**N**
Note the caret between 1 and 2.
Which version are you using and which 837 transaction ?
The caret '^' is the default repetition separator for X12. When you generate EDI you can specify explicitly which separators to use by setting the InterchangeContext in ToEdi(InterchangeContext context = null)

How to mask specific elements in HL7?

Currently I am learning how to work with HL7 and how to parse it in python. Now I was wondering what happens if a value in a HL7 segment contains a pipe sign, e.g. '|'. How is this sign handled? If there is no masking, it would lead to a crash of the HL7 parser. Is there a masking possibility?
\F\
You should read the relevant sections of chapter 2 of the version 2 standard about how escaping works in version 2.
The HL7 structure has defined escape sequences for the separators like |.
When you look at a HL7 message, the used five delimiters are right after the MSH:
MSH|^~\&
| is the Field separator F
^ the component separator S
~ is the repetition separator (for the second level elements) R
\ is the escape character E
& is the sub-component separator T
So to escape one of the special characters like |, you have to take the escape character and then add the defined letter (F,S, etc.)
So in above case, to escape the | you would have to put \F\. Or escaping the escape character is \E\.
If you like you can also change the delimiters after the MSH completely, but I don't recommend that.

Mime encoded headers with extra '=' (==?utf-8?b?base64string?=)

This might be a silly question but... here it goes!
I wrote my own MIME parser in native C++. It's a nightmare with the encodings! It was stable for the last 3 months or so but recently I noticed this Subject: header.
Subject: =?UTF-8?B?T2ZpY2luYSBkZSBJbmZvcm1hY2nDs24sIEluaWNpYXRpdmFzIHkgUmVjbGFt?===?UTF-8?B?YWNpb25lcw==?=
which should decode to this:
Subject: Oficina de Información, Iniciativas y Reclamaciones
The problem is there is one extra = (equal) in there which I can't figure out binding the two (why 2?) encoded elements which I don't understand why are separated. In theory the format should be: =?charset?encoding?encoded_string?= but found another subject that starts with two =.
==?UTF-8?B?blahblahlblah?=
How should I handle the extra =?
I could replace ==? with =? (which I am) before doing anything (and it works)... but I'm wondering if there's any kind of spec regarding this so I don't hack my way into proper functionality.
PS: How much I hate these relic protocols! All text communications should be UTF-8 and XML :)
In MIME headers encoded words are used (RFC 2047 Section 2.).
... (why 2?)
To overcome 75 encoded word limit, which is there because of 78 line length limit (or to use 2 different encodings like Chinese and Polish for example).
RFC 2047:
An 'encoded-word' may not be more than 75 characters long,
including 'charset', 'encoding', 'encoded-text', and delimiters.
If it is desirable to encode more text than will fit in an
'encoded-word' of 75 characters, multiple 'encoded-word's
(separated by CRLF SPACE) may be used.
Here's the example from RFC2047 (note there is no '=' in between):
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
=?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
Your subject should be decoded as:
"Oficina de Información, Iniciativas y Reclam=aciones"
mraq answer is incorrect. Soft line breaks apply to 'Quoted Printable' Content-Transfer-Encoding only, which can be used in MIME body.
It is called the "Soft Line Break" and it is the heritage of the SMTP protocol.
Quoting page 20 of RFC2045
(Soft Line Breaks) The Quoted-Printable encoding
REQUIRES that encoded lines be no more than 76
characters long. If longer lines are to be encoded
with the Quoted-Printable encoding, "soft" line breaks
must be used. An equal sign as the last character on a
encoded line indicates such a non-significant ("soft")
line break in the encoded text.
And also Wikipedia on Quoted-printable
A soft line break consists of an "=" at the end of an encoded line,
and does not appear as a line break in the decoded text.
From what I can see in the MIME RFC double equal signs are not valid input (for encoding), but keep in mind you could interpret the first equal sign as what it is and then use the following stuff for decoding. But seriously, those extra equal signs look like artifacts, maybe from an incorrect encoder.

The origin on why '%20' is used as a space in URLs

I am interested in knowing why '%20' is used as a space in URLs, particularly why %20 was used and why we even need it in the first place.
It's called percent encoding. Some characters can't be in a URI (for example #, as it denotes the URL fragment), so they are represented with characters that can be (# becomes %23)
Here's an excerpt from that same article:
When a character from the reserved set (a "reserved character") has
special meaning (a "reserved purpose") in a certain context, and a URI
scheme says that it is necessary to use that character for some other
purpose, then the character must be percent-encoded.
Percent-encoding a reserved character involves converting the
character to its corresponding byte value in ASCII and then
representing that value as a pair of hexadecimal digits. The digits,
preceded by a percent sign ("%") which is used as an escape character,
are then used in the URI in place of the reserved character. (For a
non-ASCII character, it is typically converted to its byte sequence in
UTF-8, and then each byte value is represented as above.)
The space character's character code is 32:
> ' '.charCodeAt(0)
32
Which is 20 in base-16:
> ' '.charCodeAt(0).toString(16)
"20"
Tack a percent sign in front of it and you get %20.
Because URLs have strict syntactic rules, like / being a special path separator character, spaces not being allowed in a URL and all characters having to be a certain subset of ASCII. To embed arbitrary characters in URLs regardless of these restrictions, bytes can be percent encoded. The byte x20 represents a space in the ASCII encoding (and most other encodings), hence %20 is the URL-encoded version of it.
It uses percent encoding. You can see the Percent Encoding part of the RFC for Uniform Resource Identifier (URI): Generic Syntax
A percent-encoding mechanism is used to represent a data octet in a
component when that octet's corresponding character is outside the
allowed set or is being used as a delimiter of, or within, the
component. A percent-encoded octet is encoded as a character
triplet, consisting of the percent character "%" followed by the two
hexadecimal digits representing that octet's numeric value. For
example, "%20" is the percent-encoding for the binary octet
"00100000" (ABNF: %x20), which in US-ASCII corresponds to the space
character (SP).

Parsing \"–\" with Erlang re

I've parsed an HTML page with mochiweb_html and want to parse the following text fragment
0 – 1
Basically I want to split the string on the spaces and dash character and extract the numbers in the first characters.
Now the string above is represented as the following Erlang list
[48,32,226,128,147,32,49]
I'm trying to split it using the following regex:
{ok, P}=re:compile("\\xD2\\x80\\x93"), %% characters 226, 128, 147
re:split([48,32,226,128,147,32,49], P, [{return, list}])
But this doesn't work; it seems the \xD2 character is the problem [if I remove it from the regex, the split occurs]
Could someone possibly explain
what I'm doing wrong here ?
why the '–' character seemingly requires three integers for representation [226, 128, 147]
Thanks.
226,128,147 is E2,80,93 in hex.
> {ok, P} = re:compile("\xE2\x80\x93").
...
> re:split([48,32,226,128,147,32,49], P, [{return, list}]).
["0 "," 1"]
As to your second question, about why a dash takes 3 bytes to encode, it's because the dash in your input isn't an ASCII hyphen (hex 2D), but is a Unicode en-dash (hex 2013). Your code is recieving this in UTF-8 encoding, rather than the more obvious UCS-2 encoding. Hex 2013 comes out to hex E28093 in UTF-8 encoding.
If your next question is "why UTF-8", it's because it's far easier to retrofit an old system using 8-bit characters and null-terminated C style strings to use Unicode via UTF-8 than to widen everything to UCS-2 or UCS-4. UTF-8 remains compatible with ASCII and C strings, so the conversion can be done piecemeal over the course of years, or decades if need be. Wide characters require a "Big Bang" one-time conversion effort, where everything has to move to the new system at once. UTF-8 is therefore far more popular on systems with legacies dating back to before the early 90s, when Unicode was created.

Resources