Parsing FETCH multiple UID - imap

I need to write an IMAP code to parse the result of this command:
tag FETCH 1,2,3,4 ALL
Most of the time, the response is something like this
* 1 FETCH (FLAGS ... ) ENVELOPE ("time" "subject" ... )\r\n
* 2 FETCH (FLAGS ... ) ENVELOPE ("time" "subject" ... )\r\n
....
tag OK FETCH COMPLETE
And so on where each Envelope starts with asterik UID, and end with a CRLF, so I can use the CRLF as a parse point.
The problem is some servers are responding to me using IMAP string literals, ie {150}\r\n .... and since the \r\n is part of the string literal I can no longer use it as a parse point.
One idea is to use the * UID as a parsepoint, but if someone coincidentally uses that as an email subject, or whatnot, it will break the algo so I believe its a bad idea to do that.
Can someone tell me how to effectively parse this type of response without using CRLF? Most thanks you very much.
edit - Hopefully to improve question, I am trying to parse each individual ENVELOPE into it's own string based on parse points, where I need a parsepoint that identifies the start of one string and the end of another.

The trick you need is one to distinguish the two kinds of line feeds, and it exists: Start reading, read until you see a CRLF, then look at the start of what you have. Is it tag space OK, NO, BAD or PREAUTH? If so, you have a complete response. If not, look at the last 10-15 characters. Are they "{", a number, optionally a plus sign, and "}" and the CRLF? If so, read until the next CRLF and repeat. If not, you have a complete response.
Note that in IMAP, you have to act on a response before you can parse the next one. MSN handling breaks if you don't, there may be other problems too.

Related

Convert WideString to String

How can I replace the empty widestring characters from a string?
e.g:
'H e l l o' convert to 'Hello'
I read an text blob field from a DB, and want to store it in another table. When I read it, I get the extra spaces, when I store it the extra spaces remains, and cannot be read correctly at the next query.
DXE, Firebird 2.5
UPDATE:
I use the IBQuery -> DataSetProvider -> ClientDataSet without desin-time created fields.
It seams, that the IBQuery retrieves the data in that wrong format.
Current Write Code:
blobStream := TStringStream.Create;
...
blobStream.WriteString(blobText); //blobText = 'H e l l o';
ibsql.ParamByName('ABLOBCOL').LoadFromStream(blobStream);
...
ibsql.ExecQuery;
...
In the FB database is 'H e l l o' stored, but it must be 'Hello'. Since it seems to be a bug in IBQuery, I need a way to convert that string.
First of all, I'm not going to attempt to describe how to remove every other character from a string. Whilst that might appear to solve your problem it merely papers thinly over the gaping cracks. What you have here is a classic text encoding mismatch. The real solution to your problem will involve fixing the mismatch.
I suspect the problem arises in code that you have not shown. As I understand your question now, you have a string variable blobText that contains incorrectly encoded text. But the code in the question takes blobText as input, and so the damage has already been done by the time we reach the code in the question. The key to solving this is the code that puts duff data into blobText.
You need to find the code which assigns to blobText and sort out the encoding problem there. It looks like you have taken UTF-16 encoded text and interpreted it as if it had an 8 bit encoding. I could speculate as to how that would happen, but it would be better for you to look at the actual code, the code that assigns to blobText. If you cannot work it out, please do post an update to the question.
I'm pretty confident that there are no bugs in the database libraries and that this is just an encoding mismatch.

split a binary string with null bytes inside

Let a binary string composed of messages separated by one null byte:
<message><null><message><null> ... <message><null>
I would like to split them. Easy, I do:
binary:split(Bin,<<0>>,[global]),
But ...
But one message is composed of two parts:
<length><texte>
length has a 4-bytes fixed size and the length can have null bytes !
Then the split function cannot cut correctly the string.
Does exist a way according to erlang state of art ?
If all messages have a 4 byte length header, I'd recommend using erlang:decode_packet(Type,Bin,Options) where Type is set to 4. This will return {ok, Message, Rest} where Message is your first message and Rest is the rest of the binary. Just rinse and repeate until you reach the end of the binary (you might have to take care of the null bytes yourself inbetween).
If, however, not all messages have a 4 byte length prefix and there's no deterministic way of detecting that header it is probably impossible to reliably parse such a list.

How to unlex using Flex(The Fast Lexical Analyzer)?

Is there any way to put a token back into the input stream using Flex? I imagine some function like yyunlex().
There is the macro REJECT which will put the token back to stream and continue to match other the rules as though the first matched didn't. If you just want to put some char back to stream #Kizaru's answer will suffice.
Example snippet:
%%
a |
ab |
abc |
abcd ECHO; REJECT;
.|\n printf("xx%c", *yytext);
%%
You have a few options.
You can put each character for the token back onto the input stream using unput(ch) where ch is the character. This call puts ch as the next character on the input stream (next character to be considered in scanning). So you could do this if you save the string during the token match.
You might want to look into yyless(0) which will put all of the characters from the token back onto the input stream too. I never used this one though, so I'm not sure if there are any gotchas. You can specify an integer n hwich will put all but the first n characters back on the input stream.
Now, if you're going to do this often during scanning/parsing, you might want to use lex just to build tokens and place the tokens onto your own data structure for parsing. This is akin to what bison and yacc's generated yyparse() function does.

Dynamically generate short URLs for a SQL database?

My client has database of over 400,000 customers. Each customer is assigned a GUID. He wants me to select all the records, create a dynamic "short URL" which includes this GUID as a parameter. Then save this short url to a field on each clients record.
The first question I have is do any of the URL shortening sites allow you to programatically create short urls on the fly like this?
TinyUrl allow you to do it (not widely documented), for example:
http://tinyurl.com/api-create.php?url=http://www.stackoverflow.com/
becomes http://tinyurl.com/6fqmtu
So you could have
http://tinyurl.com/api-create.php?url=http://mysite.com/user/xxxx-xxxx-xxxx-xxxx
to http://tinyurl.com/64dva66.
The guid doesn't end up being that clear, but the URLs should be unique
Note that you'd have to pass this through an HTTPWebRequest and get the response.
You can use Google's URL shortner, they have an API.
Here is the docs for that: http://code.google.com/apis/urlshortener/v1/getting_started.html
This URL is not sufficiently short:?
http://www.clientsdomain.com/?customer=267E7DDD-8D01-4F38-A3D8-DCBAA2179609
NOTE: Personally I think your client is asking for something strange. By asking you to create a URL field on each customer record (which will be based on the Customer's GUID through a deterministic algorithm) he is in fact essentially asking you to denormalize the database.
The algorithm URL shortening sites use is very simple:
Store the URL and map it to it's sequence number.
Convert the sequence number (id) to a fixed-length string.
Using just six lowercase letter for the second step will give you many more (24^6) combinations that the current application needs, and there's nothing preventing the use of a larger sequence at some point in time. You can use shorter sequences if you allow for numbers and/or uppercase letters.
The algorithm for the conversion is a base conversion (like when converting to hex), padding with whatever symbol represents zero. This is some Python code for the conversion:
LOWER = [chr(x + ord('a')) for x in range(25)]
DIGITS = [chr(x + ord('0')) for x in range(10)]
MAP = DIGITS + LOWER
def i2text(i, l):
n = len(MAP)
result = ''
while i != 0:
c = i % n
result += MAP[c]
i //= n
padding = MAP[0]*l
return (padding+result)[-l:]
print i2text(0,4)
print i2text(1,4)
print i2text(12,4)
print i2text(36,4)
print i2text(400000,4)
print i2text(1600000,4)
Results:
0000
0001
000c
0011
kib9
4b21
Your URLs would then be of the form http://mydomain.com/myapp/short/kib9.

Reading EDI Formatted Files

I'm new to EDI, and I have a question.
I have read that you can get most of what you need about an EDI format by looking at the last 3 characters of the ISA line. This is fine if every EDI used line breaks to separate entities, but I have found that many are single line files with any number of characters used as breaks. I have noticed that the VERY last character in every EDI I've parsed is the break character. I've looked at a few hundred, and have found no exceptions to this. If I first grab that character, and use that to obtain the last 3 of the ISA line, should I reasonably expect that I will be able to parse data from an EDI?
I don't know if this helps, but the EDI 'types' in question tend to be 850, 875. I'm not sure if that is a standard or not, but it may be worth mentioning.
the transaction type of edi doesn't really matter (850 = order, 875 = grocery po). having written a few edi parsers, here are a few things i've found:
you should be able to count on the ISA (and the ISA only) being fixed width (105 characters if memory serves).
strip off the first 105 characters. everything after that and before the first occurance of "GS" is your line terminator (this can be anything, include a 0x07 - the beep - so watch out if you're outputting to stdout for debugging or you may have a bunch of beeps coming out of the speaker). normally this is 1 or 2 characters, sometimes it can be more (if the person sending you the data adds an extra terminator for some reason). once you have the line terminator, you can get the segment (field) delimiter. i normally pull the 3 character of the GS line and use that, though the 4th character of the ISA line should work as well.
also be aware that you can get a file with multiple ISA's in it. in that case you cannot count on the line or field separators being the same within each ISA.
another thing .. it is also possible (again, not sure if its spec) for an edi file to have a variable length ISA. this is very rare, but i had to accommodate it. if that happens you have to parse the line into its fields. the last field in the ISA is only a character long, so you can determine the real length of the ISA from it. if it were me, i wouldn't worry about this unless you see a file like it. it is a rare occurance.
what i've said above may not be to the letter of the "spec" ... that is, i'm not sure its legal to have different line separators in the same file, but in different ISAs, but it is technically possible and I accommodate it because i have to process files that come through in that manner. the edi processor i use processes upwards of 5000 files a day with over 3000 possible sources of data (so i see a lot of weird stuff).
best regards,
don
EDI content is composed of segments and elements.
To parse it, you will need to break it up into segments first, and then elements like so (in PHP):
<?php
$edi = "YOUR EDIT STRING!";
$segment_delimeter = "~";
$element_delimeter = "*";
//First break it into segments
$segments = explode($segment_delimiter, $edi);
//Now break each segment into elements
$segs_and_elems = array();
foreach($segments as $segment){
$segs_and_elems[] = explode(element_delimeter, $segment);
}
//To echo out what type of EDI this is for example:
foreach($segs_and_elems as $seg){
if($seg[0] == "GS"){ echo($seg[1]); }
}
?>
Hope this helps get you started.
For header information the following java will let you get the basic info pretty easy.
C# has the split as well and the code looks very similar
try {
String sCurrentLine;
fileContent = new BufferedReader(new FileReader(filePathName));
sCurrentLine = fileContent.readLine();
// get the delimiter after ISA, if you know your field delimiter just force it.
// we look at lots of different senders messages so never sure what it will be.
delimiterElement = sCurrentLine.substring(3,1); // Grab the delimiter they are using
String[] splitMessage = sCurrentLine.split(delimiterElement,16); // to get the messages if everything is on one line of course
senderQualifier = splitMessage[5]; //who sent something we need fixed qualifier
senderID = splitMessage[6]; //who sent something we need fixed alias
ISA = splitMessage[13]; // Control number
testIndicator = splitMessage[15];
dateStamp = splitMessage[9];
timeStamp = splitMessage[10];
... do stuff with the pieces of info ...

Resources