Writing a small validator and parser of string representations of IPv4, I wonder if there are IP-Representations like 127.000.000.001 out there? I know this is strange but appears to be legal IP-Representations after all.
Yes. It can happen here's an example.
Problem is, leading zeroes in an IP can sometimes mean that the IP is written in octal.
Related
RFC-2616 says that method names are case sensitive.
Trying to simplify the parser routing I'm writing, I've got a question. What could happen if I'll treat these names case insensitive?
There are some statements in the standard, that say that programs SHOULD be tolerant. As far as I can see, this is the case for tolerance.
One more question I have, is about leading and trailing spaces and tabs where the standard forbids it. For example, inside the Request-Line only spaces allowed.
What if my parser will allow tabs as separators? What about leading spaces before the Request-Line?
One rule of thumb says: Be conservative in what you do, be liberal in what you accept from others.
So go for it, be as tolerant as you can as long as the input intent is clear, and if it simplifies your parser it's even better.
1) RFC 2616 is obsolete. You should be looking at RFC 7230.
2) If you treat method names case-insensitively, you'll fail once there are two different names that are the same when compared case-insensitively. Unlikely? Yes. Impossible? No.
3) WRT request line parsing: there's absolutely no point in being "liberal" here. In the best case, you'll accept requests that never are made. In the worst case, you'll introduce security holes because you don't know what you're doing.
My application is parsing incoming emails. I try to parse them as best as possible but every now and then I get one with puzzling content. This time is an email that looks to be in ASCII but the specified charset is: ansi_x3.110-1983.
My application handles it correctly by defaulting to ASCII, but it throws a warning which I'd like to stop receiving, so my question is: what is ansi_x3.110-1983 and what should I do with it?
According to this page on the IANA's site, ANSI_X3.110-1983 is also known as:
iso-ir-99
CSA_T500-1983
NAPLPS
csISO99NAPLPS
Of those, only the name NAPLPS seems interesting or informative. If you can, consider getting in touch with the people sending those mails. If they're really using Prodigy in this day and age, I'd be amazed.
The IANA site also has a pointer to RFC 1345, which contains a description of the bytes and the characters that they map to. Compared to ISO-8859-1, the control characters are the same, as are most of the punctuation, all of the numbers and letters, and most of the remaining characters in the first 7 bits.
You could possibly use the guide in the RFC to write a tool to map the characters over, if someone hasn't written a tool for it already. To be honest, it may be easier to simply ignore the whines about the weird character set given that the character mapping is close enough to what is expected anyway...
As already pointed out in the topic, I got the following error:
Character #\u009C cannot be represented in the character set CHARSET:CP1252
trying to print out a string given back by drakma:http-request, as far as I understand the error-code the problem is that the windows-encoding (CP1252) does not support this character.
Therefore to be able to process it, I might/must convert the whole string.
My question is what package/library does support converting strings to certain character-sets efficiently?
An alike question is this one, but just ignoring the error would not help in my case.
Drakma already does the job of "converting strings": after all, when it reads from some random webserver, it just gets a stream of bytes. It then has to convert that to a lisp string. You probably want to bind *drakma-default-external-format* to something else, although I can't remember off-hand what the allowable values are. Maybe something like :utf-8?
I'm having a strange problem that is affecting at least some of my international users of my Delphi 6 application. Here's the scenario:
My program requests status reports periodically from an external device that acts as an HTTP server.
The device sends back the status report as a response document that has a series fields delimited with the pipe character in name value pair format (e.g. - field1=-0.437).
I split the report string into the fields and then again to get each field name and numeric value.
I use StrToFloat() to convert the floating point field values in string format and assign the result of that function to a Variant variable.
This works fine on most PCs, but some of my international users are getting EConvertError's when I try to use StrToFloat() on the numeric values. Here's a concrete example of an error message from my logs:
EConvertError: '-0.685' is not a valid floating point value
As you can see -0.685 is a valid floating point number, yet I am getting the EConvertError Exception. Normally I would expect to see a comma where the decimal point is, or some other locale specific punctuation problem, but the number appears fine in this case. Also, to the best of my knowledge the external device does not even have the option to set the character set.
So what subtle nuance about Delphi 6 and international character sets might be causing this problem, perhaps related to the user's Windows XP/Win7 character settings? Note, I use standard Delphi 6 "string" cast strings throughout my program so I don't see how a multi-byte character set issue could be the root cause. Has anyone had this problem and knows what to do about it?
Your remote user's machine is expecting , for the decimal separator. When it encounters . the EConvertError exception is raised. On a machine which expects , as the decimal separator (e.g. most European and South American countries) -0.685 is indeed not a valid floating point value.
Normally I would expect to see a comma where the decimal point is, or some other locale specific punctuation problem, but the number appears fine in this case.
Your current problem is just the flip-side of the above issue. Normally, because your locale uses . as the separator, you are accustomed to seeing problems when data with , is used instead. Put yourselves in the position of somebody from a country which uses , as a separator. For them, they will be accustomed to seeing exceptions when data with . is used.
You could solve the problem by normalising the input to use the same decimal separator as the the machine locale. On a modern Delphi you could solve the problem by use the StrToFloat overload that receives a TFormatSettings parameter and explicitly specify that . is to be used as the decimal separator for this conversion. Unfortunately that facility is not available in Delphi 6.
I faced this issue for Belgian users. I also had to manually replace the '.' or ',' in the input data.
Also, if you are inserting the data into the database (sql) then, you will have to replace the ',' with '.' during insertion of the data into the database.
why use - instead off _ in url?
Url contain '_' seems like no bad effects.
Underscores are not allowed in a host name. Thus some_place.com is not a valid URL because the host name is not valid. Underscores are permissible in URLS. Thus some-place.com/which_place/ is perfectly legitimate, other concerns aside.
From RFC 1738:
host
[...] Fully qualified domain names take the form as described
in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123
[5]: a sequence of domain labels separated by ".", each domain
label starting and ending with an alphanumerical character and
possibly also containing "-" characters. The rightmost domain
label will never start with a digit, though, which
syntactically distinguishes all domain names from the IP
addresses.
When you read a_long_sentence_with_many_underscores, because you are reading it by letter or word recognition, your eye tracks along the middle of the line, but when you reach an underscore, your eye is more likely to track down a bit and back up for the next word.
When you read a-long-sentence-with-many-dashes, your eye keeps tracking along the same horizon, and by sight, it is easier for your brain to try and ignore them.
Another good reason is that Google and other search engines rank urls that match to search terms higher when the word separator is a dash.
One main reason is that most anchor tags have text-decoration:underline which effectively hides your underscore.
And, a non-tech savvy user wont automatically assume that there is an underscore :)
By the way... it seems several Java network libraries will not be able to interpret a URL correctly when using underscore:
URI uri = URI.create("http://www.google-plus.com/");
System.out.println(uri.getHost()); // prints www.google-plus.com
URI uri = URI.create("http://www.google_plus.com/");
System.out.println(uri.getHost()); // prints null
It's easier to type (at least on my german keyboard) and see.