ElasticMQ does not reject illegal string as epected - amazon-sqs

I have been using ElasticMQ, it is supposed to behave like AWS SQS.
According to AWS SQS online documentation, any illegal strings that are out of this range #x9 | #xA | #xD | #x20 to #xD7FF | #xE000 to #xFFFD | #x10000 to #x10FFFF, will be rejected.
But when I created an illegal string (only consists of one illegal char), and sent it to ElasticMQ, it was not rejected.
ElasticMQ did behave strangely though: for this illegal string (one char), the MD5 checksum calculated on the client side was very different from the MD5 checksum returned by ElasticMQ.
Can anyone verify if ElasticMQ is able to reject illegal strings?

Related

JFlex maximum read length

Given a positional language like the old IBM RPG, we can have a line such as
CCCCCDIDENTIFIER E S 10
Where characters
1-5: comment
6: specification type
7-21: identifier name
...And so on
Now, given that JFlex is based on RegExp, we would have a RegExp such as:
[a-zA-Z][a-zA-Z0-9]{0,14} {0,14}
for the identifier name token.
This RegExp however can match tokens longer than the 15 characters possible for identifier name, requiring yypushbacks.
Thus, is there a way to limit how many characters JFlex reads for a particular token?
Regular expression based lexical analysis is really not the right tool to parse fixed-field inputs. You can just split the input into fields at the known character positions, which is way easier and a lot faster. And it doesn't require fussing with regular expressions.
Anyway, [a-zA-Z][a-zA-Z0-9]{0,14}[ ]{0,14} wouldn't be the right expression even if it did properly handle the token length, since the token is really the word at the beginning, without space characters.
In the case of fixed-length fields which contain something more complicated than a single identifier, you might want to feed the field into a lexer, using a StringReader or some such.
Although I'm sure it's not useful, here's a regular expression which matches 15 characters which start with a word and are completed with spaces:
[a-zA-Z][ ]{14} |
[a-zA-Z][a-zA-Z0-9][ ]{13} |
[a-zA-Z][a-zA-Z0-9]{2}[ ]{12} |
[a-zA-Z][a-zA-Z0-9]{3}[ ]{11} |
[a-zA-Z][a-zA-Z0-9]{4}[ ]{10} |
[a-zA-Z][a-zA-Z0-9]{5}[ ]{9} |
[a-zA-Z][a-zA-Z0-9]{6}[ ]{8} |
[a-zA-Z][a-zA-Z0-9]{7}[ ]{7} |
[a-zA-Z][a-zA-Z0-9]{8}[ ]{6} |
[a-zA-Z][a-zA-Z0-9]{9}[ ]{5} |
[a-zA-Z][a-zA-Z0-9]{10}[ ]{4} |
[a-zA-Z][a-zA-Z0-9]{11}[ ]{3} |
[a-zA-Z][a-zA-Z0-9]{12}[ ]{2} |
[a-zA-Z][a-zA-Z0-9]{13}[ ] |
[a-zA-Z][a-zA-Z0-9]{14}
(That might have to be put on one very long line.)

freeradius 3.0.13 used custome attribute reply

I am configuring Freeradius.
It is authentication reject when i try to use custom attribute register in radreply table.
This is my test operation and error case.
Ⅰ. Insert record to radreply table
> insert into radreply values(null,"user1","Custom-TEST1",":=","56789");
> select * from radreply;
+----+----------+---------------+----+-----------+
| id | username | attribute | op | value |
+----+----------+---------------+----+-----------+
| 1 | user1 | Custom-NET01 | := | 12345 |
+----+----------+---------------+----+-----------+
Ⅱ. Configure dictionary file
$ vi /etc/raddb/dictionary
  ATTRIBUTE Custom-TEST1 3000 integer
I checked by debbug mode. Then i can see this messages.
(0) sql: ERROR: Error parsing value: Unknown or invalid value "10.0.0.1" for attribute Custom-TEST1
(0) sql: ERROR: Error parsing user data from database result
(0) sql: ERROR: SQL query error getting reply attributes
Please tell me how to solve this error.
The issue here is you've defined your custom attribute as an integer (which in this case means a 32bit unsigned integer). You're then later, trying to assign an IPv4 address to this integer, which is invalid.
If you want to assign an ipv4 address to your attribute, it needs to be defined as an ipaddr type.

YQL | why I'm getting a syntax error?

I'm trying to query a simple Google search using YQL but apparently it seems not working. Here's my exact query
http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%27https%3A%2F%2Fwww.google.com/search?q=Google+Guice&ie=utf-8%27%0A&format=json
And the error is
{"error":{"lang":"en-US","description":"Query syntax error(s) [line 1:74 mismatched character ' ' expecting ''']"}}
The error is pointing to line 1:74 which is near 20where. This is also the encoded version of the URL, and it is difficult for me to exactly understand where the error is.
Here's your url:
http://query.yahooapis.com/v1/public/yql?
q=select%20*%20from%20html%20where%20url%3D%27https%3A%2F%2F
www.google.com/search?q=Google+Guice&ie=utf-8%27%0A&format=json
The URL query parts are separated into the following (separated by &):
+--------+---------------------------------------------------+
| q | select%20*%20from%20html%20where%20url%3D%27https |
| | %3A%2F%2Fwww.google.com/search?q=Google+Guice |
+--------+---------------------------------------------------+
| ie | utf-8%27%0A |
+--------+---------------------------------------------------+
| format | json |
+--------+---------------------------------------------------+
As you can see, YQL is not receiving the full query string as you wanted it to. This is because the & character that should be part of the query string has not been url-encoded to %26.
The URL should look like …Guice%26ie=utf….
Aside: There are a few other issues that you are going to face. The first is that the Google search URL embedded into the query is malformed since it will contain a literal space character between Google and Guice, which Google does not accept. Secondly, the URL is restricted by Google's robots.txt so even if the URL is fixed, you won't be able to get any results from there.

URL encoding the space character: + or %20?

When is a space in a URL encoded to +, and when is it encoded to %20?
From Wikipedia (emphasis and link added):
When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email. The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20". The MIME type of data encoded this way is application/x-www-form-urlencoded, and it is currently defined (still in a very outdated manner) in the HTML and XForms specifications.
So, the real percent encoding uses %20 while form data in URLs is in a modified form that uses +. So you're most likely to only see + in URLs in the query string after an ?.
This confusion is because URLs are still 'broken' to this day.
From a blog post:
Take "http://www.google.com" for instance. This is a URL. A URL is a Uniform Resource Locator and is really a pointer to a web page (in most cases). URLs actually have a very well-defined structure since the first specification in 1994.
We can extract detailed information about the "http://www.google.com" URL:
+---------------+-------------------+
| Part | Data |
+---------------+-------------------+
| Scheme | http |
| Host | www.google.com |
+---------------+-------------------+
If we look at a more complex URL such as:
"https://bob:bobby#www.lunatech.com:8080/file;p=1?q=2#third"
we can extract the following information:
+-------------------+---------------------+
| Part | Data |
+-------------------+---------------------+
| Scheme | https |
| User | bob |
| Password | bobby |
| Host | www.lunatech.com |
| Port | 8080 |
| Path | /file;p=1 |
| Path parameter | p=1 |
| Query | q=2 |
| Fragment | third |
+-------------------+---------------------+
https://bob:bobby#www.lunatech.com:8080/file;p=1?q=2#third
\___/ \_/ \___/ \______________/ \__/\_______/ \_/ \___/
| | | | | | \_/ | |
Scheme User Password Host Port Path | | Fragment
\_____________________________/ | Query
| Path parameter
Authority
The reserved characters are different for each part.
For HTTP URLs, a space in a path fragment part has to be encoded to "%20" (not, absolutely not "+"), while the "+" character in the path fragment part can be left unencoded.
Now in the query part, spaces may be encoded to either "+" (for backwards compatibility: do not try to search for it in the URI standard) or "%20" while the "+" character (as a result of this ambiguity) has to be escaped to "%2B".
This means that the "blue+light blue" string has to be encoded differently in the path and query parts:
"http://example.com/blue+light%20blue?blue%2Blight+blue".
From there you can deduce that encoding a fully constructed URL is impossible without a syntactical awareness of the URL structure.
This boils down to:
You should have %20 before the ? and + after.
Source
I would recommend %20.
Are you hard-coding them?
This is not very consistent across languages, though.
If I'm not mistaken, in PHP urlencode() treats spaces as + whereas Python's urlencode() treats them as %20.
EDIT:
It seems I'm mistaken. Python's urlencode() (at least in 2.7.2) uses quote_plus() instead of quote() and thus encodes spaces as "+".
It seems also that the W3C recommendation is the "+" as per here: http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1
And in fact, you can follow this interesting debate on Python's own issue tracker about what to use to encode spaces: http://bugs.python.org/issue13866.
EDIT #2:
I understand that the most common way of encoding " " is as "+", but just a note, it may be just me, but I find this a bit confusing:
import urllib
print(urllib.urlencode({' ' : '+ '})
>>> '+=%2B+'
A space may only be encoded to "+" in the "application/x-www-form-urlencoded" content-type key-value pairs query part of an URL. In my opinion, this is a may, not a must. In the rest of URLs, it is encoded as %20.
In my opinion, it's better to always encode spaces as %20, not as "+", even in the query part of an URL, because it is the HTML specification (RFC 1866) that specified that space characters should be encoded as "+" in "application/x-www-form-urlencoded" content-type key-value pairs (see paragraph 8.2.1. subparagraph 1.)
This way of encoding form data is also given in later HTML specifications. For example, look for relevant paragraphs about application/x-www-form-urlencoded in HTML 4.01 Specification, and so on.
Here is a sample string in a URL where the HTML specification allows encoding spaces as pluses: "http://example.com/over/there?name=foo+bar". So, only after "?", spaces can be replaced by pluses. In other cases, spaces should be encoded to %20. But since it's hard to determine the context correctly, it's the best practice to never encode spaces as "+".
I would recommend to percent-encode all character except "unreserved" defined in RFC 3986, p.2.3
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
The implementation depends on the programming language that you chose.
If your URL contains national characters, first encode them to UTF-8 and then percent-encode the result.
To summarize the (somewhat conflicting) answers here, I think it can be boiled down to:
| standard | + | %20 |
|---------------+-----+-----|
| URL | no | yes |
| query string | yes | yes |
| form params | yes | no |
| mailto query | no | yes |
So historically I think what happened is:
The RFC specified a pretty clear standard about the form of URLs and how they are encoded. In this context the query is just a "string", there is no specification how key/value pairs should be encoded
The HTTP guys put out a standard of how key/value pairs are to be encoded in form params, and borrowed from the URL encoding standard, except that spaces should be encoded as +.
The web guys said: cool we have a way to encode key/value pairs let's put that into the URL query string
Result: We end up with two different ways how to encode spaces in a URL depending on which part you're talking about. But it doesn't even violate the URL standard. From URL perspective the "query" is just a blackbox. If you want to use other encodings there besides percent encoding: knock yourself out.
But as the email example shows it can be problematic to borrow from the form-params implementation for an URL query string. So ultimately using %20 is safer but there might not be out of the box library support for it.

In a URL, should spaces be encoded using %20 or +? [duplicate]

This question already has answers here:
URL encoding the space character: + or %20?
(5 answers)
Closed 6 years ago.
In a URL, should I encode the spaces using %20 or +? For example, in the following example, which one is correct?
www.mydomain.com?type=xbox%20360
www.mydomain.com?type=xbox+360
Our company is leaning to the former, but using the Java method URLEncoder.encode(String, String) with "xbox 360" (and "UTF-8") returns the latter.
So, what's the difference?
Form data (for GET or POST) is usually encoded as application/x-www-form-urlencoded: this specifies + for spaces.
URLs are encoded as RFC 1738 which specifies %20.
In theory I think you should have %20 before the ? and + after:
example.com/foo%20bar?foo+bar
According to the W3C (and they are the official source on these things), a space character in the query string (and in the query string only) may be encoded as either "%20" or "+". From the section "Query strings" under "Recommendations":
Within the query string, the plus sign is reserved as shorthand notation for a space. Therefore, real plus signs must be encoded. This method was used to make query URIs easier to pass in systems which did not allow spaces.
According to section 3.4 of RFC2396 which is the official specification on URIs in general, the "query" component is URL-dependent:
3.4. Query Component
The query component is a string of information to be interpreted by
the resource.
query = *uric
Within a query component, the characters ";", "/", "?", ":", "#",
"&", "=", "+", ",", and "$" are reserved.
It is therefore a bug in the other software if it does not accept URLs with spaces in the query string encoded as "+" characters.
As for the third part of your question, one way (though slightly ugly) to fix the output from URLEncoder.encode() is to then call replaceAll("\\+","%20") on the return value.
This confusion is because URL is still 'broken' to this day
Take "http://www.google.com" for instance. This is a URL. A URL
is a Uniform Resource Locator and is really a pointer to a web page
(in most cases). URLs actually have a very well-defined structure
since the first specification in 1994.
We can extract detailed information about the "http://www.google.com"
URL:
+---------------+-------------------+
| Part | Data |
+---------------+-------------------+
| Scheme | http |
| Host address | www.google.com |
+---------------+-------------------+
If we look at a more
complex URL such as
"https://bob:bobby#www.lunatech.com:8080/file;p=1?q=2#third" we can
extract the following information:
+-------------------+---------------------+
| Part | Data |
+-------------------+---------------------+
| Scheme | https |
| User | bob |
| Password | bobby |
| Host address | www.lunatech.com |
| Port | 8080 |
| Path | /file |
| Path parameters | p=1 |
| Query parameters | q=2 |
| Fragment | third |
+-------------------+---------------------+
The reserved characters are different for each part
For HTTP URLs, a space in a path fragment part has to be encoded to
"%20" (not, absolutely not "+"), while the "+" character in the path
fragment part can be left unencoded.
Now in the query part, spaces may be encoded to either "+" (for
backwards compatibility: do not try to search for it in the URI
standard) or "%20" while the "+" character (as a result of this
ambiguity) has to be escaped to "%2B".
This means that the "blue+light blue" string has to be encoded
differently in the path and query parts:
"http://example.com/blue+light%20blue?blue%2Blight+blue". From there
you can deduce that encoding a fully constructed URL is impossible
without a syntactical awareness of the URL structure.
What this boils down to is
you should have %20 before the ? and + after
Source
It shouldn't matter, any more than if you encoded the letter A as %41.
However, if you're dealing with a system that doesn't recognize one form, it seems like you're just going to have to give it what it expects regardless of what the "spec" says.
You can use either - which means most people opt for "+" as it's more human readable.
When encoding query values, either form, plus or percent-20, is valid; however, since the bandwidth of the internet isn't infinite, you should use plus, since it's two fewer bytes.

Resources