Why is Ejabberd handling PUT and POST requests differently? - erlang

This seems to cause strange behaviour when PUT has a body larger than a certain length (in my case it is 902 bytes), i.e. ejabberd trims the body (in my case it receives malformed JSON).
Github reference: https://github.com/processone/ejabberd/blob/master/src/ejabberd_http.erl#L403
If I change the case statement to:
case Method of
_AnyMethod ->
case recv_data(State) of
{ok, Data} ->
LQuery = case catch parse_urlencoded(Data) of
{'EXIT', _Reason} -> [];
LQ -> LQ
end,
{State, {LPath, LQuery, Data}};
error ->
{State, false}
end
end
then the body is parsed correctly.
Is this a configuration issue? How can I force Ejabberd to correctly parse the JSON body?

Looks like you've found a bug.
As you've noticed, for POST requests the function recv_data is called, which checks the Content-Length header and reads that many bytes from the socket. For PUT requests however, it only uses Trail, which is the data that has been already received while reading HTTP request headers. (This happens in the receive_headers function, which sends a length of 0 to the recv function, meaning that it won't wait for any specific amount of data.)
How much of the request body is received is going to depend on the size of the headers, as well as the way the client sends the request. If for example the client first sends the headers in one network packet, and then the request body in the next network packet, ejabberd wouldn't pick up the request body at all.

Related

TIdHTTP - Get only Responsecode

I am using the TIdHTTP component and it's GET function.
The GET function sends a complete request, which is fine.
However I would like to spare/save some traffic from a GET response and only want to receive the Responsecode which is in the first "line" of a HTTP response.
Is there a possibility of disconnecting the connection in order to save traffic from any further content?
As mentioned, I only need the responsecode from a website.
I alternatively thought about using Indy's TCP component (with SSL IOHandler) and craft an own HTTP Request Header and then receive the responsecode and disconnect on success - but I don't know how to do that.
TIdHTTP has an OnHeadersAvailable event that is intended for this very task. It is triggered after the response headers have been read and before the body content is read, if any. It has a VContinue output parameter that you can set to False to cancel any further reading.
Update: Something I just discovered: When setting VContinue=False in the OnHeadersAvailable event, TIdHTTP will set Response.KeepAlive=False and skip reading the response body (OK so far), but after the response is done being processed, TIdHTTP checks the KeepAlive value, and the property getter returns True if the socket hasn't been closed on the server's end (HTTP 1.1 uses keep-alives by default). This causes TIdHTTP to not close its end of the socket, and will leave any response body unread. If you then re-use the same TIdHTTP object for a new HTTP request, it will end up processing any unread body data from the previous response before it sees thee response headers of the new request.
You can work around this issue by setting the Request.Connection property to 'close' before calling TIdHTTP.Get(). That tells the server to close its end of the socket connection after sending the response (although, I just found that when requesting an HTTPS url, especially after an HTTP request directs to HTTPS, TIdHTTP clears the Request.Connection value!). Or, simply call TIdHTTP.Disconnect() after TIdHTTP.Get() exits.
I have now updated TIdHTTP to:
no longer clear the Request.Connection when preparing an HTTPS request.
close its end of the socket connection if either:
OnHeadersAvailable returns VContinue=False
the Request.Connection property (or, if connected to a proxy, the Request.ProxyConnection property) has been set to 'close', regardless of the server's response.
Usually you would use TIdHttp.Head, because HEAD requests are intended for doing just that.
If the server does not accept HEAD requests like in OP's case, you can assign the OnWorkBegin event of your TIdHttp instance, and call TIdHttp(Sender).Disconnect; there. This immediately closes the connection, the download does not continue, but you still have the meta data like response code, content length etc.

HTTP 100 Continue response CAN have a message body?

I am writing a HTTP Proxy in Delphi 6 using Synapse library.
I know that a regular response has the following syntax:
A Status-line
Zero or more header (General|Response|Entity) fields followed by CRLF
An empty line indicating the end of the header fields
Optionally a message-body
But 100 Continue is not a regular one, is just a inter-response that tells the client to continue and must be followed by a final regular response.
So, should I expect a body in a 100 Continue response?
No, 1xx status responses must not have a body. See http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-26.html#rfc.section.3.3.p.5:
"The presence of a message body in a response depends on both the request method to which it is responding and the response status code (Section 3.1.2). Responses to the HEAD request method (Section 4.3.2 of [Part2]) never include a message body because the associated response header fields (e.g., Transfer-Encoding, Content-Length, etc.), if present, indicate only what their values would have been if the request method had been GET (Section 4.3.1 of [Part2]). 2xx (Successful) responses to a CONNECT request method (Section 4.3.6 of [Part2]) switch to tunnel mode instead of having a message body. All 1xx (Informational), 204 (No Content), and 304 (Not Modified) responses do not include a message body. All other responses do include a message body, although the body might be of zero length."

Can't get request body in onresponse hook

I want to log all requests along with responses to db. I'm using hooks for that. But it looks like I can't get request body in 'onresponse' hook, it's always <<>>. In 'onrequest' hook I can get request body.
My hooks defined as:
request_hook(Req) ->
%% All is OK: ReqBody contains what I sent:
{ok, ReqBody, Req2} = cowboy_req:body(Req),
io:format("request_hook: body = ~p", [ReqBody]),
Req2.
response_hook(_Status, _Headers, _Body, Req) ->
%% ReqBody is always <<>> at this point. Why?
{ok, ReqBody, Req2} = cowboy_req:body(Req),
io:format("response_hook: body = ~p", [ReqBody]),
Req2.
Is this a bug in cowboy or normal behaviour?
I'm using the latest cowboy available at the time of writing this post (commit: aab63d605c595d8d0cd33646d13942d6cb372b60).
The latest version of Cowboy (as I know from v0.8.2) use following approach to increase performance - cowboy_req:body(Req) return Body and NewReq structure without request body. In other word it is a normal behaviour and you able to retrieve request body only once.
Cowboy does not receive request body as it can be huge, body placed in socket until it became necessary (until cowboy_req:body/1 call).
Also after you retrieve body, it become not available in handler.
So if you want to implement logging and make body available in handler, you can save body on request to shared location and explicitly remove it on response.
request_hook(Req) ->
%% limit max body length for security reasons
%% here we expects that body less than 80000 bytes
{ok, Body, Req2} = cowboy_req:body(80000, Req),
put(req_body, Body), %% put body to process dict
Req2.
response_hook(RespCode, RespHeaders, RespBody, Req) ->
ReqBody = get(req_body),
Req2.
%% Need to cleanup body record in proc dict
%% since cowboy uses one process per several
%% requests in keepalive mode
terminate(_Reason, _Req, _St) ->
put(req_body, undefined),
ok.

Twitter stream API - Erlang client

I'm very new in Erlang world and I'm trying to write a client for the Twitter Stream API. I'm using httpc:request to make a POST request and I constantly get 401 error, I'm obviously doing something wrong with how I'm sending the request... What I have looks like this:
fetch_data() ->
Method = post,
URL = "https://stream.twitter.com/1.1/statuses/filter.json",
Headers = "Authorization: OAuth oauth_consumer_key=\"XXX\", oauth_nonce=\"XXX\", oauth_signature=\"XXX%3D\", oauth_signature_method=\"HMAC-SHA1\", oauth_timestamp=\"XXX\", oauth_token=\"XXX-XXXXX\", oauth_version=\"1.0\"",
ContentType = "application/json",
Body = "{\"track\":\"keyword\"}",
HTTPOptions = [],
Options = [],
R = httpc:request(Method, {URL, Headers, ContentType, Body}, HTTPOptions, Options),
R.
At this point I'm confident there's no issue with the signature as the same signature works just fine when trying to access the API with curl. I'm guessing there's some issue with how I'm making the request.
The response I'm getting with the request made the way demonstrated above is:
{ok,{{"HTTP/1.1",401,"Unauthorized"},
[{"cache-control","must-revalidate,no-cache,no-store"},
{"connection","close"},
{"www-authenticate","Basic realm=\"Firehose\""},
{"content-length","1243"},
{"content-type","text/html"}],
"<html>\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"/>\n<title>Error 401 Unauthorized</title>\n</head>\n<body>\n<h2>HTTP ERROR: 401</h2>\n<p>Problem accessing '/1.1/statuses/filter.json'. Reason:\n<pre> Unauthorized</pre>\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n</body>\n</html>\n"}}
When trying with curl I'm using this:
curl --request 'POST' 'https://stream.twitter.com/1.1/statuses/filter.json' --data 'track=keyword' --header 'Authorization: OAuth oauth_consumer_key="XXX", oauth_nonce="XXX", oauth_signature="XXX%3D", oauth_signature_method="HMAC-SHA1", oauth_timestamp="XXX", oauth_token="XXX-XXXX", oauth_version="1.0"' --verbose
and I'm getting the events just fine.
Any help on this would be greatly appreciated, new with Erlang and I've been pulling my hair out on this one for quite a while.
There are several issues with your code:
In Erlang you are encoding parameters as a JSON body while with curl, you are encoding them as form data (application/x-www-form-urlencoded). Twitter API expects the latter. In fact, you get a 401 because the OAuth signature does not match, as you included the track=keyword parameter in the computation while Twitter's server computes it without the JSON body, as it should per OAuth RFC.
You are using httpc with default options. This will not work with the streaming API as the stream never ends. You need to process results as they arrive. For this, you need to pass {sync, false} option to httpc. See also stream and receiver options.
Eventually, while httpc can work initially to access Twitter streaming API, it brings little value to the code you need to develop around it to stream from Twitter API. Depending on your needs you might want to replace it a simple client directly built on ssl, especially considering it can decode HTTP packets (what is left for you is the HTTP chunk encoding).
For example, if your keywords are rare, you might get a timeout from httpc. Besides, it might be easier to update the list of keywords or your code with no downtime without httpc.
A streaming client directly based on ssl could be implemented as a gen_server (or a simple process, if you do not follow OTP principles) or even better a gen_fsm to implement reconnection strategies. You could proceed as follows:
Connect using ssl:connect/3,4 specifying that you want the socket to decode the HTTP packets with {packet, http_bin} and you want the socket to be configured in passive mode {active, false}.
Send the HTTP request packet (preferably as an iolist, with binaries) with ssl:send/2,3. It shall spread on several lines separated with CRLF (\r\n), with first the query line (GET /1.1/statuses/filter.json?... HTTP/1.1) and then the headers including the OAuth headers. Make sure you include Host: stream.twitter.com as well. End with an empty line.
Receive the HTTP response. You can implement this with a loop (since the socket is in passive mode), calling ssl:recv/2,3 until you get http_eoh (end of headers). Note down whether the server will send you data chunked or not by looking at the Transfer-Encoding response header.
Configure the socket in active mode with ssl:setopts/2 and specify you want packets as raw and data in binary format. In fact, if data is chunked, you could continue to use the socket in passive mode. You could also get data line by line or get data as strings. This is a matter of taste: raw is the safest bet, line by line requires that you check the buffer size to prevent truncation of a long JSON-encoded tweet.
Receive data from Twitter as messages sent to your process, either with receive (simple process) or in handle_info handler (if you implemented this with a gen_server). If data is chunked, you shall first receive the chunk size, then the tweets and the end of the chunk eventually (cf RFC 2616). Be prepared to have tweets that spread on several chunks (i.e. maintain some kind of buffer). The best here is to do the minimum decoding in this process and send tweets to another process, possibly in binary format.
You should also handle errors and socket being closed by Twitter. Make sure you follow Twitter's guidelines for reconnection.

Streaming Results from Mochiweb

I have written a web-service using Erlang and Mochiweb. The web service returns a lot of results and takes some time to finish the computation.
I'd like to return results as soon as the program finds it, instead of returning them when it found them all.
edit:
i found that i can use a chunked request to stream result, but seems that i can't find a way to close the connection. so any idea on how to close a mochiweb request?
To stream data of yet unknown size with HTTP 1.1 you can use HTPP chunked transfer encoding. In this encoding each chunk of data prepended by its size in hexadecimal. Last chunk is a zero-length chunk, with the chunk size coded as 0, but without any data.
If client doesn't support HTTP 1.1 server can send data as binary chunks and close connection at the end of the stream.
In MochiWeb it's all works as following:
HTTP response should be started with Response = Request:respond({Code, ResponseHeaders, chunked}) function. (By the way, look at the code comments);
Then chunks can be send to client with Response:write_chunk(Data) function. To indicate client the end of the stream chunk of zero length should be sent: Response:write_chunk(<<>>).
When handling of current request is over MochiWeb decides should connection be closed or can be reused by HTTP persistent connection.

Resources