Twitter stream API - Erlang client - twitter

I'm very new in Erlang world and I'm trying to write a client for the Twitter Stream API. I'm using httpc:request to make a POST request and I constantly get 401 error, I'm obviously doing something wrong with how I'm sending the request... What I have looks like this:
fetch_data() ->
Method = post,
URL = "https://stream.twitter.com/1.1/statuses/filter.json",
Headers = "Authorization: OAuth oauth_consumer_key=\"XXX\", oauth_nonce=\"XXX\", oauth_signature=\"XXX%3D\", oauth_signature_method=\"HMAC-SHA1\", oauth_timestamp=\"XXX\", oauth_token=\"XXX-XXXXX\", oauth_version=\"1.0\"",
ContentType = "application/json",
Body = "{\"track\":\"keyword\"}",
HTTPOptions = [],
Options = [],
R = httpc:request(Method, {URL, Headers, ContentType, Body}, HTTPOptions, Options),
R.
At this point I'm confident there's no issue with the signature as the same signature works just fine when trying to access the API with curl. I'm guessing there's some issue with how I'm making the request.
The response I'm getting with the request made the way demonstrated above is:
{ok,{{"HTTP/1.1",401,"Unauthorized"},
[{"cache-control","must-revalidate,no-cache,no-store"},
{"connection","close"},
{"www-authenticate","Basic realm=\"Firehose\""},
{"content-length","1243"},
{"content-type","text/html"}],
"<html>\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"/>\n<title>Error 401 Unauthorized</title>\n</head>\n<body>\n<h2>HTTP ERROR: 401</h2>\n<p>Problem accessing '/1.1/statuses/filter.json'. Reason:\n<pre> Unauthorized</pre>\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n</body>\n</html>\n"}}
When trying with curl I'm using this:
curl --request 'POST' 'https://stream.twitter.com/1.1/statuses/filter.json' --data 'track=keyword' --header 'Authorization: OAuth oauth_consumer_key="XXX", oauth_nonce="XXX", oauth_signature="XXX%3D", oauth_signature_method="HMAC-SHA1", oauth_timestamp="XXX", oauth_token="XXX-XXXX", oauth_version="1.0"' --verbose
and I'm getting the events just fine.
Any help on this would be greatly appreciated, new with Erlang and I've been pulling my hair out on this one for quite a while.

There are several issues with your code:
In Erlang you are encoding parameters as a JSON body while with curl, you are encoding them as form data (application/x-www-form-urlencoded). Twitter API expects the latter. In fact, you get a 401 because the OAuth signature does not match, as you included the track=keyword parameter in the computation while Twitter's server computes it without the JSON body, as it should per OAuth RFC.
You are using httpc with default options. This will not work with the streaming API as the stream never ends. You need to process results as they arrive. For this, you need to pass {sync, false} option to httpc. See also stream and receiver options.
Eventually, while httpc can work initially to access Twitter streaming API, it brings little value to the code you need to develop around it to stream from Twitter API. Depending on your needs you might want to replace it a simple client directly built on ssl, especially considering it can decode HTTP packets (what is left for you is the HTTP chunk encoding).
For example, if your keywords are rare, you might get a timeout from httpc. Besides, it might be easier to update the list of keywords or your code with no downtime without httpc.
A streaming client directly based on ssl could be implemented as a gen_server (or a simple process, if you do not follow OTP principles) or even better a gen_fsm to implement reconnection strategies. You could proceed as follows:
Connect using ssl:connect/3,4 specifying that you want the socket to decode the HTTP packets with {packet, http_bin} and you want the socket to be configured in passive mode {active, false}.
Send the HTTP request packet (preferably as an iolist, with binaries) with ssl:send/2,3. It shall spread on several lines separated with CRLF (\r\n), with first the query line (GET /1.1/statuses/filter.json?... HTTP/1.1) and then the headers including the OAuth headers. Make sure you include Host: stream.twitter.com as well. End with an empty line.
Receive the HTTP response. You can implement this with a loop (since the socket is in passive mode), calling ssl:recv/2,3 until you get http_eoh (end of headers). Note down whether the server will send you data chunked or not by looking at the Transfer-Encoding response header.
Configure the socket in active mode with ssl:setopts/2 and specify you want packets as raw and data in binary format. In fact, if data is chunked, you could continue to use the socket in passive mode. You could also get data line by line or get data as strings. This is a matter of taste: raw is the safest bet, line by line requires that you check the buffer size to prevent truncation of a long JSON-encoded tweet.
Receive data from Twitter as messages sent to your process, either with receive (simple process) or in handle_info handler (if you implemented this with a gen_server). If data is chunked, you shall first receive the chunk size, then the tweets and the end of the chunk eventually (cf RFC 2616). Be prepared to have tweets that spread on several chunks (i.e. maintain some kind of buffer). The best here is to do the minimum decoding in this process and send tweets to another process, possibly in binary format.
You should also handle errors and socket being closed by Twitter. Make sure you follow Twitter's guidelines for reconnection.

Related

Why is Ejabberd handling PUT and POST requests differently?

This seems to cause strange behaviour when PUT has a body larger than a certain length (in my case it is 902 bytes), i.e. ejabberd trims the body (in my case it receives malformed JSON).
Github reference: https://github.com/processone/ejabberd/blob/master/src/ejabberd_http.erl#L403
If I change the case statement to:
case Method of
_AnyMethod ->
case recv_data(State) of
{ok, Data} ->
LQuery = case catch parse_urlencoded(Data) of
{'EXIT', _Reason} -> [];
LQ -> LQ
end,
{State, {LPath, LQuery, Data}};
error ->
{State, false}
end
end
then the body is parsed correctly.
Is this a configuration issue? How can I force Ejabberd to correctly parse the JSON body?
Looks like you've found a bug.
As you've noticed, for POST requests the function recv_data is called, which checks the Content-Length header and reads that many bytes from the socket. For PUT requests however, it only uses Trail, which is the data that has been already received while reading HTTP request headers. (This happens in the receive_headers function, which sends a length of 0 to the recv function, meaning that it won't wait for any specific amount of data.)
How much of the request body is received is going to depend on the size of the headers, as well as the way the client sends the request. If for example the client first sends the headers in one network packet, and then the request body in the next network packet, ejabberd wouldn't pick up the request body at all.

iOS / Obj-c - Get HTTP status code response message (reason phrase)

I'm working with an API that sends back HTTP 406's for many different errors, along with a custom message (reason phrase). It may look something like:
406 Not Acceptable: User is already logged in
406 Not Acceptable: Missing password field
406 Not Acceptable: Node does not exist.
I can get the 406 status code and the standard "Not Acceptable" string using:
NSHTTPURLResponse *HTTPResponse = (NSHTTPURLResponse *)response;
NSInteger statusCode = [HTTPResponse statusCode];
[NSHTTPURLResponse localizedStringForStatusCode:HTTPResponse.statusCode];
However I really require the reason phrase message to know how to handle the response. How can I get it, preferably using the standard iOS SDK?
I really require the reason phrase message to know how to handle the response.
Then the API is broken. The reason phrase is a debugging aid only. It's not meant to inform client behaviour.
From RFC 2616 ยง 6.1.1:
The Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason- Phrase.
If there is information about the response that cannot be conveyed by the status code alone, the proper place for it is as a header or in the response body. The reason phrase is not a correct place to put information necessary for a client to use.
Status code 406 means that the server cannot respond with the accept-header specified in the request.
406 Not Acceptable The requested resource is only capable of generating content not acceptable according to the Accept headers sent in the request.
These error codes are not iOS-specific. If you want to display different messages based on different occurrence reasons, I suppose you should check it with your API/web server, and use conditions in your code to display your custom messages for each of them.
Ultimately, you can get the reason phrase using the ASIHTTPRequest library.
http://allseeing-i.com/ASIHTTPRequest/
It was just as simple to use in my case as AFNetworking and NSURLSession.

python: how to fetch an url? (with improper response headers)

I want to build a small script in python which needs to fetch an url. The server is a kind of crappy though and replies pure ASCII without any headers.
When I try:
import urllib.request
response = urllib.request.urlopen(url)
print(response.read())
I obtain a http.client.BadStatusLine: 100 error because this isn't a properly formatted HTTP response.
Is there another way to fetch an url and get the raw content, without trying to parse the response?
Thanks
It's difficult to answer your direct question without a bit more information; not knowing exactly how the (web) server in question is broken.
That said, you might try using something a bit lower-level, a socket for example. Here's one way (python2.x style, and untested):
#!/usr/bin/env python
import socket
from urlparse import urlparse
def geturl(url, timeout=10, receive_buffer=4096):
parsed = urlparse(url)
try:
host, port = parsed.netloc.split(':')
except ValueError:
host, port = parsed.netloc, 80
sock = socket.create_connection((host, port), timeout)
sock.sendall('GET %s HTTP/1.0\n\n' % parsed.path)
response = [sock.recv(receive_buffer)]
while response[-1]:
response.append(sock.recv(receive_buffer))
return ''.join(response)
print geturl('http://www.example.com/') #<- the trailing / is needed if no
other path element is present
And here's a stab at a python3.2 conversion (you may not need to decode from bytes, if writing the response to a file for example):
#!/usr/bin/env python
import socket
from urllib.parse import urlparse
ENCODING = 'ascii'
def geturl(url, timeout=10, receive_buffer=4096):
parsed = urlparse(url)
try:
host, port = parsed.netloc.split(':')
except ValueError:
host, port = parsed.netloc, 80
sock = socket.create_connection((host, port), timeout)
method = 'GET %s HTTP/1.0\n\n' % parsed.path
sock.sendall(bytes(method, ENCODING))
response = [sock.recv(receive_buffer)]
while response[-1]:
response.append(sock.recv(receive_buffer))
return ''.join(r.decode(ENCODING) for r in response)
print(geturl('http://www.example.com/'))
HTH!
Edit: You may need to adjust what you put in the request, depending on the web server in question. Guanidene's excellent answer provides several resources to guide you on that path.
What you need to do in this case is send a raw HTTP request using sockets.
You would need to do a bit of low level network programming using the socket python module in this case. (Network sockets actually return you all the information sent by the server as it as, so you can accordingly interpret the response as you wish. For example, the HTTP protocol interprets the response in terms of standard HTTP headers - GET, POST, HEAD, etc. The high-level module urllib hides this header information from you and just returns you the data.)
You also need to have some basic information about HTTP headers. For your case, you just need to know about the GET HTTP request. See its definition here - http://djce.org.uk/dumprequest, see an example of it here - http://en.wikipedia.org/wiki/HTTP#Example_session. (If you wish to capture live traces of HTTP requests sent from your browser, you would need a packet sniffing software like wireshark.)
Once you know basics about socket module and HTTP headers, you can go through this example - http://coding.debuntu.org/python-socket-simple-tcp-client which tells you how to send a HTTP request over a socket to a server and read its reply back. You can also refer to this unclear question on SO.
(You can google python socket http to get more examples.)
(Tip: I am not a Java fan, but still, if you don't find enough convincing examples on this topic under python, try finding it under Java, and then accordingly translate it to python.)
urllib.urlretrieve('http://google.com/abc.jpg', 'abc.jpg')

Streaming Results from Mochiweb

I have written a web-service using Erlang and Mochiweb. The web service returns a lot of results and takes some time to finish the computation.
I'd like to return results as soon as the program finds it, instead of returning them when it found them all.
edit:
i found that i can use a chunked request to stream result, but seems that i can't find a way to close the connection. so any idea on how to close a mochiweb request?
To stream data of yet unknown size with HTTP 1.1 you can use HTPP chunked transfer encoding. In this encoding each chunk of data prepended by its size in hexadecimal. Last chunk is a zero-length chunk, with the chunk size coded as 0, but without any data.
If client doesn't support HTTP 1.1 server can send data as binary chunks and close connection at the end of the stream.
In MochiWeb it's all works as following:
HTTP response should be started with Response = Request:respond({Code, ResponseHeaders, chunked}) function. (By the way, look at the code comments);
Then chunks can be send to client with Response:write_chunk(Data) function. To indicate client the end of the stream chunk of zero length should be sent: Response:write_chunk(<<>>).
When handling of current request is over MochiWeb decides should connection be closed or can be reused by HTTP persistent connection.

Supporting the "Expect: 100-continue" header with ASP.NET MVC

I'm implementing a REST API using ASP.NET MVC, and a little stumbling block has come up in the form of the Expect: 100-continue request header for requests with a post body.
RFC 2616 states that:
Upon receiving a request which
includes an Expect request-header
field with the "100-continue" expectation, an origin server MUST
either respond with 100 (Continue) status and continue to read
from the input stream, or respond with a final status code. The
origin server MUST NOT wait for the request body before sending
the 100 (Continue) response. If it responds with a final status
code, it MAY close the transport connection or it MAY continue
to read and discard the rest of the request. It MUST NOT
perform the requested method if it returns a final status code.
This sounds to me like I need to make two responses to the request, i.e. it needs to immediately send a HTTP 100 Continue response, and then continue reading from the original request stream (i.e. HttpContext.Request.InputStream) without ending the request, and then finally sending the resultant status code (for the sake of argument, lets say it's a 204 No Content result).
So, questions are:
Am I reading the specification right, that I need to make two responses to a request?
How can this be done in ASP.NET MVC?
w.r.t. (2) I have tried using the following code before proceeding to read the input stream...
HttpContext.Response.StatusCode = 100;
HttpContext.Response.Flush();
HttpContext.Response.Clear();
...but when I try to set the final 204 status code I get the error:
System.Web.HttpException: Server cannot set status after HTTP headers have been sent.
The .NET framework by default always sends the expect: 100-continue header for every HTTP 1.1 post. This behavior can be programmatically controlled per request via the System.Net.ServicePoint.Expect100Continue property like so:
HttpWebRequest httpReq = GetHttpWebRequestForPost();
httpReq.ServicePoint.Expect100Continue = false;
It can also be globally controlled programmatically:
System.Net.ServicePointManager.Expect100Continue = false;
...or globally through configuration:
<system.net>
<settings>
<servicePointManager expect100Continue="false"/>
</settings>
</system.net>
Thank you Lance Olson and Phil Haack for this info.
100-continue should be handled by IIS. Is there a reason why you want to do this explicitly?
IIS handles the 100.
That said, no it's not two responses. In HTTP, when the Expect: 100-continue comes in as part of the message headers, the client should be waiting until it receives the response before sending the content.
Because of the way asp.net is architected, you have little control over the output stream. Any data that gets written to the stream is automatically put in a 200 response with chunked encoding whenever you flush, be it that you're in buffered mode or not.
Sadly all this stuff is hidden away in internal methods all over the place, and the result is that if you rely on asp.net, as does MVC, you're pretty much unable to bypass it.
Wait till you try and access the input stream in a non-buffered way. A whole load of pain.
Seb

Resources