Sending non-ascii characters to a Zapier catch hook - zapier

A zapier web hook has been set up to catch JSON sent to it.
The issue is that if the JSON contains any non-standard characters, e.g. accented characters, the hook never catches the data (no error is displayed, it just doesn't log anything).
Id the catch hook is switched to a 'catch raw hook' then the data is received, but I then don't know how to transform the raw data into JSON for future steps. With the catch raw hook the data caught is e.g. as follows (with a special char ø in the name value):
raw_body
[{"id":2426,"name":"James Hømmett"}]
headers__http_host
hooks.zapier.com
headers__http_x_request_id
b8578a4455fea95c3287e939e304752c
headers__http_x_real_ip
[redacted IP address]
headers__http_x_forwarded_for
[redacted IP address]
headers__http_x_forwarded_host
hooks.zapier.com
headers__http_x_forwarded_port
443
headers__http_x_forwarded_proto
https
headers__http_x_scheme
https
headers__http_x_original_forwarded_for
[redacted IP address]
headers__content_length
559
headers__http_accept_encoding
gzip,deflate
headers__content_type
application/json; charset=utf-8
headers__http_user_agent
Apache-HttpClient/4.5.13 (Java/11.0.9.1)
As you can see charset=utf8 is specified in the content-type header.
The JSON validates with jsonlint.com
Any ideas?

If you're on a paid account, you can add a Code by Zapier step that returns JSON.parse(inputData.raw_body), so that the data is available in future steps.
But, not handling non-ascii characters is likely a bug, so it's worth reaching out to support if you haven't already: https://zapier.com/contact

Related

SendGrid Inbound Parse japanese (encoding: shift_jis) text garbled

Currently we are using SendGrid Inbound Parse to receive emails.
We handle the Inbound Parse webhook request by Azure HttpTrigger function implmented in C# (.NET 6).
When the received email is in UTF-8 encoding, everything's okay.
However, when we tried to receive email in shift_jis encoding, headers are okay,
but japanese characters in text and html are garbled.
From Inbound Parse request, we got the charsets as below:
subject: UTF-8
to: UTF-8
from: UTF-8
cc: UTF-8
html: shift_jis
text: shift_jis
And the string we got directly from request.form["text"] (or "html") was already garbled like "�e�L�X�gshiftJis-007"
(should be "テキストshiftJis-007"), so we cannot use string in request directly.
Then we tried to convert (System.Text.Encoding.Convert method) it from charset encoding (shift_jis) to utf-8,
and the result was different from original string but still unreadable "?e?L?X?gshiftJis-007".
Our questions are:
When using C# HttpTrigger Azure function to handle Inbound Parse webhook request
(request data is passed through AspNetCore.)
What encoding is in html/text string in Inbound Parse webhook request
when the email is send in encoding other than UTF-8?
How to read text and html in shift_jis encoding (or other encodings excluding UTF-8)
correctlyfrom an Inbound Parse webhook request?
Twilio Developer Evangelist here. I would recommend reaching out to the support team because it requires to investigate the payload to figure out what is going on.
I also tried to replicate the issue on my end with using send_raw option. Here's the payload, and it does contain shift_jis characters. You may be able to process the payload manually.
(stripped X-Mailer info)
'Content-Type: text/plain; charset="shift_jis"\n' +
'X-Mailer: \n' +
'Content-Transfer-Encoding: quoted-printable\n' +
'\n' +
'\n' +
'=83e=83L=83X=83gshiftJis-007\n'

How to use "quoted-printable" content-transfer-encoding with BizTalk AS2 receiving?

I'm currently using BizTalk Server 2013 R2 to exchange EDI as well as non-EDI documents using AS2 with a number of different trading partners. I recently added a new trading partner and after receiving a number of documents successfully I started seeing this error occur every now and then:
An output message of the component "Microsoft.BizTalk.EdiInt.PipelineComponents" in receive pipeline "Microsoft.BizTalk.EdiInt.DefaultPipelines.AS2Receive, Microsoft.BizTalk.Edi.EdiIntPipelines, Version=3.0.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" is suspended due to the following error: The content transfer encoding quoted-printable is not supported..
The sequence number of the suspended message is 2.
After some investigation I found that the AS2 platform of the trading partner in question will sometimes set the Content-Transfer-Encoding of the MIME body part to quoted-printable when the enclosed XML payload contains non-ASCII characters. When this happens the message is suspended (non-resumable) with the error above.
Messages received from this trading partner are encrypted and signed, but not compressed - and received using a HTTP request-response (two-way) port configured with the out-of-the-box AS2Receive pipeline. I've tried using a custom pipeline with the AS Decoder, S/MIME decoder and AS2 disassembler components, but this does not seem to have any effect - the error stays the same.
I've also tried receiving unencrypted messages from the trading partner (by mutual agreement) but seem to be doing something wrong here as well as the message passed to the Message Box then ends up not being disassembled properly (the MIME part boundaries and AS2 signature is still visible in the actual message payload). Since the trading partner won't allow sending of unencrypted messages in a production environment anyway, I need to get this working with encryption. They also cannot change their platform's behavior as this will reportedly affect all of their other trading partners.
Here are the unfolded HTTP headers (ellipses denotes redacted values) of the encrypted and signed AS2 message received at the point of being suspended:
Date: Mon, 20 Jan 2020 17:30:53 GMT
Content-Length: 8014
Content-Type: application/pkcs7-mime; name="smime.p7m"; smime-type=enveloped-data
From: ...
Host: ...
User-Agent: Jakarta Commons-HttpClient/3.1
AS2-To: ...
Subject: AS2 Message from ... to ...
Message-Id: <1C20200120-173053-740219#xxx.xxx.130.163>
Disposition-Notification-To: <mailto:...> ...
Disposition-Notification-Options: signed-receipt-protocol=optional, pkcs7-signature; signed-receipt-micalg=optional, sha1
AS2-From: ...
AS2-Version: 1.1
content-disposition: attachment; filename="smime.p7m"
X-Original-URL: /as2
Here is the unencrypted (ellipses denotes redacted content) payload when exact same message is sent from source party without encryption:
------=_Part_16155_1587439544.1579506174880
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
...
------=_Part_16155_1587439544.1579506174880
Content-Type: application/pkcs7-signature; name=smime.p7s; smime-type=signed-data
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature
...
------=_Part_16155_1587439544.1579506174880--
Question: does BizTalk Server support the quoted-printable encoding method? If it does, what am I doing wrong? If it does not, what are my options in terms of a workaround?
For anyone else that may encounter this same issue, I thought I'd share the solution I ended up with.
Since the error was encountered during AS2 receive pipeline processing, naturally my solution was focussed around creating a custom receive pipeline component that does more or less the same than the out-of-the-box AS2 decoder component, but with support for the quoted-printable encoding method:
1. Decode and decrypt the CMS/PKCS#7 data envelope
This is actually the easiest step with only 5 lines of code:
EnvelopedCms envelopedCms = new EnvelopedCms();
envelopedCms.Decode(encryptedData);
envelopedCms.Decrypt();
byte[] decryptedData = envelopedCms.Encode();
string decryptedMessageString = Encoding.ASCII.GetString(decryptedData);
-encryptedData is a byte-array instantiated from the body-part data stream of the AS2 message received bythe HTTP adapter.
-The Decrypt method automatically searches the user and computer certificate stores for the appropriate certificate private key and uses this to decrypt the AS2 payload. For more information on the `EnvelopedCms' class follow this link.
2. Convert any quoted-printable content in the payload to normal UTF-8 text
First we have to get the MIME boundary name from the content type string at the beginning of the decrypted payload:
int firstBlankLineInMessage = decryptedMessageString.IndexOf(Environment.NewLine + Environment.NewLine);
string contentType = decryptedMessageString.Substring(0, firstBlankLineInMessage);
Regex boundaryRegex = new Regex("boundary=\"(?<boundary>.*)\"");
Match boundaryMatch = boundaryRegex.Match(contentType);
if (!boundaryMatch.Success)
throw new Exception("Failed to get boundary name from content type");
string boundary = "--" + boundaryMatch.Groups["boundary"].Value;
Then we split the envelope and re-merge without the content-type header part:
string[] messageParts = decryptedMessageString.Split(new string[] {boundary}, StringSplitOptions.RemoveEmptyEntries);
string signedMessageString = boundary + messageParts[1] + boundary + messageParts[2] + boundary + "--\r\n";
Next we get the `Content-Transfer-Encoding' value in the MIME body-part header:
int firstBlankLineInBodyPart = messageParts[1].IndexOf(Environment.NewLine + Environment.NewLine);
string partHeaders = messageParts[1].Substring(0, firstBlankLineInBodyPart);
Regex cteRegex = new Regex("Content-Transfer-Encoding: (?<cte>.*)");
Match cteMatch = cteRegex.Match(partHeaders);
if (!cteMatch.Success)
throw new Exception("Failed to get CTE from body part headers");
string cte = cteMatch.Groups["cte"].Value;
string payload = messageParts[1].Substring(firstBlankLineInBodyPart).Trim();
And finally we check the CTE and decode if neccessary:
string payload = messageParts[1].Substring(firstBlankLineInBodyPart).Trim();
if (cte == "quoted-printable")
{
// Get charset
Regex charsetRegex = new Regex("Content-Type: .*charset=(?<charset>.*)");
Match charsetMatch = charsetRegex.Match(partHeaders);
if (!charsetMatch.Success)
throw new Exception("Failed to get charset from body part headers");
string charset = charsetMatch.Groups["charset"].Value;
QuotedPrintableDecode(payload, charset);
}
Note: There are many different implementations out there for decoding QP, including a .NET implementation that has (reportedly) been found buggy by some users. I decided to use this implementation shared by Gonzalo.
3. Update the Content-Type HTTP header and BizTalk message body-part stream
string httpHeaders = objHttpHeaders.ToString().Replace("Content-Type: application/pkcs7-mime; name=\"smime.p7m\"; smime-type=enveloped-data", "Content-Type: application/xml");
inMessage.Context.Write("InboundHttpHeaders", "http://schemas.microsoft.com/BizTalk/2003/http-properties", httpHeaders);
MemoryStream payloadStream = new MemoryStream(Encoding.UTF8.GetBytes(payload));
payloadStream.Seek(0, SeekOrigin.Begin);
pipelineContext.ResourceTracker.AddResource(payloadStream);
inMessage.BodyPart.Data = payloadStream;
-pipelineContext is the IPipelineContext variable passed to the Execute method of the custom pipeline component
-inMessage is the IBaseMessage variable passed to the Execute method
Last Thoughts
The code above can still be improved in a number of ways:
Checking HTTP headers for encryption before attempting to decrypt
Re-encrypting payload before passing message to AS2 disassembler component (if required by BizTalk party configuration)
Adding support for compression
If you'd like a copy of the source code drop me a message and I'll see about upping it to an online repo.
I had ticket opened with Microsoft BizTalk tech support on the issue. Their response is that
The quoted-printable encoding is not supported by MS BizTalk Server 2013R2" and most likely is not supported by MS BizTalk Server 2020

Problem at tweeting with ESP8266 via Thingspeak

I programmed my ESP8266 to read the soil moisture. Depending on the moisture a water pump gets activated. Now I wanted the ESP to tweet different sentences, depending on the situation.
Therefore I connected my twitter account to thingspeak.com and followed this code
Connecting to the internet works fine.
Problems:
It does not tweet every time and if it tweets, only the first word from a sentence shows up at twitter.
According to the forum, where I found the code, I already tried to replace all the spaces between the words with "%20". However then nothing shows up at twitter at all. Also single words are not always posted to twitter.
This is the code I have problems with:
// if connection to thingspeak.com is successful, send your tweet!
if (client.connect("184.106.153.149", 80))
{
client.print("GET /apps/thingtweet/1/statuses/update?key=" + API + "&status=" + tweet + " HTTP/1.1\r\n");
client.print("Host: api.thingspeak.com\r\n");
client.print("Accept: */*\r\n");
client.print("User-Agent: Mozilla/4.0 (compatible; esp8266 Lua; Windows NT 5.1)\r\n");
client.print("\r\n");
Serial.println("tweeted " + tweet);
}
I don't get any error messages.
Maybe you could help me to make it visible if the tweet was really sent and how I manage to tweet a whole sentence.
I am using the Arduino IDE version 1.8.9 and I am uploading to this board
The rest of the code works fine. The only problem is the tweeting.
Update
I now tried a few different things:
Checking server response
Works and helps a lot. The results are:
Single words as String don't get any response at all
Same for Strings like "Test%20Tweet"
Strings with multiple words like "Test Tweet" get the following response and the first word of the String shows up as a tweet
HTTP/1.1 200 OK
Server: nginx/1.7.5
Date: Wed, 19 Jun 2019 18:44:22 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 1
Connection: keep-alive
Status: 200 OK
X-Frame-Options: SAMEORIGIN
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, PUT, OPTIONS, DELETE, PATCH
Access-Control-Allow-Headers: origin, content-type, X-Requested-With
Access-Control-Max-Age: 1800
ETag: W/"RANDOM_CHARS"
Cache-Control: max-age=0, private, must-revalidate
X-Request-Id: THE_ID
1
I think the Content-Length might be the problem?
But I don't know how to change it in this code.
Checking if the connection succeded
I implemented this into my code an it never shows up on the monitor. So I think i never have a problem with not connecting.
Use a hostname instead of IP address
I tried it and never got a bad request. On the other hand nothing shows up on twitter at all.
Check if your tweet variable contains any new-line characters (carriage return or line feed). For example, the following variable would cause problems
String tweet = "Tweet no. 1\r\n";
due to the new-line characters at the end. These characters will cause the first line of the HTTP request to be cut short. I.e., instead of
GET /apps/thingtweet/1/statuses/update?key=api_key&status=Tweet no. 1 HTTP/1.1\r\n
it would become
GET /apps/thingtweet/1/statuses/update?key=api_key&status=Tweet no. 1\r\n
and the server would reject it with a 400 (Bad request) error.
On the other hand
String tweet = "Tweet no. 1";
would be fine.
If your tweets may contain such characters, then try encoding them before passing them to client.print():
tweet.replace("\r", "%0D");
tweet.replace("\n", "%0A");
Use a hostname instead of IP address
According to https://uk.mathworks.com/help/thingspeak/writedata.html, the relevant hostname for the API you are using is api.thingspeak.com. Use that instead of the IP address. This is preferable because the IP address a hostname points to can change regularly. (The IP address you are using doesn't even seem to be correct - and may already be out of date.)
I.e., change
if (client.connect("184.106.153.149", 80)) {
to
if (client.connect("api.thingspeak.com", 80)) {
API endpoint
Are you sure you are using the correct API endpoint? According to the link above, it looks like the API endpoint you need is https://api.thingspeak.com/update.json - so you may need to change
client.print("GET /apps/thingtweet/1/statuses/update?key=" + API + "&status=" + tweet + " HTTP/1.1\r\n");
to
client.print("GET /update.json?api_key=" + API + "&status=" + tweet + " HTTP/1.1\r\n");
Check if the connection succeeded
Presently, your device sends the HTTP request if connects to the server successfully - but doesn't give any indication if the connection fails! So add an else block to handle that scenario and notify the user via the serial console.
if (client.connect("api.thingspeak.com", 80)) {
client.print("GET /apps/thingtweet/1/statuses/update?key=" + API + "&status=" + tweet + " HTTP/1.1\r\n");
// etc.
}
else {
Serial.println("Connection to the server failed!");
}
Checking server response
To check the response from the server, add the following block to your main loop - which will print the server response via the serial console.
delay(50);
while (client.available()) {
String response_line = client.readString();
Serial.println(response_line);
}
To clarify: that code should go inside your loop() function.
The response should include a status line - such as HTTP/1.1 200 OK if the request was successful, or HTTP/1.1 400 Bad Request if there was a problem.
In the case of a Bad request response, the full message will quite likely contain more information about the precise reason the request failed.
HTTP vs HTTPs
Lastly, are you sure that the API supports (plain, unencrypted) HTTP as well as HTTPs? If not, that may be your problem.

How to handle docker API /images/create?

Docker API image creation / pull (/v1.6/images/create) apparently always return
HTTP/1.1 200 OK
Content-Type: application/json
no matter if the process is a success or a failure.
Furthermore, the payload is not valid json.
eg: /v1.6/images/create?fromImage=whatevertheflush
returns:
{"status":"Pulling repository whatevertheflush"}{"error":"Server error: 404 trying to fetch remote history for whatevertheflush","errorDetail":{"code":404,"message":"Server error: 404 trying to fetch remote history for whatevertheflush"}}
Not being valid json, and the HTTP error not being forwarded / used makes it awkward to handle errors for clients.
Indeed, docker-py just puke the payload (https://github.com/dotcloud/docker-py/blob/master/docker/client.py#L374). And DockerHTTPClient from openstack tries to return a value based on the http error code, which is always 200... (https://github.com/openstack/nova/blob/master/nova/virt/docker/client.py#L191)
Now, I understand the pull might take a long time, and that it somewhat make sense to start streaming an answer to the client, but I can't help thinking something is wrong here.
So, this is three fold:
am I missing something entirely here?
if not: if you are implementing a client application (say, in Python), how would you handle this (elegantly, if possible :))? try to detect valid json blocks, load them, and exit whenever we "think" something is wrong?
if not: is this going to change (for the better) in future docker versions?
This question is a bit old, but for the future reader who has landed on this page, I'd like to let you know you're not alone, we feel your pain. This API is indeed as terrible as it looks.
The TL;DR answer is "the /images/create response format is undocumented; discard the output and query /images/XXX/json after your create call completes."
I wrote some orchestration tools a few years ago, and I found the /images/create API to be extremely annoying. But let's dive in:
There is no documented schema of the 200 response; the v1.19 docs simply gave examples of a few records. The v1.37 (latest at the time I write this) docs don't even go that far, no details are provided at all of the response.
The response is sent as Transfer-Encoding: chunked, and each record sent is preceded by the byte count in hex. Here's a low-level exerpt (bypassing curl, so we can see what actually gets sent on the wire):
host-4:~ rg$ telnet localhost 2375
Trying ::1...
Connected to localhost.
Escape character is '^]'.
POST /images/create?fromImage=jenkins/jenkins:latest HTTP/1.1
Host: localhost:2375
User-Agent: foo/1.0
Accept: */*
HTTP/1.1 200 OK
Api-Version: 1.39
Content-Type: application/json
Docker-Experimental: true
Ostype: linux
Server: Docker/18.09.1 (linux)
Date: Wed, 06 Feb 2019 16:53:19 GMT
Transfer-Encoding: chunked
39
{"status":"Pulling from jenkins/jenkins","id":"latest"}
5e
{"status":"Digest: sha256:abd3e3f96fbc3445c420fda590f37e2bd3377f69affd47b63b3d826d084c5ddc"}
45
{"status":"Status: Image is up to date for jenkins/jenkins:latest"}
0
Yes, it streams the image download progress -- client libraries that don't give low-level access to the chunked records may just concatenate the data before it's provided to you. As you encountered, early versions of the API returned JSON records with the only delimiter being the chunked transfer encoding, so client code received a concatenated block of undelimited JSON and had to parse it by tracking curlies/quotes/escape chars! It has since been updated to now emit records delimited by newlines, but can we count on them always being there? Who knows! This behavior changed without ceremony, and was not preserved if you call older versions of the API on newer daemons.
It returns 200 OK immediately, which doesn't represent success or failure. (Given the nature of the call, I'd imagine it should probably return 202 Accepted instead. Ideally, we'd get a Location header pointing to a new URL that we could use to query the progress/status.)
The response data returned is huge, spammy, and just... silly. If you have a docker instance listening on TCP, try curl -Nv -X POST http://yourdocker:2375/images/create?fromImage=jenkins/jenkins:latest -o /tmp/omgwtf.txt. You'll be amazed. A ton of bandwidth is wasted transferring server-rendered ASCII bar graphs!!!. In fact, the records return each layer's progress three different ways, as numeric fields for current and total bytes, as a bar graph, and as a pretty-printed string with MB or GB units. Why isn't this just rendered on the client? Great question.
Instead, you need your client to parse kilobytes or megabytes of spam.
The bar graph has a randomly escaped unicode rep of the > character, despite being safely inside a JSON string. Someone was just throwing escape calls at the wall to see what stuck? ¯\_(ツ)_/¯
The records themselves are pretty arbitrary. There's an id field that changes what it references, and the only way to know what kind of record it is to parse the human-readable string. Pulling from XXX vs Pulling fs layer vs Downloading etc. As far as I can tell, the only real way to know if it's done is to track all the ids, and ensure you get a Pull complete for each at the time that the socket closes.
You might be able to look for Status: Downloaded newer image for XXX but I'm not sure if there are multiple possible responses for this.
As I mentioned at the start, you'll probably have the best luck requesting /images/XXX/json after /images/create claims to be complete. The combination of the two calls will give a pretty reliable indication of whether /images/create worked or not.
Here's a longer block of concatenated client response that shows a few different record types. Edited for brevity:
{"status":"Pulling from jenkins/jenkins","id":"latest"}
{"status":"Pulling fs layer","progressDetail":{},"id":"ab1fc7e4bf91"}
{"status":"Pulling fs layer","progressDetail":{},"id":"35fba333ff52"}
{"status":"Pulling fs layer","progressDetail":{},"id":"f0cb1fa13079"}
{"status":"Pulling fs layer","progressDetail":{},"id":"3d1dd648b5ad"}
{"status":"Pulling fs layer","progressDetail":{},"id":"a9f886e483d6"}
{"status":"Pulling fs layer","progressDetail":{},"id":"4346341d3c49"}
..
"status":"Waiting","progressDetail":{},"id":"3d1dd648b5ad"}
{"status":"Waiting","progressDetail":{},"id":"a9f886e483d6"}
{"status":"Waiting","progressDetail":{},"id":"4346341d3c49"}
{"status":"Waiting","progressDetail":{},"id":"006f2208d67a"}
{"status":"Waiting","progressDetail":{},"id":"fb85cf26717d"}
{"status":"Waiting","progressDetail":{},"id":"52ca068dbca7"}
{"status":"Waiting","progressDetail":{},"id":"82f4759b8d12"}
...
{"status":"Downloading","progressDetail":{"current":110118,"total":10780995},"progress":"[\u003e ] 110.1kB/10.78MB","id":"35fba333ff52"}
{"status":"Downloading","progressDetail":{"current":457415,"total":45344749},"progress":"[\u003e ] 457.4kB/45.34MB","id":"ab1fc7e4bf91"}
{"status":"Downloading","progressDetail":{"current":44427,"total":4340040},"progress":"[\u003e ] 44.43kB/4.34MB","id":"f0cb1fa13079"}
{"status":"Downloading","progressDetail":{"current":817890,"total":10780995},"progress":"[===\u003e ] 817.9kB/10.78MB","id":"35fba333ff52"}
{"status":"Downloading","progressDetail":{"current":1833671,"total":45344749},"progress":"[==\u003e ] 1.834MB/45.34MB","id":"ab1fc7e4bf91"}
{"status":"Downloading","progressDetail":{"current":531179,"total":4340040},"progress":"[======\u003e ] 531.2kB/4.34MB","id":"f0cb1fa13079"}
{"status":"Downloading","progressDetail":{"current":1719010,"total":10780995},"progress":"[=======\u003e ] 1.719MB/10.78MB","id":"35fba333ff52"}
{"status":"Downloading","progressDetail":{"current":3205831,"total":45344749},"progress":"[===\u003e ] 3.206MB/45.34MB","id":"ab1fc7e4bf91"}
{"status":"Downloading","progressDetail":{"current":1129195,"total":4340040},"progress":"[=============\u003e ] 1.129MB/4.34MB","id":"f0cb1fa13079"}
{"status":"Downloading","progressDetail":{"current":2640610,"total":10780995},"progress":"[============\u003e ] 2.641MB/10.78MB","id":"35fba333ff52"}
{"status":"Downloading","progressDetail":{"current":1719019,"total":4340040},"progress":"[===================\u003e ] 1.719MB/4.34MB","id":"f0cb1fa13079"}
{"status":"Downloading","progressDetail":{"current":4586183,"total":45344749},"progress":"[=====\u003e ] 4.586MB/45.34MB","id":"ab1fc7e4bf91"}
{"status":"Downloading","progressDetail":{"current":3549922,"total":10780995},"progress":"[================\u003e ] 3.55MB/10.78MB","id":"35fba333ff52"}
{"status":"Downloading","progressDetail":{"current":2513643,"total":4340040},"progress":"[============================\u003e ] 2.514M
...
{"status":"Pull complete","progressDetail":{},"id":"6d9b49fc8a28"}
{"status":"Extracting","progressDetail":{"current":380,"total":380},"progress":"[==================================================\u003e] 380B/380B","id":"6302e8b6563c"}
{"status":"Extracting","progressDetail":{"current":380,"total":380},"progress":"[==================================================\u003e] 380B/380B","id":"6302e8b6563c"}
{"status":"Pull complete","progressDetail":{},"id":"6302e8b6563c"}
{"status":"Extracting","progressDetail":{"current":1548,"total":1548},"progress":"[==================================================\u003e] 1.548kB/1.548kB","id":"7348f018cf93"}
{"status":"Extracting","progressDetail":{"current":1548,"total":1548},"progress":"[==================================================\u003e] 1.548kB/1.548kB","id":"7348f018cf93"}
{"status":"Pull complete","progressDetail":{},"id":"7348f018cf93"}
{"status":"Extracting","progressDetail":{"current":3083,"total":3083},"progress":"[==================================================\u003e] 3.083kB/3.083kB","id":"c651ee7bd59e"}
{"status":"Extracting","progressDetail":{"current":3083,"total":3083},"progress":"[==================================================\u003e] 3.083kB/3.083kB","id":"c651ee7bd59e"}
{"status":"Pull complete","progressDetail":{},"id":"c651ee7bd59e"}
{"status":"Digest: sha256:abd3e3f96fbc3445c420fda590f37e2bd3377f69affd47b63b3d826d084c5ddc"}
{"status":"Status: Downloaded newer image for jenkins/jenkins:latest"}
This code runs the Internet now. =8-O
This particular endpoint actually returns chunked encoding. An example via curl:
$ curl -v -X POST http://localhost:4243/images/create?fromImage=base
* About to connect() to localhost port 4243 (#0)
* Trying ::1...
* Connection refused
* Trying 127.0.0.1...
* connected
* Connected to localhost (127.0.0.1) port 4243 (#0)
> POST /images/create?fromImage=base HTTP/1.1
> User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8y zlib/1.2.5
> Host: localhost:4243
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Fri, 07 Feb 2014 04:21:59 GMT
< Transfer-Encoding: chunked
<
* Connection #0 to host localhost left intact
{"status":"Pulling repository base"}{"status":"Pulling image (ubuntu-quantl) from base","progressDetail":{},"id":"b750fe79269d"}{"status":"Pulling image (ubuntu-quantl) from base, endpoint: https://cdn-registry-1.docker.io/v1/","progressDetail":{},"id":"b750fe79269d"}{"status":"Pulling dependent layers","progressDetail":{},"id":"b750fe79269d"}{"status":"Download complete","progressDetail":{},"id":"27cf78414709"}{"status":"Download complete","progressDetail":{},"id":"b750fe79269d"}{"status":"Download complete","progressDetail":{},"id":"b750fe79269d"}* Closing connection #0
Now I'm not sure how you go about parsing this in Python, but in Ruby, I can use Yajl like so:
parts = []
Yajl::Parser.parse(body) { |o| parts << o }
puts parts
{"status"=>"Pulling repository base"}
{"status"=>"Pulling image (ubuntu-quantl) from base", "progressDetail"=>{}, "id"=>"b750fe79269d"}
{"status"=>"Pulling image (ubuntu-quantl) from base, endpoint: https://cdn-registry-1.docker.io/v1/", "progressDetail"=>{}, "id"=>"b750fe79269d"}
{"status"=>"Pulling dependent layers", "progressDetail"=>{}, "id"=>"b750fe79269d"}
{"status"=>"Download complete", "progressDetail"=>{}, "id"=>"27cf78414709"}
{"status"=>"Download complete", "progressDetail"=>{}, "id"=>"b750fe79269d"}
{"status"=>"Download complete", "progressDetail"=>{}, "id"=>"b750fe79269d"}
Using Docker v1.9 I still having this problem to deal with.
Also have found an issue on Docker Github repository: Docker uses invalid JSON format in some API functions #16925
Where some contributor suggests to use Content-Type HTTP header like this: application/json; boundary=NL
This not worked for me.
Then, while struggling with my custom parser, found this question StackOverflow: How to handle a huge stream of JSON dictionaries?

Twitter stream API - Erlang client

I'm very new in Erlang world and I'm trying to write a client for the Twitter Stream API. I'm using httpc:request to make a POST request and I constantly get 401 error, I'm obviously doing something wrong with how I'm sending the request... What I have looks like this:
fetch_data() ->
Method = post,
URL = "https://stream.twitter.com/1.1/statuses/filter.json",
Headers = "Authorization: OAuth oauth_consumer_key=\"XXX\", oauth_nonce=\"XXX\", oauth_signature=\"XXX%3D\", oauth_signature_method=\"HMAC-SHA1\", oauth_timestamp=\"XXX\", oauth_token=\"XXX-XXXXX\", oauth_version=\"1.0\"",
ContentType = "application/json",
Body = "{\"track\":\"keyword\"}",
HTTPOptions = [],
Options = [],
R = httpc:request(Method, {URL, Headers, ContentType, Body}, HTTPOptions, Options),
R.
At this point I'm confident there's no issue with the signature as the same signature works just fine when trying to access the API with curl. I'm guessing there's some issue with how I'm making the request.
The response I'm getting with the request made the way demonstrated above is:
{ok,{{"HTTP/1.1",401,"Unauthorized"},
[{"cache-control","must-revalidate,no-cache,no-store"},
{"connection","close"},
{"www-authenticate","Basic realm=\"Firehose\""},
{"content-length","1243"},
{"content-type","text/html"}],
"<html>\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"/>\n<title>Error 401 Unauthorized</title>\n</head>\n<body>\n<h2>HTTP ERROR: 401</h2>\n<p>Problem accessing '/1.1/statuses/filter.json'. Reason:\n<pre> Unauthorized</pre>\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n</body>\n</html>\n"}}
When trying with curl I'm using this:
curl --request 'POST' 'https://stream.twitter.com/1.1/statuses/filter.json' --data 'track=keyword' --header 'Authorization: OAuth oauth_consumer_key="XXX", oauth_nonce="XXX", oauth_signature="XXX%3D", oauth_signature_method="HMAC-SHA1", oauth_timestamp="XXX", oauth_token="XXX-XXXX", oauth_version="1.0"' --verbose
and I'm getting the events just fine.
Any help on this would be greatly appreciated, new with Erlang and I've been pulling my hair out on this one for quite a while.
There are several issues with your code:
In Erlang you are encoding parameters as a JSON body while with curl, you are encoding them as form data (application/x-www-form-urlencoded). Twitter API expects the latter. In fact, you get a 401 because the OAuth signature does not match, as you included the track=keyword parameter in the computation while Twitter's server computes it without the JSON body, as it should per OAuth RFC.
You are using httpc with default options. This will not work with the streaming API as the stream never ends. You need to process results as they arrive. For this, you need to pass {sync, false} option to httpc. See also stream and receiver options.
Eventually, while httpc can work initially to access Twitter streaming API, it brings little value to the code you need to develop around it to stream from Twitter API. Depending on your needs you might want to replace it a simple client directly built on ssl, especially considering it can decode HTTP packets (what is left for you is the HTTP chunk encoding).
For example, if your keywords are rare, you might get a timeout from httpc. Besides, it might be easier to update the list of keywords or your code with no downtime without httpc.
A streaming client directly based on ssl could be implemented as a gen_server (or a simple process, if you do not follow OTP principles) or even better a gen_fsm to implement reconnection strategies. You could proceed as follows:
Connect using ssl:connect/3,4 specifying that you want the socket to decode the HTTP packets with {packet, http_bin} and you want the socket to be configured in passive mode {active, false}.
Send the HTTP request packet (preferably as an iolist, with binaries) with ssl:send/2,3. It shall spread on several lines separated with CRLF (\r\n), with first the query line (GET /1.1/statuses/filter.json?... HTTP/1.1) and then the headers including the OAuth headers. Make sure you include Host: stream.twitter.com as well. End with an empty line.
Receive the HTTP response. You can implement this with a loop (since the socket is in passive mode), calling ssl:recv/2,3 until you get http_eoh (end of headers). Note down whether the server will send you data chunked or not by looking at the Transfer-Encoding response header.
Configure the socket in active mode with ssl:setopts/2 and specify you want packets as raw and data in binary format. In fact, if data is chunked, you could continue to use the socket in passive mode. You could also get data line by line or get data as strings. This is a matter of taste: raw is the safest bet, line by line requires that you check the buffer size to prevent truncation of a long JSON-encoded tweet.
Receive data from Twitter as messages sent to your process, either with receive (simple process) or in handle_info handler (if you implemented this with a gen_server). If data is chunked, you shall first receive the chunk size, then the tweets and the end of the chunk eventually (cf RFC 2616). Be prepared to have tweets that spread on several chunks (i.e. maintain some kind of buffer). The best here is to do the minimum decoding in this process and send tweets to another process, possibly in binary format.
You should also handle errors and socket being closed by Twitter. Make sure you follow Twitter's guidelines for reconnection.

Resources