SendGrid Inbound Parse japanese (encoding: shift_jis) text garbled - character-encoding

Currently we are using SendGrid Inbound Parse to receive emails.
We handle the Inbound Parse webhook request by Azure HttpTrigger function implmented in C# (.NET 6).
When the received email is in UTF-8 encoding, everything's okay.
However, when we tried to receive email in shift_jis encoding, headers are okay,
but japanese characters in text and html are garbled.
From Inbound Parse request, we got the charsets as below:
subject: UTF-8
to: UTF-8
from: UTF-8
cc: UTF-8
html: shift_jis
text: shift_jis
And the string we got directly from request.form["text"] (or "html") was already garbled like "�e�L�X�gshiftJis-007"
(should be "テキストshiftJis-007"), so we cannot use string in request directly.
Then we tried to convert (System.Text.Encoding.Convert method) it from charset encoding (shift_jis) to utf-8,
and the result was different from original string but still unreadable "?e?L?X?gshiftJis-007".
Our questions are:
When using C# HttpTrigger Azure function to handle Inbound Parse webhook request
(request data is passed through AspNetCore.)
What encoding is in html/text string in Inbound Parse webhook request
when the email is send in encoding other than UTF-8?
How to read text and html in shift_jis encoding (or other encodings excluding UTF-8)
correctlyfrom an Inbound Parse webhook request?

Twilio Developer Evangelist here. I would recommend reaching out to the support team because it requires to investigate the payload to figure out what is going on.
I also tried to replicate the issue on my end with using send_raw option. Here's the payload, and it does contain shift_jis characters. You may be able to process the payload manually.
(stripped X-Mailer info)
'Content-Type: text/plain; charset="shift_jis"\n' +
'X-Mailer: \n' +
'Content-Transfer-Encoding: quoted-printable\n' +
'\n' +
'\n' +
'=83e=83L=83X=83gshiftJis-007\n'

Related

receive space instead of + symbol in api from Swift IOS

I am passing Base64 encoded String to API call, but when we receive in API it shows as space .
Eg.) Passing like this "kv+lluLOKRkGK6v+BqNPAPsx" But in API response
receive "kv lluLOKRkGK6v BqNPAPsx" like this.
Can anyone explain why swift not sending "+" symbol in API.
And Please tell how to solve this.
Use the proper Base64 variant that does not have the + character for REST calls: RFC 4648 base64url.
If you can't change the encoding, simply transcode the Base64 you receive.

Sending non-ascii characters to a Zapier catch hook

A zapier web hook has been set up to catch JSON sent to it.
The issue is that if the JSON contains any non-standard characters, e.g. accented characters, the hook never catches the data (no error is displayed, it just doesn't log anything).
Id the catch hook is switched to a 'catch raw hook' then the data is received, but I then don't know how to transform the raw data into JSON for future steps. With the catch raw hook the data caught is e.g. as follows (with a special char ø in the name value):
raw_body
[{"id":2426,"name":"James Hømmett"}]
headers__http_host
hooks.zapier.com
headers__http_x_request_id
b8578a4455fea95c3287e939e304752c
headers__http_x_real_ip
[redacted IP address]
headers__http_x_forwarded_for
[redacted IP address]
headers__http_x_forwarded_host
hooks.zapier.com
headers__http_x_forwarded_port
443
headers__http_x_forwarded_proto
https
headers__http_x_scheme
https
headers__http_x_original_forwarded_for
[redacted IP address]
headers__content_length
559
headers__http_accept_encoding
gzip,deflate
headers__content_type
application/json; charset=utf-8
headers__http_user_agent
Apache-HttpClient/4.5.13 (Java/11.0.9.1)
As you can see charset=utf8 is specified in the content-type header.
The JSON validates with jsonlint.com
Any ideas?
If you're on a paid account, you can add a Code by Zapier step that returns JSON.parse(inputData.raw_body), so that the data is available in future steps.
But, not handling non-ascii characters is likely a bug, so it's worth reaching out to support if you haven't already: https://zapier.com/contact

How to use "quoted-printable" content-transfer-encoding with BizTalk AS2 receiving?

I'm currently using BizTalk Server 2013 R2 to exchange EDI as well as non-EDI documents using AS2 with a number of different trading partners. I recently added a new trading partner and after receiving a number of documents successfully I started seeing this error occur every now and then:
An output message of the component "Microsoft.BizTalk.EdiInt.PipelineComponents" in receive pipeline "Microsoft.BizTalk.EdiInt.DefaultPipelines.AS2Receive, Microsoft.BizTalk.Edi.EdiIntPipelines, Version=3.0.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" is suspended due to the following error: The content transfer encoding quoted-printable is not supported..
The sequence number of the suspended message is 2.
After some investigation I found that the AS2 platform of the trading partner in question will sometimes set the Content-Transfer-Encoding of the MIME body part to quoted-printable when the enclosed XML payload contains non-ASCII characters. When this happens the message is suspended (non-resumable) with the error above.
Messages received from this trading partner are encrypted and signed, but not compressed - and received using a HTTP request-response (two-way) port configured with the out-of-the-box AS2Receive pipeline. I've tried using a custom pipeline with the AS Decoder, S/MIME decoder and AS2 disassembler components, but this does not seem to have any effect - the error stays the same.
I've also tried receiving unencrypted messages from the trading partner (by mutual agreement) but seem to be doing something wrong here as well as the message passed to the Message Box then ends up not being disassembled properly (the MIME part boundaries and AS2 signature is still visible in the actual message payload). Since the trading partner won't allow sending of unencrypted messages in a production environment anyway, I need to get this working with encryption. They also cannot change their platform's behavior as this will reportedly affect all of their other trading partners.
Here are the unfolded HTTP headers (ellipses denotes redacted values) of the encrypted and signed AS2 message received at the point of being suspended:
Date: Mon, 20 Jan 2020 17:30:53 GMT
Content-Length: 8014
Content-Type: application/pkcs7-mime; name="smime.p7m"; smime-type=enveloped-data
From: ...
Host: ...
User-Agent: Jakarta Commons-HttpClient/3.1
AS2-To: ...
Subject: AS2 Message from ... to ...
Message-Id: <1C20200120-173053-740219#xxx.xxx.130.163>
Disposition-Notification-To: <mailto:...> ...
Disposition-Notification-Options: signed-receipt-protocol=optional, pkcs7-signature; signed-receipt-micalg=optional, sha1
AS2-From: ...
AS2-Version: 1.1
content-disposition: attachment; filename="smime.p7m"
X-Original-URL: /as2
Here is the unencrypted (ellipses denotes redacted content) payload when exact same message is sent from source party without encryption:
------=_Part_16155_1587439544.1579506174880
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
...
------=_Part_16155_1587439544.1579506174880
Content-Type: application/pkcs7-signature; name=smime.p7s; smime-type=signed-data
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature
...
------=_Part_16155_1587439544.1579506174880--
Question: does BizTalk Server support the quoted-printable encoding method? If it does, what am I doing wrong? If it does not, what are my options in terms of a workaround?
For anyone else that may encounter this same issue, I thought I'd share the solution I ended up with.
Since the error was encountered during AS2 receive pipeline processing, naturally my solution was focussed around creating a custom receive pipeline component that does more or less the same than the out-of-the-box AS2 decoder component, but with support for the quoted-printable encoding method:
1. Decode and decrypt the CMS/PKCS#7 data envelope
This is actually the easiest step with only 5 lines of code:
EnvelopedCms envelopedCms = new EnvelopedCms();
envelopedCms.Decode(encryptedData);
envelopedCms.Decrypt();
byte[] decryptedData = envelopedCms.Encode();
string decryptedMessageString = Encoding.ASCII.GetString(decryptedData);
-encryptedData is a byte-array instantiated from the body-part data stream of the AS2 message received bythe HTTP adapter.
-The Decrypt method automatically searches the user and computer certificate stores for the appropriate certificate private key and uses this to decrypt the AS2 payload. For more information on the `EnvelopedCms' class follow this link.
2. Convert any quoted-printable content in the payload to normal UTF-8 text
First we have to get the MIME boundary name from the content type string at the beginning of the decrypted payload:
int firstBlankLineInMessage = decryptedMessageString.IndexOf(Environment.NewLine + Environment.NewLine);
string contentType = decryptedMessageString.Substring(0, firstBlankLineInMessage);
Regex boundaryRegex = new Regex("boundary=\"(?<boundary>.*)\"");
Match boundaryMatch = boundaryRegex.Match(contentType);
if (!boundaryMatch.Success)
throw new Exception("Failed to get boundary name from content type");
string boundary = "--" + boundaryMatch.Groups["boundary"].Value;
Then we split the envelope and re-merge without the content-type header part:
string[] messageParts = decryptedMessageString.Split(new string[] {boundary}, StringSplitOptions.RemoveEmptyEntries);
string signedMessageString = boundary + messageParts[1] + boundary + messageParts[2] + boundary + "--\r\n";
Next we get the `Content-Transfer-Encoding' value in the MIME body-part header:
int firstBlankLineInBodyPart = messageParts[1].IndexOf(Environment.NewLine + Environment.NewLine);
string partHeaders = messageParts[1].Substring(0, firstBlankLineInBodyPart);
Regex cteRegex = new Regex("Content-Transfer-Encoding: (?<cte>.*)");
Match cteMatch = cteRegex.Match(partHeaders);
if (!cteMatch.Success)
throw new Exception("Failed to get CTE from body part headers");
string cte = cteMatch.Groups["cte"].Value;
string payload = messageParts[1].Substring(firstBlankLineInBodyPart).Trim();
And finally we check the CTE and decode if neccessary:
string payload = messageParts[1].Substring(firstBlankLineInBodyPart).Trim();
if (cte == "quoted-printable")
{
// Get charset
Regex charsetRegex = new Regex("Content-Type: .*charset=(?<charset>.*)");
Match charsetMatch = charsetRegex.Match(partHeaders);
if (!charsetMatch.Success)
throw new Exception("Failed to get charset from body part headers");
string charset = charsetMatch.Groups["charset"].Value;
QuotedPrintableDecode(payload, charset);
}
Note: There are many different implementations out there for decoding QP, including a .NET implementation that has (reportedly) been found buggy by some users. I decided to use this implementation shared by Gonzalo.
3. Update the Content-Type HTTP header and BizTalk message body-part stream
string httpHeaders = objHttpHeaders.ToString().Replace("Content-Type: application/pkcs7-mime; name=\"smime.p7m\"; smime-type=enveloped-data", "Content-Type: application/xml");
inMessage.Context.Write("InboundHttpHeaders", "http://schemas.microsoft.com/BizTalk/2003/http-properties", httpHeaders);
MemoryStream payloadStream = new MemoryStream(Encoding.UTF8.GetBytes(payload));
payloadStream.Seek(0, SeekOrigin.Begin);
pipelineContext.ResourceTracker.AddResource(payloadStream);
inMessage.BodyPart.Data = payloadStream;
-pipelineContext is the IPipelineContext variable passed to the Execute method of the custom pipeline component
-inMessage is the IBaseMessage variable passed to the Execute method
Last Thoughts
The code above can still be improved in a number of ways:
Checking HTTP headers for encryption before attempting to decrypt
Re-encrypting payload before passing message to AS2 disassembler component (if required by BizTalk party configuration)
Adding support for compression
If you'd like a copy of the source code drop me a message and I'll see about upping it to an online repo.
I had ticket opened with Microsoft BizTalk tech support on the issue. Their response is that
The quoted-printable encoding is not supported by MS BizTalk Server 2013R2" and most likely is not supported by MS BizTalk Server 2020

Generating Oauth authorization token using base64 encoding

I am trying to follow the guide to generate Oauth authentication tokens for YAHOO DSP API.
Base64 encoding is a way of encoding binary data into text so that it can be easily transmitted across a network without error.
In this step, you will take the client ID and client secret that the YDN console generated for you and encode them using the base64 protocol. You can use an online encoding service like base64encode.org.
No matter which service you use, ensure that no spaces are appended to the CLIENT_ID and CLIENT_SECRET keys and separate the CLIENT_ID and CLIENT_SECRET with a colon, i.e. CLIENT_ID:CLIENT_SECRET.
The generated value will now be referenced as ENCODED(CLIENT_ID:CLIENT_SECRET) in this guide.
An example is given:
CLIENT_ID = dj0yJmk9N2pIazlsZk1iTzIxJmQ9WVdrOWVEUmpVMFpWTXpRbWNHbzlNQS0tJnM9Y29uc3VtZXJzZWNyZXQmeD00NA–
CLIENT_SECRET= a7e13ea3740b933496d88755ff341bfb824805a6
AUTHORIZATION = ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5BLS06YTdlMTNlYTM3NDBiOTMzNDk2ZDg4NzU1ZmYzNDFiZmI4MjQ4MDVhNg==
Using the recommended website I get the wrong AUTHORIZATION.
I have tried both encoding the whole thing at once ie. encode(CLIENT_ID:CLIENT_SECRET), and each element individually encode(CLIENT_ID):encode(CLIENT_SECRET).
Attempt encoding whole thing:
ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5B4oCTOiBhN2UxM2VhMzc0MGI5MzM0OTZkODg3NTVmZjM0MWJmYjgyNDgwNWE2
Attempt encoding each element:
ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5B4oCT:YTdlMTNlYTM3NDBiOTMzNDk2ZDg4NzU1ZmYzNDFiZmI4MjQ4MDVhNg==
Expected result:
ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5BLS06YTdlMTNlYTM3NDBiOTMzNDk2ZDg4NzU1ZmYzNDFiZmI4MjQ4MDVhNg==
The difference between 'each element' and the expected result is only a few characters corresponding to the end of client_ID and the colon.
B4oCT: should be BLS06.
Links to full documentation:
https://developer.yahoo.com/dsp/api/docs/authentication/tokens.html
https://developer.yahoo.com/dsp/api/docs/traffic/info/sandbox.html
Update:
The final character of Client_ID is '–' . This is some sort of non-standard character that is interpreted as two dashes i.e.'--' in utf-8 and windows 1258.
One different, TO NOTE is, that when you decrypt the expected output you will get your client id as
dj0yJmk9N2pIazlsZk1iTzIxJmQ9WVdrOWVEUmpVMFpWTXpRbWNHbzlNQS0tJnM9Y29uc3VtZXJzZWNyZXQmeD00NA--
instead of
dj0yJmk9N2pIazlsZk1iTzIxJmQ9WVdrOWVEUmpVMFpWTXpRbWNHbzlNQS0tJnM9Y29uc3VtZXJzZWNyZXQmeD00NA–
NOTE, there are two "-" at the end.
OAuth client auth token is always generated using Base64 encoding with following format
Base64_Encoding(CLIENT_ID:CLIENT_SECRET)
Most of the usage perform this Base64 encoding with encoding type as "UTF-8".
It looks like, Yahoo requires this token with different encoding. On "https://www.base64encode.org/" if you try to encode your "CLIENT_ID:CLIENT_SECRET" with "Windows-1254" as destination charset, you will receive the expected result. So, it looks like both encoding and decoding here is done keeping "Windows-1254" charset in place.

Twitter stream API - Erlang client

I'm very new in Erlang world and I'm trying to write a client for the Twitter Stream API. I'm using httpc:request to make a POST request and I constantly get 401 error, I'm obviously doing something wrong with how I'm sending the request... What I have looks like this:
fetch_data() ->
Method = post,
URL = "https://stream.twitter.com/1.1/statuses/filter.json",
Headers = "Authorization: OAuth oauth_consumer_key=\"XXX\", oauth_nonce=\"XXX\", oauth_signature=\"XXX%3D\", oauth_signature_method=\"HMAC-SHA1\", oauth_timestamp=\"XXX\", oauth_token=\"XXX-XXXXX\", oauth_version=\"1.0\"",
ContentType = "application/json",
Body = "{\"track\":\"keyword\"}",
HTTPOptions = [],
Options = [],
R = httpc:request(Method, {URL, Headers, ContentType, Body}, HTTPOptions, Options),
R.
At this point I'm confident there's no issue with the signature as the same signature works just fine when trying to access the API with curl. I'm guessing there's some issue with how I'm making the request.
The response I'm getting with the request made the way demonstrated above is:
{ok,{{"HTTP/1.1",401,"Unauthorized"},
[{"cache-control","must-revalidate,no-cache,no-store"},
{"connection","close"},
{"www-authenticate","Basic realm=\"Firehose\""},
{"content-length","1243"},
{"content-type","text/html"}],
"<html>\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"/>\n<title>Error 401 Unauthorized</title>\n</head>\n<body>\n<h2>HTTP ERROR: 401</h2>\n<p>Problem accessing '/1.1/statuses/filter.json'. Reason:\n<pre> Unauthorized</pre>\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n</body>\n</html>\n"}}
When trying with curl I'm using this:
curl --request 'POST' 'https://stream.twitter.com/1.1/statuses/filter.json' --data 'track=keyword' --header 'Authorization: OAuth oauth_consumer_key="XXX", oauth_nonce="XXX", oauth_signature="XXX%3D", oauth_signature_method="HMAC-SHA1", oauth_timestamp="XXX", oauth_token="XXX-XXXX", oauth_version="1.0"' --verbose
and I'm getting the events just fine.
Any help on this would be greatly appreciated, new with Erlang and I've been pulling my hair out on this one for quite a while.
There are several issues with your code:
In Erlang you are encoding parameters as a JSON body while with curl, you are encoding them as form data (application/x-www-form-urlencoded). Twitter API expects the latter. In fact, you get a 401 because the OAuth signature does not match, as you included the track=keyword parameter in the computation while Twitter's server computes it without the JSON body, as it should per OAuth RFC.
You are using httpc with default options. This will not work with the streaming API as the stream never ends. You need to process results as they arrive. For this, you need to pass {sync, false} option to httpc. See also stream and receiver options.
Eventually, while httpc can work initially to access Twitter streaming API, it brings little value to the code you need to develop around it to stream from Twitter API. Depending on your needs you might want to replace it a simple client directly built on ssl, especially considering it can decode HTTP packets (what is left for you is the HTTP chunk encoding).
For example, if your keywords are rare, you might get a timeout from httpc. Besides, it might be easier to update the list of keywords or your code with no downtime without httpc.
A streaming client directly based on ssl could be implemented as a gen_server (or a simple process, if you do not follow OTP principles) or even better a gen_fsm to implement reconnection strategies. You could proceed as follows:
Connect using ssl:connect/3,4 specifying that you want the socket to decode the HTTP packets with {packet, http_bin} and you want the socket to be configured in passive mode {active, false}.
Send the HTTP request packet (preferably as an iolist, with binaries) with ssl:send/2,3. It shall spread on several lines separated with CRLF (\r\n), with first the query line (GET /1.1/statuses/filter.json?... HTTP/1.1) and then the headers including the OAuth headers. Make sure you include Host: stream.twitter.com as well. End with an empty line.
Receive the HTTP response. You can implement this with a loop (since the socket is in passive mode), calling ssl:recv/2,3 until you get http_eoh (end of headers). Note down whether the server will send you data chunked or not by looking at the Transfer-Encoding response header.
Configure the socket in active mode with ssl:setopts/2 and specify you want packets as raw and data in binary format. In fact, if data is chunked, you could continue to use the socket in passive mode. You could also get data line by line or get data as strings. This is a matter of taste: raw is the safest bet, line by line requires that you check the buffer size to prevent truncation of a long JSON-encoded tweet.
Receive data from Twitter as messages sent to your process, either with receive (simple process) or in handle_info handler (if you implemented this with a gen_server). If data is chunked, you shall first receive the chunk size, then the tweets and the end of the chunk eventually (cf RFC 2616). Be prepared to have tweets that spread on several chunks (i.e. maintain some kind of buffer). The best here is to do the minimum decoding in this process and send tweets to another process, possibly in binary format.
You should also handle errors and socket being closed by Twitter. Make sure you follow Twitter's guidelines for reconnection.

Resources