RegEx to find in-line images in a plain text email message - parsing

Certain mail clients allow for the sender to place images directly in the body of their email (instead of as a traditional attachment). When I receive one of these emails in my application, I need to be able to look at only the text/plain message body and determine that the sender embedded an inline image.
I'm trying to craft a RegEx to find image placeholders in the text/plain message body so I can swap them for <img> tags in my own HTML-enabled version of the message. (Wacky, I know, but this is the requirement).
The problem I'm finding is that the placeholders differ based on the sending mail client. For example, when sent from MS Outlook, the text/plain body of the multi-part message looks like this:
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Check out this image:
[cid:image001.jpg#01CB50D4.769583B0]
Isn't it cool??
A similar message sent from Gmail is a little bit different:
Content-Type: text/plain; charset=ISO-8859-1
Check out this image:
[image: image001.jpg]
Isn't it cool??
The text/html body and image/jpeg part with the base64 encoded image follow.
Has anyone done any research on this before and compiled a list or built a RegEx specifically for this purpose?
I realize a more reliable way to achieve my goal is to look at the text/html portion of the message--which seems to be a bit more standardized from the few tests I've done--but unfortunately I don't have access to that in this scenario.
I'm using C#, if that matters to anyone.
Here's a list of text/plain image placeholders I've compiled thus far:
Gmail: [image: filename.jpg]
Outlook 2007: [cid:filename.jpg#01CB50D4.769583B0]
Thunderbird 3.0.7: none

I'd suggest to go with html part. If you want to find just a placeholder in plain text part, this very simple regular expression should be sufficient (PCRE):
^\[.*\]$
At least this is what works for examples above. If you'd like to identify image name, a bit complicated expression would be required. Mind that, this will catch all lines starting with [ and ending with ] no matter what the contents are. If you'd like to limit regexp to some file types, try this:
^\[.*(\.jpg|\.jpeg|\.png|\.gif|\.bmp).*\]$i
Examples will work in Perl, since you didn't mention language...

Related

Burp reporting XSS vulnerability in unescaped HTML in JSON response

I have a Rails/Ember one-page app. Burp reports that
The value of the 'content_type' JSON parameter is copied into the HTML
document as plain text between tags. The payload
da80balert(1)4f31e was submitted in the content_type
JSON parameter. This input was echoed unmodified in the application's
response.
I can't quite parse this message referring to "is copied into" and "was submitted" in, but basically what is happening is:
A PUT or POST from the client contains ...<script>...</script>... in some field.
The server handles this request, and sends back the created object in JSON format, which includes the string in question
The client then displays that string, using the standard Embers/Handlebars {{content_type}}, which HTML-escapes the string and inserts it into the DOM, so the browser displays it on the screen as originally entered (and of course does NOT execute it).
So yes, the input was indeed echoed unmodified in the application's response. However, the application's response was not HTML, in which case there would indeed be a problem, but JSON, containing strings which when referred to by Handlebars will always be escaped properly for proper display in the browser.
So my question is, is this in fact a vulnerability? I have taken great care with my Ember app and can prove that no data from JSON objects is ever inserted "raw" into the DOM. Or is this a false positive given rise to by the mere fact the unescaped string may be found in the response if looked for using an unintelligent string comparison, not taking into account the fact that the JSON will be processed/escaped by the client-side framework?
To put it a different way, in a classic webapp spitting out HTML from the server, we know that user input such as the above must be escaped/sanitized properly. Unsanitized data "on the wire" in and of itself represents a vulnerability. However, in a one-page app based on JSON coming back from the server, the escaping/sanitization occurs in the client; the JSON on the "wire" may contain unsanitized data, and this is as expected. Am I missing something here?
There are subtle ways in which you can trick IE9 and older into treating JSON as HTML. So even if the server's response has a Content-Type header of application/json, IE will second guess it. This is called content type sniffing, and can be disabled by adding the X-Content-Type-Options: nosniff header.
JSON is not an executable format so your understanding is correct.
I did a demo of this exact problem in my talk on securing single page web apps at OWASP AppSec EU 2013 which someone put up on youtube here: http://m.youtube.com/watch?v=Femsrx0m9bU

Can I use UUID as my image file name for uniquness?

I am naming the image file in my rails app with UUID, e.g.67bc91b6-fdb3-11e0-bbff-f04da2baa2d4.png
However, I was not able to display this image correctly in my page, any clue on why this happens?
The name of a file does not matter, and an UUID (or a hash of the content) is certainly a good way to generate a unique filename. Make sure you're sending the appropriate headers. For example, if you access your URL with curl -I, you should see the correct content-type:
$ curl -I http://localhost/APP/image/67bc91b6-fdb3-11e0-bbff-f04da2baa2d4.png
HTTP/1.1 200 OK
Cache-Control: public, max-age=4
Content-Type: image/png
Content-Length: 198542
If the content-type is not set or set to the wrong format, or Content-Length is not equal to the filesize, make it so.
I've done this in other languages (C#) and had no issues; there is nothing on the web that restricts use of a UUID as an image file name. Something else must be afoot here.
If you would like shorter filenames, that is always a good thing, you can use Base36 encoding.
http://en.wikipedia.org/wiki/Base_36
Since it is case insensitive, it works well for all operative systems, and file names would be a lot shorter than simple UUID to Hex.

Changing charset when retrieving messages from mail server!

i'm currently creating a little mail client and facing a problem with charset.
I use indy's TIdIMAP4 component to retrieve data from mail-server. When i try to retrieve mail bodies then accent letters like ä, ü etc are converted to =E4, =FC respectively as it is using charset ISO-8859-1.
Content-Type: text/plain;
charset="ISO-8859-1"
Content-Transfer-Encoding:
quoted-printable
How can i make server to send me data in another charset, like utf-8? What would be the best solution for that problem?
Thanks in advance!
It is not the charset that is producing strings like =E4 and =FC, it is the Content-Transfer-Encoding instead. $E4 and $FC are the binary representations of ä and ü in ISO-8859-1, but they are 8-bit values. Email is still largely a 7-bit environment. Unless both clients and servers negotiate 8-bit transfers during their communications, then byte octets above $7F have to be encoded in a 7-bit compatible manner to pass through email gateways safely, especially legacy ones that still exist. quoted-printable is a commonly used 7-bit byte encoding in email for textual content. base64 is another one, but it is not human-readible so it tends to be used for binary data instead of textual data (though it can be used for text).
In any case, you cannot make the server deliver the email data to you in another encoding. The server is merely delivering the original email data as-is that was originally delivered to it by the sender. If you want the data in UTF-8, then you have to re-encode it yourself after downloading it. Indy will handle the decoding for you.

Translate binary characters to a human readable string?

So let's say we have a string that is like this:
‰û]M§Äq¸ºþe Ø·¦ŸßÛµÖ˜eÆÈym™ÎB+KºªXv©+Å+óS—¶ê'å‚4ŒBFJF󒉚Ү}Fó†ŽxöÒ&‹¢ T†^¤( OêIº ò|<)ð
How do I turn it into a human readable string of chars, cuz like it was a wierd output of HTML from a webserver that is text I think cuz half the web page loaded correctly. Do I need to read it with like C or Python or something. That's only a snippet of the string.
If that is in fact supposed to be a human-readable string, you'll need to figure out what character encoding it uses and translate. It's also possible that the string is compressed, encrypted, or represents binary data. It would be helpful to know where you got your string from.
I'm guessing your web server isn't sending the correct mime-type. I'd suggest taking a look at the http headers using Firefox's Live Headers plugin. If a web server decides to send you a pdf, but doesn't set the mime-type, you'll just see garbage on your screen. Alternatively, save the page to a file, and then run these commands from Cygwin or a unix shell:
file mypage.htm
strings mypage.htm
The first will tell you if the header bytes follow any recognizable pattern. The second will strip out and display all the human readable text.

Load testing multipart form

I'm trying to load-test a Rails application using JMeter. A critical part of the application involves a form that includes both text inputs and file uploads. It works fine in a browser, but when I try to post that page in JMeter, Rails is saving all of the parts of the multipart form as temp files, which causes things to break when it's looking for a string and gets a tempfile instead.
It appears that the difference is that, from a browser, the piece of the multipart request that contains a text input looks like this:
-----------------------------7d93b4186074c
Content-Disposition: form-data; name="field_name"
test
-----------------------------7d93b4186074c
while from JMeter it looks like this:
-----------------------------7d159c1302d0y0
Content-Disposition: form-data; name="field_name"
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
test
-----------------------------7d159c1302d0y0
So apparently Rails sees the former and interprets it as a plain text value and treats it as a string, but sees the latter and saves it to a temp file.
I have not been able to find a setting to convince JMeter not to send the additional headers in the multipart form for non-file fields.
Is there a way to convince Rails to ignore those headers and treat the text/plain text as strings instead of text files? Or a quick way to put a filter in front of my controller that will strip the extra headers?
Alternately, is there a better tool to load-test a Rails application that includes file upload?
Turns out these days you can just tick "use browser compatible headers" in JMeter. Could've saved myself a hell of a lot of time there :-)
So, I have customized JMeter's multipart request posting part in the source code to put out the request that rails understand. The change is easy as shown below but to create compiling Java/JMeter environment took time. :(
Anyways, now I can successfully upload a file by multipart post via JMeter.
in src/protocol/http/org/apache/jmeter/protocol/http/sampler/PostWriter.java
writeStartFileMultipart()
//writeln(out, "Content-Transfer-Encoding: binary"); // $NON-NLS-1$
writeFormMultipart()
/*****
writeln(out, "Content-Type: text/plain; charset=" + charSet); // $NON-NLS-1$
writeln(out, "Content-Transfer-Encoding: 8bit"); // $NON-NLS-1$
*****/
P.S.
A tip tip to create the build environment for 2.4 is
to comment out the 3rd party libraries check in build.xml file.
copy lib/xstream-1.3.1.jar from binary archive into lib/ directory
There may be a better way, but I ended up adding a quick filter to turn the text/plain tempfiles into strings within the parameter hash:
def change_text_files_to_strings
params.each_pair do |key, value|
params[key] = value.read if (value.class.to_s=='Tempfile' && value.content_type.start_with?('text/plain') )
end
end
By the way, it turns out that jmeter is correct here, and rails incorrect: according to RFC 2388, each item in a multipart request should have a content type (not just files), so Rails really shouldn't be using the presence of a content-type header to determine whether it's a file. Ah well.
I also used the solution above as ColdFusion was sending similar headers (minus the Content-Transfer-Encoding) with each piece of form data. I wonder if there's a better way.
EDIT: Anyone know if this has been fixed in Rails 3?
What kind of error do you get? Something like
NoMethodError (undefined method `rewind' for "1":String):
There is an issue with Rack that could explain your problem. See https://github.com/rack/rack/issuesearch?state=open&q=rewind#issue/116
We were also having a similar issue, In addition to the above answers we also correlate the X-CSRF-Token of HTTP Header Manager in that request and were
successfully able to upload the required media as many as times we wanted.

Resources