IMAP - rule for differentiating between inline and regular attachments - imap

I am working on an email client, and I wonder what is the correct algorithm for deciding whether an attachment is a regular attachment (a downloadable file like pdf, video, audio, etc...) or an inline attachment (which is just an embedded part of an HTML letter).
Until recently, I've checked whether body type (assuming the message part is not multipart, otherwise I would recursively parse ir further) is not TEXT. That is, whether it's APPLICATION, IMAGE, AUDIO or VIDEO. If that's the case, I looked at whether the nineth element is equal to ATTACHMENT or INLINE. I thought that if it's INLINE, then it is an embedded HTML particle, rather than a regular attachment.
However, recently I have across an email that contained some HTML message body and regular attachments. The problem is that its body structure looked like this:
1. mutlipart/mixed
1.1. mutlipart/alternative
1.1.1. text/plain
1.1.2. multipart/relative
1.1.2.1. text/html
1.1.2.2. Inline jpeg
1.1.2.3. Inline jpeg
1.2. pdf inline (why 'inline'? Should be 'attachment')
1.3. pdf inline (why 'inline'? Should be 'attachment')
The question is, why downloadable pdf files are of type INLINE? And what is the appropriate algorithm for determining whether a file is embedded html particle or a downloadable file? Should I look at the parent subtype to see whether it's relative or not and disregard inline vs attachment parameters?

There really is no defined one-size-fits-all algorithm. inline or attachment is something the sender sets, and is a hint on whether they want it to be displayed inline (automatically rendered), as an attachment (displayed in a list), or neither (no preference).
There is also what is sometimes called "embedded" attachments, which are attachments with a Content-ID (this is in the body structure response) and is referenced by a cid: reference in an <img> tag or the like.
So, this pretty much has to be done heuristically.
It really depends on your needs and your clients capabilities, but here is a list of heuristics you may consider using in some combination (some of these are mutually exclusive):
If it is marked 'attachment', treat it as an attachment.
If it is marked inline, and it is something you can treat as inline (image/*, maybe text/* if you like), then it is inline.
If it has a Content-ID, treat it inline.
If it has a Content-ID, and the HTML section references it, treat it as embedded (that is, the HTML viewer will render it); If it was not referenced, treat it as inline (or attachment) as your requirements dictate.
If it is neither, and it is something you want to treat as inline, then treat it as inline.
If nothing applies, treat it as an attachment.
Ignore the disposition, and treat it as inline if you wish (such as making all images always inline)
Also, the original version of inline only meant the sender wanted it automatically rendered; this is often conflated with referenced by the HTML section (which I've called embedded). These are not quite the same.

Related

When do I need to encode with multiple codecs in Grails?

I'm not clear of when (or if) I should use multiple Grails encodeAsXXX calls.
This reference says you need to encodeAsURL and then encodeAsJavaScript: http://grailsrocks.com/blog/2013/4/19/can-i-pwn-your-grails-application
It also says you need to encodeAsURL and then encodeAsHTML, I don't understand why this is necessary in the case shown but not all the time?
Are there other cases I should me using multiple chained encoders?
If I'm rendering a URL to a HTML attribute should I encodeAsURL then encodeAsHTML?
If I'm rendering a URL to a JavaScript variable sent as part of a HTML document (via a SCRIPT element) should I encodeAsURL, encodeAsJavaScript then encodeAsHTML?
If I'm rendering a string to a JavaScript variable sent as part of a HTML document should I encodeAsJavaScript then encodeAsHTML?
The official docs - https://docs.grails.org/latest/guide/security.html - don't show any examples of multiple chained encoders.
I can't see how I can understand what to do here except by finding the source for all the encoders and looking at what they encode and what's valid on the receiving end - but I figure it shouldn't be that hard for a developer and there is probably something simple I'm missing or some instructions I haven't found.
FWIW, I think the encoders I'm talking about are these ones:
https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/util/JavaScriptUtils.html#javaScriptEscape-java.lang.String-
https://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html#encode(java.lang.String,%20java.lang.String)
https://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/web/util/HtmlUtils.html#htmlEscape-java.lang.String-
.
It is certainly important to always consider XSS but in reading your question I think you are overestimating what you need to do. As long as you're using Grails 2.3 or higher and your grails.views.default.codec is set to html which it will be by default, everything rendered in your GSP with ${} will be escaped properly for you.
It is only when you are intentionally bypassing the escaping, such as if you need to get sanitized user input back into valid JavaScript within your GSP for some reason, that you would need to use the encodeAsXXX methods or similar.
I would argue (and the article makes a mention of this as well) that this should raise a smell anyway, as you probably should have that JavaScript encapsulated in a different file or TagLib where the escaping is handled.
Bottom line, use the encoding methods only if you are overriding the default HTML encoding, otherwise ${} handles it for you.

Replace URL in Text Body with an Image Tag for that URL

As the title suggests I would like to find a way to replace URLs within a body text (for a blog) with image tags for those URLs. I suspect I will need to do some form of regex. Has anyone done something like this before?
EDIT:
To describe the use case a bit more, I have a blog-esque site I am building. I would like blog writers to be able to 'drop' urls into text posts (separated by newlines), and have rails intelligently parse the string and replace any urls with images (perhaps in a helper method).
The sanest approach is to use something like Markdown (or exactly like it) and ensure that your posts are marked up correctly. This seems to be the most up-to-date gem for Markdown, https://github.com/vmg/redcarpet.
Alternatively, if you want to do this by yourself, it would still be prudent to mark up a link in some way. For example, {image src=link_to_the_image_here}.
This will make finding images within the body of text easier.

How to get http tag text by id using lua

There is a webpage parser, which takes a page contains several tags, in a certain structure, where divs are badly nested. I need to extract a certain div element, and copy it and all its content to a new html file.
Since I am new to lua, I may need basic clarification for things might seem simple.
Thanks,
The ease of extraction of data is going to largely depend on the page itself. If the page uses the exact same tag information throughout its entirety, it'll be much more difficult to extract than it would if it has named tags.
If you're able to find a version of the page that returns json format, then you're that much better off. Here's a snippet of code on something I wrote to grab definitions from a webpage that did not have json format:
local actualword, definition = string.match(wayup,"<html.-<td class='word'>%c(.-)%c</td>.-<div class=\"definition\">(.-)</div>")
Essentially, this code searched down the page until it found the class "word", and took the word after it (%c is the pattern for control characters). It continued on to "definition" and captured that, as well.
As you can see, it's a bit convoluted, but I had the luck of having specifically named tags for what I wanted.
This is edited to fit your comment. As a side note that I should have mentioned before, if you're familiar with regular expressions, you can use its model to capture what you need. In this case, it's capturing the string in its totality:
local data = string.match(page, "(<div id=\"aa\"><div>.-</div>.-</div>)")
It's rarely the fault of the language, but rather the webpage itself, that makes it hard to data mine anything. Since webpages could literally have hundreds of lines of code, it's hard to pinpoint exactly what you want without coming across garbage information. It's why I prefer a simplified result such as json, since Lua has a json module that can encode/decode and you can get your precise information.

How to keep attributes with parseFragment in Firefox extension

In Firefox extension we use parseFragment (documentation) to parse a string of HTML (received from 3rd party server) into a sanitized DocumentFragment as it required by Mozilla. The only problem, the parser removes all attributes we need, for example, class attribute.
Is it possible somehow to keep class attributes while parsing HTML with parseFragment?
P.S. I know that in Gecko 14.0 they replaced this function with another which supports sanitizing parameters. But what to do with Gecko < 14.0?
No, the whitelist is hardcoded and cannot be adjusted. However, the class attribute is in the whitelist and should be kept, you probably meant the style attribute? If you need a customized behavior you will have to use a different solution (like DOMParser which can parse HTML documents in Firefox 12).
As to older Firefox versions, you can parse XHTML data with DOMParser there. If you really have HTML then I am only aware of one way to parse it without immediately inserting it into a document (which might cause various security issues): range.createContextualFragment(). You need an HTML document for that, if you don't have one - a hidden <iframe> loading about:blank will do as well. Here is how it works:
// Get the HTML document
var doc = document.getElementById("dummyFrame").contentDocument;
// Parse data
var fragment = doc.createRange().createContextualFragment(htmlData);
// Sanitize it
sanitizeData(fragment);
Here sanitizing the data is your own responsibility. You probably want to base your sanitization on Mozilla's whitelist that I linked to above - remove all tags and attributes that are not on that list, also make sure to check the links. The style attribute is a special case: it used to be insecure but IMHO no longer is given than -moz-binding isn't supported on the web any more.

setting innerHTML in xul

I have in my browser.xul code,what I am tyring to is to fetch data from an html file and to insert it into my div element.
I am trying to use div.innerHTML but I am getting an exception:
Component returned failure code: 0x804e03f7
[nsIDOMNSHTMLElement.innerHTML]
I tried to parse the HTML using Components.interfaces.nsIScriptableUnescapeHTML and to append the parsed html into my div but my problem is that style(attribute and tag) and script isn`t parsed.
First a warning: if your HTML data comes from the web then you are trying to build a security hole into your extension. HTML code from the web should never be trusted (even when coming from your own web server and via HTTPS) and you should really use nsIScriptableUnescapeHTML. Styles should be part of your extension, using styles from the web isn't safe. For more information: https://developer.mozilla.org/En/Displaying_web_content_in_an_extension_without_security_issues
As to your problem, this error code is NS_ERROR_HTMLPARSER_STOPPARSING which seems to mean a parsing error. I guess that you are trying to feed it regular HTML code rather than XHTML (which would be XML-compliant). Either way, a better way to parse XHTML code would be DOMParser, this gives you a document that you can then insert into the right place.
If the point is really to parse HTML code (not XHTML) then you have two options. One is using an <iframe> element and displaying your data there. You can generate a data: URL from your HTML data:
frame.src = "data:text/html;charset=utf-8," + encodeURIComponent(htmlData);
If you don't want to display the data in a frame you will still need a frame (can be hidden) that has an HTML document loaded (can be about:blank). You then use Range.createContextualFragment() to parse your HTML string:
var range = frame.contentDocument.createRange();
range.selectNode(frame.contentDocument.documentElement);
var fragment = range.createContextualFragment(htmlData);
XML documents don't have innerHTML, and nsIScriptableUnescapeHTML is one way to get the html parsed but it's designed for uses where the HTML might not be safe; as you've found out it throws away the script nodes (and a few other things).
There are a couple of alternatives, however. You can use the responseXML property, although this may be suboptimal unless you're receiving XHTML content.
You could also use an iframe. It may seem old-fashioned, but an iframe's job is to take a url (the src property) and render the content it receives, which necessarily means parsing it and building a DOM. In general, when an extension running as chrome does this, it will have to take care not to give the remote content the same chrome privilages. Luckily that's easily managed; just put type="content" on the iframe. However, since you're looking to import the DOM into your XUL document wholesale, you must have already ensured that this remote content will always be safe. You're evidently using an HTTPS connection, and you've taken extra care to verify the identity of the server by making sure it sends the right certificate. You've also verified that the server hasn't been hacked and isn't delivering malicious content.

Resources