Undo email wordwrap line breaks in Ruby - ruby-on-rails

My Rails app processes incoming emails by splitting them into multiple lines. This is what I currently use on the plain text version of the body: lines = email.body.split("\n")
This works well unless the sentences are longer than ~74 characters as most email clients will automatically add a line break per RFC 2822.
Example email: https://gist.github.com/marckohlbrugge/39c17b928eb17d330d63
Looking at the plain text part there seems to be no way to discern between a line break added by the user versus the email client. You could ignore any line break happening at the 75th position, but I think there might be a chance of false positives. (I could be wrong.)
The HTML part has all the information we need, but I'm not sure about a universal way to process this. Is replacing every div and br with a newline and then stripping al other HTML elements enough? What about all the other block-element tags? What about inline elements styled as block-elements? What if an email doesn't have an HTML part?
I did find some interesting code examples in Convert HTML to plain text (with inclusion of s), but replacing a list of html tags with newlines doesn't seem like a complete (exhaustive) solution.

Is it worth looking at something like this mail library as they've probably already thought about the edge cases? ;)

Related

In Reflected XSS, why do we need to sanitize single quote, double quote, ampersand, and backslash

Based on this article
https://resources.infosecinstitute.com/topic/how-to-prevent-cross-site-scripting-attacks/
Reflected XXS happens when data injected is reflected in the response. I get the idea that if I, for example, have a search box in my page and the search term inputted by a user is displayed in the page, someone could write as a search term:
<script>alert('x');</script>
and that would be read as regular HTML element in the page that displays the response.
But lets say greater than and less than are already blocked in input (meaning they wouldn't be able to put in script tags or any tag), what's the issue if I allow single quote, double quote, ampersand, and backslash reflected in the response. I'm trying to make sense of it but I am not sure if I am understanding correctly.
Today the web stack is big and complex with many languages. We have HTML, CSS, JavaScript, VB-Script, SVG, URLs…
Each with its own rules for:
Encoding
Quoting
Commenting
Escaping
Also, each one can be nested inside each other:
And just replacing <> fixes some issues, but not all of them as you don't know where you data will end up, is it in HTML? as a HTML Attribute? inside a JavaScript string? Each one needs different encoding to become safe.
So, the world is a bit more complicated.....

Mandrill Adding a Space to a URL

Using Mandrill I'm sending an email that has a link:
<a href="http://www.slotted.co/NzIyNnx0c2NvdHRAc2xvdHRlZC5jbw==">
http://www.slotted.co/NzIyNnx0c2NvdHRAc2xvdHRlZC5jbw==
</a>
As expected Mandrill replaces my HREF with a tracking link:
http://mandrillapp.com/track/click/30319089/www.slotted.co?p=eyJzIjoiT1h4VE04RlV2bWp5R2YzNjZkNnNWaFpOemJ3IiwidiI6MSwicCI6IntcInVcIjozMDMxOTA4OSxcInZcIjoxLFwidXJsXCI6XCJodHRwOlxcXC9cXFwvd3d3LnNsb3R0ZWQuY29cXFwvTnpJeU5ueDBjMiBOdmRIUkFjMnh2ZEhSbFpDNWpidz09XCIsXCJpZFwiOlwiM2NmMWE4MzUzNGE1NDg4ZTg1OTUwMDkxZmFhY2M5NTNcIixcInVybF9pZHNcIjpbXCI3YWM1ODFiMTJkY2E0YWM4YzZlMmM3ZDU2OWU2YzQ5MmMxNDIxMDJmXCJdfSJ9
This link redirects to:
http://www.slotted.co/NzIyNnx0c2%20NvdHRAc2xvdHRlZC5jbw==
Notice the extra %20 in the middle of the path which obviously breaks the link. You can try it yourself.
Seems like a bug, but I'm still on the free plan, so no way to report it. Any suggestions?
See this answer:
We typically see this kind of issue with SMTP libraries or frameworks
that generate HTML with no true line breaks. The SMTP specs state that
the line length for email shouldn't exceed 1000 characters. When that
limit is reached, a line break gets inserted automatically when the
message data is being transmitted over SMTP. This unfortunately often
happens right in the middle of a word or a URL, for example. You'll
want to take a look at your SMTP library to see if you can modify how
line breaks are being handled.
If you're using HTML line breaks like <br> that are being used to
indicate a break, those unfortunately won't help in this case. Adding
your own line breaks (not HTML line breaks, but actual line breaks in
the data such as a newline or end of line - usually \r\n - will help
ensure that the forced line breaks aren't arbitrarily added in the
SMTP conversation in inconvenient places.

How to replace Mandrill's *| |* symbols?

Is there any chance to replace the mandrill's *| |* symbols?
The CMS i'm using (MODX) has its own symbols to enclose the tags, eg: [[+ ]]
The case is that I also have "read on web" link, where the page on the web needs to generate dynamic content as well.
I have googled and searched on http://help.mandrill.com but still no luck.
Any hint will be appreciated.
You wouldn't be able to use different symbols in your emails - those are how Mandrill's system recognizes merge tags and to replace them in the HTML and/or text of your email. You'd need to convert any placeholders you have or want for the email to that format, so you can pass the data to Mandrill as expected. If it's going to mirror what you're putting on the web, then you probably just want to have something that transforms strings, for example, to convert your CMS tags to Mandrill tags specifically for the emails.
#kaitlin-mandrill,
Exactly,
I just figured it out.
I need to replace it right before it is sent.
More or less, this is the code.
Hopefully it's useful for anyone else.

How do I set a maximum line length for Rails Slim HTML email templates?

I'm using Slim as the templating language for my HTML email. When pretty mode is turned off in production, it puts all the HTML on one line. When the emails go through Sendgrid, a line break is introduced at the 998th character, breaking the HTML. Sendgrid does this to comply with the email RFC.
How can I turn pretty mode off while rendering the email, tell Slim to respect the maximum line length, or introduce a hard line break?
Adding a few of these
= "\r\n"
throughout the email template solved the problem.
Just add data-force-encoding="✓" attribute to the body tag. That will make Rails to send email as quoted printable (trick is to use UTF8 char in fact). See: https://github.com/slim-template/slim/issues/123

regular expression for emails NOT ending with replace script

I'm currently modifying my regex for this:
Extracting email addresses in an html block in ruby/rails
basically, im making another obfuscator that uses ROT13 by parsing a block of text for all links that contain a mailto referrer(using hpricot). One use case this doesn't catch is that if the user just typed in an email address(without turning it into a link via tinymce)
So here's the basic flow of my method:
1. parse a block of text for all tags with href="mailto:..."
2. replace each tag with a javascript function that changes this into ROT13 (using this script: http://unixmonkey.net/?p=20)
3. once all links are obfuscated, pass the resulting block of text into another function that parses for all emails(this one has an email regex that reverses the email address and then adds a span to that email - to reverse it back)
step 3 is supposed to clean the block of text for remaining emails that AREN'T in a href tags(meaning it wasn't parsed by hpricot). Problem with this is that the emails that were converted to ROT13 are still found by my regex. What i want to catch are just emails that WEREN'T CONVERTED to ROT13.
How do i do this? well all emails the WERE CONVERTED have a trailing "'.replace" in them. meaning, i need to get all emails WITHOUT that string. so far i have this regex:
/\b([A-Z0-9._%+-]+#[A-Z0-9.-]+.[A-Z]{2,4}('.replace))\b/i
but this gets all the emails with the trailing '.replace i want to get the opposite and I'm currently stumped with this. any help from regex gurus out there?
MORE INFO:
Here's the regex + the block of text im parsing:
http://www.rubular.com/r/NqXIHrNqjI
as you can see, the first two 'email addresses' are already obfuscated using ROT13. I need a regex that gets the emails ohhellzyeah#ribute.com and kaboom#yahoo.com
On negative lookaheads
You can use a negative lookahead to assert that a pattern doesn't match.
For example, the following regex matches all strings that doesn't end with ".replace" string:
^(?!.*\.replace$).*$
As another example, this regex matches all a*b*, except aabb:
^(?!aabb$)a*b*$
Ideally,
See also
regular-expressions.info/Lookaheads and anchors
Flavor comparison - unfortunately, Ruby doesn't support lookbehinds
Specific solution
The following regex works in this scenario: (see on rubular.com):
/\b([A-Z0-9._%+-]+#(?![A-Z0-9.-]*'\.replace\b)[A-Z0-9.-]+\.[A-Z]{2,4})\b/i

Resources