Gemoji breaks Kramdown's HTML - ruby-on-rails

Why does Kramdown's autolinking parser break when running it over a gemojified text field?
For [Test](http://google.com "Test") I'm getting:
Test
instead of the expected output:
Test
Live app: http://runnable.com/VAL1VuMjrGFur2yx/forem-gemoji-kramdown (see the Test post)
application_helper.rb:
def add_emojify_and_kramdown(text)
raw(Kramdown::Document.new(emojify(text)).to_html)
end
[...snip...]
def emojify(text)
h(text).to_str.gsub(/:([a-z0-9\+\-_]+):/) do |match|
if emoji = Emoji.find_by_alias($1)
'![' + $1 + '](' + asset_path("emoji/#{emoji.image_filename}") + ')'
else
match
end
end
end
Some additional info:
raw(Kramdown::Document.new(text).to_html) returns the expected output, but without Gemoji
raw(emojify(text)) doesn't change anything seeing as how text contains no emojis
raw(emojify(Kramdown::Document.new(text).to_html)) returns the expected output, but as raw HTML

The first thing your emojify method does is h(text), which HTML escapes the input, converting
[Test](http://google.com "Test")
into
[Test](http://google.com "Test")
Kramdown then operates on this string, and since it no longer contains quote marks it assumes the whole contents of (...) is the URL, producing:
Test
To get it to work you just need to drop the call to h: text.gsub(.... You’ll likely need to think about how to manage your string safety if this is external data.

Related

Check if String Contains an Emoji in Ruby

In ruby, here is how you can check for a substring in a string:
str = "hello world"
str.include?("lo")
=> true
When I am attempting to save an emoji in a text column in a rails application (the text column within a mysql database is utf8), it comes back with this error:
Incorrect string value: \xF0\x9F\x99\x82
For my situation in a rails application, it suffices to see if an emoji is present in the submitted text. If an emoji is present: raise a validation error. Example:
class MyModel < ApplicationRecord
validate :cannot_contain_emojis
private
def cannot_contain_emojis
if my_column.include?("/\xF0")
errors.add(:my_column, 'Cannot include emojis")
end
end
end
Note: The reason I am checking for \xF0 is because according to this site, it appears that all, or most, emoji's begin with this signature.
This however does not work. It continues to return false even when it is true. I'm pretty sure the issue is that my include statement doesn't work because the emoji is not converted to bytes for the comparison.
Question
How can I make a validation to check that an emoji is not passed in?
Example bytes for a smiley face in UTF8: \xF0\x9F\x99\x82
You can use the Emoji Unicode property to test for Emoji using a Regexp, something like this:
def cannot_contain_emojis
if /\p{Emoji}/ =~ my_column
errors.add(:my_column, 'Cannot include emojis')
end
end
Unicode® Technical Standard #51 "UNICODE EMOJI" contains a more sophisticated regex:
\p{RI} \p{RI}
| \p{Emoji}
( \p{EMod}
| \x{FE0F} \x{20E3}?
| [\x{E0020}-\x{E007E}]+ \x{E007F} )?
(\x{200D} \p{Emoji}
( \p{EMod}
| \x{FE0F} \x{20E3}?
| [\x{E0020}-\x{E007E}]+ \x{E007F} )?
)*
[Note: some of those properties are not implemented in Onigmo / Ruby.]
However, checking for Emojis probably not going to be enough. It is pretty clear that your text processing is somehow broken at some point. And if it is broken by an Emoji, then there is a chance it will also be broken by my name, or the name of Ruby's creator 松本 行弘, or by the completely normal English word “naïve”.
Instead of playing a game of whack-a-mole trying to detect every Emoji, mathematical symbol, Arabic letter, typographically correct punctuation mark, etc., it would be much better simply the fix the text processing.
I found Jörg's solution was only working when passing in the string itself and not a variable. Not sure why that is.
/\p{Emoji}/ =~ "🎃"
=> 0
value = "1f383"
=> "1f383"
/\p{Emoji}/ =~ value
=> 0
/\p{Emoji}/ =~ "hello"
=> nil
Regardless I'd recommend using the unicode-emoji gem, as its approach is comprehensive. Its source code and documentation can be found on GitHub.

How to output JSON in Rails without escaping back slashes

I need to output some JSON for a customer in a somewhat unusual format. My app is written with Rails 5.
Desired JSON:
{
"key": "\/Date(0000000000000)\/"
}
The timestamp value needs to have a \/ at both the start and end of the string. As far as I can tell, this seems to be a format commonly used in .NET services. I'm stuck trying to get the slashes to output correctly.
I reduced the problem to a vanilla Rails 5 application with a single controller action. All the permutations of escapes I can think of have failed so far.
def index
render json: {
a: '\/Date(0000000000000)\/',
b: "\/Date(0000000000000)\/",
c: '\\/Date(0000000000000)\\/',
d: "\\/Date(0000000000000)\\/"
}
end
Which outputs the following:
{
"a": "\\/Date(0000000000000)\\/",
"b": "/Date(0000000000000)/",
"c": "\\/Date(0000000000000)\\/",
"d": "\\/Date(0000000000000)\\/"
}
For the sake of discussion, assume that the format cannot be changed since it is controlled by a third party.
I have uploaded a test app to Github to demonstrate the problem. https://github.com/gregawoods/test_app_ignore_me
After some brainstorming with coworkers (thanks #TheZanke), we came upon a solution that works with the native Rails JSON output.
WARNING: This code overrides some core behavior in ActiveSupport. Use at your own risk, and apply judicious unit testing!
We tracked this down to the JSON encoding in ActiveSupport. All strings eventually are encoded via the ActiveSupport::JSON.encode. We needed to find a way to short circuit that logic and simply return the unencoded string.
First we extended the EscapedString#to_json method found here.
module EscapedStringExtension
def to_json(*)
if starts_with?('noencode:')
"\"#{self}\"".gsub('noencode:', '')
else
super
end
end
end
module ActiveSupport::JSON::Encoding
class JSONGemEncoder
class EscapedString
prepend EscapedStringExtension
end
end
end
Then in the controller we add a noencode: flag to the json hash. This tells our version of to_json not to do any additional encoding.
def index
render json: {
a: '\/Date(0000000000000)\/',
b: 'noencode:\/Date(0000000000000)\/',
}
end
The rendered output shows that b gives us what we want, while a preserves the standard behavior.
$ curl http://localhost:3000/sales/index.json
{"a":"\\/Date(0000000000000)\\/","b":"\/Date(0000000000000)\/"}
Meditate on this:
Ruby treats forward-slashes the same in double-quoted and single-quoted strings.
"/" # => "/"
'/' # => "/"
In a double-quoted string "\/" means \ is escaping the following character. Because / doesn't have an escaped equivalent it results in a single forward-slash:
"\/" # => "/"
In a single-quoted string in all cases but one it means there's a back-slash followed by the literal value of the character. That single case is when you want to represent a backslash itself:
'\/' # => "\\/"
"\\/" # => "\\/"
'\\/' # => "\\/"
Learning this is one of the most confusing parts about dealing with strings in languages, and this isn't restricted to Ruby, it's something from the early days of programming.
Knowing the above:
require 'json'
puts JSON[{ "key": "\/value\/" }]
puts JSON[{ "key": '/value/' }]
puts JSON[{ "key": '\/value\/' }]
# >> {"key":"/value/"}
# >> {"key":"/value/"}
# >> {"key":"\\/value\\/"}
you should be able to make more sense of what you're seeing in your results and in the JSON output above.
I think the rules for this were originally created for C, so "Escape sequences in C" might help.
Hi I think this is the simplest way
.gsub("/",'//').gsub('\/','')
for input {:key=>"\\/Date(0000000000000)\\/"} (printed)
first gsub will do{"key":"\\//Date(0000000000000)\\//"}
second will get you
{"key":"\/Date(0000000000000)\/"}
as you needed

Performance implications of using :coffescript filter inside HAML templates?

So HAML 4 includes a coffeescript filter, which allows us coffee-loving rails people to do neat things like this:
- word = "Awesome."
:coffeescript
$ ->
alert "No semicolons! #{word}"
My question: For the end user, is this slower than using the equivalent :javascript filter? Does using the coffeescript filter mean the coffeescript will be compiled to javascript on every page load (which would obviously be a performance disaster), or does this only happen once when the application is started?
It depends.
When Haml compiles a filter it checks to see if the filter text contains any interpolation (#{...}). If there isn’t any then it will be the same text to transform on each request, so the conversion is done once at compile time and the result included in the template.
If there is interpolation in the filter text, then the actual text to transform will vary on each request, so the Coffeescript will need to be compiled each time.
Here’s an example. First with no interpolation:
:coffeescript
$ ->
alert "No semicolons! Awesome"
This generates the code (use haml -d to see the generated Ruby code):
_hamlout.buffer << "<script>\n (function() {\n $(function() {\n return alert(\"No semicolons! Awesome\");\n });\n \n }).call(this);\n</script>\n";
This code simply adds a string to the buffer, so no Coffeescript is being recompiled.
Now with interpolation:
- word = "Awesome."
:coffeescript
$ ->
alert "No semicolons! #{word}"
This generates:
word = "Awesome."
_hamlout.buffer << "#{
find_and_preserve(Haml::Filters::Coffee.render_with_options(
"$ ->
alert \"No semicolons! #{word}\"\n", _hamlout.options))
}\n";
Here, since Haml needs to wait to see what the value of the interpolation is, the Coffeescript is recompiled each time.
You can avoid compiling the Coffeescript on each request by not having any interpolation inside your :coffeescript filters.
The :javascript filter behaves similarly, checking to see if there is any interpolation, but since the :javascript filter only outputs some text to the buffer when it runs there is much less of a performance hit using it. You could possibly combine :javascript and :coffeescript filters, putting interpolated data in :javascript and keeping your :coffeescript static:
- word = "Awesome"
:javascript
var message = "No semicolons! #{word}";
:coffeescript
alert message
matt's answer is clear on what is going on. I made a helper to add locals to :coffeescript filters from a hash. This way you don't need to use global JavaScript variables. As a side note: on Linux, the slowdown is really negligible. On Windows however, the impact on performance is quite important (easily more than 100ms per block to compile).
module HamlHelper
def coffee_with_locals locals={}, &block
block_content = capture_haml do
block.call
end
return block_content if locals.blank?
javascript_locals = "\nvar "
javascript_locals << locals.map{ |key, value| j(key.to_s) + ' = ' + value.to_json.gsub('</', '<\/') }.join(",\n ")
javascript_locals << ";\n"
content_node = Nokogiri::HTML::DocumentFragment.parse(block_content)
content_node.search('script').each do |script_tag|
# This will match the '(function() {' at the start of coffeescript's compiled code
split_coffee = script_tag.content.partition(/\(\s*function\s*\(\s*\)\s*\{/)
script_tag.content = split_coffee[0] + split_coffee[1] + javascript_locals + split_coffee[2]
end
content_node.to_s.html_safe
end
end
It allows you to do the following:
= coffee_with_locals "test" => "hello ", :something => ["monde", "mundo", "world"], :signs => {:interogation => "?", :exclamation => "!"} do
:coffeescript
alert(test + something[2] + signs['exclamation'])
Since there is no interpollation, the code is actually compiled as normal.

Regex in Ruby: expression not found

I'm having trouble with a regex in Ruby (on Rails). I'm relatively new to this.
The test string is:
http://www.xyz.com/017010830343?$ProdLarge$
I am trying to remove "$ProdLarge$". In other words, the $ signs and anything between.
My regular expression is:
\$\w+\$
Rubular says my expression is ok. http://rubular.com/r/NDDQxKVraK
But when I run my code, the app says it isn't finding a match. Code below:
some_array.each do |x|
logger.debug "scan #{x.scan('\$\w+\$')}"
logger.debug "String? #{x.instance_of?(String)}"
x.gsub!('\$\w+\$','scl=1')
...
My logger debug line shows a result of "[]". String is confirmed as being true. And the gsub line has no effect.
What do I need to correct?
Use /regex/ instead of 'regex':
> "http://www.xyz.com/017010830343?$ProdLarge$".gsub(/\$\w+\$/, 'scl=1')
=> "http://www.xyz.com/017010830343?scl=1"
Don't use a regex for this task, use a tool designed for it, URI. To remove the query:
require 'uri'
url = URI.parse('http://www.xyz.com/017010830343?$ProdLarge$')
url.query = nil
puts url.to_s
=> http://www.xyz.com/017010830343
To change to a different query use this instead of url.query = nil:
url.query = 'scl=1'
puts url.to_s
=> http://www.xyz.com/017010830343?scl=1
URI will automatically encode values if necessary, saving you the trouble. If you need even more URL management power, look at Addressable::URI.

Truncate Markdown?

I have a Rails site, where the content is written in markdown. I wish to display a snippet of each, with a "Read more.." link.
How do I go about this? Simple truncating the raw text will not work, for example..
>> "This is an [example](http://example.com)"[0..25]
=> "This is an [example](http:"
Ideally I want to allow the author to (optionally) insert a marker to specify what to use as the "snippet", if not it would take 250 words, and append "..." - for example..
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
The marker could be thought of like an EOF marker (which can be ignored when displaying the full document)
I am using maruku for the Markdown processing (RedCloth is very biased towards Textile, BlueCloth is extremely buggy, and I wanted a native-Ruby parser which ruled out peg-markdown and RDiscount)
Alternatively (since the Markdown is translated to HTML anyway) truncating the HTML correctly would be an option - although it would be preferable to not markdown() the entire document, just to get the first few lines.
So, the options I can think of are (in order of preference)..
Add a "truncate" option to the maruku parser, which will only parse the first x words, or till the "excerpt" marker.
Write/find a parser-agnostic Markdown truncate'r
Write/find an intelligent HTML truncating function
Write/find an intelligent HTML truncating function
The following from http://mikeburnscoder.wordpress.com/2006/11/11/truncating-html-in-ruby/, with some modifications will correctly truncate HTML, and easily allow appending a string before the closing tags.
>> puts "<p><b>Something</p>".truncate_html(5, at_end = "...")
=> <p><b>Someth...</b></p>
The modified code:
require 'rexml/parsers/pullparser'
class String
def truncate_html(len = 30, at_end = nil)
p = REXML::Parsers::PullParser.new(self)
tags = []
new_len = len
results = ''
while p.has_next? && new_len > 0
p_e = p.pull
case p_e.event_type
when :start_element
tags.push p_e[0]
results << "<#{tags.last}#{attrs_to_s(p_e[1])}>"
when :end_element
results << "</#{tags.pop}>"
when :text
results << p_e[0][0..new_len]
new_len -= p_e[0].length
else
results << "<!-- #{p_e.inspect} -->"
end
end
if at_end
results << "..."
end
tags.reverse.each do |tag|
results << "</#{tag}>"
end
results
end
private
def attrs_to_s(attrs)
if attrs.empty?
''
else
' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ')
end
end
end
Here's a solution that works for me with Textile.
Convert it to HTML
Truncate it.
Remove any HTML tags that got cut in half with
html_string.gsub(/<[^>]*$/, "")
Then, uses Hpricot to clean it up and close unclosed tags
html_string = Hpricot( html_string ).to_s
I do this in a helper, and with caching there's no performance issue.
You could use a regular expression to find a line consisting of nothing but "^" characters:
markdown_string = <<-eos
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
eos
preview = markdown_string[0...(markdown_string =~ /^\^+$/)]
puts preview
Rather than trying to truncate the text, why not have 2 input boxes, one for the "opening blurb" and one for the main "guts". That way your authors will know exactly what is being show when without having to rely on some sort of funkly EOF marker.
I will have to agree with the "two inputs" approach, and the content writer would need not to worry, since you can modify the background logic to mix the two inputs in one when showing the full content.
full_content = input1 + input2 // perhaps with some complementary html, for a better formatting
Not sure if it applies to this case, but adding the solution below for the sake of completeness. You can use strip_tags method if you are truncating Markdown-rendered contents:
truncate(strip_tags(markdown(article.contents)), length: 50)
Sourced from:
http://devblog.boonecommunitynetwork.com/rails-and-markdown/
A simpler option that just works:
truncate(markdown(item.description), length: 100, escape: false)

Resources