sanitize gem issue with < and > - ruby-on-rails

I am using the sanitize gem https://github.com/rgrove/sanitize to remove some HTML tags from a string.
However, before sanitizing the string in my controller, the string is being set as follows:
<p>This is <b>bold</b> and this <span style="text-decoration: underline;">is</span> <i>italics</i> ok? This <em>is not </em>a problem.</p>
meaning that < and > are being replaced by < and >.
How can I use the sanitize gem to remove for example and when these tags are being represented as <i> and </i> in the controller?

If you want the escaped HTML tags (< and >) to be treated as HTML for the purposes of sanitizing, then you'll have to unescape them first:
require 'cgi'
Sanitize.clean(CGI.unescapeHTML(your_string))

Related

Thymeleaf allow only ruby tag, escape other tags

I want to show the ruby html tag in a thymeleaf template like this:
<h1 th:text="(${author.displayNameReading} != null) ? '<ruby><rb>' + ${author.displayName} + '</rb><rt>' + ${author.displayNameReading} + '</rt></ruby>' : ${author.displayName}" th:lang="${author.locale}">Some author name</h1>
If I use th:text, it will be escaped. It works if I use utext, but then I'm going to lose all the security for other html tags.
Is it possible to only allow the ruby, rt and rb tags inside th:text?
Why try and stuff everything into a th:text attribute? You can easily split out all that information into new tags -- which is both more readable (formatted like regular html, less string concatenation) and more secure (no need for th:utext). Something like this for example:
<h1 th:lang="${author.locale}">
<ruby th:if="${author.displayNameReading != null}">
<rb th:text="${author.displayName}" />
<rt th:text="${author.displayNameReading}" />
</ruby>
<span th:unless="${author.displayNameReading != null}" th:text="${author.displayName}" />
</h1>

How to convert HTML with Ruby helper method ERB into Slim syntax

Trying to convert some HTML code in a template to Slim syntax. The original code uses a Ruby helper method (in Rails) to dynamically determine the class of the li element.
Here is the original code in HTML:
<li class="<%= is_active_controller('dashboards') %>">
The online converter gives:
| <li class="
= is_active_controller('dashboards')
| ">
This not only is ugly and clunky--it doesn't work.
I've tried various options without success. Such as:
li class=is_active_controller('dashboards')
...as well as several other variations without success.
li class=(is_active_controller('dashboards'))

Dealing with quotes in html.erb

I'm setting a data element of an image in my html.erb file:
<img src="<%=image%>" data-description="<%= auto_link(step.description)%>"/>
The issue is that there there are sometimes quotes in my step.description that interfere so that data-description is not set correctly, such as:
<img src="..." data-description="<pre><code class=" language-java"="" style="width: 193px; height: 257px; margin-left: -96.5px; margin-top: -128.5px; opacity: 1;">
How can I remove conflicting quotes in my erb file?
There's a helper method called j or escape_javascript that will escape quotes in a string and make it possible to add a string with quotes to an attribute on an element like you're trying to do. More info here
So, change your code to:
<img src="<%=image%>" data-description="<%=j auto_link(step.description)%>"/>
Just adding that j will do it for any sort of string with quotes.
If you're also putting HTML inside an HTML attribute you will have to escape html too with the html_escape helper:
<img src="<%=image%>" data-description="<%=h j(auto_link(step.description))%>"/>
h is short for html_escape. That should escape the tags inside the attribute and not break your layout.

How can I get all attributes from nokogiri nodeset

As in Nokogiri::XML::Element, there is a method called attributes to get all as a hash. While for NodeSet object, there are no such method and we need to specify attribute key to get its value. I know that xpath have the ability to extract attributes but I couldn't think of the solutions of the following situation:
Normally, there is only one attr called match-type in match element document:
<D:match match-type="starts-with">appren</D:match>
But now, I need to assume only matct-type attr is allowed in this element tag:
<D:match caseless="bogus" match-type="starts-with">appren</D:match>
My idea is to get all attributes inside this element and find out the size of the attributes other than 'match-type'.
Any solution that I can do that? Thanks!
This isn't going to directly answer your question, because it's not clear whether you've tried anything. Instead, this code can be modified to do what you want but you're going to need to figure out what to change:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<html>
<body>
<a id="some_id" href="/foo/bar/index.html" class='bold'>anchor text</a>
<a id="some_other_id" href="/foo/bar/index2.html" class='italic'>anchor text</a>
</body>
</html>
EOT
doc.search('a').map{ |node| node.keys.reject{ |k| k == 'id' }.map{ |p| node[p].size }.inject(:+) } # => [23, 26]

Getting attribute's value in Nokogiri to extract link URLs

I have a document which look like this:
<div id="block">
link
</div>
I can't get Nokogiri to get me the value of href attribute. I'd like to store the address in a Ruby variable as a string.
html = <<HTML
<div id="block">
link
</div>
HTML
doc = Nokogiri::HTML(html)
doc.xpath('//div/a/#href')
#=> [#<Nokogiri::XML::Attr:0x80887798 name="href" value="http://google.com">]
Or if you wanna be more specific about the div:
>> doc.xpath('//div[#id="block"]/a/#href')
=> [#<Nokogiri::XML::Attr:0x80887798 name="href" value="http://google.com">]
>> doc.xpath('//div[#id="block"]/a/#href').first.value
=> "http://google.com"
doc = Nokogiri::HTML(open("[insert URL here]"))
href = doc.css('#block a')[0]["href"]
The variable href is assigned to the value of the "href" attribute for the <a> element inside the element with id 'block'. The line doc.css('#block a') returns a single item array containing the attributes of #block a. [0] targets that single element, which is a hash containing all the attribute names and values. ["href"] targets the key of "href" inside that hash and returns the value, which is a string containing the url.
Having struggled with this question in various forms, I decided to write myself a tutorial disguised as an answer. It may be helpful to others.
Starting with with this snippet:
require 'rubygems'
require 'nokogiri'
html = <<HTML
<div id="block1">
link1
</div>
<div id="block2">
link2
<a id="tips">just a bookmark</a>
</div>
HTML
doc = Nokogiri::HTML(html)
extracting all the links
We can use xpath or css to find all the elements and then keep only the ones that have an href attribute:
nodeset = doc.xpath('//a') # Get all anchors via xpath
nodeset.map {|element| element["href"]}.compact # => ["http://google.com", "http://stackoverflow.com"]
nodeset = doc.css('a') # Get all anchors via css
nodeset.map {|element| element["href"]}.compact # => ["http://google.com", "http://stackoverflow.com"]
But there's a better way: in the above cases, the .compact is necessary because the searches return the "just a bookmark" element as well. We can use a more refined search to find just the elements that contain an href attribute:
attrs = doc.xpath('//a/#href') # Get anchors w href attribute via xpath
attrs.map {|attr| attr.value} # => ["http://google.com", "http://stackoverflow.com"]
nodeset = doc.css('a[href]') # Get anchors w href attribute via css
nodeset.map {|element| element["href"]} # => ["http://google.com", "http://stackoverflow.com"]
finding a specific link
To find a link within the <div id="block2">
nodeset = doc.xpath('//div[#id="block2"]/a/#href')
nodeset.first.value # => "http://stackoverflow.com"
nodeset = doc.css('div#block2 a[href]')
nodeset.first['href'] # => "http://stackoverflow.com"
If you know you're searching for just one link, you can use at_xpath or at_css instead:
attr = doc.at_xpath('//div[#id="block2"]/a/#href')
attr.value # => "http://stackoverflow.com"
element = doc.at_css('div#block2 a[href]')
element['href'] # => "http://stackoverflow.com"
find a link from associated text
What if you know the text associated with a link and want to find its url? A little xpath-fu (or css-fu) comes in handy:
element = doc.at_xpath('//a[text()="link2"]')
element["href"] # => "http://stackoverflow.com"
element = doc.at_css('a:contains("link2")')
element["href"] # => "http://stackoverflow.com"
find text from a link
And what if you want to find the text associated with a particular link?
Not a problem:
element = doc.at_xpath('//a[#href="http://stackoverflow.com"]')
element.text # => "link2"
element = doc.at_css('a[href="http://stackoverflow.com"]')
element.text # => "link2"
useful references
In addition to the extensive Nokorigi documentation, I came across some useful links while writing this up:
a handy Nokogiri cheat sheet
a tutorial on parsing HTML with Nokogiri
interactively test CSS selector queries
doc = Nokogiri::HTML("HTML ...")
href = doc.css("div[id='block'] > a")
result = href['href'] #http://google.com
data = '<html lang="en" class="">
<head>
<a href="https://example.com/9f40a.css" media="all" rel="stylesheet" /> link1</a>
<a href="https://example.com/4e5fb.css" media="all" rel="stylesheet" />link2</a>
<a href="https://example.com/5s5fb.css" media="all" rel="stylesheet" />link3</a>
</head>
</html>'
Here is my Try for above sample of HTML code:
doc = Nokogiri::HTML(data)
doc.xpath('//#href').map(&:value)
=> [https://example.com/9f40a.css, https://example.com/4e5fb.css, https://example.com/5s5fb.css]
document.css("#block a")["href"]
where document is the Nokogiri HTML parsed.

Resources