How to replace last free space to nbsp in ruby - ruby-on-rails

How I can replace last free space to in ruby?
I have in database this:
<h1>Hello dear friend!</h1>
<p>How are you?</p>
<figure><img src="..." alt="..." /></figure>
<p>Bye!</p>
And I need to have this "output":
<h1>Hello dear friend!</h1>
<p>How are you?</p>
<figure><img src="..." alt="..." /></figure>
<p>Bye!</p>
I tried to play with nokogiri:
text = Nokogiri::HTML::DocumentFragment.parse(...)
text.css('h1, h2, h3, h4, h5, h6, p, li').each do |tag|
tag_arr = tag.content.split(' ')
tag_last_words = tag_arr[tag_arr.length-2..tag_arr.length]
tag_return = tag_arr[0..-2].push(tag_last_words.join(' '))
tag_return = tag_return.join(' ')
tag.content = tag_return
end
but the I can't beat some "bugs":
all attributes and inner tags (html) are deleted
instead of I have &nbsp;
Why? To avoid single word wrapping to new line on mobile device. (JS is not an option in my case)

Related

How to scrape a span name in Nokogiri in Ruby?

I want to scrape data off a website. The data is in the text of a span.
The HTML looks like this:
<p class="text-muted text-small">
<span class="text-muted">Votes:</span>
<span name="nv" data-value="1564808">1,564,808</span>
<span class="ghost">|</span>
<span class="text-muted">Gross:</span>
<span name="nv" data-value="107,928,762">$107.93M</span>
</p>
I want to search the whole page and get the value of the data-value which is 1,564,808 not the 107.93M value.
I tried various ways to get the data, Like for instance:
#votes = []
html_content =
open("https://www.imdb.com/list/ls057823854/sort=list_order,asc&st_
dt=&mod e=detail&page=1").read
doc = Nokogiri::HTML(html_content)
doc.css(".text-muted['span name=nv']").each do |i|
#votes << i.text.strip
Try this code:
doc.css('div.lister-item-content > p.text-muted > span[name = nv]:nth-child(2)').map(&:text)
Which results in:
["1,564,941", "373,745", "2,004,624", "1,077,404", "887,189", "305,554", "207,904", "1,074,609", "748,393", "789,255", "1,224,753", "754,008", "634,752", "1,056,328", "1,604,158", "1,438,194", "629,504", "1,158,452", "517,609", "539,263", "1,443,979", "1,290,159", "161,981", "830,992", "1,427,193", "299,532", "289,184", "705,138", "615,264", "1,147,650", "1,030,826", "1,018,932", "921,730", "524,568", "557,482", "1,973,773", "813,743", "367,587", "342,800", "188,210", "649,467", "1,068,455", "547,990", "527,123", "805,964", "420,447", "441,780", "318,295", "1,004,742", "446,096", "203,977", "581,108", "1,754,019", "616,804", "484,534", "265,048", "958,244", "289,190", "651,605", "503,185", "320,564", "660,685", "476,016", "432,155", "588,572", "374,705", "378,561", "337,801", "463,467", "508,822", "187,810", "1,128,184", "221,361", "261,529", "322,314", "324,435", "116,258", "318,628", "1,334,595", "222,651", "1,155,754", "228,713", "205,956", "271,162", "293,774", "33,136", "80,385", "703,048", "195,712", "274,244", "233,133", "121,874", "208,462", "513,797", "485,112", "120,750", "135,232", "57,411", "125,431", "297,193"]

Nokogiri parsing missing element create issue

I am having Plain html doc NO CSS . In which some of the content i need to pass to excel sheet. I tried with Nokogiri it works on Css basis.
Do anybody tried this thing.
<html>
<head></head>
<body>
***NOTE***
<br>
Items
<br>
<br>
Invoice Number : [78945824] PO Number : [4587958]
<br>
Track It : 12345
<br>
<br>
Items
<br>
<br>
Invoice Number : [79546828] PO Number : [4567892]
<br>
<br>
<br>
Items
<br>
<br>
Invoice Number : [78976824] PO Number : [897569]
<br>
Track It : 12345
<br>
</body>
</html>
I am able to retrieve the PO Number & Tracking no
require 'rubygems'
require 'nokogiri'
require 'open-uri'
PAGE_URL = "a.html"
page = Nokogiri::HTML(open(PAGE_URL))
data = page.css("body").text
po_numbers = data.scan(/Invoice Number : \[\d+\] PO Number : \[(\d+)\]/).flatten
tracking_numbers = page.css("a").text.split
[["PO Number", "Tracking Number"]].concat(po_numbers.zip(tracking_numbers))
puts po_numbers
puts tracking_numbers
=> po_numbers = ["4587958", "4567892", "4587958"]
=> tracking_numbers = ["12543", "12356"]
When we zip those together, we get:
=> po_numbers.zip(tracking_numbers)
=> [["4587958", "12543"], ["4567892", "12356"], ["4587958", "nil"]]
What we want is:
=> [["4587958", "12543"], ["4567892", "nil"], ["4587958", "12356"] ]
Try this
data = page.css("body").text
data = data.gsub(" ","").split(/\n/)
po=[]
track=[]
data.each do |i|
if i.include? "PONumber"
po << i.split("PONumber:").last.scan(/\d+/)[0]
end
if i.include? "TrackIt"
track << i.split("TrackIt:").last
end
end
po.zip(track)
If you can use regex to scan for all invoice number (po_numbers), you can do the same with tracking number (tracking_numbers):
tracking_numbers = data.scan(/Tracking no : (\d*)/).flatten
The returned array includes nil, therefore, you can walk through both array for po number and tracking number
po_numbers.each_with_index do |elm, index|
p "PO Number: #{elm}, Tracking Number: #{tracking_numbers[index]}"
end
Update
This regex match the updated HTML
/Track It :\s*(?:<a href=".*">\s*(\d+)\s*<\/a>|$)/
It matches both empty track number and one with a link.

How to split value from a string in ruby

My example string is listed here. i want to split every value result in array or hash to process value of each element.
<div id="test">
accno: 123232323 <br>
id: 5443534534534 <br>
name: test_name <br>
url: www.google.com <br>
</div>
How can i fetch each values in a hash or array.
With regex it's easy:
s = '<div id="test">
accno: 123232323 <br>
id: 5443534534534 <br>
name: test_name <br>
url: www.google.com <br>
</div>'
p s.scan(/\s+(.*?)\:\s+(.*?)<br>/).map.with_object({}) { |i, h| h[i[0].to_sym] = i[1].strip }
Or you can precise you keys (accno, id, name, url) like ([a-z]+) if they contains only lower case letters:
p s.scan(/\s+([a-z]+)\:\s+(.*?)<br>/).map.with_object({}) { |i, h| h[i[0].to_sym] = i[1].strip }
Result:
{:accno=>"123232323", :id=>"5443534534534", :name=>"test_name", :url=>"www.google.com"}
Update
in case of:
<div id="test"> accno: 123232323 id: 5443534534534 name: test_name url: www.google.com </div>
regex will be:
/([a-z]+)\:\s*(.*?)\s+/
([a-z]+) - this is hash key, and it could contains - or _, then just add it like: ([a-z]+\-_). This scheme presume that after key follows : (perhaps with space) and then some text until the space. Or (\s+|<) at the end if line ends without space: url: www.google.com</div>
If you are processing html, use a html/xml parser like nokogiri to pull out the text content of the required <div> tag using a CSS selector. Then parse the text into fields.
To install nokogiri:
gem install nokogiri
Then process the page and text:
require "nokogiri"
require "open-uri"
# re matches: spaces (word) colon spaces (anything) space
re_fields = /\s+(?<field>\w+):\s+(?<data>.*?)\s/
# Somewhere to store the results
record = {}
page = Nokogiri::HTML( open("http://example.com/divtest.html") )
# Select the text from <div id=test> and scan into fields with the regex
page.css( "div#test" ).text.scan( re_fields ){ |field, data|
record[ field ] = data
}
p record
Results in:
{"accno"=>"123232323", "id"=>"5443534534534", "name"=>"test_name", "url"=>"www.google.com"}
The page.css( "blah" ) selector can also be accessed as an array if you are processing multiple elements, which can be looped through with .each
# Somewhere to store the results
records = []
# Select the text from <div id=test> and scan into fields with the regex
page.css( "div#test" ).each{ |div|
record = {}
div.text.scan( re_fields ){ |field, data|
record[field] = data
}
records.push record
}
p records

Generate table of contents like on Wikipedia, without JavaScript

I have a page that is formatted like so:
<h1>Header</h1>
<h2>Subheader</h2>
<h3>Subsubheader</h3>
<h1>Another header</h1>
Is it possible to server-side generate a table of contents / outline at the start of the page, like Wikipedia does in its articles? I use Ruby on Rails.
EDIT: WITHOUT JavaScript!
I created a class for this purpose today. It depends on http://www.nokogiri.org/, but that gem comes with Rails already.
Put this in app/models/toc.rb:
class Toc
attr_accessor :html
TOC_CLASS = "toc".freeze
TOC_ELEMENT = "p".freeze
TOC_ITEMS = "h1 | h2 | h3 | h4 | h5".freeze
UNIQUEABLE_ELEMENTS = "h1 | h2 | h3 | h4 | h5 | p".freeze
def initialize(content)
#html = Nokogiri::HTML.fragment content
end
def generate
clear
set_uniq_ids
toc = create_container
html.xpath(TOC_ITEMS).each { |node| toc << toc_item_tag(node) }
html.prepend_child toc
return html.to_s
end
private
def clear
html.search(".#{TOC_CLASS}").remove
end
def set_uniq_ids
html.xpath(UNIQUEABLE_ELEMENTS).
each { |node| node["id"] = rand_id }
end
def rand_id
(0...8).map { ('a'..'z').to_a[rand(26)] }.join
end
def create_container
toc = Nokogiri::XML::Node.new TOC_ELEMENT, html
toc["class"] = TOC_CLASS
return toc
end
def toc_item_tag(node)
"<a data-turbolinks='false' class=\"toc-link toc-link-#{node.name}\" href=\"##{node["id"]}\">#{node.text}</a>"
end
end
Use it like
toc = Toc.new article.body
body_with_toc = toc.generate
article.update body: body_with_toc
You need to generate data source from your hierarchy to be something like this
#toc = [ ['header', 0], ['subheader', 1], ['subsubheader', 2],
['header2', 0], ['header3', 0], ['subheader2', 1]
]
Than it is easy to render it in template, for example:
<%- #toc.each do |item, distance| %>
<%= (' ' * distance * 5).html_safe %>
<%= item %>
<br/>
<%- end %>
Would give you:
header
subheader
subsubheader
header2
header3
subheader2
Of course you can use 'distance' for determining style size instead of 'depth', but I hope you get the main idea.
yes, it is possible. you don't really need rails for this; you can also use javascript to generate a table of contents.
Here is an exmaple library that you can use.
http://www.kryogenix.org/code/browser/generated-toc/
You could alternatively create your anchor links as you loop through elements in your rails erb/haml views.

How to display data in two columns in Rails

I'm new to Ruby on Rails. How can I display products in two columns?
When I write the following, the right column will display the same products, but I want to display
some in the left and some in the right columns.
#main_container
.left_col
%div{"data-hook" => "___homepage_featured_products"}
%h3
Featured Activities
- #featured.each do |pr|
- #product = pr
%a.box{:href=>url_for(#product), :title=>"#{#product.name} | #{#product.location}"}
- if #product.images[0]
.img{:style=>"background-image: url('#{#product.images[0].attachment.url(:original)}')"}
.details
%h3
= #product.name.truncate 20
%p.infos
= image_tag #product.activity_type.icon, :class=>"pictogram" rescue ''
%span= #product.activity_type.name.titleize rescue ''
\/
%span.price= number_to_currency #product.price rescue ''
\/
= #product.location
\/
= #product.level
%p
= #product.description.truncate(120) rescue ''
.right_col
You could put each product into its own div, and then use CSS to float them to the left so that a maximum of 2 boxes will appear next to each other horizontally. This will give the effect of a 2 column layout. As an example:
#main_container { width: 900px; }
.featured_product { width: 450px; float: left; }
Add padding etc as needed.
Alternatively you could split the array after you retrieve it from the database and run the code twice, once in the left column and once in the right:
#left, #right = #featured.in_groups_of((#featured.count / 2.0).ceil, false)

Resources