Nokogiri parsing missing element create issue - ruby-on-rails
I am having Plain html doc NO CSS . In which some of the content i need to pass to excel sheet. I tried with Nokogiri it works on Css basis.
Do anybody tried this thing.
<html>
<head></head>
<body>
***NOTE***
<br>
Items
<br>
<br>
Invoice Number : [78945824] PO Number : [4587958]
<br>
Track It : 12345
<br>
<br>
Items
<br>
<br>
Invoice Number : [79546828] PO Number : [4567892]
<br>
<br>
<br>
Items
<br>
<br>
Invoice Number : [78976824] PO Number : [897569]
<br>
Track It : 12345
<br>
</body>
</html>
I am able to retrieve the PO Number & Tracking no
require 'rubygems'
require 'nokogiri'
require 'open-uri'
PAGE_URL = "a.html"
page = Nokogiri::HTML(open(PAGE_URL))
data = page.css("body").text
po_numbers = data.scan(/Invoice Number : \[\d+\] PO Number : \[(\d+)\]/).flatten
tracking_numbers = page.css("a").text.split
[["PO Number", "Tracking Number"]].concat(po_numbers.zip(tracking_numbers))
puts po_numbers
puts tracking_numbers
=> po_numbers = ["4587958", "4567892", "4587958"]
=> tracking_numbers = ["12543", "12356"]
When we zip those together, we get:
=> po_numbers.zip(tracking_numbers)
=> [["4587958", "12543"], ["4567892", "12356"], ["4587958", "nil"]]
What we want is:
=> [["4587958", "12543"], ["4567892", "nil"], ["4587958", "12356"] ]
Try this
data = page.css("body").text
data = data.gsub(" ","").split(/\n/)
po=[]
track=[]
data.each do |i|
if i.include? "PONumber"
po << i.split("PONumber:").last.scan(/\d+/)[0]
end
if i.include? "TrackIt"
track << i.split("TrackIt:").last
end
end
po.zip(track)
If you can use regex to scan for all invoice number (po_numbers), you can do the same with tracking number (tracking_numbers):
tracking_numbers = data.scan(/Tracking no : (\d*)/).flatten
The returned array includes nil, therefore, you can walk through both array for po number and tracking number
po_numbers.each_with_index do |elm, index|
p "PO Number: #{elm}, Tracking Number: #{tracking_numbers[index]}"
end
Update
This regex match the updated HTML
/Track It :\s*(?:<a href=".*">\s*(\d+)\s*<\/a>|$)/
It matches both empty track number and one with a link.
Related
How to scrape a span name in Nokogiri in Ruby?
I want to scrape data off a website. The data is in the text of a span. The HTML looks like this: <p class="text-muted text-small"> <span class="text-muted">Votes:</span> <span name="nv" data-value="1564808">1,564,808</span> <span class="ghost">|</span> <span class="text-muted">Gross:</span> <span name="nv" data-value="107,928,762">$107.93M</span> </p> I want to search the whole page and get the value of the data-value which is 1,564,808 not the 107.93M value. I tried various ways to get the data, Like for instance: #votes = [] html_content = open("https://www.imdb.com/list/ls057823854/sort=list_order,asc&st_ dt=&mod e=detail&page=1").read doc = Nokogiri::HTML(html_content) doc.css(".text-muted['span name=nv']").each do |i| #votes << i.text.strip
Try this code: doc.css('div.lister-item-content > p.text-muted > span[name = nv]:nth-child(2)').map(&:text) Which results in: ["1,564,941", "373,745", "2,004,624", "1,077,404", "887,189", "305,554", "207,904", "1,074,609", "748,393", "789,255", "1,224,753", "754,008", "634,752", "1,056,328", "1,604,158", "1,438,194", "629,504", "1,158,452", "517,609", "539,263", "1,443,979", "1,290,159", "161,981", "830,992", "1,427,193", "299,532", "289,184", "705,138", "615,264", "1,147,650", "1,030,826", "1,018,932", "921,730", "524,568", "557,482", "1,973,773", "813,743", "367,587", "342,800", "188,210", "649,467", "1,068,455", "547,990", "527,123", "805,964", "420,447", "441,780", "318,295", "1,004,742", "446,096", "203,977", "581,108", "1,754,019", "616,804", "484,534", "265,048", "958,244", "289,190", "651,605", "503,185", "320,564", "660,685", "476,016", "432,155", "588,572", "374,705", "378,561", "337,801", "463,467", "508,822", "187,810", "1,128,184", "221,361", "261,529", "322,314", "324,435", "116,258", "318,628", "1,334,595", "222,651", "1,155,754", "228,713", "205,956", "271,162", "293,774", "33,136", "80,385", "703,048", "195,712", "274,244", "233,133", "121,874", "208,462", "513,797", "485,112", "120,750", "135,232", "57,411", "125,431", "297,193"]
How can i Read a file in Ruby on Rails
I´m new to rails an i try to read a txt.file that looks like this: ThomasLinde ; PeterParker ; Monday JulkoAndrovic ; KeludowigFrau ; Tuesday JohannesWoellenstein ; SiegmundoKrugmando ; Wednesday Now i want to read each "column" of the .txt file to display it on a page of my application. My idea for the code looks like this: if (File.exist?("Zuordnung_x.txt")) fi=File.open("Zuordnung_x.txt", "r") fi.each { |line| sa=line.split(";") #nanny_name=sa[0] #customer_name=sa[1] #period_name=sa[2] } fi.close else #nanny_name=nil #customer_name=nil #period_name=nil flash.now[:not_available] = "Nothing happened!" end This is my Idea but he gives me only one line. Any ideas? or i am just able to read one line if i use #nanny_name?
You can only need a variable with an array value, and push every line to it. #result = [] if (File.exist?("Zuordnung_x.txt")) fi=File.open("Zuordnung_x.txt", "r") fi.each do |line| sa=line.split(";") #result << {nanny_name: sa[0], customer_name: sa[1], period_name: [2]} end fi.close else flash.now[:not_available] = "Nothing happened!" end and on view template, you need to each #result, example <% #result.each do |row| %> <p><%= "#{row[:nanny_name]} serve the customer #{row[:customer_name]} on #{row[:period_name]}" %><p> <% end %> optional : If just using split, probably you will get some string with whitespace at the beginning of string or at the end of string "ThomasLinde ; PeterParker ; Monday".split(';') => ["ThomasLinde ", " PeterParker ", " Monday"] to handle it, you need strip every value of an array like this : "ThomasLinde ; PeterParker ; Monday".split(';').map(&:strip) => ["ThomasLinde", "PeterParker", "Monday"]
How to split value from a string in ruby
My example string is listed here. i want to split every value result in array or hash to process value of each element. <div id="test"> accno: 123232323 <br> id: 5443534534534 <br> name: test_name <br> url: www.google.com <br> </div> How can i fetch each values in a hash or array.
With regex it's easy: s = '<div id="test"> accno: 123232323 <br> id: 5443534534534 <br> name: test_name <br> url: www.google.com <br> </div>' p s.scan(/\s+(.*?)\:\s+(.*?)<br>/).map.with_object({}) { |i, h| h[i[0].to_sym] = i[1].strip } Or you can precise you keys (accno, id, name, url) like ([a-z]+) if they contains only lower case letters: p s.scan(/\s+([a-z]+)\:\s+(.*?)<br>/).map.with_object({}) { |i, h| h[i[0].to_sym] = i[1].strip } Result: {:accno=>"123232323", :id=>"5443534534534", :name=>"test_name", :url=>"www.google.com"} Update in case of: <div id="test"> accno: 123232323 id: 5443534534534 name: test_name url: www.google.com </div> regex will be: /([a-z]+)\:\s*(.*?)\s+/ ([a-z]+) - this is hash key, and it could contains - or _, then just add it like: ([a-z]+\-_). This scheme presume that after key follows : (perhaps with space) and then some text until the space. Or (\s+|<) at the end if line ends without space: url: www.google.com</div>
If you are processing html, use a html/xml parser like nokogiri to pull out the text content of the required <div> tag using a CSS selector. Then parse the text into fields. To install nokogiri: gem install nokogiri Then process the page and text: require "nokogiri" require "open-uri" # re matches: spaces (word) colon spaces (anything) space re_fields = /\s+(?<field>\w+):\s+(?<data>.*?)\s/ # Somewhere to store the results record = {} page = Nokogiri::HTML( open("http://example.com/divtest.html") ) # Select the text from <div id=test> and scan into fields with the regex page.css( "div#test" ).text.scan( re_fields ){ |field, data| record[ field ] = data } p record Results in: {"accno"=>"123232323", "id"=>"5443534534534", "name"=>"test_name", "url"=>"www.google.com"} The page.css( "blah" ) selector can also be accessed as an array if you are processing multiple elements, which can be looped through with .each # Somewhere to store the results records = [] # Select the text from <div id=test> and scan into fields with the regex page.css( "div#test" ).each{ |div| record = {} div.text.scan( re_fields ){ |field, data| record[field] = data } records.push record } p records
Simple string concatenation in rails in view page
I am new to Rails. Can someone please explain to me the concept of string concatenation using variables in view page and the controller?
For example : In controller Code : def show #firstname = 'Test' #lastname = 'User' end In view page : Full Name : <%= "#{#firstname} #{lastname}" %> For further details Click Here Scenarios:- If you want to keep two variables on View page and add concatenation for those then use of space is necessary. View page: <% var string_1 = "With" var string_2 = "Rails" var addition_1 = string_1 + string_2; var addition_2 = string_1 + " " + string_2 %> <h1> First Addition -> #{addition_1} </h1> <h1> Second Addition -> #{addition_2} </h1> Output : First Addition -> WithRails Second Addition -> With Rails
in view <% var1 = "ruby" var2 = "on" var3 = var1 + var2 %> Finally <% f_var = "Ruby #{var3}"%> but this type of code is not recommended in view as it does not look good. You should use helper method for this type of requirement
Split #blogs into three divs using size of description field as weight
I have a collection of Blog items. #blogs = Blog.find(:all) Each blog has a description textfield with some text. What I would like to do is splitting the #blogs objects into 3 divs, but with roughly the same characters in each column. <div id="left"> #blog1 (653 characters) </div> <div id="center"> #blog2 (200 characters) #blog5 (451 characters) </div> <div id="right"> #blog3 (157 characters) #blog4 (358 characters) #blog6 (155 characters) </div> I can't figure out how to do that without getting really complicated and probably inefficient. So far I have thought about converting the description field (size) to % of total characters in the #blogs collection, but how do I match/split the elements, so that I get closest to 33% in each column - like a super simple tetris game :) Any thoughts?
Here's a quick hack that isn't perfect, but might get you pretty close. The algorithm is simple: Sort items by size. Partition items into N bins. Resort each bin by date (or other field, per your desired presentation order) Here's a quick proof of concept: #!/usr/bin/env ruby # mock out some simple Blog class for this example class Blog attr_accessor :size, :date def initialize #size = rand(700) + 100 #date = Time.now + rand(1000) end end # create some mocked data for this example #blogs = Array.new(10) { Blog.new } # sort by size sorted = #blogs.sort_by { |b| b.size } # bin into NumBins NumBins = 3 bins = Array.new(NumBins) { Array.new } #blogs.each_slice(NumBins) do |b| b.each_with_index { |x,i| bins[i] << x } end # sort each bin by date bins.each do |bloglist| bloglist.sort_by! { |b| b.date } end # output bins.each_with_index do |bloglist,column| puts puts "Column Number: #{column+1}" bloglist.each do |b| puts "Blog: Size = #{b.size}, Date = #{b.date}" end total = bloglist.inject(0) { |sum,b| sum + b.size } puts "TOTAL SIZE: #{total}" end For more ideas, look up the multiprocessor scheduling problem.