Screen scraping when results can't be found? - nokogiri

I have the following code in a screen scraping rake task
page = agent.get("https://domainname.co.uk/unit/27/logs?type=incoming&page=8")
page = agent.page.search("table tbody tr").each do |row|
next if (!row.at('td'))
time, source, destination, duration = row.search('td')[1..5].map{ |td| td.text.strip }
parsed_time = Time.parse(time)
unless Call.find_by_time(parsed_time)
Call.create({:time => parsed_time, :source => source, :destination => destination, :duration => duration})
end
end
This section of the script navigates to page 8 and then creates a call record for each table row of data.
If the page I have navigated to doesn't contain any calls logs, it show the following code:
<tr class='no-data'>
<td colspan='7'>There are no call records matching the search criteria</td>
</tr>
When the rake task navigates to a page with no call logs the tasks fails to complete. It shows the following error:
rake aborted!
can't convert nil into String
So, is there a way when using Nokogiri and Mechanize to recover from nil? Is there a simple way of checking if <tr class='no-data'> exists before trying to import the data?
Update with suggested code
Error message
Scraping Page 9
rake aborted!
can't convert nil into String
Code
puts 'Scraping Page 9'
if agent.page.root.css('tr.no-data').empty?
page = agent.get("https://domaindname.co.uk/27/logs?type=incoming&page=9")
page = agent.page.search("table tbody tr").each do |row|
next if (!row.at('td'))
time, source, destination, duration = row.search('td')[1..5].map{ |td| td.text.strip }
parsed_time = Time.parse(time)
unless Call.find_by_time(parsed_time)
Call.create({:time => parsed_time, :source => source, :destination => destination, :duration => duration})
end
end
else
puts 'No calls on this page'
end

You can check to see if that element exists
if agent.page.root.css('tr.no-data').empty?
# it doesn't exist
else
# do the normal thing
end

Related

Create a link for next page in rails

I'm using the Twilio API in a rails app to show a user a list of their recordings. Say a user has 11 recordings total, and I'm showing them 3 per page.
twilio_controller.rb
def calls
#user = current_user
#account_sid = #user.twilio_account_sid
#auth_token = #user.twilio_auth_token
#page_size = 3
#page = params[:page_id] || 0
#sub_account_client = Twilio::REST::Client.new(#account_sid, #auth_token)
#subaccount = #sub_account_client.account
#recordings = #subaccount.recordings
#recordingslist = #recordings.list({:page_size => #page_size, :page => #page})
end
calls.html.erb
<% #recordingslist.each do |recording| %>
<tr>
<td><%= recording.sid %></td>
</tr>
<% end %>
<%= link_to "Next Page", twilio_calls_path(#page + 1) %>
routes.rb
#twilio routes
post 'twilio/callhandler'
get 'twilio/calls'
match 'twilio/calls' => 'twilio#page', :as => :twilio_page # Allow `recordings/page` to return the first page of results
match 'twilio/calls/:page_id' => 'twilio#page', :as => :twilio_page
Paging info is built into the Twilio response such that
#recordingslist.next_page
gives me the next 3 recordings (verified in rails console). How do I link to that so that when a user clicks the link, the table loads the next 3 results?
Thanks!
You can use a gem like Kaminari for Pagination.
https://github.com/amatsuda/kaminari
I would recommend utilizing the paging functionality that ships with twilio-ruby. According to the docs:
ListResource.list() accepts paging arguments.
Start by create a route for your Twilio list view. Make sure you can pass a page_id parameter – this is how your controller action will know which page to render:
# config/routes.rb
match 'recordings/page/:page_id' => 'twilio#page', :as => :twilio_page
match 'recordings/page' => 'twilio#page', :as => :twilio_page # Allow `recordings/page` to return the first page of results
Then, in the page action, pull the page_id parameter (or set if to 1 if there is no page_id, and pass the page_number and page_size as arguments to #recordings.list:
# app/controllers/twilio_controller.rb
def page
page_size = 3
#page = params[:page_id] || 1
#sub_account_client = Twilio::REST::Client.new(#account_sid, #auth_token)
#subaccount = #sub_account_client.account
#recordings = #subaccount.recordings
#recordingslist = #recordings.list({:page_size => page_size, :page => page)})
end
Finally, in your view, pass the page number to twilio_page_path in your link_helper – just make sure to adjust the page number accordingly (+1 for the next page, -1 for the previous page:
# view
<%= link_to "Next Page", twilio_page_path(#page.to_i + 1) %>
<%= link_to "Previous Page", twilio_page_path(#page.to_i - 1) %>
Note that – if you're at the start or end of your list – you may inadvertently end up passing an invalid page_id. Therefore, you may want to implement some exception handling in your page controller action:
# app/controllers/twilio_controller.rb
def page
begin
#page = params[:page_id] || 1 # If `page_id` is valid
rescue Exception => e
#page = 1 # If `page_id` is invalid
end
# Remaining logic...
end

Displaying XML Hashes in Rails Views Not Working

I have narrowed down a 33,364 entry XML file to the 1,068 that I need. Now I am attempting to gather pieces of information from each node that I have narrowed my search down to, and store each piece of information in a hash, so that I can list out the relevant data in a rails view.
Here is the code in my controller (home_controller.rb) --
class HomeController < ApplicationController
# REQUIRE LIBRARIES
require 'nokogiri'
require 'open-uri'
def search
end
def listing
#properties = {}
# OPEN THE XML FILE
mits_feed = File.open("app/assets/xml/mits.xml")
# OUTPUT THE XML DOCUMENT
doc = Nokogiri::XML(mits_feed)
doc.xpath("//Property/PropertyID/Identification[#OrganizationName='northsteppe']").each do |property|
# GATHER PROPERTY INFORMATION
information = {
"street_address" => property.xpath("Address/AddressLine1").text,
"city" => property.xpath("Address/City").text,
"zipcode" => property.xpath("Address/PostalCode").text,
"short_description" => property.xpath("Information/ShortDescription").text,
"long_description" => property.xpath("Information/LongDescription").text,
"rent" => property.xpath("Information/Rents/StandardRent").text,
"application_fee" => property.xpath("Fee/ApplicationFee").text,
"bedrooms" => property.xpath("Floorplan/Room[#RoomType='Bedroom']/Count").text,
"bathrooms" => property.xpath("Floorplan/Room[#RoomType='Bathroom']/Count").text,
"bathrooms" => property.xpath("ILS_Unit/Availability/VacancyClass").text
}
# MERGE NEW PROPERTY INFORMATION TO THE EXISTING HASH
#properties.merge(information)
end
end
end
I'm not getting any errors and my view is loading fine, but it is pulling up blank. Here is my view file (listing.html.erb) --
<div class="propertiesHolder">
<% if #properties %>
<ul>
<% #properties.each do |property| %>
<li><%= property.information.street_address %></li>
<% end %>
</ul>
<% else %>
<h1>There are no properties that match your search</h1>
<% end %>
</div>
Does anyone know why this might be pulling up blank? I would assume that I would receive an error if I had done something incorrect in the code. I also tried just outputting "Hello World" as text for each |property| and this also pulled up blank. Thank you!
Ruby merge does not mutate your hash. It just returns the two hashes as one.
Example
h1 = { "a" => 100, "b" => 200 }
h2 = { "b" => 254, "c" => 300 }
h1.merge(h2)
#=> {"a"=>100, "b"=>254, "c"=>300}
h1
#=> {"a"=>100, "b"=>200}
Note how h1 still retains its original values?
What you will want to do is rename your information hash to #properties. I suggest this because you are merging a hash with information in it (information) with an empty hash (#properties). So instead of overwriting when you merge the hashes, just use the first hash.

Rails issue with params hash

I need to export a previously rendered table from my view to pdf. When I build the array of hashes as follows:
__index = 0
#people.each do |p| %>
#pdfdata[__index] = {
[:name] => p.name.to_s,
[:surname] => p.surname.to_s
__index += 1
end
end
and send it to the controller in order to export it on a pdf as follows:
hidden_field_tag(:pdfdata, #pdfdata)
when I get the params[:pdfdata] I cannot find a way unless I build a string parser to map the data accordingly... is there a better way to do this?
Modifying your code a little bit to get
#people.each_with_index do |p,i| %>
#pdfdata[i] = {
[:name] => p.name.to_s,
[:surname] => p.surname.to_s}
end
and use this gem to create the hidden has field
https://github.com/brianhempel/hash_to_hidden_fields

How to update partial after completing job

This is my first post here but I have gotten some great info from this site already. So I thought someone may be able to help.
In my view I have a form that once submitted, the controller passes the job off to navvy and it really works well. The problem is I would like to have another partial on the same page as the form update with the new info once navvy is done working. So in my controller I have:
Navvy::Job.enqueue( GetZip, :get_zip, #series, :job_options => {:priority => 8})
And then in my navvy block which is located in config/initializers/navvy.rb I have:
class GetZip
def self.get_zip(params)
fetch = Fetch.new
fetch.get_zip(params)
# What to put here to update partial #
end
end
Which works as expected. I am just not sure how I can have the partial in my view updated once the navvy job is completed. Any suggestions would be greatly appreciated.
The problem here is that once you've fired up a background process you're no longer tied into that user's request (which is obviously the point of the background process).
So your background job is not aware of what the user is doing in the meantime. For example they could navigate to another page, or even leave your website.
You could get your background job to create a record in a database that indicates it has started processing and update this status when it has finished processing. That way when the user next refreshes the page you can check the status of that record and the partial can be updated accordingly.
If you want the page to auto-update you could keep polling the status of that record with ajax requests. On the initial submission of the form you could start the polling rather than have it running all the time.
Here's what I'm using.
This is where the Job is called:
Setting::progress = "Starting..."
Navvy::Job.enqueue(EmailWorker, :async_email, stuff)
Settings is super simple:
class Setting < ActiveRecord::Base
def Setting::progress=(value)
setn = Setting.find_or_initialize_by_name("email_status")
setn.value = value
setn.save
end
end
and the navvy job EmailWorker is:
class EmailWorker
def self.async_email(options)
# send one at a time
total = options[:list].length
errors = []
options[:list].each_with_index do |email_addr, n|
begin
Setting::progress = "#{n+1}/#{total}: #{email_addr}"
Postoffice.one_notice(email_addr, options).deliver
rescue Exception => e
Setting::progress = "#{email_addr} #{e.message}"
errors << "<span class='red'>#{email_addr}</span> #{e.message}"
end
# get stack level too deep errors at random when this is removed. I don't get it.
sleep 0.05
end
Setting::progress = "DONE sending #{total} with #{errors.length} errors<br/>#{errors.join("<br/>")}"
"Done" # just for display in Navvy console output
end
end
Then the triggering page has this:
<%-
# had to add this back as rails 3 took it away
def periodically_call_remote(options = {})
frequency = options[:frequency] || 10 # every ten seconds by default
code = "new PeriodicalExecuter(function() {#{remote_function(options)}}, #{frequency})"
javascript_tag(code)
end
-%>
<script type="text/javascript">
//<![CDATA[
check_var = true;
//]]>
</script>
<%= periodically_call_remote(
:condition => "check_var == true",
:url => { :controller => "application",
:action => "update" },
:frequency => '3') -%>
<div id="progress">
<p>Monitoring progress of emails</p>
</div>
And heres the method that's called repeatedly:
def update
raise "not for direct" if (!request.xhr?)
#progress = Setting::progress
#again = !(/^DONE/ =~ #progress)
render :action => "update"
end
Which trigers an in-place update via update.rjs
page.assign('check_var', #again)
page.replace_html "progress", :partial => "info", :locals => { :info => #progress }
page.visual_effect :highlight, 'progress', :startcolor => "#3333ff", :restorecolor => '#ffffff', :duration => 0.5
One word of warning about Navvy - since it runs as a background task until killed, it will stay executing your old app when you cap deploy. You must kill navvy in old "current" and move to new "current".

Rails/ajax - whitespace and request.raw_post

I am starting to learn Ajax with rails.
I have a catalog index page with a text_field_tag querying db if it finds similar "section" results.
Index.html.erb
<h1>Catalogs</h1>
<label>Search by Section:</label>
<%=text_field_tag :section %>
<%= observe_field(:section,
:frequency=> 0.1,
:update=> "article_list",
:url=>{ :action => :get_article_list }) %>
<div id="article_list"></div>
Catalogs_controller.rb
def index
end
def get_article_list
#section = request.raw_post.split(/&/).first
#catalogList = "<ol>"
Catalog.find(:all, :conditions => ["section = ?", #section]).each do |catalog|
#catalogList += "<li>" + catalog.title + "</li>"
end
#catalogList += "</ol>"
render :text => #catalogList
end
Question:
request.raw_post renders something like:
xml&authenticity_token=tgtxV3knlPvrJqT9qazs4BIcKYeFy2hGDIrQxVUTvFM%3D
so I use
request.raw_post.split(/&/).first
to get the section query ("xml"). It works, however how can I do if the query have a whitespace. (like "Open Source") In fact, I have Open Source sections in my db, but request.raw_post.split(/&/).first renders Open%20Source. How can I manage this? Did I have to use a full text search engine to achieve it or there is another way?
Thanks a lot for your explanation!
Look over your logs, in them you will see the post and the params being passed. You should not need to do your own query-string splitting. You should be able to use params[:section] to get the post data.
As your comment implies, there's something missing. Your observe_field needs to tell the Rails helper what to do. Check out: http://apidock.com/rails/ActionView/Helpers/PrototypeHelper/observe_field. Anyhow, you'll want to do something like:
observe_field(... # lots of parameters
:with => 'section'
)
And that should give you params[:section].

Resources