How to display output from rake task to the browser? - ruby-on-rails

I have this rake file which scraping data from a website. While this script scraping the data, I am using rake-progressbar gem to track its progress. However, right now I can see the progress only in my terminal and only if I run the rake file in my terminal by typing: "rake testing2".
What I want now is to be able to see the progress in the browser when I click the link that trigger my rake file.
This is my home view which contains the link to trigger the rake file(testing2.rake):
<div>
<p>Find me in app/views/home/index.html.erb</p>
<h3>Scrape data:</h3>
<%= link_to "Scrape",:action => 'scrape' %>
</div>
This is my home controller:
class HomeController < ApplicationController
def index
end
def scrape
%x[rake testing2]
redirect_to root_url
end
end
This is my rake file(testing2.rake) which contains the codes to scrape the data as well as the code of the progress bar:
require 'mechanize'
require 'date'
require 'json'
require 'rake-progressbar'
task :testing2 => [:environment] do
agent = Mechanize.new
last_page_number = 1
for pg_number in 1..last_page_number do
puts "Scrapping..."
page = agent.get("https://www.congress.gov/members?page=#{pg_number}")
page_links = page.links_with(href: %r{.*/member/\w+})
page_links_size = page_links.size
member_links = page_links
bar = RakeProgressbar.new(100)
members = member_links.map do |link|
member = link.click
name = member.search('title').text.split('|')[0]
institution = member.search('td~ td+ td').text.split(':')[0]
stripActivities = activities.reject { |x| x.length == 1 }
{
name: name.strip,
institution: institution.strip
}
bar.inc
end
bar.finished
end
end
And below is the output in the terminal with the progress bar showing:
So, how can i display this progress onto the browser?

progress bar:
You can store progress in database and show it in browser page.
parsing:
remove rake task and just run it as ActiveJob when user click to link

Related

Rails Mechanize two part form

I have a simple task that I want to do using mechanize, but the website has a login page that asks for your email, then once you have input it and clicked the submit, it asks for your password on the same page. How would I handle the second submit on the same page?
I tried this:
task :estimatesite => :environment do
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get('https://estimatesite.com/auth/login')
form = page.forms.first
form['user_search_email[email]'] = 'myemail#email.com'
form['check_distinct_user_password[plainPassword]'] = 'mypassword'
page = form.submit
end
it looks something like this:
task :estimatesite => :environment do
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get('https://estimatesite.com/auth/login')
form = page.forms.first
form['user_search_email[email]'] = 'myemail#email.com'
page = form.submit
form['check_distinct_user_password[plainPassword]'] = 'mypassword'
page = form.submit
end
but neither seems to work
task :estimatesite => :environment do
require 'mechanize'
agent = WWW::Mechanize.new
agent.get('https://estimatesite.com/auth/login')
form = agent.page.forms.first
form['user_search_email[email]'] = 'myemail#email.com''
form.submit
form['check_distinct_user_password[plainPassword]'] = 'mypassword'
form.submit
end
Reference =>
http://railscasts.com/episodes/191-mechanize?autoplay=true

Ruby LoadError issue with 'require'

I have researched this for quite some time, and have yet to solve my issue. Here is the error that I am receiving:
C:/Ruby23/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- nexpose-runner/constants (LoadError)
from C:/Ruby23/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in require'
from scan.rb:4:in `<main>'
Here is my code:
require 'nexpose'
require 'csv'
require 'json'
require 'nexpose-runner/constants'
require 'nexpose-runner/scan_run_description'
module NexposeRunner
module Scan
def Scan.start(options)
run_details = ScanRunDescription.new(options)
run_details.verify
nsc = get_new_nexpose_connection(run_details)
site = create_site(run_details, nsc)
start_scan(nsc, site, run_details)
reports = generate_reports(nsc, site, run_details)
verify_run(reports[0])
end
def self.generate_reports(nsc, site, run_details)
puts "Scan complete for #{run_details.site_name}, Generating Vulnerability Report"
vulnerbilities = generate_report(CONSTANTS::VULNERABILITY_REPORT_QUERY, site.id, nsc)
generate_csv(vulnerbilities, CONSTANTS::VULNERABILITY_REPORT_NAME)
puts "Scan complete for #{run_details.site_name}, Generating Vulnerability Detail Report"
vuln_details = generate_report(CONSTANTS:: VULNERABILITY_DETAIL_REPORT_QUERY, site.id, nsc)
generate_csv(vuln_details, CONSTANTS::VULNERABILITY_DETAIL_REPORT_NAME)
puts "Scan complete for #{run_details.site_name}, Generating Software Report"
software = generate_report(CONSTANTS::SOFTWARE_REPORT_QUERY, site.id, nsc)
generate_csv(software, CONSTANTS::SOFTWARE_REPORT_NAME)
puts "Scan complete for #{run_details.site_name}, Generating Policy Report"
policies = generate_report(CONSTANTS::POLICY_REPORT_QUERY, site.id, nsc)
generate_csv(policies, CONSTANTS::POLICY_REPORT_NAME)
puts "Scan complete for #{run_details.site_name}, Generating Audit Report"
generate_template_report(nsc, site.id, CONSTANTS::AUDIT_REPORT_FILE_NAME, CONSTANTS::AUDIT_REPORT_NAME, CONSTANTS::AUDIT_REPORT_FORMAT)
puts "Scan complete for #{run_details.site_name}, Generating Xml Report"
generate_template_report(nsc, site.id, CONSTANTS::XML_REPORT_FILE_NAME, CONSTANTS::XML_REPORT_NAME, CONSTANTS::XML_REPORT_FORMAT)
[vulnerbilities, software, policies]
end
def self.verify_run(vulnerabilities)
raise StandardError, CONSTANTS::VULNERABILITY_FOUND_MESSAGE if vulnerabilities.count > 0
end
def self.start_scan(nsc, site, run_details)
puts "Starting scan for #{run_details.site_name} using the #{run_details.scan_template} scan template"
scan = site.scan nsc
begin
sleep(3)
stats = nsc.scan_statistics(scan.id)
status = stats.status
puts "Current #{run_details.site_name} scan status: #{status.to_s} -- PENDING: #{stats.tasks.pending.to_s} ACTIVE: #{stats.tasks.active.to_s} COMPLETED #{stats.tasks.completed.to_s}"
end while status == Nexpose::Scan::Status::RUNNING
end
def self.create_site(run_details, nsc)
puts "Creating a nexpose site named #{run_details.site_name}"
site = Nexpose::Site.new run_details.site_name, run_details.scan_template
run_details.ip_addresses.each { |address|
site.add_ip address
}
if run_details.engine
site.engine = run_details.engine
end
site.save nsc
puts "Created site #{run_details.site_name} successfully with the following host(s) #{run_details.ip_addresses.join(', ')}"
site
end
def self.get_new_nexpose_connection(run_details)
nsc = Nexpose::Connection.new run_details.connection_url, run_details.username, run_details.password, run_details.port
nsc.login
puts 'Successfully logged into the Nexpose Server'
nsc
end
def self.generate_report(sql, site, nsc)
report = Nexpose::AdhocReportConfig.new(nil, 'sql')
report.add_filter('version', '1.3.0')
report.add_filter('query', sql)
report.add_filter('site', site)
report_output = report.generate(nsc)
CSV.parse(report_output.chomp, {:headers => :first_row})
end
def self.generate_template_report(nsc, site, file_name, report_name, report_format)
adhoc = Nexpose::AdhocReportConfig.new(report_name, report_format, site)
data = adhoc.generate(nsc)
File.open(file_name, 'w') { |file| file.write(data) }
end
def self.generate_csv(csv_output, name)
CSV.open(name, 'w') do |csv_file|
csv_file << csv_output.headers
csv_output.each do |row|
csv_file << row
if name == CONSTANTS::VULNERABILITY_REPORT_NAME
puts '--------------------------------------'
puts "IP: #{row[0]}"
puts "Vulnerability: #{row[1]}"
puts "Date Vulnerability was Published: #{row[2]}"
puts "Severity: #{row[3]}"
puts "Summary: #{row[4]}"
puts '--------------------------------------'
end
end
end
end
end
end
In the command prompt, I am entering in the following code to run it (this file is called scan.rb):
ruby scan.rb "http://localhost:3780" "username" "password" "3780" "webpage" "ip-address" "full-audit-widget-corp"
So far, I've tried changing require to require_relative, as well as re-arranging the paths (like putting the whole path, for example). Neither has worked.
I also made sure to have the Ruby Development Kit installed.
Thanks!
please check the local gem list: gem list --local

My scraped data is empty (Rails and mechanize)

I am writing a simple script to scrape data from this link: https://www.congress.gov/members.
The script will go through each link of the member, follow that link, and scrape data from that link. This script is a .rake file on Ruby on Rails application.
Below is the script:
require 'mechanize'
require 'date'
require 'json'
require 'openssl'
module OpenSSL
module SSL
remove_const :VERIFY_PEER
end
end
OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
I_KNOW_THAT_OPENSSL_VERIFY_PEER_EQUALS_VERIFY_NONE_IS_WRONG = nil
task :testing do
agent = Mechanize.new
page = agent.get("https://www.congress.gov/members")
page_links = page.links_with(href: %r{^/member/\w+})
product_links = page_links[0...2]
products = product_links.map do |link|
product = link.click
state = product.search('td:nth-child(1)').text
website = product.search('.member_website+ td').text
{
state: state,
website: website
}
end
puts JSON.pretty_generate(products)
end
and below is the output when i ran this script/file:
Your regular expression does not match links.
Try this: page_links = page.links_with(href: %r{.*/member/\w+})
You can validate regular expressions here: http://rubular.com/

Mechanize ruby cannot see all content in linkedin

I've installed the mechanize gem in rails app and to test it I'm just copying and pasting the code below into the irb console. It logs into the page and I can put Orange into the search field and submit but then the next page has no content with "Orange" nor any of the orange employees that I see in my browser. Does linkedin have some security features to stop this or am I doing something wrong?
require 'rubygems'
require 'mechanize'
require 'nokogiri'
require 'open-uri'
#create agent
agent = Mechanize.new { |agent|
agent.user_agent_alias = 'Mac Safari 4'
}
agent.follow_meta_refresh = true
#visit page
page = agent.get("https://www.linkedin.com/")
#login
login_form = page.form('login')
login_form.session_key = "email"
login_form.session_password = "password"
page = agent.submit(login_form, login_form.buttons.first)
# get the form
form = agent.page.form_with(:name => "commonSearch")
#fill form out
form.keywords = 'Orange France'
# get the button you want from the form
button = form.button_with(:value => "Search")
# submit the form using that button
agent.submit(form, button)
agent.page.link_with(:text => "Orange")
=> nil
The problem with Mechanize is it won't work directly with JavaScript loaded content, like the one found on this scenario using a LinkedIn search.
A solution for this is to look on the page's body and use regular expressions to get the desired content, and then parse the results as JSON.
For example:
url = "http://www.linkedin.com/vsearch/p?type=people&keywords=dario+barrionuevo"
results = agent.get(url).body.scan(/\{"person"\:\{.*?\}\}/)
person = results.first # You'd use an each here, but for the example we'll get the first
json = JSON.parse(person)
json['person']['firstName'] # => 'Dario'
json['person']['lastName'] # => 'Barrionuevo'

How can I create log files on loop using mechanize with ruby

I am trying to make more than one log file on localhost
one file is sign_in.rb
require 'mechanize'
#agent = Mechanize.new
page = #agent.get('http://localhost:3000/users/sign_in')
form =page.forms.first
form["user[username]"] ='admin'
form["user[password]"]= '123456'
#agent.submit(form,form.buttons.first)
pp page
the second is profile_page.rb
require 'mechanize'
require_relative 'sign_in'
page = #agent.get('http://localhost:3000/users/admin')
form =page.forms.first
form.radiobuttons_with(:name => 'read_permission_level')[1].check
#agent.submit(form,form.buttons.first)
pp page
how can I combine these two files and run them on loop in order to create more than one log file
I don't know much about Mechanize, but is there any reason you can't simply combine the two bits of code and put them a while loop? I don't know how often you need to do Mechanize.new. To make more than one log file, simply open two different files and write to them.
require 'mechanize'
require_relative 'sign_in'
log1 = File.open("first.log", "w")
log2 = File.open("second.log", "w")
#agent = Mechanize.new
while true
# #agent = Mechanize.new # not sure if this is needed
page = #agent.get('http://localhost:3000/users/sign_in')
form = page.forms.first
form["user[username]"] ='admin'
form["user[password]"]= '123456'
#agent.submit(form,form.buttons.first)
PP.pp page, log1
# #agent = Mechanize.new # not sure if this is needed
page = #agent.get('http://localhost:3000/users/admin')
form = page.forms.first
form.radiobuttons_with(:name => 'read_permission_level')[1].check
#agent.submit(form,form.buttons.first)
PP.pp page, log2
end

Resources