I have written a ruby script (code below) to scrape from Deliveroo.co.uk.
Right now I run it manually by going to terminal and typing in 'ruby ....rb'.
How do I automate things so that this script runs automatically every hour?
Also, how do I save the output from each run without overwriting the previous output?
Code is below.. thank you.
require 'open-uri'
require 'nokogiri'
require 'csv'
# Store URL to be scraped
url = "https://deliveroo.co.uk/restaurants/london/maida-vale?postcode=W92DE"
# Parse the page with Nokogiri
page = Nokogiri::HTML(open(url))
# Display output onto the screen
name =[]
page.css('span.list-item-title.restaurant-name').each do |line|
name << line.text.strip
end
category = []
page.css('span.restaurant-detail.detail-cat').each do |line|
category << line.text.strip
end
delivery_time = []
page.css('span.restaurant-detail.detail-time').each do |line|
delivery_time << line.text.strip
end
distance = []
page.css('span.restaurant-detail.detail-distance').each do |line|
distance << line.text.strip
end
status = []
page.css('li.restaurant--details').each do |line|
if line.attr("class").include? "unavailable"
sts = "closed"
else
sts = "open"
end
status << sts
end
# Write data to CSV file
CSV.open("deliveroo.csv", "w") do |file|
file << ["Name", "Category", "Delivery Time", "Distance", "Status"]
name.length.times do |i|
file << [name[i], category[i], delivery_time[i], distance[i], status[i]]
end
end
There's two questions, I'll try to answer them below.
How to run periodically:
What you are looking for is a cronjob, there are many resources out there for creating one.
Look into cron or gems like whenever / clockwork.
Save output between multiple runs: In order to save the output you could just write to a file directly in ruby, very similar to what you are doing right now.
The way you're saving it right now is:
CSV.open("deliveroo.csv", "w") do |file|
The "w" opens the file and overwrites any content present in it, try "a" (append) instead.
CSV.open("deliveroo.csv", "a") do |file|
Read more here about opening files in different modes: File opening mode in Ruby
Related
I want to create a CRON task for daily report. I need guidance where to create my class in my project (in which folder). How to instantiate an object from rails console for the same class. Will that class inherit application controller? I would also like to know since i will be querying my database so would my models be directly accessible in this file or somehow i have to include them like we do in django?
I have created a class /lib/tasks/daily_report.rb. But i am unable to understand how will i use that file to create a task.
module Reports
class Report
class << self
def collect_data
row_data = []
headers = ["Mobile", "Buildings", "Owners", "Tenants", "Members", "Total People"]
row_data.push(*headers)
puts "in side collect data"
date = Date.today.to_s
mobile = ["mobiles"]
for i in mobile do
row = []
row << i
build_count = Buildings.where(created_at: date, added_by: i).count
row << build_count
puts "build_count"
owners_count = Residents.where(created_at: date, added_by: i, role: "owner").count
row << owners_count
puts "owners_count"
tenants_count = Residents.where(created_at: date, added_by: i, role: "tenant").count
row << tenants_count
members_count = MemeberRelations.where(created_at: date, added_by: i).count
row << members_count
total_people = owners_count + tenants_count + members_count
row << total_people
row_data << row
end
puts row_data
return row_data
end
def generate_csv()
puts "walk away"
row_data = self.collect_data
CSV.open('/home/rajdeep/police-api/daily_report.csv', 'w') do |csv|
row_data.each { |ar| csv << ar }
end
end
end
end
end
If you wish to manage cron tasks from Rails, try whenever gem.
Add it to your Gemfile,
Gemfile
gem 'whenever', require: false
Run initialize task from root of your app
$ bundle exec wheneverize .
This will create an initial config/schedule.rb file for you (as long
as the config folder is already present in your project)
(from the gem docs).
After that in config/schedule.rb set proper parameters of call time. For example
config/schedule.rb
every :hour do # Many shortcuts available: :hour, :day, :month, :year, :reboot
runner "Report.generate_csv"
end
More syntax options of schedule.rb here
UPDATE AFTER COMMENTS
Hope, you're under Rails context yet. Create file in public folder at application root path.
result_file = "#{Rails.root}/public/cards-excluded.csv"
CSV.open(result_file, 'w') do |csv|
row_data.each { |ar| csv << ar }
end
ANOTHER UPDATE LATER
Okay, although this is not relevant to the original question, let's try to solve your problem.
We'll proceed from what you have Rails application, not custom Ruby library.
First, create module at your_rals_app/lib/reports.rb file
module Reports
class Report
class << self
def collect_data
# your current code and line below, explicit return
return row_data
end
def generate_csv
row_data = collect_data # btw unnecessary assignment
CSV.open('/home/rajdeep/police-api/daily_report.csv', 'w') do |csv|
row_data.each { |ar| csv << ar }
end
end
end
end
end
Second, make sure, that you have lib files at autoload path. Check it in you config/application.rb
config.autoload_paths += %W(#{config.root}/lib/*)
Thirdly, use Reports module such way (> means that you're at rails console, rails c)
> Reports::Report.generate_csv
I am running a transaction download script through Ruby. I was wondering if it is possible to label each .csv it creates with the current date/time the script was run. Below is the end of the script.
CSV.open("transaction_report.csv", "w") do |csv|
csv << header_row
search_results.each do |transaction|
transaction_details_row = header_row.map{ |attribute| transaction.send(attribute) }
csv << transaction_details_row
end
end
Like this?
CSV.open("transaction_report-#{Time.now}.csv", "w") do |csv|
csv << header_row
search_results.each do |transaction|
transaction_details_row = header_row.map{ |attribute| transaction.send(attribute) }
csv << transaction_details_row
end
end
This just appends the time of generation to the file name. For example:
"transaction_report-#{Time.now}.csv"
# => "transaction_report-2019-10-10 16:09:07 +0100.csv"
If you want to avoid spaces in the file name, you can sub these out like so:
"transaction_report-#{Time.now.to_s.gsub(/\s/, '-')}.csv"
# => "transaction_report-2019-10-10-16:09:40-+0100.csv"
Is that what you're after? It sounds right based on the question, though happy to update if you're able to correct me :)
I have a list of names (names.txt) separated by line. After I loop through each line, I'd like to move it to another file (processed.txt).
My current implementation to loop through each line:
open("names.txt") do |csv|
csv.each_line do |line|
url = line.split("\n")
puts url
# Remove line from this file amd move it to processed.txt
end
end
def readput
#names = File.readlines("names.txt")
File.open("processed.txt", "w+") do |f|
f.puts(#names)
end
end
You can do it like this:
File.open('processed.txt', 'a') do |file|
open("names.txt") do |csv|
csv.each_line do |line|
url = line.chomp
# Do something interesting with url...
file.puts url
end
end
end
This will result in processed.txt containing all of the urls that were processed with this code.
Note: Removing the line from names.txt is not practical using this method. See How do I remove lines of data in the middle of a text file with Ruby for more information. If this is a real goal of this solution, it will be a much larger implementation with some design considerations that need to be defined.
I am trying to scrape list of restaurants for my zip code from Deliveroo.co.uk
I need to add a way to figure out whether a restaurant is open or closed... from the website its very clear, but I just need to update my code to reflect this.
How do I go about doing this? I need to create something like a 'status' variable and then set each restaurant to 'open' or 'closed'.
Here is the website I'm trying to scrape from: https://deliveroo.co.uk/restaurants/london/maida-vale?postcode=W92DE&time=1800&day=today
And my code is below.
thanks.
require 'open-uri'
require 'nokogiri'
require 'csv'
# Store URL to be scraped
url = "https://deliveroo.co.uk/restaurants/london/maida-vale?postcode=W92DE"
# Parse the page with Nokogiri
page = Nokogiri::HTML(open(url))
# Display output onto the screen
name =[]
page.css('span.list-item-title.restaurant-name').each do |line|
name << line.text
end
category = []
page.css('span.restaurant-detail.detail-cat').each do |line|
category << line.text
end
delivery_time = []
page.css('span.restaurant-detail.detail-time').each do |line|
delivery_time << line.text
end
distance = []
page.css('span.restaurant-detail.detail-distance').each do |line|
distance << line.text
end
status = []
# Write data to CSV file
CSV.open("deliveroo.csv", "w") do |file|
file << ["Name", "Category", "Delivery Time", "Distance", "Status"]
name.length.times do |i|
file << [name[i], category[i], delivery_time[i], distance[i]]
end
end
end
We need to check li.restaurant--details have / have not class unavailable for close / open restaurant.
status = []
page.css('li.restaurant--details').each do |line|
if line.attr("class").include? "unavailable"
sts = "closed"
else
sts = "open"
end
status << sts
end
Btw, you should remove white space when get restaurant_name, etc ...
page.css('span.list-item-title.restaurant-name').each do |line|
name << line.text.strip
end
You can refer my code at here: https://gist.github.com/vinhnglx/4eaeb2e8511dd1454f42
ruby n00b here in hope of some guidance. I am looking to scrape a website (600-odd names and links on one page) and output to CSV. The scraping itself works fine (the output correctly fills the terminal as the script runs), but I can't get the CSV to populate. The code:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'csv'
url = "http://www.example.com/page/"
page = Nokogiri::HTML(open(url))
page.css('.item').each do |item|
name = item.at_css('a').text
link = item.at_css('a')[:href]
foo = puts "#{name}"
bar = "#{link}"
CSV.open("file.csv", "wb") do |csv|
csv << [foo, bar]
end
end
puts "upload complete!"
...replacing the csv << [foo, bar] with csv << [name, link] just puts the final iteration into the CSV. I feel there's something basic I am missing here. Thanks for reading.
The problem is that you're doing CSV.open for every single item. So it's overwriting the file with the newer item. And hence at the end, you're left with the last item in the csv file.
Move the CSV.open call before page.css('.item').each and it should work.
CSV.open("file.csv", "wb") do |csv|
page.css('.item').each do |item|
name = item.at_css('a').text
link = item.at_css('a')[:href]
csv << [name, link]
end
end