rails mechanize run through each url in a postgres table - ruby-on-rails

*Edit:
Per my comment below, I guess a better question would be, 'What would the proper way be to have mechanize go through each url and update its name column? (each name would be unique to the url)' Below is what I've been basing my exercise on. *
I have a postgres table that goes like...
| name (string) | url (text) |
The url column is already populated with various url's and appears like this one:
http://www.a4apps.com/Websites/SampleCalendar/tabid/89/ctl/Register/Default.aspx
I am trying to run a mechanize rake task that will run through each url and update the name based on the text it finds at a css tag.
namespace :db do
desc "Fetch css from db urls"
task :fetch_css => :environment do
require 'rubygems'
require 'mechanize'
require 'open-uri'
agent = Mechanize.new
url = Mytable.pluck(:url)
agent.get(url)
agent.page.search('#dnn_ctr444_ContentPane').each do |item|
name = item.css('.EventNextPrev:nth-child(1) a').text
Mytable.update(:name => name)
end
end
end
When I run the rake task it returns:
rake aborted!
bad URI(is not URI?): %255B%2522http://www.a4apps.com/Websites/SampleCalendar/tabid/89/Default.aspx%2522,%2520%2522http://www.a4apps.com/Websites/SampleCalendar/tabid/89/ctl/Privacy/Default.aspx%2522,%2520%2522http://www.a4apps.com/Websites/SampleCalendar/tabid/89/ctl/Terms/Default.aspx%2522,%2520%2522http://www.a4apps.com/Websites/SampleCalendar/tabid/89/ctl/Register/Default.aspx%2522%255D
Thanks for any help. If there's any way I can make the question easier to answer, please let me know.
Mike

I feel a little lonely answering my own questions lately but I'll post my answers in the event that someone else finds themselves in the same bind. Also, maybe others will tell me if my solution has any fatal flaws that I am not seeing yet. Here is my final rake that seems to be working, getting urls from my table, running mechanize on them and updating the table with the info found at the urls...
namespace :db do
desc "Fetch css from db urls"
task :fetch_css => :environment do
Mytable.all.each do |info| # for each row do...
require 'rubygems'
require 'mechanize'
require 'open-uri'
agent = Mechanize.new
agent.get(info.url) # get the url column data for the current db row...
nombre = agent.page.search('.EventNextPrev:nth-child(1) a').text # plug it into mech.
info.update_attributes(:name => nombre) # and update the db with the css result.
end
end
end
Thanks.
Mike

Related

Writing TestCase for CSV import rake task

I have a simple rails application where I import data from csv into my rails app which is functioning properly, but I have no idea where to start with testing this rake task, as well as where in a modular rails app. Any help would be appreciated. Thanks!
Hint
My Rails structure is a little different from traditional rails structures, as I have written a Modular Rails App. My structure is in the picture below:
engines/csv_importer/lib/tasks/web_import.rake
The rake task that imports from csv..
require 'open-uri'
require 'csv'
namespace :web_import do
desc 'Import users from csv'
task users: :environment do
url = 'http://blablabla.com/content/people.csv'
# I forced encoding so avoid UndefinedConversionError "\xC3" from ASCII-8BIT to UTF-8
csv_string = open(url).read.force_encoding('UTF-8')
counter = 0
duplicate_counter = 0
user = []
CSV.parse(csv_string, headers: true, header_converters: :symbol) do |row|
next unless row[:name].present? && row[:email_address].present?
user = CsvImporter::User.create row.to_h
if user.persisted?
counter += 1
else
duplicate_counter += 1
end
end
p "Email duplicate record: #{user.email_address} - #{user.errors.full_messages.join(',')}" if user.errors.any?
p "Imported #{counter} users, #{duplicate_counter} duplicate rows ain't added in total"
end
end
Mounted csv_importer in my parent structure
This makes the csv_importer engine available in the root of the application.
Rails.application.routes.draw do
mount CsvImporter::Engine => '/', as: 'csv_importer'
end
To correctly migrate in the root of the application, I added initializer
/engines/csv_importer/lib/csv_importer/engine.rb
module CsvImporter
class Engine < ::Rails::Engine
isolate_namespace CsvImporter
# This enables me to be able to correctly migrate the database from the parent application.
initializer :append_migrations do |app|
unless app.root.to_s.match(root.to_s)
config.paths['db/migrate'].expanded.each do |p|
app.config.paths['db/migrate'] << p
end
end
end
end
end
So with this explanation am able to run rails app like every other rails application. I explained this so anyone who will help will understand what to help me with as regards writing test for the rake task inside the engine.
What I have done as regards writing TEST
task import: [:environment] do
desc 'Import CSV file'
task test: :environment do
# CSV.import 'people.csv'
Rake::Task['app:test:db'].invoke
end
end
How do someone write test for a rake task in a modular app? Thanks!
I haven't worked with engines, but is there a way to just put the CSV importing logic into it's own class?
namespace :web_import do
desc 'Import users from csv'
task users: :environment do
WebImport.new(url: 'http://blablabla.com/content/people.csv').call
end
end
class WebImport # (or whatever name you want)
def initialize(url) ... end
def call
counter, CSV parse, etc...
end
end
That way you can bump into the Rails console to do the WebImport and you can also do a test isolating WebImport. When you do Rake tasks and Jobs (Sidekiq etc), you want to make the Rake task act as as thin a wrapper as possible around the actual meat of the code (which is in this case CSV parsing). Separate the "trigger the csv parse" code from the "actually parse the csv" code into their own classes or files.

NameError: uninitialized constant Class::Class in a rake-task

I'm trying to make a rake-task in a such way:
require 'open-uri'
namespace :news_parser do
desc 'Parsing news from 6 news sites'
task :parse_news do
load 'lib/news_parser.rb'
ProcherkParser.new.save_novelties
VikkaParser.new.save_novelties
InfomistParser.new.save_novelties
ZmiParser.new.save_novelties
VycherpnoParser.new.save_novelties
ProvceParser.new.save_novelties
end
end
In my lib/news_parser.rb I have classes and instance methods, which perfectly work in a rails console, by doing the following:
load 'lib/news_parser.rb'
ProcherkParser.new.save_novelties
It saves to my db all the information I need. But how can I do it in a rake-task? Any help would be appreciate. Thanks.
Does it work when you replace
task :parse_news do
with
task :parse_news => :environment do
?
It will load your Rails environment before your task, and your code should work just like in the rails console.
Also, you could DRY your code a bit :
require 'open-uri'
namespace :news_parser do
desc 'Parsing news from 6 news sites'
task :parse_news => :environment do
load 'lib/news_parser.rb'
[ProcherkParser, VikkaParser, InfomistParser, ZmiParser, VycherpnoParser, ProvceParser].each do |parser_klass|
parser_klass.new.save_novelties
end
end
end
This is how I am using in my app
namespace :raw_logs do
desc 'Download - Decrypt - Save raw logs'
task process: :environment do
LogItem.create(org_id: 12)
end
end
My rake task is rake raw_logs:process

Csv import to database form url

I'm trying to have this url place everything in the csv into my database.
here is the code i have so far
require 'open-uri'
namespace populate do
task reload: :environment do
Afftable.delete_all
url = 'something.com/format=csv'
CSV.open(url).each do |row|
Afftable.create(name: row[0], address: row[1])
end
end
end
When i try running this command i get this error
Command: bundle exec rake populate:reload
Error: NameError: undefined local variable or methodpopulate' for main:Object`
My database has all the headers in the csv file already inside the database. (i've not touched the create yet as i don't really know what im doing with this)
I think that you need a colon in front of populate.
namespace :populate do

Rake task to download and unzip

I would like to update a cities table every week to reflect changes in cities across the world. I am creating a Rake task for the purpose. If possible, I would like to do this without adding another gem dependency.
The zipped file is a publicly available zipped file at geonames.org/15000cities.zip.
My attempt:
require 'net/http'
require 'zip'
namespace :geocities do
desc "Rake task to fetch Geocities city list every 3 days"
task :fetch do
uri = URI('http://download.geonames.org/export/dump/cities15000.zip')
zipped_folder = Net::HTTP.get(uri)
Zip::File.open(zipped_folder) do |unzipped_folder| #erroring here
unzipped_folder.each do |file|
Rails.root.join("", "list_of_cities.txt").write(file)
end
end
end
end
The return from rake geocities:fetch
rake aborted!
ArgumentError: string contains null byte
As detailed, I'm trying to unzip the file and save it to a list_of_cities.txt file. Once I the methodology down for accomplishing this, I believe I can figure out how to update my db, based on the file. (But if you have opinions on how best to handle the actual db update, other than my planned way, I'd love to hear them. But that seems like a different post entirely.)
This will save zipped_folder to disk, then unzip it and save its contents:
require 'net/http'
require 'zip'
namespace :geocities do
desc "Rake task to fetch Geocities city list every 3 days"
task :fetch do
uri = URI('http://download.geonames.org/export/dump/cities15000.zip')
zipped_folder = Net::HTTP.get(uri)
File.open('cities.zip', 'wb') do |file|
file.write(zipped_folder)
end
zip_file = Zip::File.open('cities.zip')
zip_file.each do |file|
file.extract
end
end
end
This will extract all files inside the zip file, in this case cities15000.txt.
You can then read the contents of cities15000.txt and update your database.
If you want to extract to a different file name, you can pass it to file.extract like this:
zip_file.each do |file|
file.extract('list_of_cities.txt')
end
I think it can be done more easily without ruby, just using wget and unzip:
namespace :geocities do
desc "Rake task to fetch Geocities city list every 3 days"
task :fetch do
`wget -c --tries=10 http://download.geonames.org/export/dump/cities15000.zip | unzip`
end
end

Calling a Rake task from a controller with parameters

I have a instance variable in my controller that queries my table for a value, and I need to send that value through to my rake task.
So here are the 2 relevant lines in my controller:
#turl = Fteam.where(:id => #ids).select(:TeamUrl)
system "rake updateTm:update[#turl]"
Here is my rake file:
desc "Import Players"
task :update, [:Tmurl] => :environment do |t, args|
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'mechanize'
agent = Mechanize.new
puts "This is the selected Team URL: #{args.Tmurl}"
end
end
This is what the rake task returns:
This is the selected Team URL: #turl
My guess is that the controller is not passing the variable correctly. So how can I pass the actual value of the variable to the rake task so the output is correct?
probably you may need to use interpolation
"rake updateTm:update[#{#turl}]"
It is a .rb file. and and in ruby we can'y call variable directly.
You should use
system "rake updateTm:update[#{#turl}]"
Thanks for all the help and ideas everyone, i came up with a solution.
Instead of #turl = Fteam.where(:id => #ids).select(:TeamUrl)
I changed that to #turl = Fteam.where(:id => #ids).pluck(:TeamUrl)
That gave me the actual value that i needed rather then the active record, which was causing the error because it could not pass a value for which it could not translate or understand.

Resources