Rake task to download and unzip - ruby-on-rails

I would like to update a cities table every week to reflect changes in cities across the world. I am creating a Rake task for the purpose. If possible, I would like to do this without adding another gem dependency.
The zipped file is a publicly available zipped file at geonames.org/15000cities.zip.
My attempt:
require 'net/http'
require 'zip'
namespace :geocities do
desc "Rake task to fetch Geocities city list every 3 days"
task :fetch do
uri = URI('http://download.geonames.org/export/dump/cities15000.zip')
zipped_folder = Net::HTTP.get(uri)
Zip::File.open(zipped_folder) do |unzipped_folder| #erroring here
unzipped_folder.each do |file|
Rails.root.join("", "list_of_cities.txt").write(file)
end
end
end
end
The return from rake geocities:fetch
rake aborted!
ArgumentError: string contains null byte
As detailed, I'm trying to unzip the file and save it to a list_of_cities.txt file. Once I the methodology down for accomplishing this, I believe I can figure out how to update my db, based on the file. (But if you have opinions on how best to handle the actual db update, other than my planned way, I'd love to hear them. But that seems like a different post entirely.)

This will save zipped_folder to disk, then unzip it and save its contents:
require 'net/http'
require 'zip'
namespace :geocities do
desc "Rake task to fetch Geocities city list every 3 days"
task :fetch do
uri = URI('http://download.geonames.org/export/dump/cities15000.zip')
zipped_folder = Net::HTTP.get(uri)
File.open('cities.zip', 'wb') do |file|
file.write(zipped_folder)
end
zip_file = Zip::File.open('cities.zip')
zip_file.each do |file|
file.extract
end
end
end
This will extract all files inside the zip file, in this case cities15000.txt.
You can then read the contents of cities15000.txt and update your database.
If you want to extract to a different file name, you can pass it to file.extract like this:
zip_file.each do |file|
file.extract('list_of_cities.txt')
end

I think it can be done more easily without ruby, just using wget and unzip:
namespace :geocities do
desc "Rake task to fetch Geocities city list every 3 days"
task :fetch do
`wget -c --tries=10 http://download.geonames.org/export/dump/cities15000.zip | unzip`
end
end

Related

Writing TestCase for CSV import rake task

I have a simple rails application where I import data from csv into my rails app which is functioning properly, but I have no idea where to start with testing this rake task, as well as where in a modular rails app. Any help would be appreciated. Thanks!
Hint
My Rails structure is a little different from traditional rails structures, as I have written a Modular Rails App. My structure is in the picture below:
engines/csv_importer/lib/tasks/web_import.rake
The rake task that imports from csv..
require 'open-uri'
require 'csv'
namespace :web_import do
desc 'Import users from csv'
task users: :environment do
url = 'http://blablabla.com/content/people.csv'
# I forced encoding so avoid UndefinedConversionError "\xC3" from ASCII-8BIT to UTF-8
csv_string = open(url).read.force_encoding('UTF-8')
counter = 0
duplicate_counter = 0
user = []
CSV.parse(csv_string, headers: true, header_converters: :symbol) do |row|
next unless row[:name].present? && row[:email_address].present?
user = CsvImporter::User.create row.to_h
if user.persisted?
counter += 1
else
duplicate_counter += 1
end
end
p "Email duplicate record: #{user.email_address} - #{user.errors.full_messages.join(',')}" if user.errors.any?
p "Imported #{counter} users, #{duplicate_counter} duplicate rows ain't added in total"
end
end
Mounted csv_importer in my parent structure
This makes the csv_importer engine available in the root of the application.
Rails.application.routes.draw do
mount CsvImporter::Engine => '/', as: 'csv_importer'
end
To correctly migrate in the root of the application, I added initializer
/engines/csv_importer/lib/csv_importer/engine.rb
module CsvImporter
class Engine < ::Rails::Engine
isolate_namespace CsvImporter
# This enables me to be able to correctly migrate the database from the parent application.
initializer :append_migrations do |app|
unless app.root.to_s.match(root.to_s)
config.paths['db/migrate'].expanded.each do |p|
app.config.paths['db/migrate'] << p
end
end
end
end
end
So with this explanation am able to run rails app like every other rails application. I explained this so anyone who will help will understand what to help me with as regards writing test for the rake task inside the engine.
What I have done as regards writing TEST
task import: [:environment] do
desc 'Import CSV file'
task test: :environment do
# CSV.import 'people.csv'
Rake::Task['app:test:db'].invoke
end
end
How do someone write test for a rake task in a modular app? Thanks!
I haven't worked with engines, but is there a way to just put the CSV importing logic into it's own class?
namespace :web_import do
desc 'Import users from csv'
task users: :environment do
WebImport.new(url: 'http://blablabla.com/content/people.csv').call
end
end
class WebImport # (or whatever name you want)
def initialize(url) ... end
def call
counter, CSV parse, etc...
end
end
That way you can bump into the Rails console to do the WebImport and you can also do a test isolating WebImport. When you do Rake tasks and Jobs (Sidekiq etc), you want to make the Rake task act as as thin a wrapper as possible around the actual meat of the code (which is in this case CSV parsing). Separate the "trigger the csv parse" code from the "actually parse the csv" code into their own classes or files.

The case of the disappearing ActiveRecord attribute

Following the instructions in https://stackoverflow.com/a/24496452/102675 I wound up with the following:
namespace :db do
desc 'Drop, create, migrate, seed and populate sample data'
task seed_sample_data: [:drop, :create, :migrate, :seed, :populate_sample_data] do
puts 'Sample Data Populated. Ready to go!'
end
desc 'Populate the database with sample data'
task populate_sample_data: :environment do
puts Inspector.column_names.include?('how_to_fix')
# create my sample data
end
end
As you would expect, I get true if I run bundle exec rake db:populate_sample_data
BUT if I run bundle exec rake db:seed_sample_data I get all the migration output and then false. In other words I can't see the Inspector attribute how_to_fix even though it definitely exists as proved by the other rake run. Where did my attribute go?
My guess is that this is a "caching" problem. Can you try the following?
task populate_sample_data: :environment do
Inspector.reset_column_information
# ...
end
P.S. We used to have a similar problem working with different databases having the exact same schema (only except some columns here and there)

Error: Importing Data CSV Rails 4

I'm very new to the concept of importing data into a SQL database with CSV. I've followed some stackoverflow posts but I'm getting an error. The error states, Errno::ENOENT: No such file or directory # rb_sysopen - products.csv after running rake import:data. I have csv required in my application.rb as well as I have created a csv file and placed it in TMP. Here is my code so far. I understand I may be asking for a lot from the community but if someone were to answer this question, can you provide some more insight into CSV and rake functions. Thanks so much!!!
<b>import.rake</b>
namespace :import do
desc "imports data from a csv file"
task :data => :environment do
require 'csv'
CSV.foreach('tmp/products.csv') do |row|
name = row[0]
price = row[1].to_i
Product.create( name: name, price: price )
end
end
end
Specify the full path to the CSV file.
For example, if the file is in /tmp/ use:
CSV.foreach('/tmp/products.csv') do |row|
If the products.csv file is in your application's tmp directory use:
CSV.foreach(Rails.root.join('tmp', 'products.csv')) do |row|
I ran into something similar, it was forgetting to put both parenthesis with the braces so you might want to try going from:
Product.create( name: name, price: price )
to:
Product.create({ name: name, price: price })
Check out the smarter_csv Gem.
In it's simplest form you can do this:
SmarterCSV.process('tmp/products.csv').each do |hash|
Product.create( hash )
end
Add smarter_csv to your Gemfile, so it's auto-loaded when you require the environment in your Rake task
This gives you:
namespace :import do
desc 'imports data from given csv file'
task :data, [:filename] => :environment do |t, args|
fail "File not found" unless File.exists? args[:filename]
options = {} # add options if needed
SmarterCSV.process( args[:filename], options).each do |hash|
Product.create( hash )
end
end
end
Call it like this:
rake import:data['/tmp/products.csv']
See also: https://github.com/tilo/smarter_csv

How to integrate my own scraper in Rails app?

I have just created a Rails app with a model app/models/post.rb and have written a scraper scrapers/base_scraper.rb (class BaseScraper) that collect data from the target site to the hash variable data. Now I want to insert values of data into the Post model. How to do it properly in Rails? I have heard smth about Rake but have no idea how to utilize it properly. Help me please!
Assuming that data stores just one post and that each of the key stored in the datahash are valid Post fields (column_name), you can do simply this:
Post.create(data)
If you want to launch the whole process from console, you can create a rake task under lib/tasks directory of your process with the following:
# scraper.rake
namespace :scraper do
desc "Run scraper"
task :run => :environment do
data = BaseScraper.your_collect_data_class_method
Post.create(data) if data
end
end
task :default => 'scraper:run'
And then run it from console as a rake task with rake scraper
Of course I also assume that scrapers dir is in your Rails load path.
If not, add it to your application.rbfile.
# application.rb
...
module YourApp
class Application < Rails::Application
...
config.autoload_paths += Dir["#{config.root}/scrapers/"]
...
end
end

Rails Generate Custom Rakefile

I'm working on a project that is migrating data from a customers old_busted DB into rails objects to be worked on later. Similarly, I need to convert these objects into a CSV and upload it to a neutral FTP (this is to allow a coworker to build the example pages through Sugar CRM). I've created rake files to do all of this, and it was successful. Now, I'm going to continue this process for each object that I create in rails (relative to the previous DB) and, best case, wanted these generated when I run rake generate scaffold <object>.
Here is my import rake:
desc "Import Clients from db"
task :get_busted_clients => [:environment] do
#old_clients = Busted::Client.all
#old_clients.each do |row|
#client = Client.new();
#client.client_id = row.NUMBER
#client.save
end
end
Here is my CSV convert/FTP upload rake:
desc "Exports db's to local CSV and uploads them to FTP"
task :export_clients_CSV => [:environment] do
# Required libraries for CSV read/write and NET/FTP IO #
require 'csv'
require 'net/ftp'
# Pull all Editor objects into clients for reading #
clients = Client.all
puts "Creating CSV file for <Clients> and updating column names..."
# Open a new CSV file that uses the column headers from Client #
CSV.open("clients.csv", "wb",
:write_headers => true, :headers => Client.column_names) do |csv|
puts "--Loading each entry..."
# Load all entries from Client into the CSV file row by row #
clients.each do |client|
# This line specifically puts the attributes in the rows WITH RESPECT TO#
# THE COLUMNS
csv << client.attributes.values_at(*Client.column_names)
end
puts "--Done loading each entry..."
end
puts "...Data populated. Finished bulding CSV. Closing File."
puts "------------------------"
# Upload CSV File to FTP server by requesting new FTP connection, assigning credentials
# and informing the client what file to look for and what to name it
puts "Uploading <Clients>..."
ftp = Net::FTP.new('192.168.xxx.xxx')
ftp.login(user = "user", passwd = "passwd")
ftp.puttextfile("clients.csv", "clients.csv")
ftp.quit()
puts "...Finished."
end
I ran rake generate g get_busted and put this in my get_busted_generator.rb:
class GetBustedGenerator < Rails::Generators::NamedBase
source_root File.expand_path('../templates', __FILE__)
def generate_get_busted
copy_file "getbusted.rake", "lib/tasks/#{file_name}.rake"
end
end
After that, I got lost. I can't find anything on templating a rake file or the syntax included to do so.
Rails has been a recent endeavor and I may be overlooking something in terms of design of the solution to my problem.
TL;DR: Is templating a rake file a bad thing? Solution alternatives? If not, whats the syntax for generating either script custom to the object (or point me in the direction, please).

Resources