How to integrate my own scraper in Rails app? - ruby-on-rails

I have just created a Rails app with a model app/models/post.rb and have written a scraper scrapers/base_scraper.rb (class BaseScraper) that collect data from the target site to the hash variable data. Now I want to insert values of data into the Post model. How to do it properly in Rails? I have heard smth about Rake but have no idea how to utilize it properly. Help me please!

Assuming that data stores just one post and that each of the key stored in the datahash are valid Post fields (column_name), you can do simply this:
Post.create(data)
If you want to launch the whole process from console, you can create a rake task under lib/tasks directory of your process with the following:
# scraper.rake
namespace :scraper do
desc "Run scraper"
task :run => :environment do
data = BaseScraper.your_collect_data_class_method
Post.create(data) if data
end
end
task :default => 'scraper:run'
And then run it from console as a rake task with rake scraper
Of course I also assume that scrapers dir is in your Rails load path.
If not, add it to your application.rbfile.
# application.rb
...
module YourApp
class Application < Rails::Application
...
config.autoload_paths += Dir["#{config.root}/scrapers/"]
...
end
end

Related

How to properly call methods within a Rake task in Rails?

I'm starting with RoR, so I have a question about the implementation of rake tasks.
I have a rake file with a few lines that made some scheduled tasks with Twitter Gem Api.
Run the rake works properly, but I want to order my code.
Now the rake looks like this:
desc "This task is called by the scheduler"
task :update_feed => :environment do
require 'twitter'
client = Twitter::REST::Client.new do |config|
#Client configs
end
here.i.do.some.stuff
end
I'm pretty sure that have the client configs right there isn't a good idea.
On the other hand, I want to be able to save the methods within the model so I can use them elsewhere.
So, I'm trying to do this unsuccessfully:
#On tweet.rb model file:
class Tweet < ApplicationRecord
validates :tweet_id, uniqueness: true
def here_i_do_some_stuff
#do some stuff
#Plus config the client
end
end
#And the rake file
task :update_feed => :environment do
Tweet.here_i_do_some_stuff
end
And that's not working for
rake aborted!
NoMethodError: undefined method `here_i_do_some_stuff' for Tweet (call 'Tweet.connection' to establish a connection):Class
I don't really know how to do this, I tried other ways like write on other files and require them, but I think this is the correct way, but I don't know how to make it run properly.
Thanks, JR.

Cannot load namespaced model when invoking rake task

I got a rake task which invokes other rake tasks, so my development data can be easily reset.
the first rake task (lib/tasks/populate.rake)
# Rake task to populate development database with test data
# Run it with "rake db:populate"
namespace :db do
desc 'Erase and fill database'
task populate: :environment do
...
Rake::Task['test_data:create_company_plans'].invoke
Rake::Task['test_data:create_companies'].invoke
Rake::Task['test_data:create_users'].invoke
...
end
end
the second rake task (lib/tasks/populate_sub_scripts/create_company_plans.rake)
namespace :test_data do
desc 'Create Company Plans'
task create_company_plans: :environment do
Company::ProfilePlan.create!(name: 'Basic', trial_period_days: 30, price_monthly_cents: 4000)
Company::ProfilePlan.create!(name: 'Professional', trial_period_days: 30, price_monthly_cents: 27_500)
Company::ProfilePlan.create!(name: 'Enterprise', trial_period_days: 30, price_monthly_cents: 78_500)
end
end
when I run bin/rake db:populate then i get this error
rake aborted! LoadError: Unable to autoload constant
Company::ProfilePlan, expected
/home/.../app/models/company/profile_plan.rb to define it
but when I run the second rake task independently it works well.
The model (path: /home/.../app/models/company/profile_plan.rb)
class Company::ProfilePlan < ActiveRecord::Base
# == Constants ============================================================
# == Attributes ===========================================================
# == Extensions ===========================================================
monetize :price_monthly_cents
# == Relationships ========================================================
has_many :profile_subscriptions
# == Validations ==========================================================
# == Scopes ===============================================================
# == Callbacks ============================================================
# == Class Methods ========================================================
# == Instance Methods =====================================================
end
Rails 5.0.1
Ruby 2.4.0
The App was just upgraded from 4.2 to 5
It works when I require the whole path:
require "#{Rails.root}/app/models/company/profile_plan.rb"
But this seems strange to me, because in the error message rails has the correct path to the Model. Does someone know why I have to require the file when invoked from another rake task?
Thank you very much
Well, it seems that rake doesn't eager load, so when you call the create_company_plans.rake alone it loads the referred objects, however when you invoke it from another rake, it doesn't know you will need them and so they are not loaded.
You can take a look at this other QA which was similar to yours.
I think maybe you don't need to require the whole path, just:
require 'models/company/profile_plan'
From what I understand, you can probably overcome the problem by reenable ing and then revoke ing the task as given below. Pardon me if this doesn't work.
['test_data:create_company_plans', 'test_data:create_companies'].each do |task|
Rake::Task[task].reenable
Rake::Task[task].invoke
end
There is more info on this stackoverflow question how-to-run-rake-tasks-from-within-rake-tasks .

How do I access one of my models from within a ruby script in the /lib/ folder in my Rails 3 app?

I tried putting my script in a class that inherited from my model, like so:
class ScriptName < MyModel
But when I ran rake my_script at the command-line, I got this error:
rake aborted!
uninitialized constant MyModel
What am I doing wrong?
Also, should I name my file my_script.rb or my_script.rake?
Just require the file. I do this in one of my rake tasks (which I name my_script.rake)
require "#{Rails.root.to_s}/app/models/my_model.rb"
Here's a full example
# lib/tasks/my_script.rake
require "#{Rails.root.to_s}/app/models/video.rb"
class Vid2 < Video
def self.say_hello
"Hello I am vid2"
end
end
namespace :stuff do
desc "hello"
task :hello => :environment do
puts "saying hello..."
puts Vid2.say_hello
puts "Finished!"
end
end
But a better design is to have the rake task simply call a helper method. The benefits are that it's easier to scan the available rake tasks, easier to debug, and the code the rake task runs becomes very testable. You could add a rake_helper_spec.rb file for example.
# /lib/rake_helper.rb
class Vid2 < Video
def self.say_hello
"Hello I am vid2"
end
end
# lib/tasks/myscript.rake
namespace :stuff do
desc "hello"
task :hello => :environment do
Vid2.say_hello
end
end
All I had to do to get this to work was put my requires above the task specification, and then just declare the :environment flag like so:
task :my_script => :environment do
#some code here
end
Just by doing that, gave me access to all my models. I didn't need to require 'active_record' or even require my model.
Just specified environment and all my models were accessible.
I was also having a problem with Nokogiri, all I did was removed it from the top of my file as a require and added it to my Gemfile.

Resque worker not recognizing the Rails Mongoid model

I am using resque in my application for delayed jobs, where i cant send emails & sms to bulk number of users asynchronously. And the data is stored in mongodb, mongoid is the ODM connects rails & mongo.
My mongoid model looks like this
class Item
include Mongoid::Document
include Geo::LocationHelper
field :name, :type => String
field :desc, :type => String
#resque queue name
#queue = :item_notification
#resque perform method
def self.perform(item_id)
#item = Item.find(item_id)
end
end
I can able to add jobs to resque, i have verified using resque-web. Whenever i start an resque-worker
QUEUE=item_notification rake resque:work
i got the uninitialized constant Item , since i am using resque as rails gem and starting rake in rails root, i believe my mongoid models should be loaded.
After digging lot, i found that we can explicitly ask rake to load the environment by
QUEUE=item_notification rake environment resque:work
but now also i got the same error uninitialized constant Item
can someone help me out?
and my
Actually, its a problem in dev environment. after adding this line to into resque.rake task file
# load the Rails app all the time
namespace :resque do
puts "Loading Rails environment for Resque"
task :setup => :environment
ActiveRecord::Base.send(:descendants).each { |klass| klass.columns }
end
it works fine
The code taken from GitHub-Resque-Wiki

Monkeypatch a model in a rake task to use a method provided by a plugin?

During some recent refactoring we changed how our user avatars are stored not realizing that once deployed it would affect all the existing users. So now I'm trying to write a rake task to fix this by doing something like this.
namespace :fix do
desc "Create associated ImageAttachment using data in the Users photo fields"
task :user_avatars => :environment do
class User
# Paperclip
has_attached_file :photo ... <paperclip stuff, styles etc>
end
User.all.each do |user|
i = ImageAttachment.new
i.photo_url = user.photo.url
user.image_attachments << i
end
end
end
When I try running that though I'm getting undefined method `has_attached_file' for User:Class
I'm able to do this in script/console but it seems like it can't find the paperclip plugin's methods from a rake task.
the rake task is probably not be loading the full Rails environment. You can force it to do so by doing something like this:
require File.expand_path(File.dirname(__FILE__) + "/../config/environment")
where the path leads to your environment.rb file. If this were to fix the issue, you should include it inside this task specifically, because you probably do not want all your rake tasks to include the environment by default. In fact, a rake task may not even be the best place to do what you're trying to do. You could try creating a script in the script directory as well.

Resources