Here's the code for the scraper
class Scrape
def perform
url = "# a long url"
agent = Mechanize.new
agent.get(url)
while(agent.page.link_with(:text => "Next Page \u00BB")) do
agent.page.search(".content").each do |item|
puts "."
House.create!({
# attributes...
})
end
agent.page.link_with(:text => "Next Page \u00BB").click
end
end
end
On my local environment I can run it in the rails console just by typing
Scrape.new.delay.perform # to queue the job
rake jobs:work
and it works perfectly.
However running the analogous (with a worker running instead of rake jobs:work) in the Heroku console doesn't seem to do anything. I tried logging some lines in the Heroku log and I can get the url variable to log (so the method is at least getting called) but the "." which is there to show each time we run the while loop never appears and no Houses are created in the database.
Anyone any ideas what might be wrong?
Solved this problem myself, pretty obscure bug though. I was using ruby 1.9.2 in my local environment but I had the app deployed on a ruby 1.8.7 stack.
The important difference being the change in character encoding between the two ruby versions which meant that Mechanize couldn't find a link with the unicode encoded character "\u00BB" and thus didn't do any scraping.
Related
Let's say I wanted a greeting every time the Rails console comes up:
Scotts-MBP-4:ucode scott$ rails c
Loading development environment (Rails 4.2.1)
Hello there! I'm a custom greeting
2.1.5 :001 >
Where would I put the puts 'Hello there! I\'m a custom greeting' statement?
Another Stackoverflow answer suggested, and I've read this elsewhere too, that I can put that in an initializer like this:
# config/initializers/console_greeting.rb
if defined?(Rails::Console)
puts 'Hello there! I\'m a custom greeting'
end
That doesn't work for me though :(. Even without the if defined?(Rails::Console) I still don't get output. Seems like initializers are not run when I enter a console, despite what others suggest.
I use ~/.irbrc for similar purposes (I require a gem in each console session). For example, my .irbrc
if (defined? Rails)
# Rails specific
end
# common for all irb sessions
You could use your project name to limit executing code to only one project's console:
if (defined? Rails) && (defined? YourProject)
# code goes here
end
The following will work in Rails 6:
Just pass a block to Rails.application.console, e.g
# config/initializers/custom_console_message.rb
if Rails.env.production?
Rails.application.console do
puts "Custom message here"
end
end
Now when starting the rails production console, the custom message will be printed. This code will not be executed when you start rails server.
Remove the if Rails.env.production? if you want this to run in all environments.
In my Rails app, there are some cases where code is being outputted with the puts command (for debugging purposes). Is there anyway to follow this output in the rails console (with the rails c command)? Or is there any other way to debug/view logs in the rails console?
Thanks!
For debugging, use Rails.logger instead of puts. For example:
Rails.logger.info "Some debugging info"
This will be logged to a log file in rails_app_root/log directory. If you are running in development environment locally, it will be logged to rails_app/log/development.log file.
Now, to see the log as they come in you can use tail command, like this:
tail -f log/development.log
Hope it helps.
Rails.logger is the best solution,one more think i want to add,if you want separate log file you could use like this
def read(args)
unless args.blank?
cache_key= self.get_cache_key(args)
end
logger = Logger.new("#{Rails.root}/log/cache_read.log")
logger.error("cache read scope == #{cache_key.to_s}")
end
so cache_read.log file having only this method log only.
I have a Resque job which pulls a csv list of data off of a remote server and then runs through the +40k entries to add any new items to an existing database table. The job is running fine however it severely slows down the response time of any subsequent requests to the server. In the console which I've launched 'bundle exec rails server', I see not print statements though the job is running. However once I hit my rails server (via a page referesh), I see multiple SELECT / INSERT statements roll by before the server responds. The SELECT/INSERT statements are clearly generated by my Resque job but oddly they wait to print to the console unit I hit the server through the browser.
It sure feels like I'm doing something wrong or not following the 'rails way'. Advice?
Here is the code in my Resque job which does the SELECT/INSERTS
# data is an array of hashes formed from parsing csv input. Max size is 1000
ActiveRecord::Base.transaction do
data.each do |h|
MyModel.find_or_create_by_X_and_Y( h[:x], h[:y], h )
end
end
Software Stack
Rails 3.2.0
postgresql 9.1
Resque 1.20.0
EDIT
I've finally take the time to debug this a bit more. Even a very simple worker, like below, slows down the next server response. In the console where I've launched the rail sever process I see that the delay occurs b/c stdout from the worker is being printed only after I ping the server.
def perform()
s = Time.now
0.upto( 90000 ) do |i|
Rails.logger.debug i * i
end
e = Time.now
Rails.logger.info "Start: #{s} ---- End #{e}"
Rails.logger.info "Total Time: #{e - s }"
end
I can get the rails server back to its normal responsiveness again if I suppress stdout when I launch rails but it doesn't seem like that should be necessary... bundle exec rails server > /dev/nul
Any input on a better way to solve this issue?
I think this answer to "Logging issues with Resque" will help.
The Rails server, in development mode, has the log file open. My understanding -- I need to confirm this -- is that it flushes the log before writing anything new to it, in order to preserve the order. If you have the Rails server attached to a terminal, it wants to output all of the changes first! This can lead to large delays if your workers have written large quantities to the log.
Note: this has been happening to me for some time, but I just put my finger on it recently.
I want to authenticate Users when they try to fire up a TCP connection in my Rails app. Here's the current code I have, it's very simplistic but should give you an idea of what I want to do.
TcpServer.rb
module TcpServer
def receive_data(data)
(#buf ||= '') << data
if line = #buf.slice!(/(.+)\r?\n/)
commands = data.split(";")
case commands[0]
when /start/i
if !User.authenticate(commands[1],commands[2])
close_connection
puts "Subscription invalid."
else
put "Subscription validated."
end
end
end
end
EventMachine::run do
host = "localhost"
port = "5587"
EventMachine::start_server host, port, TcpServer
puts "TcpServer started # #{host}:#{port}"
end
end
What do I need to require or include in order to access my User model from that module? Or is this just a completely incorrect way to do it? If so, what do you suggest?
The issue is I wasn't running it with Rails.
I was running it with:
ruby lib/TcpServer.rb
rather than:
script/runner lib/TcpServer.rb
No includes or requires needed, Rails did it automagically.
Dir.glob(Rails.root.join('app/models/*.rb')).each { |file| require file }
The above will get all models loaded if you need them (you can just add 'user.rb' to the statement if needed, in the comment above you may need to specify the path and not include the ".rb" part -> "require 'user'").
You should make a decision as to whether you think this type of integration server should be part of the running Rails app or potentially another "application" that is part of the same code base. You could keep the core internals here and start your EM server with a custom rake task and load the Rails env through that rake task.
namespace :tcp do
task :start, :needs => :environment do
# server load and start here
end
end
If I am going to open a new means of execution then I prefer to keep these running in separate processes to keep any errors from causing both to go down together. (I would look at Resque jobs/workers as a good example of how to keep code in the same Rails app without forcing them to run in the same process)
I'm trying to write to my log files while running a rake task. It works fine in development mode, but as soon as I switch to the production environment, nothing is written to the log files.
I read here
How do I use a custom log for my rake tasks in Ruby on Rails?
that this is the normal behavior and also found a #wontfix ticket in lighthouse.
My question: Is there a way to output, what's going on while my rake task is running? It performs some crawling and runs for hours. I would prefer if the output went in a specific log file like /log/crawler.log
Right now I'm just using this command to write to the log files:
ActiveRecord::Base.logger.info "Log Text"
Thanks!
The problem you are having is that rails ignores 'info' level logs in production mode.
I'd recommend reading this: http://www.ruby-doc.org/stdlib/libdoc/logger/rdoc/classes/Logger.html
and creating your own logger:
logger = Logger.new('logfile.log')
logger.info "Something happened"
You can make a new logger with Logger.new("file.log") and then call it's methods like this.
task :import_stuff => :environment do
require 'csv'
l = Logger.new("stuff.log")
csv_file = "#{RAILS_ROOT}/stuff.csv"
CSV.open(csv_file, 'r') do |row|
l.info row[1]
end
end
Maybe you need to write out the buffer where you need it:
logger.flush
or you can turn on auto flushing:
task :foo => :environment do
Rails.logger.auto_flushing = 1
Rails.logger.info "bar"
end