I wrote a ruby script to download an image URL:
require 'open-uri'
imageAddress = ARGV[0]
targetPath = ARGV[1]
fullFileNamePath = "#{targetPath}test.jpg"
begin
File.open(fullFileNamePath, 'wb') do |fo|
fo.write open(imageAddress).read
end
rescue OpenURI::HTTPError => ex
puts ex
File.delete(fullFileNamePath)
end
Example Usage:
ruby download_image.rb "https://images.genius.com/b015b15e476c92d10a834d523575d3c9.1000x1000x1.jpg" "/Users/Me/Downloads/"
The problem is, sometimes I run across this output error:
520 Origin Error
Then, when I try the same URL in my browser, I get something like this:
If I reload the page or click the 'Retry for a live version' button in the above image, the page loads.
Then if I run the script again it downloads the image just fine.
So how can I replicate this page reload / 'Retry for a live version' behavior using ruby and without switching to my browser? Running the script again doesn't do the job.
It sounds like you are looking for a delay command. If the script fails (or encounters '520 Origin Error') wait and re-try.
This is a quick built recursive function, you may want to add other checks for how many times you have looped, breaking after so many. (Also not tested, may contain errors, meant as an example)
def getFile(params_you_need)
begin
File.open(fullFileNamePath, 'wb') do |fo|
fo.write open(imageAddress).read
end
rescue OpenURI::HTTPError => ex
puts ex
File.delete(fullFileNamePath)
if ex == '520 Origin Error'
sleep(30) #generally a good time to pause
getFile(params_you_need)
end
end
end
Related
This is my code (recompilation of How do I temporarily redirect stderr in Ruby? (which can't be used because of native extension writes):
def silence_stdout(log = '/dev/null')
orig = $stdout.dup
$stdout.reopen(File.new(log, 'w'))
begin
yield
ensure
$stdout = orig
end
end
silence_stdout('ttt.log') do
#do something
end
But I have a problem, the file is filled with the code only after the puma stops (Ctrl + C).
Probably should I close the file? But I do not understand how to do it. All my attempts to close the file end as "log writing failed. closed stream" or "no block given (yield)".
I ask for advice.
I build a website-crawler that (later on) uses these links to read out information.
The current rake-task goes through all the possible pages one by one and checks if the requests goes trough (valid response) or returns a 404/503 error (invalid page). If it's valid the pages url gets saved into my database.
Now as you can see the task requests 50,000 pages in total thus requires some time.
I have read about Sidekiq and how it can perform these tasks asynchronously thus making this a lot faster.
My question: As you can see my task builds the counter after each loop. This will not work with Sidekiq I guess as it will only perform this script independent of itself various times, am I right?
How would I go around the problem of each instance needing its own counter then?
Hopefully my question makes sense - Thank you very much!
desc "Validate Pages"
task validate_url: :environment do
require 'rubygems'
require 'open-uri'
require 'nokogiri'
counter = 1
base_url = "http://example.net/file"
until counter > 50000 do
begin
url = "#{base_url}_#{counter}/"
open(url)
page = Page.new
page.url = url
page.save!
puts "Saved #{url} !"
counter += 1
rescue OpenURI::HTTPError => ex
logger ||= Logger.new("validations.log")
if ex.io.status[0] == "503"
logger.info "#{ex} # #{counter}"
end
puts "#{ex} # #{counter}"
counter += 1
rescue SocketError => ex
logger ||= Logger.new("validations.log")
logger.info "#{ex} # #{counter}"
puts "#{ex} # #{counter}"
counter += 1
end
end
end
A simple Redis INCR operation will create and/or increment a global counter for your jobs to use. You can use Sidekiq's redis connection to implement a counter trivially:
Sidekiq.redis do |conn|
conn.incr("my-counter")
end
If you want to use it async - that means you will have many instances of same job. The fastest approach - to use something like redis. This will give you simple and fast way to check\update counter for your needs. But also make sure you took care about counter: If one of your jobs using it, lock it for other jobs, so there wont be wrong results, etc
I need to call a command(in a sinatra or rails app) like this:
`command sub`
Some log will be outputed when the command is executing.
I want to see the log displaying continuously in the process.
But I just can get the log string after it's done with:
result = `command sub`
So, is there a way to implement this?
On windows i have the best experience with IO.popen
Here is a sample
require 'logger'
$log = Logger.new( "#{__FILE__}.log", 'monthly' )
#here comes the full command line, here it is a java program
command = %Q{java -jar getscreen.jar #{$userid} #{$password}}
$log.debug command
STDOUT.sync = true
begin
# Note the somewhat strange 2> syntax. This denotes the file descriptor to pipe to a file. By convention, 0 is stdin, 1 is stdout, 2 is stderr.
IO.popen(command+" 2>&1") do |pipe|
pipe.sync = true
while str = pipe.gets #for every line the external program returns
#do somerthing with the capturted line
end
end
rescue => e
$log.error "#{__LINE__}:#{e}"
$log.error e.backtrace
end
There's six ways to do it, but the way you're using isn't the correct one because it waits for the process the return.
Pick one from here:
http://tech.natemurray.com/2007/03/ruby-shell-commands.html
I would use IO#popen3 if I was you.
I want to check if the URL that the user inputs is in fact a valid page.
I tried:
if Nokogiri::HTML(open("http://example.com"))
#DO REQUIRED TASK
end
But that immediately throws an error upon attempting to open the page. I want to return the result of whether it is a document of any kind.
I either get the error:
no such file or directory
or:
getaddrinfo: Name or service not known
depending on how I try to make the check.
I'd start with something like:
require 'nokogiri'
require 'open-uri'
begin
doc = Nokogiri.HTML(open(url))
rescue Exception => e
puts "Couldn't read \"#{ url }\": #{ e }"
exit
end
puts (doc.errors.empty?) ? "No problems found" : doc.errors
Nokogiri sets the document's errors array to the values of any errors that occurred during the parsing process.
This only addresses one part of the issue though. Malicious people like to break things, and this would be very easy to break. In general, be very careful about anything a user gives you, especially if your site is exposed to the wild internet.
Prior to telling OpenURI to load the file to give to Nokogiri, you should sniff that URL and do some sanity checks using a HTTP HEAD request to find out the size and MIME-TYPE of the content being retrieved. Once you know those, you can try loading the file.
Firstly, it's bad style to 'rescue Exception => e' in Ruby.
[Refer: http://daniel.fone.net.nz/blog/2013/05/28/why-you-should-never-rescue-exception-in-ruby/ ]
Secondly, for this case, "rescue OpenURI::HTTPError => e" would be more suitable.
I'm not familiar with handling exceptions but something like :
begin
page = Nokogiri::HTML(open("http://example.com"))
ensure
puts "not a document of any kind"
end
do_something_whith(page) if page
...should do the trick.
or (after reading your comment) :
begin
page = open("http://example.com")
ensure
puts "not a document of any kind"
end
Nokogiri::HTML(page) if page
Using Rails 3.1.1 and Herkou
I have 1.000 products in my app. They all have a very slow controller which is effectively solved by fragment caching. Although the data doesn't change very often, it still needs to expire (which I do by sweeping) periodically, in my case once a week.
Now, after sweeping the cached views I don't want my users to create the new fragments by trying to access the products one after another (takes about 6-8 secs at the first load, 2-3 sec for the cached load). I assume I can do that with some sort of script that will load each Product Page one by one and thus make the server create those fragments.
I can imagine this can be handled in three ways:
Run a script on my local machine that will try to access each url with some sort of get-command - Downside: Not very pretty and will affect visitor stats in a way I would not prefer.
Run the same type script on the server after the sweeper, that will load each Product. How can I do that, in that case?
Using a smart Rails command to do this automatically. Is there such an elegant command?
I made this script and it works. The "product.slug" is because I have friendly_id installed. It will produce url-variables with names such as www.mydomain.com/productabc-123/ which will be read by Nokogiri (Nokogiri gem is needed for this solution).
PLEASE NOTE THAT I SWITCHED FROM FRAGMENT CACHING TO ACTION CACHING IN THIS SOLUTION (as opposed to the question, where I am using fragment caching). The important difference for this is when I check the cache if Rails.cache.exist?('views/www.mydomain.com/' + product.slug). For fragment_caching it should be the fragment name there instead.
require 'nokogiri'
require 'open-uri'
Product.all.each do |product|
url = 'http://www.mydomain.com/' + product.slug
begin
if Rails.cache.exist?('views/www.mydomain.com/' + product.slug)
puts url + " is already in cache"
else
doc = Nokogiri::HTML(open(url))
puts "Reads " + url
# Verifies if the caching worked. Only for trouble shooting
if Rails.cache.exist?('views/www.mydomain.com/' + product.slug)
puts "--->" + url + " is NOW in the cache"
else
puts "--->" + url + " is still not in the cache!"
end
sleep 1
end
rescue
puts 'Normal rescue of ' + url
rescue Timeout::Error
puts 'Timeout rescue of ' + url
puts 'Sleep for 5 sec'
sleep 5
retry
end
end
Create a script that runs as rake task, or better yet a worker, that runs and curls the page. There is no need to include a gem when you can just call curl
`curl -A "CacheRefresher" #{ENV['HOSTNAME']}/api/v1/#{klass.name.underscore.pluralize}/#{id} >/dev/null 2>&1`