I am trying to post updates from heroku using carrierwave to twitter... with media.
http://rdoc.info/gems/twitter/Twitter/API/Tweets#update_with_media-instance_method
I can do it without media, but when I try media, I keep running into problems.
Twitter.update_with_media("message", File.new(picture.picture_url.to_s))
I get the error:
Errno::ENOENT (No such file or directory - https://amazonlinktopicture)
Any ideas? I tried with File.open also and it didn't work.
Just for benefit of other
> Source
require 'twitter'
require 'open-uri'
config = YAML.load_file('twitter.yml')
Twitter.configure do |c|
c.consumer_key = config['consumer_key']
c.consumer_secret = config['consumer_secret']
c.oauth_token = config['oauth_token']
c.oauth_token_secret = config['oauth_token_secret']
end
# Tempfile
begin
uri = URI.parse('https://dev.twitter.com/sites/default/files/images_terms/tweet-display-guidelines-20110405.png')
media = uri.open
media.instance_eval("def original_filename; '#{File.basename(uri.path)}'; end")
p Twitter.update_with_media(Time.now.to_s, media)
rescue => e
p e
end
# StringIO
begin
uri = URI.parse('http://a3.twimg.com/a/1315421129/images/logos/twitter_newbird_blue.png')
media = uri.open
media.instance_eval("def original_filename; '#{File.basename(uri.path)}'; end")
p Twitter.update_with_media(Time.now.to_s, media)
rescue => e
p e
end
require 'open-uri'
Twitter.update_with_media("message", open(picture.picture_url.to_s) {|f| f.read })
begin
twitter_client = Twitter::REST::Client.new do |c|
c.consumer_key = config['consumer_key']
c.consumer_secret = config['consumer_secret']
c.oauth_token = config['oauth_token']
c.oauth_token_secret = config['oauth_token_secret']
end
twitter_client.update_with_media(message, open(picture.picture_url))
rescue Exception => exc
#message = exc.message
end
begin
twitter_client = Twitter::REST::Client.new do |client|
client.consumer_key = config['consumer_key']
client.consumer_secret = config['consumer_secret']
client.oauth_token = config['oauth_token']
client.oauth_token_secret = config['oauth_token_secret']
end
twitter_client.update_with_media(message, open(picture.picture_url))
rescue Exception => exc
#message = exc.message
end
Related
I have used Typhoeus to stream a zip file to memory, then am iterating through each file to extract the XML doc. To read the XML file I used Nokogiri, but am getting an error, Errno::ENOENT: No such file or directory # rb_sysopen - my_xml_doc.xml.
I looked up the error and saw that ruby is most likely running the script in the wrong directory. I am a little confused, do I need to save the XML doc to memory first before I can read it as well?
Here is my code to clarify further:
Controller
def index
url = "http://feed.omgili.com/5Rh5AMTrc4Pv/mainstream/posts/"
html_response = Typhoeus.get(url)
doc = Nokogiri::HTML(html_response.response_body)
path_array = []
doc.search("a").each do |value|
path_array << value.content if value.content.include?(".zip")
end
path_array.each do |zip_link|
download_file = File.open zip_link, "wb"
request = Typhoeus::Request.new("#{url}#{zip_link}")
binding.pry
request.on_headers do |response|
if response.code != 200
raise "Request failed"
end
end
request.on_body do |chunk|
download_file.write(chunk)
end
request.run
Zip::File.open(download_file) do |zipfile|
zipfile.each do |file|
binding.pry
doc = Nokogiri::XML(File.read(file.name))
end
end
end
end
file
=> #<Zip::Entry:0x007ff88998373
#comment="",
#comment_length=0,
#compressed_size=49626,
#compression_method=8,
#crc=20393847,
#dirty=false,
#external_file_attributes=0,
#extra={},
#extra_length=0,
#filepath=nil,
#follow_symlinks=false,
#fstype=0,
#ftype=:file,
#gp_flags=2056,
#header_signature=009890,
#internal_file_attributes=0,
#last_mod_date=18769,
#last_mod_time=32626,
#local_header_offset=0,
#local_header_size=nil,
#name="my_xml_doc.xml",
#name_length=36,
#restore_ownership=false,
#restore_permissions=false,
#restore_times=true,
#size=138793,
#time=2016-10-17 15:59:36 -0400,
#unix_gid=nil,
#unix_perms=nil,
#unix_uid=nil,
#version=20,
#version_needed_to_extract=20,
#zipfile="some_zip_file.zip">
This is the solution I came up with:
Gems:
gem 'typhoeus'
gem 'rubyzip'
gem 'redis', '~>3.2'
Controller:
def xml_to_redis_list(url)
html_response = Typhoeus.get(url)
doc = Nokogiri::HTML(html_response.response_body)
#redis = Redis.new
path_array = []
doc.search("a").each do |value|
path_array << value.content if value.content.include?(".zip")
end
path_array.each do |zip_link|
download_file = File.open zip_link, "wb"
request = Typhoeus::Request.new("#{url}#{zip_link}")
request.on_headers do |response|
if response.code != 200
raise "Request failed"
end
end
request.on_body do |chunk|
download_file.write(chunk)
end
request.run
while download_file.size == 0
sleep 1
end
zip_download = Zip::File.open(download_file.path)
Zip::File.open("#{Rails.root}/#{zip_download.name}") do |zip_file|
zip_file.each do |file|
xml_string = zip_file.read(file.name)
check_if_xml_duplicate(xml_string)
#redis.rpush("NEWS_XML", xml_string)
end
end
File.delete("#{Rails.root}/#{zip_link}")
end
end
def check_if_xml_duplicate(xml_string)
#redis.lrem("NEWS_XML", -1, xml_string)
end
I have ruby on rails 4.
How I can to check proxy and get information abot this proxy (timeout and etc.), if it work?
I parse page with nokoriri through proxy.
page = Nokogiri::HTML(open("http://bagche.ru/home/radio_streem/", :proxy => "http://213.135.96.35:3129", :read_timeout=>10))
gem install curb
require 'net/http'
require 'net/ping'
require 'curb'
def proxy_check
#proxies = Proxy.all
url = "ya.ru"
#proxies.each do |p|
proxy = Net::Ping::TCP.new(p.proxy_address, p.proxy_port.to_i)
if proxy.ping?
#resp = Curl::Easy.new(url) { |easy|
easy.proxy_url = p.proxy_address
easy.proxy_port=p.proxy_port.to_i
# easy.timeout=90
# easy.connect_timeout=30
easy.follow_location = true
easy.proxy_tunnel = true
}
begin
#resp.perform
#resp.response_code
rescue
puts "CURL_GET -e- fail "+p.proxy_address
if #resp.response_code == 200
p.proxy_status = 1
p.proxy_timeout = #resp.total_time
else
p.proxy_status = 0
puts "CURL_GET fail "+p.proxy_address
end
end
else
p.proxy_status = 0
puts "ping fail "+p.proxy_address
end
p.save
end
end
returned 200 if availabil.
I am having a hell of a problem here.
I'm using ruby on rails:
ruby 1.8.7 (2011-12-10 patchlevel 356)
rails 2.3.14
I'm trying a simple open with open-uri on the following address:
http://jollymag.net/n/10390-летни-секс-пози-във-водата.html (link is NSFW)
However the resulting file when read produces a weird (broken) string.
This was tested on ruby 1.9.3 and rails 3.2.x too.
require 'open-uri'
url = 'http://jollymag.net/n/10390-летни-секс-пози-във-водата.html'
url = URI.encode(url)
file = open(url)
doc = file.collect.to_s # <- the document is broken
document = Nokogiri::HTML.parse(doc,nil,"utf8")
puts document # <- the document after nokogiri has one line of content
I tried Iconv stuff and others but nothing works. The above code is more or less a minimal isolated case for the exact problem.
I appreciate any help since I'm trying to figure this bug for a couple of days now.
Regards,
Yavor
So the problem was a tricky one for me.
It appears that some servers return only gzip-ed response.
So in order to read you of course have to read it accordingly.
I decided to post my whole crawl code so people might find a more complete solutions to such problems. This is part of a bigger class so it refers a lot of the times to self.
Hope it helps!
SHINSO_HEADERS = {
'Accept' => '*/*',
'Accept-Charset' => 'utf-8, windows-1251;q=0.7, *;q=0.6',
'Accept-Encoding' => 'gzip,deflate',
'Accept-Language' => 'bg-BG, bg;q=0.8, en;q=0.7, *;q=0.6',
'Connection' => 'keep-alive',
'From' => 'support#xenium.bg',
'Referer' => 'http://svejo.net/',
'User-Agent' => 'Mozilla/5.0 (compatible; Shinso/1.0;'
}
def crawl(url_address)
self.errors = Array.new
begin
begin
url_address = URI.parse(url_address)
rescue URI::InvalidURIError
url_address = URI.decode(url_address)
url_address = URI.encode(url_address)
url_address = URI.parse(url_address)
end
url_address.normalize!
stream = ""
timeout(10) { stream = url_address.open(SHINSO_HEADERS) }
if stream.size > 0
url_crawled = URI.parse(stream.base_uri.to_s)
else
self.errors << "Server said status 200 OK but document file is zero bytes."
return
end
rescue Exception => exception
self.errors << exception
return
end
# extract information before html parsing
self.url_posted = url_address.to_s
self.url_parsed = url_crawled.to_s
self.url_host = url_crawled.host
self.status = stream.status
self.content_type = stream.content_type
self.content_encoding = stream.content_encoding
self.charset = stream.charset
if stream.content_encoding.include?('gzip')
document = Zlib::GzipReader.new(stream).read
elsif stream.content_encoding.include?('deflate')
document = Zlib::Deflate.new().deflate(stream).read
#elsif stream.content_encoding.include?('x-gzip') or
#elsif stream.content_encoding.include?('compress')
else
document = stream.read
end
self.charset_guess = CharGuess.guess(document)
if not self.charset_guess.blank? or
not self.charset_guess == 'utf-8' or
not self.charset_guess == 'utf8'
document = Iconv.iconv("UTF-8", self.charset_guess , document).to_s
end
document = Nokogiri::HTML.parse(document,nil,"utf8")
document.xpath('//script').remove
document.xpath('//SCRIPT').remove
for item in document.xpath('//*[translate(#src, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz")]')
item.set_attribute('src',make_absolute_address(item['src']))
end
document = document.to_s.gsub(/<!--(.|\s)*?-->/,'')
#document = document.to_s.gsub(/\<![ \r\n\t]*(--([^\-]|[\r\n]|-[^\-])*--[ \r\n\t]*)\>/,'')
self.content = Nokogiri::HTML.parse(document,nil,"utf8")
end
First of all Thanks for you all for helping programmers like me with your valuable inputs in solving day to day issues.
This is my first question in stack overflow as I am experiencing this problems from almost one week.
WE are building a crawler which crawls the specific websites and extract the contents from it, we are using mechanize to acheive this , as it was taking loads of time we decided to run the crawling process as a background task using resque with redis gem , but while sending the process to background I am experiencing the error as the title saying,
my code in lib/parsers/home.rb
require 'resque'
require File.dirname(__FILE__)+"/../index"
class Home < Index
Resque.enqueue(Index , :page )
def self.perform(page)
super (page)
search_form = page.form_with :name=>"frmAgent"
resuts_page = search_form.submit
total_entries = resuts_page.parser.xpath('//*[#id="PagingTable"]/tr[2]/td[2]').text
if total_entries =~ /(\d+)\s*$/
total_entries = $1
else
total_entries = "unknown"
end
start_res_idx = 1
while true
puts "Found #{total_entries} entries"
detail_links = resuts_page.parser.xpath('//*[#id="MainTable"]/tr/td/a')
detail_links.each do |d_link|
if d_link.attribute("class")
next
else
data_page = #agent.get d_link.attribute("href")
fields = get_fields_from_page data_page
save_result_page page.uri.to_s, fields
#break
end
end
site_done
rescue Exception => e
puts "error: #{e}"
end
end
and the superclass in lib/index.rb is
require 'resque'
require 'mechanize'
require 'mechanize/form'
class Index
#queue = :Index_queue
def initialize(site)
#site = site
#agent = Mechanize.new
#agent.user_agent = Mechanize::AGENT_ALIASES['Windows Mozilla']
#agent.follow_meta_refresh = true
#rows_parsed = 0
#rows_total = 0
rescue Exception => e
log "Unable to login: #{e.message}"
end
def run
log "Parsing..."
url = "unknown"
if #site.url
url = #site.url
log "Opening #{url} as a data page"
#page = #agent.get(url)
#perform method should be override in subclasses
#data = self.perform(#page)
else
#some sites do not have "datapage" URL
#for example after login you're already on your very own datapage
#this is to be addressed in 'perform' method of subclass
#data = self.perform(nil)
end
rescue Exception=>e
puts "Failed to parse URL '#{url}', exception=>"+e.message
set_site_status("error "+e.message)
end
#overriding method
def self.perform(page)
end
def save_result_page(url, result_params)
result = Result.find_by_sql(["select * from results where site_id = ? AND ref_code = ?", #site.id, utf8(result_params[:ref_code])]).first
if result.nil?
result_params[:site_id] = #site.id
result_params[:time_crawled] = DateTime.now().strftime "%Y-%m-%d %H:%M:%S"
result_params[:link] = url
result = Result.create result_params
else
result.result_fields.each do |f|
f.delete
end
result.link = url
result.time_crawled = DateTime.now().strftime "%Y-%m-%d %H:%M:%S"
result.html = result_params[:html]
fields = []
result_params[:result_fields_attributes].each do |f|
fields.push ResultField.new(f)
end
result.result_fields = fields
result.save
end
#rows_parsed +=1
msg = "Saved #{#rows_parsed}"
msg +=" of #{#rows_total}" if #rows_total.to_i > 0
log msg
return result
end
end
What's Wrong with this code?
Thanks
I'm trying to run an rspec test.
You can see most of that code here.
Maybe it's relevant: CoRegEmailWorker.perform contains this:
ProvisionalUser.where("unsubscribed = false AND disabled = false AND (email_sent_count < ? OR email_sent_count is NULL) AND (last_email_sent <= ? OR last_email_sent IS NULL) AND sign_up_date IS NULL",
ProvisionalUser::EMAIL_COUNT_LIMIT, email_sending_interval.hours.ago).
each{ |user|
begin
user.send_email
rescue Exception => ex
logger.error ex
end
}
and ProvisionalUser has this method:
def send_email
self.email_sent_count = self.email_sent_count.nil? ? 1 : self.email_sent_count + 1
self.last_email_sent = DateTime.now
self.disabled = true if self.email_sent_count == EMAIL_COUNT_LIMIT
self.save!
ProvisionalUserNotifier.send_registration_invite(self.id).deliver
end
Finally, ProvisionalUserNotifier inherits from MailGunNotifier which inherits from ActionMailer.
The problem I'm having is that the deliveries array is not being populated. In my `config/environments/test.rb'. I have this:
config.action_mailer.perform_deliveries = true
config.action_mailer.delivery_method = :test
I'm not certain what else is needed here.
i've even gone so far as to try this:
require "spec_helper"
require "action_mailer"
describe "unsubscribe functionality" do
pu1 = ProvisionalUser.new
pu1.email = 'contact_me#test.com'
pu1.partner = 'partner'
pu1.first_name = 'joe'
pu1.save!
before(:each) do
ActionMailer::Base.delivery_method = :test
ActionMailer::Base.perform_deliveries = true
ActionMailer::Base.deliveries = []
end
it "should send emails to subscribed users only" do
unsubscribed_user = FactoryGirl.build(:unsubscribed_user)
unsubscribed_user.save!
subscribed_user = FactoryGirl.create(:subscribed_user)
CoRegEmailWorker.perform
ActionMailer::Base.deliveries.length.should == 1
ActionMailer::Base.deliveries.first.email.should =~ subscribed_user.email
#sent.first.email.should_not =~ unsubscribed_user.email
#sent.first.email.should =~ subscribed_user.email
end
def sent
ActionMailer::Base.deliveries
end
end
wow. that was really annoying. because the exception was being eaten, i wasn't seeing that I was missing a neccessary value for the subject of the email to work.