I'm trying to get all connections (interactions) on a facebook page since a certain time period. I'm using the koala gem and filtering the request with "since: 1.month.ago.to_i" which seems to work fine. However, this gives me 25 results at a time. If I change the limit to 446 (the maximum it seems) that works better. But...if I use .next_page to give me the next set of results within the given time range, it instead just gives me a next set of results without obeying the time range.
For example, let's say I don't increase the limit and I have 25 results per request. I do something like:
#api.get_connections(#fan_page_id, "feed", {since: 1.month.ago.to_i})
let's assume there are 30 results for this and the first request gets me 25 (the default limit). then, if I do this:
#api.get_connections(#fan_page_id, "feed", {since: 1.month.ago.to_i}).next_page
instead of returning the last 5 results, it returns 25 more, 20 of which are not "since: 1.month.ago.to_i". I have a while loop cycling through the pages but I don't know where to stop since it just keep returning results to me no matter what as long as I keep calling .next_page.
is there a better way of doing this?
if not, what's the best way to check to make sure the post i'm looking at in the loop is still within the time range i want and to break out if not?
here's my code:
def perform(fan_page_id, pagination_options = {})
#since_date = pagination_options[:since_date] if pagination_options[:since_date]
#limit = pagination_options[:limit] if pagination_options[:limit]
#oauth = Koala::Facebook::OAuth.new
#api = Koala::Facebook::API.new #oauth.get_app_access_token
fb_page = #api.get_object(fan_page_id)
#fan_page_id = fb_page["id"]
# Collect all the users who liked, commented, or liked *and* commented on a post
process_posts(#api.get_connections(#fan_page_id, "feed", {since: #since_date})) do |post|
## do stuff based on each post
end
end
private
# Take each post from the specified feed and perform the provided
# code on each post in that feed.
#
# #param [Koala::Facebook::API::GraphCollection] feed An API response containing a page's feed
def process_posts(feed, options = {})
raise ArgumentError unless block_given?
current_feed = feed
begin
current_feed.each { |post| yield(post) }
current_feed = current_feed.next_page
end while current_feed.any?
end
current = #api.get_connections(#fan_page_id, "feed", {since: 1.month.ago.to_i})
next = current.next_page
next = next.next_page
.....
Please try these, I think they work.
Related
In a Rails 6.x app, I have a controller method which backgrounds queries that take longer than 2 minutes (to avoid a browser timeout), advises the user, stores the results, and sends a link that can retrieve them to generate a live page (with Highcharts charts). This works fine.
Now, I'm trying to implement the same logic with a method that backgrounds the creation of a report, via a Tempfile, and attaches the contents to an email, if the query runs too long. This code works just fine if the 2-minute timeout is NOT reached, but the Tempfile is empty at the commented line if the timeout IS reached.
I've tried wrapping the second part in another thread, and wrapping the internals of each thread with a mutex, but this is all getting above my head. I haven't done a lot of multithreading, and every time I do, I feel like I stumble around till I get it. This time, I can't even seem to stumble into it.
I don't know if the problem is with my thread(s), or a race condition with the Tempfile object. I've had trouble using Tempfiles before, because they seem to disappear quicker than I can close them. Is this one getting cleaned up before it can be sent? The file handle actually still exists on the file system at the commented point, even though it's empty, so I'm not clear on what's happening.
def report
queue = Queue.new
file = Tempfile.new('report')
thr = Thread.new do
query = %Q(blah blah blah)
#calibrations = ActiveRecord::Base.connection.exec_query query
query = %Q(blah blah blah)
#tunings = ActiveRecord::Base.connection.exec_query query
if queue.empty?
unless #tunings.empty?
CSV.open(file.path, 'wb') do |csv|
csv << ["headers...", #parameters].flatten
#calibrations.each do |c|
line = [c["h1"], c["h2"], c["h3"], c["h4"], c["h5"], c["h6"], c["h7"], c["h8"]]
t = #tunings.select { |t| t["code"] == c["code"] }.first
#parameters.each do |parameter|
line << t[parameter.downcase]
end
csv << line
end
end
send_data file.read, :type => 'text/csv; charset=iso-8859-1; header=present', :disposition => "attachment; filename=\"report.csv\""
end
else
# When "timed out", `file` is empty here
NotificationMailer.report_ready(current_user, file.read).deliver_later
end
end
give_up_at = Time.now + 120.seconds
while Time.now < give_up_at do
if !thr.alive?
break
end
sleep 1
end
if thr.alive?
queue << "Timeout"
render html: "Your report is taking longer than 2 minutes to generate. To avoid a browser timeout, it will finish in the background, and the report will be sent to you in email."
end
end
The reason the file is empty is because you are giving the query 120 seconds to complete. If after 120 seconds that has not happened you add "Timeout" to the queue. The query is still running inside the thread and has not reached the point where you check if the queue is empty or not. When the query does complete, since the queue is now not empty, you skip the part where you write the csv file and go to the Notification.report line. At that point the file is still empty because you never wrote anything into it.
In the end I think you need to rethink the overall logic of what you are trying to accomplish and there needs to be more communication between the threads and the top level.
Each thread needs to tell the top level if it has already sent the result, and the top level needs to let the thread know that its past time to directly send the result, and instead should email the result.
Here is some code that I think / hope will give some insight into how to approach this problem.
timeout_limit = 10
query_times = [5, 15, 1, 15]
timeout = []
sent_response = []
send_via_email = []
puts "time out is set to #{timeout_limit} seconds"
query_times.each_with_index do |query_time, query_id|
puts "starting query #{query_id} that will take #{query_time} seconds"
timeout[query_id] = false
sent_response[query_id] = false
send_via_email[query_id] = false
Thread.new do
## do query
sleep query_time
unless timeout[query_id]
puts "query #{query_id} has completed, displaying results now"
sent_response[query_id] = true
else
puts "query #{query_id} has completed, emailing result now"
send_via_email[query_id] = true
end
end
give_up_at = Time.now + timeout_limit
while Time.now < give_up_at
break if sent_response[query_id]
sleep 1
end
unless sent_response[query_id]
puts "query #{query_id} timed out, we will email the result of your query when it is completed"
timeout[query_id] = true
end
end
# simulate server environment
loop { }
=>
time out is set to 10 seconds
starting query 0 that will take 5 seconds
query 0 has completed, displaying results now
starting query 1 that will take 15 seconds
query 1 timed out, we will email the result of your query when it is completed
starting query 2 that will take 1 seconds
query 2 has completed, displaying results now
starting query 3 that will take 15 seconds
query 1 has completed, emailing result now
query 3 timed out, we will email the result of your query when it is completed
query 3 has completed, emailing result now
I'm trying to get all Stripe::BalanceTransaction except those they are already in my JsonStripeEvent
What I did =>
def perform(*args)
last_recorded_txt = REDIS.get('last_recorded_stripe_txn_last')
txns = Stripe::BalanceTransaction.all(limit: 100, expand: ['data.source', 'data.source.application_fee'], ending_before: last_recorded_txt)
REDIS.set('last_recorded_stripe_txn_last', txns.data[0].id) unless txns.data.empty?
txns.auto_paging_each do |txn|
if txn.type.eql?('charge') || txn.type.eql?('payment')
begin
JsonStripeEvent.create(data: txn.to_json)
rescue StandardError => e
Rails.logger.error "Error while saving data from stripe #{e}"
REDIS.set('last_recorded_stripe_txn_last', txn.id)
break
end
end
end
end
But It doesnt get the new one from the API.
Can anyone could help me for this ? :)
Thanks
I think it's because the way auto_paging_each works is almost opposite to what you expect :)
As you can see from its source, auto_paging_each calls Stripe::ListObject#next_page, which is implemented as follows:
def next_page(params={}, opts={})
return self.class.empty_list(opts) if !has_more
last_id = data.last.id
params = filters.merge({
:starting_after => last_id,
}).merge(params)
list(params, opts)
end
It simply takes the last (already fetched) item and adds its id as the starting_after filter.
So what happens:
You fetch 100 "latest" (let's say) records, ordered by descending date (default order for BalanceTransaction API according to Stripe docs)
When you call auto_paging_each on this dataset then, it takes the last record, adds its id as the
starting_after filter and repeats the query.
The repeated query returns nothing because there are noting newer (starting later) than the set you initially fetched.
As far as there are no more newer items available, the iteration stops after the first step
What you could do here:
First of all, ensure that my hypothesis is correct :) - put the breakpoint(s) inside Stripe::ListObject and check. Then 1) rewrite your code to use starting_after traversing logic instead of ending_before - it should work fine with auto_paging_each then - or 2) rewrite your code to control the fetching order manually.
Personally, I'd vote for (2): for me slightly more verbose (probably), but straightforward and "visible" control flow is better than poorly documented magic.
I have an API that I am pulling data from, and I want to collect all the tags from this API...but I don't know the number of tags in advance, and the API throttles access via the max number of results returned in any 1 call (100). It has an unlimited number of pages though.
So a call may look like this: Tag.update_tags(100, 5) where 100 is the max number of objects returned in 1 call and 5 is the page to begin (i.e. if you assume that the tags are stored sequentially, what this is saying is return the tag records with IDs in the range of 401 - 500.
The issue is, I don't want to manually have to enter 5 (i.e. I don't know what the upper limit is). There is no way for me to ping the total number of tags (if there were, I would simply divide it and put this call in a loop up to that number).
All I do know is that once it reaches a page that doesn't have any results, it will return an empty array [].
So, how do I loop through all the tags and stop when the result returned is an empty array (which would be the final result returned and therefore not evaluated)?
What does that loop look like?
Use an unconditional loop with a break statement when the result returns the empty array.
i = 1
loop do
result = call_to_api(i)
do_something_with(result)
i += 1
break if result.empty?
end
Of course in a production scenario you want something a little more robust, including exception handlers, some progress log reporting, and some kind of concrete iteration limit to ensure that the loop does not become infinite.
Update
Here's an example using a class to wrap up the logic.
class Api
DEFAULT_OPTIONS = {:start_position => 1, :max_iterations => 1000}
def initialize(base_uri, config)
#config = DEFAULT_OPTIONS.merge(config)
#position = config[:start_position]
#results_count = 0
end
def each(&block)
advance(&block) while can_advance?
log("Processed #{#results_count} results")
end
def advance(&block)
yield result
#results_count += result.count
#position += 1
#current_result = nil
end
def result
#current_result ||= begin
response = Net::HTTP.get_response(current_uri)
JSON.decode(response.body)
rescue
# provide some exception handling/logging
end
end
def can_advance?
#position < (#config[:start_position] + #config[:max_iterations]) && result.any?
end
def current_uri
Uri.parse("#{#base_uri}?page=#{#position}")
end
end
api = Api.new('http://somesite.com/api/v1/resource')
api.each do |result|
do_something_with(result)
end
There's also an angle with this to allow for concurrency by setting the start and iteration count for each thread which would definetly speed this up with the concurrent http requests.
Hmmm. You can get 100 items at a time, and start at a particular page. How to implement the iteration depends on what you want to do. Let's suppose that you want to collect all the unique tags. Establish a map (for example, a HashMap), then retrieve one page at a time and process it. When you hit a page that's empty, you're done.
// Implements a map and methods to update it
MyHashMap uniqueTags;
// Stores a page of tags
Page page;
Do
// get a page of tags
page = readTags();
if (page != null) {
uniqueTags.getUniqueTags(page);
} else {
break;
}
until (page == null);
I am retrieving results from NCBI's online Blast tool with 'net/http' and 'uri'. To do this I have to search through an html page to check if one of the lines is "Status=WAITING" or "Status=READY". When the Blast tool has finished the status will change to ready and results will be posted on the html page.
I have a working version to check the status and then retrieve the information that I need, but it is inefficient and is broken into two methods when I believe that there could be some way to put them into one.
def waitForBlast(rid)
get = Net::HTTP.post_form(URI.parse('http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?'), {:RID => "#{rid}", :CMD => 'Get'})
get.body.each{|line| (waitForBlast(rid) if line.strip == "Status=WAITING") if line[/Status=/]}
end
def returnBlast(rid)
blast_array = Array.new
get = Net::HTTP.post_form(URI.parse('http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?'), {:RID => "#{rid}", :CMD => 'Get'})
get.body.each{|line| blast_array.push(line[/<a href=#\d+>/][/\d+/]) if line[/<a href=#\d+>/]}
return blast_array
end
The first method checks the status and is my main concern because it is recursive. I believe(and correct me if I'm wrong) that designed as is takes too much computing power when all that I need is some way to recheck the results within the same method(adding in a time delay is a bonus). The second method is fine, but I would prefer if it was combined with the first somehow. Any help appreciated.
Take a look at this implementation. This is what he does:
res='http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Get&FORMAT_OBJECT=SearchInfo&RID=' + #rid
while status = open(res).read.scan(/Status=(.*?)$/).to_s=='WAITING'
#logger.debug("Status=WAITING")
sleep(3)
end
I think using the string scanner might be a bit more efficient than iterating over every line in the page, but I haven't looked at it's implementation so I may be wrong.
Hi I've this piece of code
class Place < ActiveRecord::Base
def self.find_or_create_by_latlon(lat, lon)
place_id = call_external_webapi
result = Place.where(:place_id => place_id).limit(1)
result = Place.create(:place_id => place_id, ... ) if result.empty? #!
result
end
end
Then I'd like to do in another model or controller
p = Post.new
p.place = Place.find_or_create_by_latlon(XXXXX, YYYYY) # race-condition
p.save
But Place.find_or_create_by_latlon takes too much time to get the data if the action executed is create and sometimes in production p.place is nil.
How can I force to wait for the response before execute p.save ?
thanks for your advices
You're right that this is a race condition and it can often be triggered by people who double click submit buttons on forms. What you might do is loop back if you encounter an error.
result = Place.find_by_place_id(...) ||
Place.create(...) ||
Place.find_by_place_id(...)
There are more elegant ways of doing this, but the basic method is here.
I had to deal with a similar problem. In our backend a user is is created from a token if the user doesn't exist. AFTER a user record is already created, a slow API call gets sent to update the users information.
def self.find_or_create_by_facebook_id(facebook_id)
User.find_by_facebook_id(facebook_id) || User.create(facebook_id: facebook_id)
rescue ActiveRecord::RecordNotUnique => e
User.find_by_facebook_id(facebook_id)
end
def self.find_by_token(token)
facebook_id = get_facebook_id_from_token(token)
user = User.find_or_create_by_facebook_id(facebook_id)
if user.unregistered?
user.update_profile_from_facebook
user.mark_as_registered
user.save
end
return user
end
The step of the strategy is to first remove the slow API call (in my case update_profile_from_facebook) from the create method. Because the operation takes so long, you are significantly increasing the chance of duplicate insert operations when you include the operation as part of the call to create.
The second step is to add a unique constraint to your database column to ensure duplicates aren't created.
The final step is to create a function that will catch the RecordNotUnique exception in the rare case where duplicate insert operations are sent to the database.
This may not be the most elegant solution but it worked for us.
I hit this inside a sidekick job that retries and gets the error repeatedly and eventually clears itself. The best explanation I've found is on a blog post here. The gist is that postgres keeps an internally stored value for incrementing the primary key that gets messed up somehow. This rings true for me because I'm setting the primary key and not just using an incremented value so that's likely how this cropped up. The solution from the comments in the link above appears to be to call ActiveRecord::Base.connection.reset_pk_sequence!(table_name) This cleared up the issue for me.
begin
result = Place.where(:place_id => place_id).limit(1)
result = Place.create(:place_id => place_id, ... ) if result.empty? #!
rescue ActiveRecord::StatementInvalid => error
#save_retry_count = (#save_retry_count || 1)
ActiveRecord::Base.connection.reset_pk_sequence!(:place)
retry if( (#save_retry_count -= 1) >= 0 )
raise error
end